# Calculations

## In This Vignette

• Calculations and Calculation Groups
• Calculation Types
• Basic Calculation Methods
• Method 1: Summarising Values
• Method 2: Deriving values from other summarised values
• Moving the calculations in the pivot table
• Displaying calculations on rows
• Specifying calculations before row/column data groups
• Filtering data as part of a calculation
• % of Totals, Cumulative Sums, Running Differences, Rolling Averages, Ratios/Multiples
• Formatting calculated values
• Empty cells
• Using multiple data frames in the same pivot table
• Method 3: Custom calculation functions
• Method 4: Showing a value (no calculation)
• Performance considerations

## Calculations and Calculation Groups

Calculations define how (typically numerical) data is to be summarised/aggregated. Common ways of summarising data include sum, avg (mean, median, …), max, min, etc.

Within a pivottabler pivot table, calculations always belong to a Calculation Group. Calculation groups allow calculations to be defined that refer to other calculations.

Every pivot table always has a default calculation group (called default). This is sufficient for most scenarios and calculations groups are not referred to again in this vignette. All the calculations defined in this vignette sit in the default calculation group.

Creating additional calculation groups is only necessary for some advanced pivot table layouts - see the Irregular Layout vignette for an example.

## Calculation Types

The pivottabler package supports several different ways of calculating the values to display in the cells of the pivot table:

1. Summarise values (dplyr summarise or data.table calculate expression)
2. Deriving values from other summarised values
3. Custom calculation functions
4. Show a value (no calculation)

Calculations are added to the pivot table using the defineCalculation() function.

The first two of the methods listed above are described in more detail in the following section. The third and fourth methods are less commonly needed and are described at the end of this vignette.

## Basic Calculation Methods

### Calculation Method 1: Summarising Values

The most common way to calculate the pivot table is to provide an expression that describes how to summarise the data, e.g. defining a calculation that counts the number of trains:

library(pivottabler)
pt <- PivotTable$new() pt$addData(bhmtrains)
pt$addColumnDataGroups("TrainCategory") pt$addRowDataGroups("TOC")
pt$defineCalculation(calculationName="TotalTrains", summariseExpression="n()") pt$renderPivot()

The pivottabler package uses the dplyr package by default. The summariseExpression is therefore an expression that can be used with the dplyr summarise() function. The following shows several different example expressions:

library(pivottabler)
library(dplyr)
library(lubridate)

trains <- mutate(bhmtrains,
ArrivalDelta=difftime(ActualArrival, GbttArrival, units="mins"),
ArrivalDelay=ifelse(ArrivalDelta<0, 0, ArrivalDelta))

# create the pivot table
pt <- PivotTable$new() pt$addData(trains)
pt$addRowDataGroups("TOC", totalCaption="All TOCs") pt$defineCalculation(calculationName="TotalTrains", caption="Total Trains",
summariseExpression="n()")
pt$defineCalculation(calculationName="MinArrivalDelay", caption="Min Arr. Delay", summariseExpression="min(ArrivalDelay, na.rm=TRUE)") pt$defineCalculation(calculationName="MaxArrivalDelay", caption="Max Arr. Delay",
summariseExpression="max(ArrivalDelay, na.rm=TRUE)")
pt$defineCalculation(calculationName="MeanArrivalDelay", caption="Mean Arr. Delay", summariseExpression="mean(ArrivalDelay, na.rm=TRUE)", format="%.1f") pt$defineCalculation(calculationName="MedianArrivalDelay", caption="Median Arr. Delay",
summariseExpression="median(ArrivalDelay, na.rm=TRUE)")
pt$defineCalculation(calculationName="IQRArrivalDelay", caption="Delay IQR", summariseExpression="IQR(ArrivalDelay, na.rm=TRUE)") pt$defineCalculation(calculationName="SDArrivalDelay", caption="Delay Std. Dev.",
summariseExpression="sd(ArrivalDelay, na.rm=TRUE)", format="%.1f")
pt$renderPivot() The data.table package can also be used - see the Performance for details. Note that the “count” summarise expression is specified as “.N” when using data.table, not as “n()”. Also, when using data.table, be aware of some strange behaviour that leads to incorrect values in the pivot table when aggregating over columns that are also used in either row/column headings. Again, see the Performance for details. ### Calculation Method 2: Deriving values from other summarised values Calculations can be defined that refer to other calculations, by following these steps: 1. Specifying type="calculation", 2. Specifying the names of the calculations which this calculation is based on in the basedOn argument. 3. Specifying an expression for this calculation in the calculationExpression argument. The values of the base calculations are accessed as elements of the values list. For example, calculating the percentage of trains with an arrival delay of greater than five minutes: library(pivottabler) library(dplyr) library(lubridate) # derive some additional data trains <- mutate(bhmtrains, ArrivalDelta=difftime(ActualArrival, GbttArrival, units="mins"), ArrivalDelay=ifelse(ArrivalDelta<0, 0, ArrivalDelta), DelayedByMoreThan5Minutes=ifelse(ArrivalDelay>5,1,0)) # create the pivot table pt <- PivotTable$new()
pt$addData(trains) pt$addRowDataGroups("TOC", totalCaption="All TOCs")
pt$defineCalculation(calculationName="DelayedTrains", caption="Trains Arr. 5+ Mins Late", summariseExpression="sum(DelayedByMoreThan5Minutes, na.rm=TRUE)") pt$defineCalculation(calculationName="TotalTrains", caption="Total Trains",
summariseExpression="n()")
pt$defineCalculation(calculationName="DelayedPercent", caption="% Trains Arr. 5+ Mins Late", type="calculation", basedOn=c("DelayedTrains", "TotalTrains"), format="%.1f %%", calculationExpression="values$DelayedTrains/values$TotalTrains*100") pt$renderPivot()

The base calculations can be hidden by specifying visible=FALSE, e.g. to look at how the percentage of trains more than five minutes late varied by month and train operating company:

library(pivottabler)
library(dplyr)
library(lubridate)

trains <- mutate(bhmtrains,
GbttDateTime=if_else(is.na(GbttArrival), GbttDeparture, GbttArrival),
GbttMonth=make_date(year=year(GbttDateTime), month=month(GbttDateTime), day=1),
ArrivalDelta=difftime(ActualArrival, GbttArrival, units="mins"),
ArrivalDelay=ifelse(ArrivalDelta<0, 0, ArrivalDelta),
DelayedByMoreThan5Minutes=ifelse(ArrivalDelay>5,1,0))

# create the pivot table
pt <- PivotTable$new() pt$addData(trains)
pt$addColumnDataGroups("GbttMonth", dataFormat=list(format="%B %Y")) pt$addRowDataGroups("TOC", totalCaption="All TOCs")
pt$defineCalculation(calculationName="DelayedTrains", visible=FALSE, summariseExpression="sum(DelayedByMoreThan5Minutes, na.rm=TRUE)") pt$defineCalculation(calculationName="TotalTrains", visible=FALSE,
summariseExpression="n()")
pt$defineCalculation(calculationName="DelayedPercent", caption="% Trains Arr. 5+ Mins Late", type="calculation", basedOn=c("DelayedTrains", "TotalTrains"), format="%.1f %%", calculationExpression="values$DelayedTrains/values$TotalTrains*100") pt$renderPivot()

## Moving the calculations in the pivot table

By default the calculation headings (if visible) are placed in the pivot table as column headings, underneath any column data groups.

library(pivottabler)
pt <- PivotTable$new() pt$addData(bhmtrains)
pt$addColumnDataGroups("TrainCategory") pt$addRowDataGroups("TOC")
pt$defineCalculation(calculationName="NumberOfTrains", caption="Number of Trains", summariseExpression="n()") pt$defineCalculation(calculationName="MaximumSpeedMPH", caption="Maximum Speed (MPH)",
summariseExpression="max(SchedSpeedMPH, na.rm=TRUE)")
pt$renderPivot() The location of these headings can be moved as illustrated below. ### Displaying calculations on rows Calculations can be swapped onto the rows using the addRowCalculationGroups() method: library(pivottabler) pt <- PivotTable$new()
pt$addData(bhmtrains) pt$addColumnDataGroups("TrainCategory")
pt$addRowDataGroups("TOC") pt$defineCalculation(calculationName="NumberOfTrains", caption="Number of Trains",
summariseExpression="n()")
pt$defineCalculation(calculationName="MaximumSpeedMPH", caption="Maximum Speed (MPH)", summariseExpression="max(SchedSpeedMPH, na.rm=TRUE)") pt$addRowCalculationGroups()
pt$renderPivot() ## Specifying calculations before row/column data groups In the examples above, the row/column groups were specified first and then the calculations. It is equally possible to specify the calculations first. The calculation names then form the first level of the row/column groups, e.g. library(pivottabler) pt <- PivotTable$new()
pt$addData(bhmtrains) pt$defineCalculation(calculationName="NumberOfTrains", caption="Number of Trains",
summariseExpression="n()")
pt$defineCalculation(calculationName="MaximumSpeedMPH", caption="Maximum Speed (MPH)", summariseExpression="max(SchedSpeedMPH, na.rm=TRUE)") pt$addColumnCalculationGroups()
pt$addColumnDataGroups("TrainCategory") pt$addRowDataGroups("TOC")
pt$renderPivot() Similarly, on rows: library(pivottabler) pt <- PivotTable$new()
pt$addData(bhmtrains) pt$addColumnDataGroups("TrainCategory")
pt$defineCalculation(calculationName="NumberOfTrains", caption="Number of Trains", summariseExpression="n()") pt$defineCalculation(calculationName="MaximumSpeedMPH", caption="Maximum Speed (MPH)",
summariseExpression="max(SchedSpeedMPH, na.rm=TRUE)")
pt$addRowCalculationGroups() pt$addColumnDataGroups("PowerType")
pt$addRowDataGroups("TOC") pt$renderPivot()

## Filtering data as part of a calculation

It is possible to specify additional/different filter criteria as part of a calculation definition. This additional criteria can either be in the form of a PivotFilters object or a PivotFiltersOverrides object.

For example, to calculate the percentage of trains of each category that each train company operated at weekends, for each cell in the pivot table:

1. Additionally filter the data to trains operating at weekends and count the number of trains at weekends,
2. Count the total number of trains (with no additional filter, i.e. irrespective of weekday or weekend),
3. Calculate the percentage.
library(dplyr)
library(lubridate)
library(pivottabler)

# get the date of each train and whether that date is a weekday or weekend
trains <- bhmtrains %>%
mutate(GbttDateTime=if_else(is.na(GbttArrival), GbttDeparture, GbttArrival),
DayNumber=wday(GbttDateTime),
WeekdayOrWeekend=ifelse(DayNumber %in% c(1,7), "Weekend", "Weekday"))

# render the pivot table
pt <- PivotTable$new() pt$addData(trains)
pt$addColumnDataGroups("TrainCategory") pt$addRowDataGroups("TOC")
weekendFilter <- PivotFilters$new(pt, variableName="WeekdayOrWeekend", values="Weekend") pt$defineCalculation(calculationName="WeekendTrains", summariseExpression="n()",
filters=weekendFilter, visible=FALSE)
pt$defineCalculation(calculationName="TotalTrains", summariseExpression="n()", visible=FALSE) pt$defineCalculation(calculationName="WeekendTrainsPercentage",
type="calculation", basedOn=c("WeekendTrains", "TotalTrains"),
format="%.1f %%",
calculationExpression="values$WeekendTrains/values$TotalTrains*100")
pt$renderPivot() See the Appendix: Calculations vignette for many more examples. ## % of Totals, Cumulative Sums, Running Differences, Rolling Averages, Ratios/Multiples Changing the filter criteria used in a cell enables many additional types of calculations. See the Appendix: Calculations vignette for examples. Example: % of row total: library(pivottabler) pt <- PivotTable$new()
pt$addData(bhmtrains) pt$addColumnDataGroups("TrainCategory")
pt$addRowDataGroups("TOC") pt$defineCalculation(calculationName="CountTrains", summariseExpression="n()",
caption="Count", visible=FALSE)
filterOverrides <- PivotFilterOverrides$new(pt, keepOnlyFiltersFor="TOC") pt$defineCalculation(calculationName="TOCTotalTrains", filters=filterOverrides,
summariseExpression="n()", caption="TOC Total", visible=FALSE)
pt$defineCalculation(calculationName="PercentageOfTOCTrains", type="calculation", basedOn=c("CountTrains", "TOCTotalTrains"), calculationExpression="values$CountTrains/values$TOCTotalTrains*100", format="%.1f %%", caption="% of TOC") pt$renderPivot()

See the Appendix: Calculations vignette for an explanation of the above.

## Formatting calculated values

Each cell in a pivot table has two values: A rawValue that is the result of the calculation in the cell. The rawValue is typically the same data type as the variable the calculation is based on, e.g. a sum() of numerical values will result in a numerical value. The formattedValue is the value that is displayed in the pivot table. The data type of the formattedValue is typically character.

The formatting of calculation results is specified by setting the format parameter when calling the defineCalculation function.

A number of different approaches to formatting are supported:

• If format is a text value, then pivottabler invokes base::sprintf() with the specified format.
• If format is a list, then pivottabler invokes base::format(), where the elements in the list become arguments in the function call.
• If format is an R function, then this is invoked for each value.
• If format is not specified, then base::as.character() is invoked to provide a default formatted value.

The above are the same approaches used when formatting data groups. Examples of the first two approaches above can be found in previous examples in this vignette. An example of the third approach and more details can be found in the Data Groups vignette.

Some formatting behaviour depends on the data type of the value being formatted. For details see the Appendix: Details vignette.

## Empty cells

By default, where no data exists (for a particular combination of row and column headers) pivottabler will leave the pivot table cell empty. Sometimes it is desirable to display a value in these cells. This can be specified in two ways in the defineCalculation() function - either by specifying a value for either the noDataValue or noDataCaption arguments. The differences between these two options are as follows:

Comparison noDataValue argument noDataCaption argument
Allowed Data Type(s) integer or numeric character
format argument applies Yes (will be formatted) No (will be displayed as-is)
Will be used in other calculations Yes No

If the requirement is only to display a different value when there is no data, then noDataCaption is the right choice. Both approaches are demonstrated below, where the Virgin Trains, Ordinary Passenger cell has no data, so the empty cell value/caption is shown.

### noDataValue Example

library(pivottabler)
pt <- PivotTable$new() pt$addData(bhmtrains)
pt$addColumnDataGroups("TrainCategory") pt$addRowDataGroups("TOC")
pt$defineCalculation(calculationName="TotalTrains", summariseExpression="n()", noDataValue=0) pt$renderPivot()

### noDataCaption Example

library(pivottabler)
pt <- PivotTable$new() pt$addData(bhmtrains)
pt$addColumnDataGroups("TrainCategory") pt$addRowDataGroups("TOC")
pt$defineCalculation(calculationName="TotalTrains", summariseExpression="n()", noDataCaption="-") pt$renderPivot()

## Using multiple data frames in the same pivot table

A pivot table can display data from multiple data frames. The following summarises the possible functionality:

• A pivot table can contain multiple calculations. Many of the examples above in this vignette illustrate this.
• Each calculation must be based on one data frame. This is specified as part of the arguments to the defineCalculation() function1.
• Defining multiple calculations allows data from multiple data frames to be displayed in the pivot table.

Important: When adding multiple data frames to a pivot table, the data frame columns used for the data groups (i.e. row/column headings) must be conformed, i.e.:

• The columns from the data frames placed on the row/column headings in the pivot table must be present in all of the data frames added to the pivot table.
• Those columns must have the same names in all of the data frames added to the pivot table.
• The data values used in those columns should be consistent (e.g. “England” must be “England” in all data frames added to the pivot table, not “England” in one data frame and “Eng”, “ENGLAND”, etc. in other data frames).

It is also worth noting that only the first data frame added to the pivot table is used when generating the row/column headings.

The example below illustrates using two data frames with a single pivot table:

library(pivottabler)
library(dplyr)

trains <- mutate(bhmtrains,
ArrivalDelta=difftime(ActualArrival, GbttArrival, units="mins"),
ArrivalDelay=ifelse(ArrivalDelta<0, 0, ArrivalDelta),
DelayedByMoreThan5Minutes=ifelse(ArrivalDelay>5,1,0)) %>%
select(TrainCategory, TOC, DelayedByMoreThan5Minutes)
# in this example, bhmtraindisruption is joined to bhmtrains
# so that the TrainCategory and TOC columns are present in both
# data frames added to the pivot table
cancellations <- bhmtraindisruption %>%
inner_join(bhmtrains, by="ServiceId") %>%
mutate(CancelledInBirmingham=ifelse(LastCancellationLocation=="BHM",1,0)) %>%
select(TrainCategory, TOC, CancelledInBirmingham)

# create the pivot table
pt <- PivotTable$new() pt$addData(trains, "trains")
pt$addData(cancellations, "cancellations") pt$addColumnDataGroups("TrainCategory")
pt$addRowDataGroups("TOC") pt$defineCalculation(calculationName="DelayedTrains", dataName="trains",
caption="Delayed",
summariseExpression="sum(DelayedByMoreThan5Minutes, na.rm=TRUE)")
pt$defineCalculation(calculationName="CancelledTrains", dataName="cancellations", caption="Cancelled", summariseExpression="sum(CancelledInBirmingham, na.rm=TRUE)") pt$renderPivot()

In the example above, the number of trains more than five minutes late is calculated from the trains data frame and the number of trains cancelled at Birmingham New Street is calculated from the cancellations data frame.

The following two methods of calculating cell values are more advanced and less commonly needed.

### Calculation Method 3: Custom calculation functions

A custom calculation function allows more complex calculation logic to be used. Such a function is invoked once for each cell in the body of the pivot table. Custom calculation functions always have the same arguments defined:

• pivotCalculator is a helper object that offers various methods to assist in performing calculations,
• netFilters contains the definitions of the filter criteria coming from the row and column headers in the pivot table,
• format provides the formatting definition - this is the same value specified in the defineCalculation() call,
• baseValues provides access to the results of other calculations in the calculation group,
• cell provides access to more details about the individual cell that is being calculated
• The cell argument is provided to support future scenarios so is not explained here.

For example, if we wish to examine the worst single day performance, we need to:

1. For each date, calculate the percentage of trains more than five minutes late,
2. Sort this list (of dates and percentages) into descending order (by percentage of trains more than five minutes late),
3. Display the top percentage value from this list.
library(pivottabler)
library(dplyr)
library(lubridate)

trains <- mutate(bhmtrains,
GbttDateTime=if_else(is.na(GbttArrival), GbttDeparture, GbttArrival),
GbttDate=make_date(year=year(GbttDateTime), month=month(GbttDateTime), day=day(GbttDateTime)),
GbttMonth=make_date(year=year(GbttDateTime), month=month(GbttDateTime), day=1),
ArrivalDelta=difftime(ActualArrival, GbttArrival, units="mins"),
ArrivalDelay=ifelse(ArrivalDelta<0, 0, ArrivalDelta),
DelayedByMoreThan5Minutes=ifelse(ArrivalDelay>5,1,0))

# custom calculation function
getWorstSingleDayPerformance <- function(pivotCalculator, netFilters, format, baseValues, cell) {
# get the data frame
trains <- pivotCalculator$getDataFrame("trains") # apply the TOC and month filters coming from the headers in the pivot table filteredTrains <- pivotCalculator$getFilteredDataFrame(trains, netFilters)
# calculate the percentage of trains more than five minutes late by date
dateSummary <- filteredTrains %>%
group_by(GbttDate) %>%
summarise(DelayedPercent = sum(DelayedByMoreThan5Minutes, na.rm=TRUE) / n() * 100) %>%
arrange(desc(DelayedPercent))
# top value
tv <- dateSummary$DelayedPercent[1] # build the return value value <- list() value$rawValue <- tv
value$formattedValue <- pivotCalculator$formatValue(tv, format=format)
return(value)
}

# create the pivot table
pt <- PivotTable$new() pt$addData(trains, "trains")
pt$addColumnDataGroups("GbttMonth", dataFormat=list(format="%B %Y")) pt$addRowDataGroups("TOC", totalCaption="All TOCs")
pt$defineCalculation(calculationName="WorstSingleDayDelay", format="%.1f %%", type="function", calculationFunction=getWorstSingleDayPerformance) pt$renderPivot()

The return value from the custom function must be a list containing the raw result value (i.e. unformatted, that is either integer or numeric data type) and a formatted value (that is character data type).

Using a custom calculation function also enables additional possibilities, e.g. including additional information in the formatted value, in this case the date of the worst single day performance (where the code changes compared to the example above are highlighted):

library(pivottabler)
library(dplyr)
library(lubridate)

trains <- mutate(bhmtrains,
GbttDateTime=if_else(is.na(GbttArrival), GbttDeparture, GbttArrival),
GbttDate=make_date(year=year(GbttDateTime), month=month(GbttDateTime), day=day(GbttDateTime)),
GbttMonth=make_date(year=year(GbttDateTime), month=month(GbttDateTime), day=1),
ArrivalDelta=difftime(ActualArrival, GbttArrival, units="mins"),
ArrivalDelay=ifelse(ArrivalDelta<0, 0, ArrivalDelta),
DelayedByMoreThan5Minutes=ifelse(ArrivalDelay>5,1,0))

# custom calculation function
getWorstSingleDayPerformance <- function(pivotCalculator, netFilters, format, baseValues, cell) {
# get the data frame
trains <- pivotCalculator$getDataFrame("trains") # apply the TOC and month filters coming from the headers in the pivot table filteredTrains <- pivotCalculator$getFilteredDataFrame(trains, netFilters)
# calculate the percentage of trains more than five minutes late by date
dateSummary <- filteredTrains %>%
group_by(GbttDate) %>%
summarise(DelayedPercent = sum(DelayedByMoreThan5Minutes, na.rm=TRUE) / n() * 100) %>%
arrange(desc(DelayedPercent))
# top value
tv <- dateSummary$DelayedPercent[1] date <- dateSummary$GbttDate[1]             #     <<  CODE CHANGE  <<
# build the return value
value <- list()
value$rawValue <- tv value$formattedValue <- paste0(format(      #     <<  CODE CHANGE (AND BELOW)  <<
date, format="%a %d"), ":  ", pivotCalculator$formatValue(tv, format=format)) return(value) } # create the pivot table pt <- PivotTable$new()
pt$addData(trains, "trains") pt$addColumnDataGroups("GbttMonth", dataFormat=list(format="%B %Y"))
pt$addRowDataGroups("TOC", totalCaption="All TOCs") pt$defineCalculation(calculationName="WorstSingleDayDelay", format="%.1f %%",
type="function", calculationFunction=getWorstSingleDayPerformance)
pt$renderPivot() Including two values in each cell somewhat reduces the readability however. ### Calculation Method 4: Showing a value (no calculation) With this approach, the pivot table performs little or no calculations. The values to display are predominantly calculated in R code before the pivot table is created. This pivot table is used primarily as a visualisation mechanism. Returning to the original simple example of the number of trains operated by each train operating company: library(pivottabler) pt <- PivotTable$new()
pt$addData(bhmtrains) pt$addColumnDataGroups("TrainCategory")
pt$addRowDataGroups("TOC") pt$defineCalculation(calculationName="TotalTrains", summariseExpression="n()")
pt$renderPivot() In the example above, pivottabler calculated the values in each pivot table cell. We can alternatively calculate the values explicitly in R code and instead just use the pivot table to display them: library(pivottabler) # perform the aggregation in R code explicitly trains <- bhmtrains %>% group_by(TrainCategory, TOC) %>% summarise(NumberOfTrains=n()) %>% ungroup() # a sample of the aggregated data head(trains) ## # A tibble: 6 x 3 ## TrainCategory TOC NumberOfTrains ## <fct> <fct> <int> ## 1 Express Passenger Arriva Trains Wales 3079 ## 2 Express Passenger CrossCountry 22865 ## 3 Express Passenger London Midland 14487 ## 4 Express Passenger Virgin Trains 8594 ## 5 Ordinary Passenger Arriva Trains Wales 830 ## 6 Ordinary Passenger CrossCountry 63 # display this pre-calculated data pt <- PivotTable$new()
pt$addData(trains) pt$addColumnDataGroups("TrainCategory")
pt$addRowDataGroups("TOC") pt$defineCalculation(calculationName="TotalTrains", type="value", valueName="NumberOfTrains")
pt$renderPivot() In the current version of pivottabler there is no way to explicitly pre-calculate the totals. Instead, two workarounds are possible. Either the totals can be hidden or a summarise expression can be specified to calculate the totals. Both of these examples are presented below. #### Hiding the totals library(pivottabler) # perform the aggregation in R code explicitly trains <- bhmtrains %>% group_by(TrainCategory, TOC) %>% summarise(NumberOfTrains=n()) %>% ungroup() # display this pre-calculated data pt <- PivotTable$new()
pt$addData(trains) pt$addColumnDataGroups("TrainCategory", addTotal=FALSE)   #  <<  *** CODE CHANGE ***  <<
pt$addRowDataGroups("TOC", addTotal=FALSE) # << *** CODE CHANGE *** << pt$defineCalculation(calculationName="TotalTrains", type="value", valueName="NumberOfTrains")
pt$renderPivot() #### Calculating the totals library(pivottabler) # perform the aggregation in R code explicitly trains <- bhmtrains %>% group_by(TrainCategory, TOC) %>% summarise(NumberOfTrains=n()) %>% ungroup() # display this pre-calculated data pt <- PivotTable$new()
pt$addData(trains) pt$addColumnDataGroups("TrainCategory")
pt$addRowDataGroups("TOC") pt$defineCalculation(calculationName="TotalTrains",  # <<  *** CODE CHANGE (AND BELOW) *** <<
type="value", valueName="NumberOfTrains",
summariseExpression="sum(NumberOfTrains)")
pt\$renderPivot()

## Performance considerations

The pivottabler package supports two different evaluation modes for computing cell values: batch and sequential. The batch evaluation mode offers much higher performance, especially for large pivot tables.

The pivottabler package is also able to use two different packages when carrying out summarising cell calculations (method 1 calculations as described above in this vignette): dplyr and data.table. data.table offers slightly higher performance when used with large data frames (over ten million rows) with batch evaluation mode, though has some quirks/odd behaviours when aggregating over columns that are also used in the row/column headings of the pivot table.

Please see the Performance for more information about pivottabler performance and data.table.

1. If the pivot table contains only one data frame, then specifying the data frame when calling defineCalculation() is not necessary.