Model income tax and project

Hugh Parsonage


The functions model_income_tax and project are the core of the grattan package. Grattan applies them to the ATO’s 2% sample files to produce costings of changes to tax policy. The functions are both \(X^n \to X^n\). That is, they take a sample file and return a mutated sample file.

With the mutated sample file, the costing for that particular tax year is the weighted sum of the difference between the new_tax and the baseline_tax columns. We can also use the mutated sample file to perform distributional analysis, such as the average change in tax by taxable income percentile.

Since the input data consists of tax returns and the grattan package does not purport to generate inferences about the wider Australian population, these functions cannot (directly) analyse the effect of policies on households or on the wider population. For example, policies affecting welfare payments, changes to the tax settings of businesses or super funds, or changes which would tax people who do not currently file tax returns are not amenable to the kind of analysis these functions perform.

How to use model_income_tax

model_income_tax takes a sample file and returns a sample file under the settings given by the function arguments.

To start, let’s load the (minimal) packages we need. We’ll use the synthetic 2015-16 sample file contained in the suggested package taxstats1516. See ?install_taxstats for installation instructions. For future years, use the latest sample file from the ATO.


# Use the actual sample file if you've got it
s1516 <-
s1516[, WEIGHT := 50L]

This function is purely cosmetic.

#' @return Number formatted as dollar e.g. 30e3 => $30,000
dollar <- function (x, digits = 0) {
  nsmall <- digits
  commaz <- format(abs(x), nsmall = nsmall, trim = TRUE, big.mark = ",", 
                   scientific = FALSE, digits = 1L)
  if_else(x < 0, 
          paste0("\U2212","$", commaz),
          paste0("$", commaz))

All instances of model_income_tax have two mandatory arguments: sample_file and baseline_fy. These define the baseline_tax column in the result. When an argument is left as NULL, the new_tax column is calculated using the corresponding tax setting that applied in baseline_fy.

s1516 %>%
  model_income_tax(baseline_fy = "2015-16") %>%
  select_grep("tax$", "Taxable_Income") %>%  # just look at relevant cols
  head %>%
Taxable_Income baseline_tax new_tax
28849 2155 2155.29
210436 72060 72060.64
22285 426 426.15
58461 11592 11592.96
0 0 0.00
20078 0 0.00

Note that by default new_tax is a double precision vector, not rounded. You can use return. = to return rounded variables.

With the use of a simple function to test equality, we can see that new_tax is just the same as baseline_tax, as expected.

is_all_equal <- function(x, y) {
  if (is.integer(x) && is.integer(y)) {
    all(x == y)
  } else {
    isTRUE(all.equal(x, y))

s1516 %>%
  model_income_tax(baseline_fy = "2015-16", 
                   return. = "") %>%
  select_grep("tax$", "Taxable_Income") %T>%
  .[, stopifnot(is_all_equal(baseline_tax, new_tax))] %>%
  head %>%
Taxable_Income baseline_tax new_tax
28849 2155 2155
210436 72060 72060
22285 426 426
58461 11592 11592
0 0 0
20078 0 0

The choice of rounded, unrounded, or truncated values may be important for some analysis. For instance, tax liabilities are calculated using whole dollar amounts, so a truncated value may be appropriate when the values of new_tax for each row need to be very precise. Unrounded values may be important to determine changes in marginal tax rates. Rounded values may be the most appropriate choice for costings.

Changing ordinary tax parameters

You can change how the ‘ordinary tax’ is calculated by changing the arguments ordinary_tax_thresholds and ordinary_tax_rates. To replicate the 2015-16 tax scales, one would use.

Note that the temporary budget repair levy is not included by default, so I simulated it by topping up the $180,000 marginal tax rate. This simulation is imperfect because the small business tax offset does not offset levies. As a result, baseline_tax and new_tax are slightly different in s1516_no_changes. This is not a problem for tax years including and beyond 2018-19.

Changing Medicare levy parameters

The Medicare levy is more complex to calculate than ordinary income tax. There are parameters relating to two thresholds, as well as different thresholds for families and SAPTO-eligible individuals. Even the simplest modification require changes to multiple parameters. Warnings are emitted whenever parameters are not internally consistent.

Let’s try to increase the Medicare levy rate from 2% and 3%. Observe the warning messages.

## Warning: `medicare_levy_upper_threshold` was not specified, but its default value would be inconsistent with the parameters that were specified.
## Its value has been set to:
##  medicare_levy_upper_threshold = 30479
## Warning: `medicare_levy_upper_sapto_threshold` was not specified, but its default value would be inconsistent with the parameters that were specified.
## Its value has been set to:
##  medicare_levy_upper_sapto_threshold = 48197

Note the warning messsage says that the parameter has been changed. However, you should never tolerate the warning; instead, change the parameter to the suggested one (if you agree with the warning message’s advice).

Since there are many degrees of freedom, and since thresholds are generally the things that are actually contemplated when making changes, warnings will suggest changing thresholds over changes to the rate or taper if there is a conflict. Only when the thresholds have been manually selected and there is still a conflict is a change to the taper or rate suggested. For example, if we didn’t want to change the upper threshold, but keep it at its 2015-16 value of $26,670, we could insist:

## Warning: `medicare_levy_lower_threshold` was not specified, but its default value would be inconsistent with the parameters that were specified.
## Its value has been set to:
##  medicare_levy_lower_threshold = 18668

The warning still assumes the taper and rate are the same, but it can no longer suggest a change to the upper threshold (since we provided it), so it suggests a change to the lower threshold. Only once we exhaust the thresholds it can adjust does the warning message start to include changing the taper:

## Warning: `medicare_levy_taper` was not specified, but its default value would be inconsistent with the parameters that were specified.
## Its value has been set to:
##  medicare_levy_taper = 0.15

Changes to the Low Income Tax Offset

Here is a change to the LITO so that the maximum offset is $1000, rather than $445, with the 1.5% taper left as-is. Then we print the revenue foregone.

## [1] "-$4 billion"

How to use project

The function project takes a sample file and returns a sample file. The other mandatory argument is h, the number of integer years ahead of the sample file provided.

Thus, to get a forecast for the 2018-19 tax year:

s1819 <- project(s1516, h = 3L)

This uses the internal forecast methods. To specify specific forecast outcomes, you can use the wage.series and lf.series

Wage and labour series

To compare the tax collections under these different assumptions, one would use income_tax separately:

Currently there is no interface to using the upper or lower bounds of the labour force or wage price indices. If you wanted the 80% upper bound of the prediction interval for salary out to 2020-21, for instance, you would pass Sw_amt to excl_vars and manually inflate.

## [1] "$50,884"
## [1] "$51,648"
## [1] "$65,782"
## [1] "$66,544"

Combining the two

To cost a reduction in the capital gains tax discount from 50% to 25% over the four years from 2018-19, we would run

cgt_25pc_fwd_estimates <- 
  lapply(yr2fy(2019:2022), function(fy) {
    s1516 %>%
      project_to(to_fy = fy) %>%
                       cgt_discount_rate = 0.25) %>%
      .[, fy_year := fy]
  }) %>%

Note that this takes a few seconds, most of which is spent within project. We could improve the speed of this by caching the intermediate objects, either as objects in the environment or as files (say, .fst files). You should consider doing this when you find yourself running project many times – likely you are just repeating calculations.

cgt_25pc_fwd_estimates %>%
  mutate_ntile("Taxable_Income", n = 5L, keyby = "fy_year") %>%
  .[, delta := new_tax - baseline_tax] %>%
  .[, .(totDelta = sum(delta),
        avgDelta = mean(delta)),
    keyby = .(fy_year, Taxable_IncomeQuintile)] %>%
  # cosmetic
  .[, lapply(.SD, round), keyby = key(.)] %>%
fy_year Taxable_IncomeQuintile totDelta avgDelta
2018-19 1 0 0
2018-19 2 380973 7
2018-19 3 1686609 31
2018-19 4 3335203 62
2018-19 5 72763741 1349
2019-20 1 0 0
2019-20 2 398814 7
2019-20 3 1748443 32
2019-20 4 3452320 64
2019-20 5 76423334 1417
2020-21 1 0 0
2020-21 2 434734 8
2020-21 3 1767220 33
2020-21 4 3609380 67
2020-21 5 82787537 1535
2021-22 1 0 0
2021-22 2 455595 8
2021-22 3 1834093 34
2021-22 4 3694293 69
2021-22 5 86490642 1604

lito_multi for custom offsets

While model_income_tax cannot account for the future imagination of tax policy makers, the argument lito_multi does provide a powerful mechanism for handling complicated offsets. The argument, if provided, must be a list of two components x and y. These can be used to define an offset: for every (x_i, y_i) defined the value of the offset for a taxable income x_i must be y_i with the points in between interpolated linearly.

For example to simply mimic LITO in 2015-16:

## Empty data.table (0 rows) of 67 cols: Gender,age_range,Occ_code,Partner_status,Region,Lodgment_method...

Budget_... parameters

These were used to cost policies proposed in the 2018 Budget period by the Government and the Opposition. They’re unlikely to have much use except in reproducing past results.


The Seniors and Pensioner Tax Offset (SAPTO) can also be modified. To cost the abolition of SAPTO, one would use:

To model a change to lower the SAPTO threshold from $32,279 to $27,000:

To cost the proposal in Age of entitlement: age-based tax breaks (2016)

## [1] "$383 million"