Introduction to the raustats package

David Mitchell

2019-03-12

Introduction

The raustats package allows researchers to quickly search and download selected Australian Bureau of Statistics (ABS) and Reserve Bank of Australia (RBA) data in a programmatic and reproducible fashion. This facilitates seamless integration of ABS and/or RBA data into analysis projects, and enables automatic update to the latest available data.

Australian Bureau of Statistics

The Australian Bureau of Statistics (ABS) is Australia’s national statistical agency, providing trusted official statistics on a wide range of economic, social, population and environmental matters of importance to Australia. Key ABS statistical collections include:

Reserve Bank of Australia

The Reserve Bank of Australia (RBA) is Australia’s central bank. In addition to its legislative responsibilities, it collects and publishes statistics on money, credit, the Australian banking systems and other relevant economic metrics. Key RBA statistics include:

ABS & RBA data availability

The ABS and RBA currently make most, or all, of their statistics primarily available through Excel- and/or CSV-format spreadsheets. The former typically require some additional re-formatting to produce well-formatted (tidy) data.

The functions in this package help facilitate access to ABS and RBA statistics through R, returning data in long-format (tidy-like) tables.

The ABS is also developing a data access API: ABS.Stat, which is currently a Beta release and provides access to a subset of all ABS statistics. This package include functions to search and download data via the ABS.Stat API. Please note that as the API is still in development, changes in the API back-end may break the functions in this package.

Main features of the raustats package:

Quick-start guide

This section provides a quick-start guide on how to download ABS and RBA statistics.

First, load the library:

library(raustats)

Downloading ABS Catalogue Statistics

The abs_cat_stats function downloads ABS statistics by ABS Catalogue Number. For example, the latest Consumer Price Index (CPI) data set (ABS Catalogue no. 6401.0) can be downloaded with the following call:

To download the latest CPI data reported in Table 1 only (ABS groups Tables 1 and 2 together), simply supply a regular expression to the tables argument:

Alternatively, tables also accepts a regular expression that matches the table name as specified on the ABS data access page, as follows:

Statistics from previous releases can be accessed using the releases argument. A more detailed explanation of the abs_cat_stats function and further examples are provided below.

Downloading ABS data via ABS.Stat

The abs_stats function is used to download ABS statistics via the ABS.Stat API. For example, to download the latest CPI data series for ‘All Groups’ changes in prices across Australia and each of the eight capital cities, simply call:

The package includes additional functions to search for datasets and data series available on ABS.Stat. These are documented further below.

Downloading RBA data

The rba_stats function downloads statistics available on the RBA website. For example, the latest statistics covering the RBA’s assets and liabilities (RBA Statistical Table A1) can be downloaded with the following call:

A more detailed explanation of rba_stats and other RBA data access functions is provided below.

ABS statistics access functions

ABS Catalogue statistics functions

The ABS Catalogue statistics functions are split into core functions:

and several helper functions:

The helper functions are called by the core functions and should generally not need to be called directly by users—though there are some cases where these functions may be useful.

Finding available ABS Catalogue statistics

The ABS does not provide a consolidated searchable list of all current statistical collections available through the ABS Catalogue. A text search facility is available on the ABS website: www.abs.gov.au, but this package does not currently provide any functionality to access this facility. Instead, the abs_cat_cachelist data set contained in this package lists the more common ABS Catalogue statistics.

Accessing ABS Catalogue statistics with abs_cat_stats

As shown in the Quick Start section, the abs_cat_stats function provides easy access to ABS statistics by ABS Catalogue Number. The following examples demonstrate typical uses of the function and the various function arguments.

The simplest use of the abs_cat_stats function is to download all tables available in a specified ABS Catalogue series. The Quick Start illustrated how to download all CPI tables. The following example downloads the latest quarterly national accounts statistics (ABS Catalogue no. 5206.0):

The function returns a long-format (tidy) table with the following columns:

  • series_id – ABS series identifier
  • date – Date-format date
  • value – observation value
  • data_item_description – data item name and description
  • series_type – series type, generally one of: Original, Trend, Seasonally Adjusted
  • series_start – series start date
  • series_end – series end data
  • no_obs – number of series observations
  • unit – unit type (e.g. Percent, $ Millions, Index Numbers, Proportion)
  • data_type – data type (e.g. Derived)
  • freq (frequency) – Frequency (e.g. Annual, Quarterly, Monthly)
  • collection_month – collection month (integer)
  • catalogue_no – catalogue number (e.g. 5206.0, 6401.0)
  • publication_title – ABS publication title
  • table_no – ABS publication table number (integer)
  • table_title – ABS publication table title.
#>         date series_id value                    data_item_description
#> 1 1959-09-01 A2303155J  1100 Gross value of agricultural production ;
#> 2 1959-12-01 A2303155J  1084 Gross value of agricultural production ;
#> 3 1960-03-01 A2303155J  1071 Gross value of agricultural production ;
#> 4 1960-06-01 A2303155J  1061 Gross value of agricultural production ;
#> 5 1960-09-01 A2303155J  1069 Gross value of agricultural production ;
#> 6 1960-12-01 A2303155J  1095 Gross value of agricultural production ;
#>   series_type series_start series_end no_obs       unit data_type    freq
#> 1       Trend   1959-09-01 2018-12-01    238 $ Millions   DERIVED Quarter
#> 2       Trend   1959-09-01 2018-12-01    238 $ Millions   DERIVED Quarter
#> 3       Trend   1959-09-01 2018-12-01    238 $ Millions   DERIVED Quarter
#> 4       Trend   1959-09-01 2018-12-01    238 $ Millions   DERIVED Quarter
#> 5       Trend   1959-09-01 2018-12-01    238 $ Millions   DERIVED Quarter
#> 6       Trend   1959-09-01 2018-12-01    238 $ Millions   DERIVED Quarter
#>   collection_month catalogue_no
#> 1                3       5206.0
#> 2                3       5206.0
#> 3                3       5206.0
#> 4                3       5206.0
#> 5                3       5206.0
#> 6                3       5206.0
#>                                                        publication_title
#> 1 Australian National Accounts: National Income, Expenditure and Product
#> 2 Australian National Accounts: National Income, Expenditure and Product
#> 3 Australian National Accounts: National Income, Expenditure and Product
#> 4 Australian National Accounts: National Income, Expenditure and Product
#> 5 Australian National Accounts: National Income, Expenditure and Product
#> 6 Australian National Accounts: National Income, Expenditure and Product
#>   table_no                         table_title
#> 1       10 Agricultural Income, Current prices
#> 2       10 Agricultural Income, Current prices
#> 3       10 Agricultural Income, Current prices
#> 4       10 Agricultural Income, Current prices
#> 5       10 Agricultural Income, Current prices
#> 6       10 Agricultural Income, Current prices

The tables argument allows users to select the set of catalogue tables to be downloaded by specifying a regular expression to pattern match the ABS table names, as specified on the ABS web page for the specified Catalogue number—by default (tables = "all") the function automatically downloads all available tables from the specified catalogue number. The following sample code downloads Tables 1 and 2 from Catalogue no. 5206.0.

The same result may be achieved by specifying one or more regular expressions matching one or more table names. For example:

The releases argument enables users to download data from a specified release. By default, the function downloads the latest available data (i.e. releases="Latest"). The format is a date object or character string specifying the month and year of release. For example, the following sample code downloads Table 1 from the December 2017 release of the quarterly national accounts.

The releases argument accepts multiple elements, as per the following example, which downloads Table 1 from each of the December 2016 and 2017 quarter national accounts:

The abs_cat_stats function is designed to download both ABS time series spreadsheets (types="tss") and cross-section spreadsheets (types="css")—‘Data Cubes’ in ABS terminology. ABS time series spreadsheets generally have a standard format, with a single column for each series, several headers rows containing series metadata and a single row with the unique series identifier. ABS cross-section spreadsheet formats vary, depending on the number of dimensions (categories) available in the data set. These tables typically have multiple uniquely-identifying header rows.

Presently, the abs_cat_stats only includes functionality to download and process time series spreadsheets—functionality to handle ABS Data Cubes (cross-section spreadsheets, types="css") is planned to be added in future versions.

In the meantime, it is possible to download data cubes using a sequence of abs_cat_tablesabs_cat_downloadabs_cat_unzip and piping the result into a read_excel function call. The following example downloads ABS labour force table: LM1 - Labour force status by Age, Greater Capital City and Rest of State (ASGS), Marital status and Sex:

Finding available ABS Catalogue tables with abs_cat_tables

The abs_cat_tables function returns a list of all tables for one or more specified ABS Catalogue numbers. It can be used to identify the relevant table(s) to download. The following two examples return all available tables for the latest quarterly national accounts (5206.0) and CPI (6401.0), respectively.

#>   cat_no release
#> 1 5206.0  Latest
#> 2 5206.0  Latest
#> 3 5206.0  Latest
#> 4 5206.0  Latest
#> 5 5206.0  Latest
#> 6 5206.0  Latest
#>                                                                        item_name
#> 1                                      Table 1. Key National Accounts Aggregates
#> 2    Table 2. Expenditure on Gross Domestic Product (GDP), Chain volume measures
#> 3           Table 3. Expenditure on Gross Domestic Product (GDP), Current prices
#> 4      Table 4. Expenditure on Gross Domestic Product (GDP), Chain price indexes
#> 5 Table 5. Expenditure on Gross Domestic Product (GDP), Implicit price deflators
#> 6                  Table 6. Gross Value Added by Industry, Chain volume measures
#>   cat_no release
#> 1 6401.0  Latest
#> 2 6401.0  Latest
#> 3 6401.0  Latest
#> 4 6401.0  Latest
#> 5 6401.0  Latest
#> 6 6401.0  Latest
#>                                                                                                            item_name
#> 1                                              TABLES 1 and 2. CPI: All Groups, Index Numbers and Percentage Changes
#> 2        TABLES 3 and 4. CPI: Groups, Weighted Average of Eight Capital Cities, Index Numbers and Percentage Changes
#> 3                                                                TABLE 5. CPI: Groups, Index Numbers by Capital City
#> 4 TABLE 6. CPI: Group, Sub-group and Expenditure Class Contribution to Change in All Groups Indexes, by Capital City
#> 5                     TABLE 7. CPI: Group, Sub-group and Expenditure Class, Weighted Average of Eight Capital Cities
#> 6                                          TABLE 8. CPI: Analytical Series, Weighted Average of Eight Capital Cities

The abs_cat_tables also has three additional arguments: releases, types and include_urls. The releases argument returns the list of downloadable tables from the specified release—by default releases="Latest". Lists of available tables from earlier releases can be obtained by specifying the month and year of release, e.g.releases = "Jun 2017". The types argument enables users to specify which file types to include. Options are ‘tss’ – ABS Time Series Spreadsheets, ‘css’ – ABS Data Cubes, and ‘pub’ – ABS Publications. The default is types = c('tss', 'css'). The include_urls argument specifies whether or not to include the URLs of available data tables in the returned results. The default is include_urls=FALSE.

The following example, returns all quarterly national accounts tables in the September and December 2017 quarter releases, and includes the table URLs.

#>   cat_no  release
#> 1 5206.0 Sep 2017
#> 2 5206.0 Sep 2017
#> 3 5206.0 Sep 2017
#> 4 5206.0 Sep 2017
#> 5 5206.0 Sep 2017
#> 6 5206.0 Sep 2017
#>                                                                        item_name
#> 1                                      Table 1. Key National Accounts Aggregates
#> 2    Table 2. Expenditure on Gross Domestic Product (GDP), Chain volume measures
#> 3           Table 3. Expenditure on Gross Domestic Product (GDP), Current prices
#> 4      Table 4. Expenditure on Gross Domestic Product (GDP), Chain price indexes
#> 5 Table 5. Expenditure on Gross Domestic Product (GDP), Implicit price deflators
#> 6                  Table 6. Gross Value Added by Industry, Chain volume measures
#>                                                                                                                                                                                                          path_1
#> 1                       https://www.abs.gov.au/ausstats/meisubs.NSF/log?openagent&5206001_key_aggregates.xls&5206.0&Time%20Series%20Spreadsheet&CB59A5311E58AB4ECA258248000BC36E&0&Dec%202017&07.03.2018&Latest
#> 2          https://www.abs.gov.au/ausstats/meisubs.NSF/log?openagent&5206002_expenditure_volume_measures.xls&5206.0&Time%20Series%20Spreadsheet&697CFA1F6B8D30D4CA258248000BC423&0&Dec%202017&07.03.2018&Latest
#> 3            https://www.abs.gov.au/ausstats/meisubs.NSF/log?openagent&5206003_expenditure_current_price.xls&5206.0&Time%20Series%20Spreadsheet&ACD9B3B6AF33687CCA258248000BC4E1&0&Dec%202017&07.03.2018&Latest
#> 4            https://www.abs.gov.au/ausstats/meisubs.NSF/log?openagent&5206004_expenditure_price_indexes.xls&5206.0&Time%20Series%20Spreadsheet&ACF054F5EA215051CA258248000BC57E&0&Dec%202017&07.03.2018&Latest
#> 5 https://www.abs.gov.au/ausstats/meisubs.NSF/log?openagent&5206005_expenditure_implicit_price_deflators.xls&5206.0&Time%20Series%20Spreadsheet&45471559D72CC8EBCA258248000BC615&0&Dec%202017&07.03.2018&Latest
#> 6                         https://www.abs.gov.au/ausstats/meisubs.NSF/log?openagent&5206006_industry_gva.xls&5206.0&Time%20Series%20Spreadsheet&199CB9B4D0E73023CA258248000BC6B9&0&Dec%202017&07.03.2018&Latest
#>                                                                                                                                                                                                          path_2
#> 1                       https://www.abs.gov.au/ausstats/meisubs.NSF/log?openagent&5206001_key_aggregates.zip&5206.0&Time%20Series%20Spreadsheet&CB59A5311E58AB4ECA258248000BC36E&0&Dec%202017&07.03.2018&Latest
#> 2          https://www.abs.gov.au/ausstats/meisubs.NSF/log?openagent&5206002_expenditure_volume_measures.zip&5206.0&Time%20Series%20Spreadsheet&697CFA1F6B8D30D4CA258248000BC423&0&Dec%202017&07.03.2018&Latest
#> 3            https://www.abs.gov.au/ausstats/meisubs.NSF/log?openagent&5206003_expenditure_current_price.zip&5206.0&Time%20Series%20Spreadsheet&ACD9B3B6AF33687CCA258248000BC4E1&0&Dec%202017&07.03.2018&Latest
#> 4            https://www.abs.gov.au/ausstats/meisubs.NSF/log?openagent&5206004_expenditure_price_indexes.zip&5206.0&Time%20Series%20Spreadsheet&ACF054F5EA215051CA258248000BC57E&0&Dec%202017&07.03.2018&Latest
#> 5 https://www.abs.gov.au/ausstats/meisubs.NSF/log?openagent&5206005_expenditure_implicit_price_deflators.zip&5206.0&Time%20Series%20Spreadsheet&45471559D72CC8EBCA258248000BC615&0&Dec%202017&07.03.2018&Latest
#> 6                         https://www.abs.gov.au/ausstats/meisubs.NSF/log?openagent&5206006_industry_gva.zip&5206.0&Time%20Series%20Spreadsheet&199CB9B4D0E73023CA258248000BC6B9&0&Dec%202017&07.03.2018&Latest
#>                                                                      item_name.1
#> 1                                      Table 1. Key National Accounts Aggregates
#> 2    Table 2. Expenditure on Gross Domestic Product (GDP), Chain volume measures
#> 3           Table 3. Expenditure on Gross Domestic Product (GDP), Current prices
#> 4      Table 4. Expenditure on Gross Domestic Product (GDP), Chain price indexes
#> 5 Table 5. Expenditure on Gross Domestic Product (GDP), Implicit price deflators
#> 6                  Table 6. Gross Value Added by Industry, Chain volume measures
#>                                                                                                                                                                                                        path_1.1
#> 1                       https://www.abs.gov.au/ausstats/meisubs.NSF/log?openagent&5206001_key_aggregates.xls&5206.0&Time%20Series%20Spreadsheet&CE1F279684368599CA2581ED001C1FF1&0&Sep%202017&06.12.2017&Latest
#> 2          https://www.abs.gov.au/ausstats/meisubs.NSF/log?openagent&5206002_expenditure_volume_measures.xls&5206.0&Time%20Series%20Spreadsheet&BB9C6B38AED399F6CA2581ED001C20B7&0&Sep%202017&06.12.2017&Latest
#> 3            https://www.abs.gov.au/ausstats/meisubs.NSF/log?openagent&5206003_expenditure_current_price.xls&5206.0&Time%20Series%20Spreadsheet&F8493FB3A611F956CA2581ED001C2238&0&Sep%202017&06.12.2017&Latest
#> 4            https://www.abs.gov.au/ausstats/meisubs.NSF/log?openagent&5206004_expenditure_price_indexes.xls&5206.0&Time%20Series%20Spreadsheet&071D54249F63577ACA2581ED001C22D2&0&Sep%202017&06.12.2017&Latest
#> 5 https://www.abs.gov.au/ausstats/meisubs.NSF/log?openagent&5206005_expenditure_implicit_price_deflators.xls&5206.0&Time%20Series%20Spreadsheet&7E21DF087C402045CA2581ED001C2365&0&Sep%202017&06.12.2017&Latest
#> 6                         https://www.abs.gov.au/ausstats/meisubs.NSF/log?openagent&5206006_industry_gva.xls&5206.0&Time%20Series%20Spreadsheet&81D262BF7896E156CA2581ED001C240D&0&Sep%202017&06.12.2017&Latest
#>                                                                                                                                                                                                        path_2.1
#> 1                       https://www.abs.gov.au/ausstats/meisubs.NSF/log?openagent&5206001_key_aggregates.zip&5206.0&Time%20Series%20Spreadsheet&CE1F279684368599CA2581ED001C1FF1&0&Sep%202017&06.12.2017&Latest
#> 2          https://www.abs.gov.au/ausstats/meisubs.NSF/log?openagent&5206002_expenditure_volume_measures.zip&5206.0&Time%20Series%20Spreadsheet&BB9C6B38AED399F6CA2581ED001C20B7&0&Sep%202017&06.12.2017&Latest
#> 3            https://www.abs.gov.au/ausstats/meisubs.NSF/log?openagent&5206003_expenditure_current_price.zip&5206.0&Time%20Series%20Spreadsheet&F8493FB3A611F956CA2581ED001C2238&0&Sep%202017&06.12.2017&Latest
#> 4            https://www.abs.gov.au/ausstats/meisubs.NSF/log?openagent&5206004_expenditure_price_indexes.zip&5206.0&Time%20Series%20Spreadsheet&071D54249F63577ACA2581ED001C22D2&0&Sep%202017&06.12.2017&Latest
#> 5 https://www.abs.gov.au/ausstats/meisubs.NSF/log?openagent&5206005_expenditure_implicit_price_deflators.zip&5206.0&Time%20Series%20Spreadsheet&7E21DF087C402045CA2581ED001C2365&0&Sep%202017&06.12.2017&Latest
#> 6                         https://www.abs.gov.au/ausstats/meisubs.NSF/log?openagent&5206006_industry_gva.zip&5206.0&Time%20Series%20Spreadsheet&81D262BF7896E156CA2581ED001C240D&0&Sep%202017&06.12.2017&Latest

And the following example illustrates use of the abs_cat_tables to return all available downloadable Data Cubes for a non-time series collection—the Australian Statistical Geography Standard (ASGS) main structure classification and digital boundaries (Catalogue no. 1270.0.55.001).

#>          cat_no release
#> 1 1270.0.55.001  Latest
#> 2 1270.0.55.001  Latest
#> 3 1270.0.55.001  Latest
#> 4 1270.0.55.001  Latest
#> 5 1270.0.55.001  Latest
#> 6 1270.0.55.001  Latest
#>                                                        item_name
#> 1   New South Wales Mesh Blocks ASGS Edition 2016 in .csv Format
#> 2          Victoria Mesh Blocks ASGS Edition 2016 in .csv Format
#> 3        Queensland Mesh Blocks ASGS Edition 2016 in .csv Format
#> 4   South Australia Mesh Blocks ASGS Edition 2016 in .csv Format
#> 5 Western Australia Mesh Blocks ASGS Edition 2016 in .csv Format
#> 6          Tasmania Mesh Blocks ASGS Edition 2016 in .csv Format
#>                                                                                                                                                                                    path_1
#> 1 https://www.abs.gov.au/AUSSTATS/subscriber.nsf/log?openagent&1270055001_mb_2016_nsw_csv.zip&1270.0.55.001&Data%20Cubes&1FC672E70A77D52FCA257FED0013A0F7&0&July%202016&12.07.2016&Latest
#> 2 https://www.abs.gov.au/AUSSTATS/subscriber.nsf/log?openagent&1270055001_mb_2016_vic_csv.zip&1270.0.55.001&Data%20Cubes&F1EA82ECA7A762BCCA257FED0013A253&0&July%202016&12.07.2016&Latest
#> 3 https://www.abs.gov.au/AUSSTATS/subscriber.nsf/log?openagent&1270055001_mb_2016_qld_csv.zip&1270.0.55.001&Data%20Cubes&A6A81C7C2CE74FAACA257FED0013A344&0&July%202016&12.07.2016&Latest
#> 4  https://www.abs.gov.au/AUSSTATS/subscriber.nsf/log?openagent&1270055001_mb_2016_sa_csv.zip&1270.0.55.001&Data%20Cubes&5763C01CA9A3E566CA257FED0013A38D&0&July%202016&12.07.2016&Latest
#> 5  https://www.abs.gov.au/AUSSTATS/subscriber.nsf/log?openagent&1270055001_mb_2016_wa_csv.zip&1270.0.55.001&Data%20Cubes&6C293909851DCBFFCA257FED0013A3BF&0&July%202016&12.07.2016&Latest
#> 6 https://www.abs.gov.au/AUSSTATS/subscriber.nsf/log?openagent&1270055001_mb_2016_tas_csv.zip&1270.0.55.001&Data%20Cubes&A9B01B4DACD0BFEFCA257FED0013A3FC&0&July%202016&12.07.2016&Latest
#>                                                                             path_2
#> 1 New%20South%20Wales%20Mesh%20Blocks%20ASGS%20Edition%202016%20in%20.csv%20Format
#> 2            Victoria%20Mesh%20Blocks%20ASGS%20Edition%202016%20in%20.csv%20Format
#> 3          Queensland%20Mesh%20Blocks%20ASGS%20Edition%202016%20in%20.csv%20Format
#> 4   South%20Australia%20Mesh%20Blocks%20ASGS%20Edition%202016%20in%20.csv%20Format
#> 5 Western%20Australia%20Mesh%20Blocks%20ASGS%20Edition%202016%20in%20.csv%20Format
#> 6            Tasmania%20Mesh%20Blocks%20ASGS%20Edition%202016%20in%20.csv%20Format

Other ABS Catalogue helper functions

As already noted, there are several ABS Catalogue helper functions that are called by abs_cat_stats and abs_cat_tables—that download and parse the ABS Catalogue table files. The main ones are:

  • abs_cat_download
  • abs_cat_unzip
  • abs_read_tss

The following examples illustrate the use of these functions.

The abs_cat_download function downloads and saves ABS Catalogue tables from a supplied URL. It is called inside the abs_cat_stats and can be used directly to download one or more ABS Catalogue table files. It is most usefully used in conjunction with the abs_cat_tables function, as follows:

The abs_cat_unzip function extracts Excel files from compressed ABS zip archives (see example below). It uses the utils::unzip function (using some standard file locations). There are two arguments: files and exdir which have a similar meaning to the utils::unzip equivalent arguments. By default exdir = tempdir().

The abs_read_tss function extracts data from standard-formatted ABS Catalogue time series spreadsheets and returns it as a long-format (tidy) data frame. The next example shows use of the function to read Table 1 from the national accounts (Catalogue 5206.0).

ABS.Stat statistics access functions

The raustats package also includes a range of functions to list, search and download data sets and statistics available through ABS.Stat API. The following subsections outline the key functions.

Finding available data with abs_datasets

The abs_datasets function returns a list of all datasets available through ABS.Stat. The function has two arguments: lang (default is English: lang="en") and include_notes (default: include_notes=FALSE). The following example shows the results with notes included.

                            id agencyID

1 ATSI_BIRTHS_SUMM ABS 2 ATSI_FERTILITY ABS 3 ABS_ABORIGINAL_POPPROJ_INDREGION ABS 4 ABORIGINAL_POP_PROJ_REMOTE ABS 5 ABORIGINAL_POP_PROJ ABS 6 ALC ABS name 1 Aboriginal and Torres Strait Islander births and confinements, summary, by state 2 Aboriginal and Torres Strait Islander fertility, by age, by state 3 Aboriginal and Torres Strait Islander Population Projections by Indigenous Regions 4 Aboriginal and Torres Strait Islander Population Projections, Remoteness Area 5 Aboriginal and Torres Strait Islander Population Projections, State/Territory 6 Apparent Consumption of Alcohol, Australia

Cached list of available ABS.Stat datasets abs_cachelist

For performance, a cached list of datasets available through the ABS.Stat API is provided in the abs_cachelist data set included with raustats. abs_cachelist is the default source used in abs_search() and abs_stats() to find matching ABS datasets.

By default, abs_cachelist is in English. To search indicators in a different language, you can download an updated copy of abs_cachelist using abs_datasets() ans specifying a different language.

Checking dataset dimensions with abs_dimensions()

The abs_dimensions() functions lists the name of all available dimensions and the respective dimension type. Typical dimension types are: ‘Dimension’, ‘TimeDimension’ and ‘Attribute’. ‘Dimension’ attributes are used in the filter argument of abs_stats function. The following example lists the data dimensions of the ‘CPI’ dataset.

A list of all available dimension codes and descriptions for a particular dataset can be viewed by selecting the relevant dataset from abs_cachelist or an updated cache list returned by abs_cache.

Downloading data with abs_stats()

The abs_stats() function returns data from specified datasets available via the ABS.Stat API. The following section outlines typical use of the abs_stats() function, and also describes each of the core function arguments.

The following example downloads original All groups CPI index numbers for each of the eight Australian state and territory capital cities and also the average for all capital cities.

The filter conditions are:

  • MEASURE=1 – ‘Index Numbers’
  • REGION=c(1:8,50) – Each of the eight capital cities (1–8) and all eight capital cities (50)
  • INDEX=10001 – ‘All groups CPI’
  • TSEST=10 – ‘Original’ observations
  • FREQUENCY=Q – Quarterly observations
#>         measure region          index adjustment_type frequency     time
#> 1 Index Numbers Sydney All groups CPI        Original Quarterly Sep-1948
#> 2 Index Numbers Sydney All groups CPI        Original Quarterly Dec-1948
#> 3 Index Numbers Sydney All groups CPI        Original Quarterly Mar-1949
#> 4 Index Numbers Sydney All groups CPI        Original Quarterly Jun-1949
#> 5 Index Numbers Sydney All groups CPI        Original Quarterly Sep-1949
#> 6 Index Numbers Sydney All groups CPI        Original Quarterly Dec-1949
#>   values obs_status unknown agency_id                     agency_name
#> 1    3.7          0      NA       ABS Australian Bureau of Statistics
#> 2    3.7          0      NA       ABS Australian Bureau of Statistics
#> 3    3.9          0      NA       ABS Australian Bureau of Statistics
#> 4    3.9          0      NA       ABS Australian Bureau of Statistics
#> 5    4.0          0      NA       ABS Australian Bureau of Statistics
#> 6    4.1          0      NA       ABS Australian Bureau of Statistics
#>                             dataset_name
#> 1 Consumer Price Index (CPI) 17th Series
#> 2 Consumer Price Index (CPI) 17th Series
#> 3 Consumer Price Index (CPI) 17th Series
#> 4 Consumer Price Index (CPI) 17th Series
#> 5 Consumer Price Index (CPI) 17th Series
#> 6 Consumer Price Index (CPI) 17th Series

The filter argument can also be set equal to “all”, in which case the function will attempt to download all observations available for the specified dataset. However, if the dataset is large it may breach the ABS.Stat API query length, record and/or session time constraints.1 Queries that breach these limits will need to be re-specified as multiple separate calls to obtain the required data.

For example, the following abs_stats function call, attempts to download all series available for the CPI dataset, but the specified query length (1191 characters) exceeds maximum request URL character limit.

By default, abs_stats checks whether the query string length and the estimated number of records to be returned and will halt execution if the query breaches any of these conditions. Setting the enforce_api_limits = FALSE (default: TRUE) will ignore these checks and submit the query to the ABS.Stat API anyway—though this is not recommended.

Setting the return_url = TRUE (default: FALSE) will return the RESTful URL query string, but does not submit the query to the ABS.Stat API, see the following example function call and output.

The abs_search function can be used to specify the filter. For example, the following code block produces the same filter list, specified in the previous example, and can subsequently be supplied to the abs_stats filter argument.

Users can also limit the date range to return by specifying one or bothstart_date and end_date arguments. These accept either a numeric or character arguments—if numeric it must be a four-digit year. If a character string it can be either a monthly, quarterly, half-year or financial year as formatted as follows: month – ‘2016-M01’, quarter – ‘2016-Q1’, half-year – ‘2016-B2’, financial year – ‘2016-17’. The following example returns all CPI observations between September 2015 and June 2018.

The other arguments dimensionAtObservation and detail provide refinements to the URL query. These need not be modified by the user—the defaults are: dimensionAtObservation='AllDimensions and detail='Full'.

RBA statistics access functions

Finding available RBA data tables with rba_table_cache

The rba_table_cache function returns a dataset of all available RBA statistical tables. The function scans the RBA website and returns a list of all Statistical tables, Historical data tables and Discontinued data tables. (The rba_cachelist data set included in this package contains a pre-saved list of all available RBA statistical tables.)

The dataset has four columns:

The rba_table_cache function has no arguments. The following example shows use of the function and the returned output.

#>   table_no
#> 1       A1
#> 2       A1
#> 3       A2
#> 4       A3
#> 5       A4
#> 6       A5
#>                                                              table_name
#> 1                                      Liabilities and Assets - Summary
#> 2                                     Liabilities and Assets - Detailed
#> 3                                               Monetary Policy Changes
#> 4                                      Open Market Operations - Current
#> 5 Foreign Exchange Transactions and Holdings of Official Reserve Assets
#> 6               Daily Foreign Exchange Market Intervention Transactions
#>          table_type
#> 1 statistical table
#> 2 statistical table
#> 3 statistical table
#> 4 statistical table
#> 5 statistical table
#> 6 statistical table
#>                                                                  url
#> 1  https://www.rba.gov.au/statistics/tables/xls/a01whist-summary.xls
#> 2 https://www.rba.gov.au/statistics/tables/xls/a01whist-detailed.xls
#> 3           https://www.rba.gov.au/statistics/tables/xls/a02hist.xls
#> 4               https://www.rba.gov.au/statistics/tables/xls/a03.xls
#> 5           https://www.rba.gov.au/statistics/tables/xls/a04hist.xls
#> 6           https://www.rba.gov.au/statistics/tables/xls/a05hist.xls

Accessing RBA statistical tables with rba_stats()

As previously outlined in the Quick Start section, the rba_stats() function provides easy access to RBA statistical tables. The rba_stats() function has three mutually-exclusive table selection arguments: table_no, pattern, and url, for selecting RBA statistical tables. Specifying table_no selects tables matching the specified RBA table number, e.g. A1, B1, B11.1. Specifying pattern selects all RBA tables matching the regular expression specified in pattern. The url argument can be used to specify one or more valid RBA statistical table URLs.

The function returns a long-format (tidy) table with the following columns:

The following examples demonstrate typical uses of the function and the various function arguments. The first example downloads RBA Statistical Table A1 – Liabilities and Assets using table_no.

#>         date series_id value                                 title
#> 1 1994-06-01 ARBALCRFW   633 Capital and Reserve Bank Reserve Fund
#> 2 1994-06-08 ARBALCRFW   633 Capital and Reserve Bank Reserve Fund
#> 3 1994-06-15 ARBALCRFW   633 Capital and Reserve Bank Reserve Fund
#> 4 1994-06-22 ARBALCRFW   633 Capital and Reserve Bank Reserve Fund
#> 5 1994-06-29 ARBALCRFW   633 Capital and Reserve Bank Reserve Fund
#> 6 1994-07-06 ARBALCRFW   633 Capital and Reserve Bank Reserve Fund
#>                                                             description
#> 1 Capital and Reserve Bank Reserve Fund; see Notes for further details.
#> 2 Capital and Reserve Bank Reserve Fund; see Notes for further details.
#> 3 Capital and Reserve Bank Reserve Fund; see Notes for further details.
#> 4 Capital and Reserve Bank Reserve Fund; see Notes for further details.
#> 5 Capital and Reserve Bank Reserve Fund; see Notes for further details.
#> 6 Capital and Reserve Bank Reserve Fund; see Notes for further details.
#>   frequency     type     units source publication_date table_no
#> 1    Weekly Original $ million    RBA       2019-03-08       A1
#> 2    Weekly Original $ million    RBA       2019-03-08       A1
#> 3    Weekly Original $ million    RBA       2019-03-08       A1
#> 4    Weekly Original $ million    RBA       2019-03-08       A1
#> 5    Weekly Original $ million    RBA       2019-03-08       A1
#> 6    Weekly Original $ million    RBA       2019-03-08       A1
#>                                                     table_name
#> 1 RESERVE BANK OF AUSTRALIA - LIABILITIES AND ASSETS - SUMMARY
#> 2 RESERVE BANK OF AUSTRALIA - LIABILITIES AND ASSETS - SUMMARY
#> 3 RESERVE BANK OF AUSTRALIA - LIABILITIES AND ASSETS - SUMMARY
#> 4 RESERVE BANK OF AUSTRALIA - LIABILITIES AND ASSETS - SUMMARY
#> 5 RESERVE BANK OF AUSTRALIA - LIABILITIES AND ASSETS - SUMMARY
#> 6 RESERVE BANK OF AUSTRALIA - LIABILITIES AND ASSETS - SUMMARY

The second example downloads the same tables using the pattern argument to download tables matching the regular expression: ‘Liabilities and Assets.+A1’.

And the third example downloads the same tables using the relevant URLs. The example presented below first returns a list of all RBA tables matching the supplied regular expression (‘Liabilities and Assets.+A1’) and then uses the returned URLs to download each data set.

#>         date series_id value                                 title
#> 1 1994-06-01 ARBALCRFW   633 Capital and Reserve Bank Reserve Fund
#> 2 1994-06-08 ARBALCRFW   633 Capital and Reserve Bank Reserve Fund
#> 3 1994-06-15 ARBALCRFW   633 Capital and Reserve Bank Reserve Fund
#> 4 1994-06-22 ARBALCRFW   633 Capital and Reserve Bank Reserve Fund
#> 5 1994-06-29 ARBALCRFW   633 Capital and Reserve Bank Reserve Fund
#> 6 1994-07-06 ARBALCRFW   633 Capital and Reserve Bank Reserve Fund
#>                                                             description
#> 1 Capital and Reserve Bank Reserve Fund; see Notes for further details.
#> 2 Capital and Reserve Bank Reserve Fund; see Notes for further details.
#> 3 Capital and Reserve Bank Reserve Fund; see Notes for further details.
#> 4 Capital and Reserve Bank Reserve Fund; see Notes for further details.
#> 5 Capital and Reserve Bank Reserve Fund; see Notes for further details.
#> 6 Capital and Reserve Bank Reserve Fund; see Notes for further details.
#>   frequency     type     units source publication_date table_no
#> 1    Weekly Original $ million    RBA       2019-03-08       A1
#> 2    Weekly Original $ million    RBA       2019-03-08       A1
#> 3    Weekly Original $ million    RBA       2019-03-08       A1
#> 4    Weekly Original $ million    RBA       2019-03-08       A1
#> 5    Weekly Original $ million    RBA       2019-03-08       A1
#> 6    Weekly Original $ million    RBA       2019-03-08       A1
#>                                                     table_name
#> 1 RESERVE BANK OF AUSTRALIA - LIABILITIES AND ASSETS - SUMMARY
#> 2 RESERVE BANK OF AUSTRALIA - LIABILITIES AND ASSETS - SUMMARY
#> 3 RESERVE BANK OF AUSTRALIA - LIABILITIES AND ASSETS - SUMMARY
#> 4 RESERVE BANK OF AUSTRALIA - LIABILITIES AND ASSETS - SUMMARY
#> 5 RESERVE BANK OF AUSTRALIA - LIABILITIES AND ASSETS - SUMMARY
#> 6 RESERVE BANK OF AUSTRALIA - LIABILITIES AND ASSETS - SUMMARY

Again, the update_cache argument specifies whether the function uses the list of RBA tables supplied with the package with the package rba_cachelist (the default) or to update list of tables from the RBA website. The rba_stats function also optionally accepts rba_search arguments: fields and ignore.case, for informing pattern matching.

At present, the rba_stats function only handles Statistical tables, which have a consistently structured format comprising full metadata and observations. Functionality to handle Historical data tables and Discontinued data tables are in progress.

Other RBA helper functions

There are also two RBA helper functions that are called by rba_stats—that download and parse the RBA statistical tables. These additional functions should not ordinarily need to be called directly, but there may be situations in which users wish to access these functions directly. The functions are:

The following examples illustrate the use of these functions.

The rba_download function downloads and saves RBA tables from a supplied URL, and returns a character vector of the saved local filenames. It is called inside the rba_stats function and can be used to directly download one or more RBA statistical table files. It is most usefully used in conjunction with the rba_search function.

The rba_read_tss function extracts data from standard-format RBA statistical tables and returns it as a long-format (tidy) data frame. It is also called by the rba_stats function. The following call will extract data from the previously downloaded tables.

Using data returned by raustats

Statistics returned by the raustats package functions are generally in long (tidy) format data frames. This provides for easy integration with packages like ggplot2, tidyr and dplyr. The following example illustrates how data downloaded using the raustats can be easily transformed and plotted. This example uses data from ABS’ Private New Capital Expenditure and Expected Expenditure (ABS Catalogue no. 5265.0).

First download selected tables from Catalogue no. 5265.0.

capex_q <-
  abs_cat_stats("5625.0",
                tables=c("Actual Expenditure by Type of Asset and Industry - Current Prices",
                         "Actual Expenditure, By Type of Industry - Chain Volume Measures",
                         "Actual and Expected Capital Expenditure by Industry.+:Current Prices"))

Then add a new variable denoting Australian state/territory.

library(dplyr)
## Add state/territory variable
capex_q <- capex_q %>%
  mutate(state = sub(sprintf(".*(%s).*",
                             paste(c("New South Wales","Victoria","Queensland","South Australia",
                                     "Western Australia","Tasmania","Northern Territory",
                                     "Australian Capital Territory","Total \\(State\\)"),
                                   collapse="|")),
                     "\\1", data_item_description, ignore.case=TRUE))

Finally, plot quarterly time series mining sector capital expenditure (at current prices) by Australian state and territory using ggplot.

library(ggplot2)
## Filter mining capital expenditure
capex_q_min <- capex_q %>%
  filter(grepl("mining", data_item_description, ignore.case=TRUE)) %>%
  filter(grepl("actual", data_item_description, ignore.case=TRUE)) %>%
  filter(grepl("current price", data_item_description, ignore.case=TRUE)) %>%
  filter(grepl("Total \\(Type of Asset.+\\)", data_item_description, ignore.case=TRUE))

ggplot(data=capex_q_min) +
  geom_line(aes(x=date, y=value/10^3, colour=state)) +
  scale_x_date(date_labels="%b\n%Y") +
  scale_y_continuous(limits=c(0, NA)) +
  labs(title="Australian mining sector capital expenditure, by state",
       y="Capital expenditure ($ billion)", x=NULL) +
  guides(colour = guide_legend(title=NULL)) + 
  theme(plot.title = element_text(hjust=0.5),
        legend.box = "horizontal",
        legend.position = "bottom",
        axis.text.x=element_text(angle=0, size=8))

Unresolved issues

There are a few behaviours of the ABS.Stat API that may help explain any unexpected results. As the ABS.Stat API is still in Beta release and subject to revision, some of these issues will be addressed in future versions of the raustats package, as ABS.Stat transitions towards a stable release version.

Searching in other languages

The ABS.Stat API (Beta version) includes scope to cater to multiple languages. However, at the time of writing, French appears to be the only other language included in the data sets. Moreover, the text for many records that are denoted as French appear to be in English. The ABS API calling functions included in this library allow users to specify their preferred language, but this functionality has been little tested to date.

Query, data and session constraints

As already noted, the ABS.Stat API has a query string length limit (1000 characters) and record return limit (1 million records). We have not tested how rigorously these limits are enforced. Accordingly, the abs_stats function includes the enforce_api_limits argument to catch these limits before the query is submitted. The argument also allows the user to override the limits and submit the query anyway. This argument may be deprecated in future versions.

The ABS.Stat API also has a 10-minute session time limit for users to download datasets via the SDMX service. The functions in raustats almost exclusively use the SDMX-JSON query interface, so it is not clear whether the session time limit applies. If time limits are an issue, the ABS advises users to submit multiple smaller requests to retrieve the required data.

Resources


  1. ABS.Stat has a query string length limit (maximum of 1000 characters) on URL queries, a record return limit (1 million records) and session time limits (maximum 10-minute session time limit). (See the ABS.Stat Web Services User Guide FAQ for further details.)