R package ropenaq

M. Salmon

2018-02-28

Introduction

This R package is aimed at accessing the openaq API. OpenAQ is a community of scientists, software developers, and lovers of open environmental data who are building an open, real-time database that provides programmatic and historical access to air quality data. See their website at https://openaq.org/ and see the API documentation at https://docs.openaq.org/. The package contains 5 functions that correspond to the 5 different types of query offered by the openaq API: cities, countries, latest, locations and measurements. The package uses the dplyr package: all output tables are data.frame (dplyr “tbl_df”) objects, that can be further processed and analysed.

What data can you get?

Via the API since November 2017 the API only provides access to the latest 90 days of OpenAQ data. The whole OpenAQ data can be accessed via Amazon S3. See this announcement. You can interact with Amazon S3 using the aws.s3 package and the maintainer of ropenaq plans to write tutorials about how to access OpenAQ data and will also keep the documentation of ropenaq up-to-date regarding data access changes.

Finding measurements availability

Three functions of the package allow to get lists of available information. Measurements are obtained from locations that are in cities that are in countries.

The aq_countries function

The aq_countries function allows to see for which countries information is available within the platform. It is the easiest function because it does not have any argument. The code for each country is its ISO 3166-1 alpha-2 code.

library("ropenaq")
countries_table <- aq_countries()
library("knitr")
kable(countries_table)
name code cities locations count
Andorra AD 2 3 16346
Argentina AR 1 4 14976
Australia AU 18 99 3421950
Austria AT 16 306 1521351
Bahrain BH 1 1 14724
Bangladesh BD 1 2 16523
Belgium BE 14 191 1280070
Bosnia and Herzegovina BA 8 17 715247
Brazil BR 72 119 2812094
Canada CA 11 165 2174471
Chile CL 138 113 4337881
China CN 21 74 547807
Colombia CO 1 1 15327
Croatia HR 16 49 260470
Czech Republic CZ 15 200 1344886
Denmark DK 7 25 187282
Ethiopia ET 1 2 21427
Finland FI 35 107 589845
France FR 134 1171 6743082
Germany DE 36 1026 7118364
Ghana GH 1 11 1595
Gibraltar GI 2 6 36099
Hong Kong HK 9 16 84882
Hungary HU 14 50 480631
India IN 62 171 7276092
Indonesia ID 2 3 37550
Ireland IE 11 26 90345
Israel IL 14 137 62225518
Italy IT 45 104 579467
Kosovo XK 1 1 14826
Kuwait KW 1 1 7251
Latvia LV 4 4 35927
Lithuania LT 8 17 100796
Luxembourg LU 3 7 73071
Macedonia, the Former Yugoslav Republic of MK 16 30 344980
Malta MT 4 4 46226
Mexico MX 5 95 1826197
Mongolia MN 25 40 2147519
Nepal NP 1 4 26318
Netherlands NL 68 112 5195292
Nigeria NG 1 1 2541
Norway NO 32 70 1155546
Peru PE 1 19 437327
Philippines PH 1 1 958
Poland PL 10 16 547921
Portugal PT 15 64 197578
Russian Federation RU 1 49 187117
Serbia RS 4 5 15823
Singapore SG 1 1 1275
Slovakia SK 8 38 385334
Slovenia SI 8 8 27183
South Africa ZA 1 11 189963
Spain ES 115 1066 8242089
Sri Lanka LK 1 1 2687
Sweden SE 3 13 203211
Switzerland CH 14 25 267841
Taiwan, Province of China TW 30 77 2938513
Thailand TH 33 64 2700224
Turkey TR 40 142 3900703
Uganda UG 1 1 7275
United Arab Emirates AE 1 1 1121
United Kingdom GB 112 162 5332550
United States US 747 1946 28129949
Viet Nam VN 2 3 34345
attr(countries_table, "meta")
#> # A tibble: 1 x 6
#>   name       license   website                   page limit found
#>   <fct>      <fct>     <fct>                    <int> <int> <int>
#> 1 openaq-api CC BY 4.0 https://docs.openaq.org/     1 10000    64
attr(countries_table, "timestamp")
#> # A tibble: 1 x 1
#>   queriedAt          
#>   <dttm>             
#> 1 2018-02-28 20:39:51

The aq_cities function

Using the aq_cities functions one can get all cities for which information is available within the platform. For each city, one gets the number of locations and the count of measures for the city, the URL encoded string, and the country it is in.

cities_table <- aq_cities()
kable(head(cities_table))
city country locations count cityURL
Escaldes-Engordany AD 2 16032 Escaldes-Engordany
unused AD 1 314 unused
Abu Dhabi AE 1 1121 Abu+Dhabi
Buenos Aires AR 4 14976 Buenos+Aires
Austria AT 174 121987 Austria
Amt der Niedersterreichischen Landesregierung AT 39 322499 Amt+der+Nieder%EF%BF%BDsterreichischen+Landesregierung

The optional country argument allows to do this for a given country instead of the whole world.

cities_tableIndia <- aq_cities(country="IN", limit = 10, page = 1)
kable(cities_tableIndia)
city country locations count cityURL
Delhi IN 35 1181981 Delhi
Rohtak IN 1 98711 Rohtak
Hyderabad IN 15 486472 Hyderabad
Gurgaon IN 1 152863 Gurgaon
Jaipur IN 6 202925 Jaipur
Nagpur IN 5 74415 Nagpur
Ahmedabad IN 2 62603 Ahmedabad
Pali IN 2 24504 Pali
Kolkata IN 7 169391 Kolkata
Visakhapatnam IN 8 212254 Visakhapatnam

If one inputs a country that is not in the platform (or misspells a code), then an error message is thrown.

#aq_cities(country="PANEM")

The aq_locations function

The aq_locations function has far more arguments than the first two functions. On can filter locations in a given country, city, location, for a given parameter (valid values are “pm25”, “pm10”, “so2”, “no2”, “o3”, “co” and “bc”), from a given date and/or up to a given date, for values between a minimum and a maximum, for a given circle outside a central point by the use of the latitude, longitude and radius arguments. In the output table one also gets URL encoded strings for the city and the location. Below are several examples.

Here we only look for locations with PM2.5 information in Chennai, India.

locations_chennai <- aq_locations(country = "IN", city = "Chennai", parameter = "pm25")
kable(locations_chennai)
location city country count sourceNames lastUpdated firstUpdated sourceName latitude longitude pm25 pm10 no2 so2 o3 co bc cityURL locationURL
Alandur Bus Depot Chennai IN 13224 CPCB 1519271100 1487450700 CPCB 12.99711 80.19151 TRUE FALSE FALSE FALSE FALSE FALSE FALSE Chennai Alandur+Bus+Depot
IIT Chennai IN 16204 CPCB 1519271100 1487442600 CPCB 12.99251 80.23745 TRUE FALSE FALSE FALSE FALSE FALSE FALSE Chennai IIT
Manali Chennai IN 19515 CPCB 1519271100 1487452500 CPCB 13.16454 80.26285 TRUE FALSE FALSE FALSE FALSE FALSE FALSE Chennai Manali
US Diplomatic Post: Chennai Chennai IN 17531 StateAir_Chennai 1519846200 1449869400 StateAir_Chennai 13.08784 80.27847 TRUE FALSE FALSE FALSE FALSE FALSE FALSE Chennai US+Diplomatic+Post%3A+Chennai

Getting measurements

Two functions allow to get data: aq_measurement and aq_latest. In both of them the arguments city and location needs to be given as URL encoded strings.

The aq_measurements function

The aq_measurements function has many arguments for getting a query specific to, say, a given parameter in a given location or for a given circle outside a central point by the use of the latitude, longitude and radius arguments. Below we get the PM2.5 measures for Delhi in India.

results_table <- aq_measurements(country = "IN", city = "Delhi", parameter = "pm25", limit = 10, page = 1)
kable(results_table)
location parameter value unit country city latitude longitude dateUTC dateLocal cityURL locationURL
US Diplomatic Post: New Delhi pm25 108 µg/m³ IN Delhi 28.63576 77.22445 2018-02-28 20:30:00 2018-03-01 02:00:00 Delhi US+Diplomatic+Post%3A+New+Delhi
US Diplomatic Post: New Delhi pm25 107 µg/m³ IN Delhi 28.63576 77.22445 2018-02-28 19:30:00 2018-03-01 01:00:00 Delhi US+Diplomatic+Post%3A+New+Delhi
US Diplomatic Post: New Delhi pm25 108 µg/m³ IN Delhi 28.63576 77.22445 2018-02-28 18:30:00 2018-03-01 00:00:00 Delhi US+Diplomatic+Post%3A+New+Delhi
US Diplomatic Post: New Delhi pm25 102 µg/m³ IN Delhi 28.63576 77.22445 2018-02-28 17:30:00 2018-02-28 23:00:00 Delhi US+Diplomatic+Post%3A+New+Delhi
US Diplomatic Post: New Delhi pm25 88 µg/m³ IN Delhi 28.63576 77.22445 2018-02-28 16:30:00 2018-02-28 22:00:00 Delhi US+Diplomatic+Post%3A+New+Delhi
US Diplomatic Post: New Delhi pm25 74 µg/m³ IN Delhi 28.63576 77.22445 2018-02-28 15:30:00 2018-02-28 21:00:00 Delhi US+Diplomatic+Post%3A+New+Delhi
US Diplomatic Post: New Delhi pm25 69 µg/m³ IN Delhi 28.63576 77.22445 2018-02-28 14:30:00 2018-02-28 20:00:00 Delhi US+Diplomatic+Post%3A+New+Delhi
US Diplomatic Post: New Delhi pm25 68 µg/m³ IN Delhi 28.63576 77.22445 2018-02-28 13:30:00 2018-02-28 19:00:00 Delhi US+Diplomatic+Post%3A+New+Delhi
US Diplomatic Post: New Delhi pm25 62 µg/m³ IN Delhi 28.63576 77.22445 2018-02-28 12:30:00 2018-02-28 18:00:00 Delhi US+Diplomatic+Post%3A+New+Delhi
US Diplomatic Post: New Delhi pm25 75 µg/m³ IN Delhi 28.63576 77.22445 2018-02-28 11:30:00 2018-02-28 17:00:00 Delhi US+Diplomatic+Post%3A+New+Delhi

One could also get all possible parameters in the same table.

The aq_latest function

This function gives a table with all newest measures for the locations that are chosen by the arguments. If all arguments are NULL, it gives all the newest measures for all locations. Below are the latest values for Hyderabad at the time this vignette was compiled.

tableLatest <- aq_latest(country="IN", city="Hyderabad")
kable(head(tableLatest))
location city country distance latitude longitude parameter value lastUpdated unit sourceName averagingPeriod_value averagingPeriod_unit cityURL locationURL
Bollaram Industrial Area Hyderabad IN NA NA NA co 420.0 2017-02-17 05:15:00 µg/m³ CPCB 0.25 hours Hyderabad Bollaram+Industrial+Area
Bollaram Industrial Area Hyderabad IN NA NA NA pm25 55.0 2017-02-17 05:15:00 µg/m³ CPCB 0.25 hours Hyderabad Bollaram+Industrial+Area
Bollaram Industrial Area Hyderabad IN NA NA NA pm10 137.0 2017-02-17 05:15:00 µg/m³ CPCB 0.25 hours Hyderabad Bollaram+Industrial+Area
Bollaram Industrial Area Hyderabad IN NA NA NA no2 16.2 2017-02-17 05:15:00 µg/m³ CPCB 0.25 hours Hyderabad Bollaram+Industrial+Area
Bollaram Industrial Area Hyderabad IN NA NA NA so2 16.8 2017-02-17 05:15:00 µg/m³ CPCB 0.25 hours Hyderabad Bollaram+Industrial+Area
Bollaram Industrial Area, Hyderabad - TSPCB Hyderabad IN NA NA NA pm25 72.0 2018-02-22 03:15:00 µg/m³ CPCB 0.25 hours Hyderabad Bollaram+Industrial+Area%2C+Hyderabad+-+TSPCB

Paging and limit

For all endpoints/functions, there a a limit and a page arguments, which indicate, respectively, how many results per page should be shown and which page should be queried. If you don’t enter the parameters by default all results for the query will be retrieved with async requests, but it might take a while nonetheless depending on the total number of results.

aq_measurements(city = "Delhi", parameter = "pm25")

Rate limiting

In October 2017 the API introduced a rate limit of 2,000 requests every 5 minutes. Please keep this in mind. In the case when the request receives a response status of 429 (too many requests), the package will wait 5 minutes.

Other packages of interest for getting air quality data