Tidygeocoder provides a unified interface for performing both forward and reverse geocoding queries with a variety of geocoder services. In forward geocoding you provide an address to the geocoder service and you get latitude and longitude coordinates in return. In reverse geocoding you provide the latitude and longitude and the geocoder service will return that location’s address. In both cases, other data about the location can be provided by the geocoder service.
The geocode()
and geo()
functions are for forward geocoding while the reverse_geocode()
and reverse_geo()
functions perform reverse geocoding. The geocode()
and reverse_geocode()
functions extract either addresses (forward geocoding) or coordinates (reverse geocoding) from the input dataframe and pass this data to the geo()
and reverse_geo()
functions respectively which execute the geocoding queries. All extra arguments (...
) given to the geocode()
and passed to geo()
and extra arguments given to reverse_geocode()
are passed to reverse_geo()
.
library(tibble)
library(dplyr)
library(tidygeocoder)
<- tibble(singlelineaddress = c(
address_single "11 Wall St, NY, NY",
"600 Peachtree Street NE, Atlanta, Georgia"
))<- tribble(
address_components ~street, ~cty, ~st,
"11 Wall St", "NY", "NY",
"600 Peachtree Street NE", "Atlanta", "GA"
)
You can use the address
argument to specify single-line addresses. Note that when multiple addresses are provided, the batch geocoding functionality of the Census geocoder service is used. Additionally, verbose = TRUE
displays logs to the console.
<- address_single %>%
census_s1 geocode(address = singlelineaddress, method = "census", verbose = TRUE)
#> Number of Unique Addresses: 2
#> Executing batch geocoding...
#> Batch limit: 10,000
#> Passing 2 addresses to the census batch geocoder
#> Querying API URL: https://geocoding.geo.census.gov/geocoder/locations/addressbatch
#> Passing the following parameters to the API:
#> format : "json"
#> benchmark : "Public_AR_Current"
#> vintage : "Current_Current"
#> Query completed in: 0.9 seconds
#>
singlelineaddress | lat | long |
---|---|---|
11 Wall St, NY, NY | 40.70747 | -74.01122 |
600 Peachtree Street NE, Atlanta, Georgia | 33.77085 | -84.38505 |
Alternatively you can run the same query with the geo()
function by passing the address values from the dataframe directly. In either geo()
or geocode()
, the lat
and long
arguments are used to name the resulting latitude and longitude fields. Here the method
argument is used to specify the “osm” (Nominatim) geocoder service. Refer to the geo()
function documentation for the possible values of the method
argument.
<- geo(
osm_s1 address = address_single$singlelineaddress, method = "osm",
lat = latitude, long = longitude
)
address | latitude | longitude |
---|---|---|
11 Wall St, NY, NY | 40.70707 | -74.01117 |
600 Peachtree Street NE, Atlanta, Georgia | 33.77086 | -84.38614 |
Instead of single-line addresses, you can use any combination of the following arguments to specify your addresses: street
, city
, state
, county
, postalcode
, and country
.
<- address_components %>%
census_c1 geocode(street = street, city = cty, state = st, method = "census")
street | cty | st | lat | long |
---|---|---|---|---|
11 Wall St | NY | NY | 40.70747 | -74.01122 |
600 Peachtree Street NE | Atlanta | GA | 33.77085 | -84.38505 |
The cascade
method first tries to use one geocoder service and then again attempts to geocode addresses that were not found using a second geocoder service. By default it first uses the Census Geocoder and then OSM, but you can specify any two methods you want (in order) with the cascade_order
argument.
<- address_components %>%
addr_comp1 bind_rows(
tibble(
cty = c("Toronto", "Tokyo"),
country = c("Canada", "Japan")
)
)
<- addr_comp1 %>% geocode(
cascade1 street = street, state = st, city = cty,
country = country, method = "cascade"
)
street | cty | st | country | lat | long | geo_method |
---|---|---|---|---|---|---|
11 Wall St | NY | NY | NA | 40.70747 | -74.01122 | census |
600 Peachtree Street NE | Atlanta | GA | NA | 33.77085 | -84.38505 | census |
NA | Toronto | NA | Canada | 43.64655 | -79.41953 | osm |
NA | Tokyo | NA | Japan | 35.65870 | 139.40728 | osm |
To return the full geocoder service results (not just latitude and longitude), specify full_results = TRUE
. Additionally, for the Census geocoder you can get fields for geographies such as Census tracts by specifying return_type = 'geographies'
. Be sure to use full_results = TRUE
with return_type = 'geographies'
in order to allow the Census geography columns to be returned.
<- address_single %>% geocode(
census_full1 address = singlelineaddress,
method = "census", full_results = TRUE, return_type = "geographies"
)
singlelineaddress | lat | long | id | input_address | match_indicator | match_type | matched_address | tiger_line_id | tiger_side | state_fips | county_fips | census_tract | census_block |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
11 Wall St, NY, NY | 40.70747 | -74.01122 | 1 | 11 Wall St, NY, NY, , , | Match | Exact | 11 WALL ST, NEW YORK, NY, 10005 | 59659656 | R | 36 | 061 | 000700 | 1004 |
600 Peachtree Street NE, Atlanta, Georgia | 33.77085 | -84.38505 | 2 | 600 Peachtree Street NE, Atlanta, Georgia, , , | Match | Non_Exact | 600 PEACHTREE ST, ATLANTA, GA, 30308 | 17343689 | L | 13 | 121 | 001902 | 2003 |
As mentioned earlier, the geocode()
function passes addresses in dataframes to the geo()
function for geocoding so we can also directly use the geo()
function in a similar way:
<- geo("Salzburg, Austria", method = "osm", full_results = TRUE) %>%
salz select(-licence)
address | lat | long | place_id | osm_type | osm_id | boundingbox | display_name | class | type | importance | icon |
---|---|---|---|---|---|---|---|---|---|---|---|
Salzburg, Austria | 47.79813 | 13.04648 | 257918086 | relation | 86538 | 47.7512115, 47.8543925, 12.9856478, 13.1272842 | Salzburg, 5020, Österreich | boundary | administrative | 0.6854709 | https://nominatim.openstreetmap.org/ui/mapicons//poi_boundary_administrative.p.20.png |
For reverse geocoding you’ll use reverse_geocode()
instead of geocode()
and reverse_geo()
instead of geo()
. Note that the reverse geocoding functions are structured very similarly to the forward geocoding functions and share many of the same arguments (method
, limit
, full_results
, etc.). For reverse geocoding you will provide latitude and longitude coordinates as inputs and the location’s address will be returned by the geocoder service.
Below, the reverse_geocode()
function is used to geocode coordinates in a dataframe. The lat
and long
arguments specify the columns that contain the latitude and longitude data. The address
argument can be used to specify the single line address column name that is returned from the geocoder. Just as with forward geocoding, the method
argument is used to specify the geocoder service.
<- tibble(
lat_longs1 latitude = c(38.895865, 43.6534817),
longitude = c(-77.0307713, -79.3839347)
)
<- lat_longs1 %>%
rev1 reverse_geocode(lat = latitude, long = longitude, address = addr, method = "osm")
latitude | longitude | addr |
---|---|---|
38.89587 | -77.03077 | Freedom Plaza, 1455, Pennsylvania Avenue Northwest, Penn Quarter, Washington, District of Columbia, 20004, United States |
43.65348 | -79.38393 | Toronto City Hall, 100, Queen Street West, Financial District, Spadina—Fort York, Old Toronto, Toronto, Golden Horseshoe, Ontario, M5H 2N2, Canada |
The same query can also be performed by passing the latitude and longitudes directly to the reverse_geo()
function. Here we will use full_results = TRUE
so that the full results are returned (not just the single line address column).
<- reverse_geo(
rev2 lat = lat_longs1$latitude,
long = lat_longs1$longitude,
method = "osm",
full_results = TRUE
)
glimpse(rev2)
#> Rows: 2
#> Columns: 22
#> $ lat <dbl> 38.89587, 43.65348
#> $ long <dbl> -77.03077, -79.38393
#> $ address <chr> "Freedom Plaza, 1455, Pennsylvania Avenue Northwest, P…
#> $ place_id <int> 259487183, 137497520
#> $ licence <chr> "Data © OpenStreetMap contributors, ODbL 1.0. https://…
#> $ osm_type <chr> "relation", "way"
#> $ osm_id <int> 8060882, 198500761
#> $ osm_lat <chr> "38.895849999999996", "43.6536032"
#> $ osm_lon <chr> "-77.03077367444483", "-79.38400547469666"
#> $ tourism <chr> "Freedom Plaza", NA
#> $ house_number <chr> "1455", "100"
#> $ road <chr> "Pennsylvania Avenue Northwest", "Queen Street West"
#> $ quarter <chr> "Penn Quarter", "Spadina—Fort York"
#> $ city <chr> "Washington", "Old Toronto"
#> $ state <chr> "District of Columbia", "Ontario"
#> $ postcode <chr> "20004", "M5H 2N2"
#> $ country <chr> "United States", "Canada"
#> $ country_code <chr> "us", "ca"
#> $ boundingbox <list> [<"38.8956276", "38.896068", "-77.03182", "-77.029727…
#> $ amenity <chr> NA, "Toronto City Hall"
#> $ neighbourhood <chr> NA, "Financial District"
#> $ state_district <chr> NA, "Golden Horseshoe"
Only unique input data (either addresses or coordinates) is passed to geocoder services even if your data contains duplicates. NA and blank inputs are excluded from queries. Input latitudes and longitudes are also limited to the range of possible values. Below is an example of how duplicate and missing data is handled by tidygeocoder. As the console messages shows, only the two unique addresses are passed to the geocoder service.
# create a dataset with duplicate and NA addresses
<- address_single %>%
duplicate_addrs bind_rows(address_single) %>%
bind_rows(tibble(singlelineaddress = rep(NA, 3)))
<- duplicate_addrs %>%
duplicates_geocoded geocode(singlelineaddress, verbose = TRUE)
#> Number of Unique Addresses: 2
#> Executing single address geocoding...
#> Number of Unique Addresses: 1
#> Querying API URL: https://nominatim.openstreetmap.org/search
#> Passing the following parameters to the API:
#> q : "11 Wall St, NY, NY"
#> limit : "1"
#> format : "json"
#> HTTP Status Code: 200
#> Query completed in: 0.2 seconds
#> Total query time (including sleep): 1 seconds
#>
#> Number of Unique Addresses: 1
#> Querying API URL: https://nominatim.openstreetmap.org/search
#> Passing the following parameters to the API:
#> q : "600 Peachtree Street NE, Atlanta, Georgia"
#> limit : "1"
#> format : "json"
#> HTTP Status Code: 200
#> Query completed in: 0.1 seconds
#> Total query time (including sleep): 1 seconds
#>
singlelineaddress | lat | long |
---|---|---|
11 Wall St, NY, NY | 40.70707 | -74.01117 |
600 Peachtree Street NE, Atlanta, Georgia | 33.77086 | -84.38614 |
11 Wall St, NY, NY | 40.70707 | -74.01117 |
600 Peachtree Street NE, Atlanta, Georgia | 33.77086 | -84.38614 |
NA | NA | NA |
NA | NA | NA |
NA | NA | NA |
As shown above, duplicates will not be removed from your results by default. However, you can return only unique results by using unique_only = TRUE
. Note that passing unique_only = TRUE
to geocode()
or reverse_geocode()
will result in the original dataframe format (including column names) to be discarded in favor of the standard field names (ie. “address”, ‘lat, ’long’, etc.).
<- duplicate_addrs %>%
uniqueonly1 geocode(singlelineaddress, unique_only = TRUE)
address | lat | long |
---|---|---|
11 Wall St, NY, NY | 40.70707 | -74.01117 |
600 Peachtree Street NE, Atlanta, Georgia | 33.77086 | -84.38614 |
The limit
argument can be specified to allow multiple results (rows) per input if available. The maximum value for the limit
argument is often 100 for geocoder services. To use the default limit
value for the selected geocoder service you can use limit = NULL
which will prevent the limit parameter from being included in the query.
<- geo(
geo_limit c("Lima, Peru", "Cairo, Egypt"),
method = "osm",
limit = 3, full_results = TRUE
)
glimpse(geo_limit)
#> Rows: 5
#> Columns: 13
#> $ address <chr> "Lima, Peru", "Lima, Peru", "Lima, Peru", "Cairo, Egypt"…
#> $ lat <dbl> -12.06211, -12.20011, -11.99997, 30.04439, 30.03325
#> $ long <dbl> -77.03653, -76.28506, -76.83322, 31.23573, 31.56216
#> $ place_id <int> 258244064, 258655931, 258306819, 258758158, 298971141
#> $ licence <chr> "Data © OpenStreetMap contributors, ODbL 1.0. https://os…
#> $ osm_type <chr> "relation", "relation", "relation", "relation", "relatio…
#> $ osm_id <int> 1944756, 1944659, 1944670, 5466227, 4103336
#> $ boundingbox <list> [<"-12.0797663", "-12.0303496", "-77.0884555", "-77.001…
#> $ display_name <chr> "Lima, Perú", "Lima, Perú", "Lima, Perú", "القاهرة, محاف…
#> $ class <chr> "boundary", "boundary", "boundary", "place", "boundary"
#> $ type <chr> "administrative", "administrative", "administrative", "c…
#> $ importance <dbl> 0.7830015, 0.6119761, 0.5934835, 0.6960286, 0.4835559
#> $ icon <chr> "https://nominatim.openstreetmap.org/ui/mapicons//poi_bo…
To directly specify specific API parameters for a given method
you can use the custom_query
parameter. For example, the Nominatim (OSM) geocoder has a ‘polygon_geojson’ argument that can be used to return GeoJSON geometry content. To pass this parameter you can insert it with a named list using the custom_query
argument:
<- geo("Cairo, Egypt",
cairo_geo method = "osm", full_results = TRUE,
custom_query = list(polygon_geojson = 1), verbose = TRUE
)#> Number of Unique Addresses: 1
#> Querying API URL: https://nominatim.openstreetmap.org/search
#> Passing the following parameters to the API:
#> q : "Cairo, Egypt"
#> limit : "1"
#> polygon_geojson : "1"
#> format : "json"
#> HTTP Status Code: 200
#> Query completed in: 0.1 seconds
#> Total query time (including sleep): 1 seconds
#>
glimpse(cairo_geo)
#> Rows: 1
#> Columns: 15
#> $ address <chr> "Cairo, Egypt"
#> $ lat <dbl> 30.04439
#> $ long <dbl> 31.23573
#> $ place_id <int> 258758158
#> $ licence <chr> "Data © OpenStreetMap contributors, ODbL 1.0. htt…
#> $ osm_type <chr> "relation"
#> $ osm_id <int> 5466227
#> $ boundingbox <list> [<"29.7483062", "30.3209168", "31.2200331", "31.…
#> $ display_name <chr> "القاهرة, محافظة القاهرة, مصر"
#> $ class <chr> "place"
#> $ type <chr> "city"
#> $ importance <dbl> 0.6960286
#> $ icon <chr> "https://nominatim.openstreetmap.org/ui/mapicons/…
#> $ geojson.type <chr> "Polygon"
#> $ geojson.coordinates <list> [<array[1 x 119 x 2]>]
To test a query without sending any data to a geocoder service, you can use no_query = TRUE
(NA results are returned).
<- geo(c("Vancouver, Canada", "Las Vegas, NV"),
noquery1 no_query = TRUE,
method = "arcgis"
)#> Number of Unique Addresses: 2
#> Executing single address geocoding...
#> Number of Unique Addresses: 1
#> Querying API URL: https://geocode.arcgis.com/arcgis/rest/services/World/GeocodeServer/findAddressCandidates
#> Passing the following parameters to the API:
#> SingleLine : "Vancouver, Canada"
#> maxLocations : "1"
#> f : "json"
#> Number of Unique Addresses: 1
#> Querying API URL: https://geocode.arcgis.com/arcgis/rest/services/World/GeocodeServer/findAddressCandidates
#> Passing the following parameters to the API:
#> SingleLine : "Las Vegas, NV"
#> maxLocations : "1"
#> f : "json"
address | lat | long |
---|---|---|
Vancouver, Canada | NA | NA |
Las Vegas, NV | NA | NA |
Additional usage notes for the geocode()
, geo()
, reverse_geocode()
, and reverse_geo()
functions:
api_url
argument. Alternatively, the iq_region
and geocodio_v
arguments are helper functions for customizing the API URL.min_time
argument will default to a value based on the maximum query rate of the given geocoder service. If you are using a local Nominatim server or have a commercial geocoder plan that has less restrictive usage limits then you can manually set min_time
to a lower valuer (such as 0).mode
argument can be used to specify whether the batch geocoding or single address/coordinate geocoding should be used. By default batch geocoding will be used if available when more than one address or coordinate is provided (with some noted exceptions for slower batch geocoder services).return_addresses
and return_coords
parameters (for forward and reverse geocoding respectively) can be used to toggle whether the input addresses or coordinates are returned. Setting these parameters to FALSE
is necessary to use batch geocoding if limit
is greater than 1 or NULL.reverse_geocode()
and geocode()
functions, the return_input
argument can be used to toggle if the input dataset is included in the returned dataframe.