Geocoding and Reverse Geocoding Services are widely used to provide data about coordinate and location information, including longitude, latitude, formatted location name, administrative region with different levels. There are some package can provide geocode service such as tidygeocoder, baidumap and baidugeo. However, some of them not always provide precise information in China, and some of them is unavailable with upgrade backend API.
amapGeocode is built to provide high precise geocoding and reverse geocoding service, and it provides an interface for the AutoNavi(高德) Maps API geocoding services. API docs can be found here and here. Here are two main functions to use are getCoord()
which takes a character location name as an input and getLocation()
which takes two numeric longitude and latitude values as inputs.
The getCoord()
function extracts coordinate information from input character location name and output the result as data.table
, XML
or JSON (as list)
. And the getLocation()
function extracts location information from input numeric longitude and latitude values and output the result as data.table
, XML
or JSON (as list)
. With the data.table
format as output, it’s highly readable and can be used as an alternative of data.frame
amapGeocode is inspired by baidumap and baidugeo. If you want to choose the Baidu Map API, these packages are good choices.
However, AutoNavi has significant high precise, in my case, the Results from Baidu were unsatisfactory.
Since v0.5
, parallel operation finally come to amapGeocode
with the parallel
package as the backend. There is a really huge performance improvement for batch queries. Here is a demo from my PC with below specification.
- CPU: AMD Ryzen 3600 @ 3.6GHz (6 cores with 12 threads)
- RAM: 32GB DDR4 2933MHz
- System Disk: Sandisk Ultra 3D NVME aka. WD SN550 with 1TB
- OS: Windows 10 Pro @ Insider Fast Ring (Build 20257.1)
- Internet: CMCC @ 200Mbps from Chengdu, Sichuan, China
library(amapGeocode)
library(readr)
sample_site <- read_csv("https://gist.githubusercontent.com/womeimingzi11/0fa3f4744f3ebc0f4484a52649f556e5/raw/47a69157f3e26c4d3bc993f3715b9ba88cda9d93/sample_site.csv")
str(sample_site)
#> tibble [300 x 1] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
#> $ address: chr [1:300] "四川省 绵阳市 游仙区 四川省绵阳市游仙区" "四川省 自贡市 自流井区 自流井区五星街珍珠寺新森3组" "四川省 绵阳市 游仙区 游仙区" "四川省 眉山市 仁寿县 向家乡平安村1组" ...
#> - attr(*, "spec")=
#> .. cols(
#> .. address = col_character()
#> .. )
# Here is the old implement
start_time <- proc.time()
old <- lapply(sample_site$address, amapGeocode:::getCoord.individual)
proc.time() - start_time
#> user system elapsed
#> 2.80 0.33 76.83
# Here is the new implement
start_time <- proc.time()
new <- getCoord(sample_site$address)
proc.time() - start_time
#> user system elapsed
#> 0.03 0.12 8.09
Around 8-10 TIMES FASTER with 300 records.
All you need to do is upgrade amapGeocode
to the latest version without changing any code!
While parallel support is a totally threads depending operation, so you will get completely different speed on different devices.
You can install the released version of amapGeocode from CRAN with:
To install the development version, run following command:
Before start geocoding and reverse geocoding, please apply a AutoNavi Map API Key. Set amap_key
globally by following command:
Then get result of geocoding, by getCoord
function.
lng | lat | formatted_address | country | province | city | district | township | street | number | citycode | adcode |
---|---|---|---|---|---|---|---|---|---|---|---|
104.0431 | 30.6678 | 四川省成都市金牛区四川省中医院 | 中国 | 四川省 | 成都市 | 金牛区 | NA | NA | NA | 028 | 510106 |
lng | lat | formatted_address | country | province | city | district | township | street | number | citycode | adcode |
---|---|---|---|---|---|---|---|---|---|---|---|
104.0431 | 30.66780 | 四川省成都市金牛区四川省中医院 | 中国 | 四川省 | 成都市 | 金牛区 | NA | NA | NA | 028 | 510106 |
104.0390 | 30.66362 | 四川省成都市青羊区四川省人民医院 | 中国 | 四川省 | 成都市 | 青羊区 | NA | NA | NA | 028 | 510105 |
104.0439 | 30.66629 | 四川省成都市金牛区成都中医药大学十二桥校区 | 中国 | 四川省 | 成都市 | 金牛区 | NA | NA | NA | 028 | 510106 |
The response we get from AutoNavi Map API is JSON or XML. For readability, we transform them to data.table
, by setting to_table
argument as TRUE
by default.
If anyone want to get response as JSON or XML, just set to_table = FALSE
. If anyone want to extract information from JSON or XML. The result can further be parsed by extractCoord
.
# An individual request
res <- getCoord("成都中医药大学", output = "XML", to_table = FALSE)
res
#> {xml_document}
#> <response>
#> [1] <status>1</status>
#> [2] <info>OK</info>
#> [3] <infocode>10000</infocode>
#> [4] <count>1</count>
#> [5] <geocodes type="list">\n <geocode>\n <formatted_address>四川省成都市金牛区成都中医 ...
extractCoord
is created to get a result as a data.table.
lng | lat | formatted_address | country | province | city | district | township | street | number | citycode | adcode |
---|---|---|---|---|---|---|---|---|---|---|---|
104.0433 | 30.66686 | 四川省成都市金牛区成都中医药大学 | 中国 | 四川省 | 成都市 | 金牛区 | NA | NA | NA | 028 | 510106 |
get result of reverse geocoding, by getLocation
function.
formatted_address | country | province | city | district | township | citycode | towncode |
---|---|---|---|---|---|---|---|
四川省成都市金牛区西安路街道成都中医药大学附属医院腹泻门诊成都中医药大学(十二桥校区) | 中国 | 四川省 | 成都市 | 金牛区 | 西安路街道 | 028 | 510106024000 |
res <- getLocation(104.043284, 30.666864, output = "XML", to_table = FALSE)
res
#> {xml_document}
#> <response>
#> [1] <status>1</status>
#> [2] <info>OK</info>
#> [3] <infocode>10000</infocode>
#> [4] <regeocode>\n <formatted_address>四川省成都市金牛区西安路街道成都中医药大学附属医院腹泻门诊成都中医药大学(十二 ...
extractLocation
is created to get a result as a data.table.
formatted_address | country | province | city | district | township | citycode | towncode |
---|---|---|---|---|---|---|---|
四川省成都市金牛区西安路街道成都中医药大学附属医院腹泻门诊成都中医药大学(十二桥校区) | 中国 | 四川省 | 成都市 | 金牛区 | 西安路街道 | 028 | 510106024000 |
get result of reverse geocoding, by getAdmin
function.
There is a difference between getAdmin and other function, no matter the to_table
argument is TRUE
or FALSE
the result won’t be a jointed table by different parent administrative region. For example, with the to_table = TRUE
, all the lower level administrative region of Province A and Province B will be binded as one data.table, respectively. But the table of province A and table of province B won’t be binded further.
This is becasue, this function support different administrative region levels, bind their result is nonsense.
|
|
|
res <- getAdmin("四川省", output = "XML", to_table = FALSE)
res
#> {xml_document}
#> <response>
#> [1] <status>1</status>
#> [2] <info>OK</info>
#> [3] <infocode>10000</infocode>
#> [4] <count>1</count>
#> [5] <suggestion>\n <keywords type="list"/>\n <cities type="list"/>\n</sugge ...
#> [6] <districts type="list">\n <district>\n <citycode/>\n <adcode>51000 ...
extractAdmin
is created to get result as tibble.
res
#> {xml_document}
#> <response>
#> [1] <status>1</status>
#> [2] <info>OK</info>
#> [3] <infocode>10000</infocode>
#> [4] <count>1</count>
#> [5] <suggestion>\n <keywords type="list"/>\n <cities type="list"/>\n</sugge ...
#> [6] <districts type="list">\n <district>\n <citycode/>\n <adcode>51000 ...
tb <- extractAdmin(res)
knitr::kable(tb)
lng | lat | name | level | citycode | adcode |
---|---|---|---|---|---|
105.8298 | 32.43367 | 广元市 | city | 0839 | 510800 |
104.0657 | 30.65946 | 成都市 | city | 028 | 510100 |
104.7417 | 31.46402 | 绵阳市 | city | 0816 | 510700 |
106.6334 | 30.45640 | 广安市 | city | 0826 | 511600 |
104.3987 | 31.12799 | 德阳市 | city | 0838 | 510600 |
106.7537 | 31.85881 | 巴中市 | city | 0827 | 511900 |
106.0830 | 30.79528 | 南充市 | city | 0817 | 511300 |
104.7734 | 29.35277 | 自贡市 | city | 0813 | 510300 |
105.4433 | 28.88914 | 泸州市 | city | 0830 | 510500 |
104.6419 | 30.12221 | 资阳市 | city | 0832 | 512000 |
103.8318 | 30.04832 | 眉山市 | city | 1833 | 511400 |
103.7613 | 29.58202 | 乐山市 | city | 0833 | 511100 |
104.6308 | 28.76019 | 宜宾市 | city | 0831 | 511500 |
105.5713 | 30.51331 | 遂宁市 | city | 0825 | 510900 |
105.0661 | 29.58708 | 内江市 | city | 1832 | 511000 |
107.5023 | 31.20948 | 达州市 | city | 0818 | 511700 |
102.2587 | 27.88676 | 凉山彝族自治州 | city | 0834 | 513400 |
101.7160 | 26.58045 | 攀枝花市 | city | 0812 | 510400 |
103.0010 | 29.98772 | 雅安市 | city | 0835 | 511800 |
102.2214 | 31.89979 | 阿坝藏族羌族自治州 | city | 0837 | 513200 |
101.9638 | 30.05066 | 甘孜藏族自治州 | city | 0836 | 513300 |
get result of reverse geocoding, by convertCoord
function, here is how to convert coordinate from gps to AutoNavi.
Please not, this is still a very experimental function because I have no experience at converting coordinates. The implementation of this input method is not as delicate as I expect. If you have any good idea, please let me know or just fork repo and pull a reques.
lng | lat |
---|---|
116.4876 | 39.99175 |
res <- convertCoord("116.481499,39.990475", coordsys = "gps", to_table = FALSE)
res
#> $status
#> [1] "1"
#>
#> $info
#> [1] "ok"
#>
#> $infocode
#> [1] "10000"
#>
#> $locations
#> [1] "116.487585177952,39.991754014757"
extractConvertCoord
is created to get result as data.table.
lng | lat |
---|---|
116.4876 | 39.99175 |
For more functions and improvements, Coming Soon!
Yes! Feel free to input a list, a vector and a column of a table as what you do in other packages.
Yes! The parallel operation is automatic.
However, because more testing is needed, there may be some potential problems. Feel free to open an Issue!
Unfortunately, there is no plan to add internal parallel support to amapGeocode. Here are some reasons:
1. The aim of amapGeocode is to create a package which is easy to use. Indeed, the parallel operation can make many times performance improvement, especially there are half million queries. However, the parallel operation often platform limited, I don’t have enough time and machine to test on different platforms. In fact even in macOS, the system I’m relatively familiar with, I have already encountered a lot of weird parallel issues and I don’t have the intention or the experience to fix them.
2. The queries limitation. For most of free users or developers, the daily query limitation and queries per second is absolutely enough: 30,000 queries per day and 200 queries per second. But for parallel operation, the limitation is relatively easy to exceed. For purchase developers, it may cause serious financial troubles.
So for anybody who wants to send millions of request by amapGeocode, you are welcomed to make the parallel operations manually.
It’s very common for API upgrades to make the downstream application, like amapGeocode, to be unavailable. Feel free to let me know once it’s broken or just open an Issue.
Hex Sticker was created by hexSticker package with the world data from rnaturalearth.