taxize
allows users to search over many taxonomic data sources for species names (scientific and common) and download up and downstream taxonomic hierarchical information - among other things.
The taxize
tutorial is can be found at http://ropensci.org/tutorials/taxize.html.
The functions in the package that hit a specific API have a prefix and suffix separated by an underscore. They follow the format of service_whatitdoes
. For example, gnr_resolve
uses the Global Names Resolver API to resolve species names. General functions in the package that don't hit a specific API don't have two words separated by an underscore, e.g., classification
.
You need API keys for Encyclopedia of Life (EOL), and Tropicos.
Note that a few data sources require SOAP web services, which are difficult to support in R across all operating systems. These include: World Register of Marine Species, Pan-European Species directories Infrastructure , and Mycobank, so far. Data sources that use SOAP web services have been moved to a new package called taxizesoap
. Find it at https://github.com/ropensci/taxizesoap.
taxize
Souce | Function prefix | API Docs | API key |
---|---|---|---|
Encylopedia of Life |
eol
|
link | link |
Taxonomic Name Resolution Service |
tnrs
|
"api.phylotastic.org/tnrs" | none |
Integrated Taxonomic Information Service |
itis
|
link | none |
Global Names Resolver |
gnr
|
link | none |
Global Names Index |
gni
|
link | none |
IUCN Red List |
iucn
|
link | none |
Tropicos |
tp
|
link | link |
Theplantlist dot org |
tpl
|
** | none |
Catalogue of Life |
col
|
link | none |
National Center for Biotechnology Information |
ncbi
|
none | none |
CANADENSYS Vascan name search API |
vascan
|
link | none |
International Plant Names Index (IPNI) |
ipni
|
link | none |
Barcode of Life Data Systems (BOLD) |
bold
|
link | none |
National Biodiversity Network (UK) |
nbn
|
link | none |
Index Fungorum |
fg
|
link | none |
EU BON |
eubon
|
link | none |
Index of Names (ION) |
ion
|
link | none |
**: There are none! We suggest using TPL
and TPLck
functions in the taxonstand package. We provide two functions to get bullk data: tpl_families
and tpl_get
.
***: There are none! The function scrapes the web directly.
See the newdatasource tag in the issue tracker
For more examples see the tutorial
install.packages("taxize")
Windows users install Rtools first.
install.packages("devtools")
devtools::install_github("taxize", "ropensci")
library('taxize')
Alot of taxize
revolves around taxonomic identifiers. Because, as you know, names can be a mess (misspelled, synonyms, etc.), it's better to get an identifier that a particular data sources knows about, then we can move forth acquiring more fun taxonomic data.
uids <- get_uid(c("Chironomus riparius", "Chaetopteryx"))
Classifications - think of a species, then all the taxonomic ranks up from that species, like genus, family, order, class, kingdom.
out <- classification(uids)
lapply(out, head)
#> $`315576`
#> name rank id
#> 1 cellular organisms no rank 131567
#> 2 Eukaryota superkingdom 2759
#> 3 Opisthokonta no rank 33154
#> 4 Metazoa kingdom 33208
#> 5 Eumetazoa no rank 6072
#> 6 Bilateria no rank 33213
#>
#> $`492549`
#> name rank id
#> 1 cellular organisms no rank 131567
#> 2 Eukaryota superkingdom 2759
#> 3 Opisthokonta no rank 33154
#> 4 Metazoa kingdom 33208
#> 5 Eumetazoa no rank 6072
#> 6 Bilateria no rank 33213
Get immediate children of Salmo. In this case, Salmo is a genus, so this gives species within the genus.
children("Salmo", db = 'ncbi')
#> $Salmo
#> childtaxa_id childtaxa_name childtaxa_rank
#> 1 1509524 Salmo marmoratus x Salmo trutta species
#> 2 1484545 Salmo cf. cenerinus BOLD:AAB3872 species
#> 3 1483130 Salmo zrmanjaensis species
#> 4 1483129 Salmo visovacensis species
#> 5 1483128 Salmo rhodanensis species
#> 6 1483127 Salmo pellegrini species
#> 7 1483126 Salmo opimus species
#> 8 1483125 Salmo macedonicus species
#> 9 1483124 Salmo lourosensis species
#> 10 1483123 Salmo labecula species
#> 11 1483122 Salmo farioides species
#> 12 1483121 Salmo chilo species
#> 13 1483120 Salmo cettii species
#> 14 1483119 Salmo cenerinus species
#> 15 1483118 Salmo aphelios species
#> 16 1483117 Salmo akairos species
#> 17 1201173 Salmo peristericus species
#> 18 1035833 Salmo ischchan species
#> 19 700588 Salmo labrax species
#> 20 237411 Salmo obtusirostris species
#> 21 235141 Salmo platycephalus species
#> 22 234793 Salmo letnica species
#> 23 62065 Salmo ohridanus species
#> 24 33518 Salmo marmoratus species
#> 25 33516 Salmo fibreni species
#> 26 33515 Salmo carpio species
#> 27 8032 Salmo trutta species
#> 28 8030 Salmo salar species
#>
#> attr(,"class")
#> [1] "children"
#> attr(,"db")
#> [1] "ncbi"
Get all species in the genus Apis
downstream("Apis", db = 'itis', downto = 'Species', verbose = FALSE)
#> $Apis
#> tsn parentname parenttsn taxonname rankid rankname
#> 1 154396 Apis 154395 Apis mellifera 220 Species
#> 2 763550 Apis 154395 Apis andreniformis 220 Species
#> 3 763551 Apis 154395 Apis cerana 220 Species
#> 4 763552 Apis 154395 Apis dorsata 220 Species
#> 5 763553 Apis 154395 Apis florea 220 Species
#> 6 763554 Apis 154395 Apis koschevnikovi 220 Species
#> 7 763555 Apis 154395 Apis nigrocincta 220 Species
#>
#> attr(,"class")
#> [1] "downstream"
#> attr(,"db")
#> [1] "itis"
Get all genera up from the species Pinus contorta (this includes the genus of the species, and its co-genera within the same family).
upstream("Pinus contorta", db = 'itis', upto = 'Genus', verbose=FALSE)
#> $`Pinus contorta`
#> tsn parentname parenttsn taxonname rankid rankname
#> 1 18031 Pinaceae 18030 Abies 180 Genus
#> 2 18033 Pinaceae 18030 Picea 180 Genus
#> 3 18035 Pinaceae 18030 Pinus 180 Genus
#> 4 183396 Pinaceae 18030 Tsuga 180 Genus
#> 5 183405 Pinaceae 18030 Cedrus 180 Genus
#> 6 183409 Pinaceae 18030 Larix 180 Genus
#> 7 183418 Pinaceae 18030 Pseudotsuga 180 Genus
#> 8 822529 Pinaceae 18030 Keteleeria 180 Genus
#> 9 822530 Pinaceae 18030 Pseudolarix 180 Genus
#>
#> attr(,"class")
#> [1] "upstream"
#> attr(,"db")
#> [1] "itis"
synonyms("Acer drummondii", db="itis")
#> $`Acer drummondii`
#> sub_tsn acc_name acc_tsn message
#> 1 183671 Acer rubrum var. drummondii 526853 no syns found
get_ids(names="Salvelinus fontinalis", db = c('itis', 'ncbi'), verbose=FALSE)
#> $itis
#> Salvelinus fontinalis
#> "162003"
#> attr(,"match")
#> [1] "found"
#> attr(,"uri")
#> [1] "http://www.itis.gov/servlet/SingleRpt/SingleRpt?search_topic=TSN&search_value=162003"
#> attr(,"class")
#> [1] "tsn"
#>
#> $ncbi
#> Salvelinus fontinalis
#> "8038"
#> attr(,"class")
#> [1] "uid"
#> attr(,"match")
#> [1] "found"
#> attr(,"uri")
#> [1] "http://www.ncbi.nlm.nih.gov/taxonomy/8038"
#>
#> attr(,"class")
#> [1] "ids"
You can limit to certain rows when getting ids in any get_*()
functions
get_ids(names="Poa annua", db = "gbif", rows=1)
#> $gbif
#> Poa annua
#> "2704179"
#> attr(,"class")
#> [1] "gbifid"
#> attr(,"match")
#> [1] "found"
#> attr(,"uri")
#> [1] "http://www.gbif.org/species/2704179"
#>
#> attr(,"class")
#> [1] "ids"
Furthermore, you can just back all ids if that's your jam with the get_*_()
functions (all get_*()
functions with additional _
underscore at end of function name)
get_ids_(c("Chironomus riparius", "Pinus contorta"), db = 'nbn', rows=1:3)
#> $nbn
#> $nbn$`Chironomus riparius`
#> ptaxonversionkey searchmatchtitle rank namestatus
#> 1 NBNSYS0000027573 Chironomus riparius Species Recommended
#> 2 NHMSYS0001718042 Elaphrus riparius Species Recommended
#> 3 NBNSYS0000023345 Paederus riparius Species Recommended
#>
#> $nbn$`Pinus contorta`
#> ptaxonversionkey searchmatchtitle rank namestatus
#> 1 NHMSYS0000494848 Pinus contorta var. contorta Variety Recommended
#> 2 NBNSYS0000004786 Pinus contorta Species Recommended
#> 3 NHMSYS0000494848 Pinus contorta subsp. contorta Subspecies Recommended
#>
#>
#> attr(,"class")
#> [1] "ids"
sci2comm('Helianthus annuus', db = 'itis')
#> $`Helianthus annuus`
#> [1] "common sunflower" "sunflower" "wild sunflower"
#> [4] "annual sunflower"
comm2sci("black bear", db = "itis")
#> $`black bear`
#> [1] "Ursus thibetanus" "Ursus thibetanus"
#> [3] "Chiropotes satanas" "Ursus americanus luteolus"
#> [5] "Ursus americanus americanus" "Ursus americanus"
#> [7] "Ursus americanus"
numeric
to uid
as.uid(315567)
#> [1] "315567"
#> attr(,"class")
#> [1] "uid"
#> attr(,"match")
#> [1] "found"
#> attr(,"uri")
#> [1] "http://www.ncbi.nlm.nih.gov/taxonomy/315567"
list
to uid
as.uid(list("315567", "3339", "9696"))
#> [1] "315567" "3339" "9696"
#> attr(,"class")
#> [1] "uid"
#> attr(,"match")
#> [1] "found" "found" "found"
#> attr(,"uri")
#> [1] "http://www.ncbi.nlm.nih.gov/taxonomy/315567"
#> [2] "http://www.ncbi.nlm.nih.gov/taxonomy/3339"
#> [3] "http://www.ncbi.nlm.nih.gov/taxonomy/9696"
out <- as.uid(c(315567, 3339, 9696))
(res <- data.frame(out))
#> ids class match uri
#> 1 315567 uid found http://www.ncbi.nlm.nih.gov/taxonomy/315567
#> 2 3339 uid found http://www.ncbi.nlm.nih.gov/taxonomy/3339
#> 3 9696 uid found http://www.ncbi.nlm.nih.gov/taxonomy/9696
Check out our milestones to see what we plan to get done for each version.
taxize
in R doing citation(package = 'taxize')