taxize
allows users to search over many taxonomic data sources for species names (scientific and common) and download up and downstream taxonomic hierarchical information - among other things.
The taxize
tutorial is can be found at http://ropensci.org/tutorials/taxize.html.
The functions in the package that hit a specific API have a prefix and suffix separated by an underscore. They follow the format of service_whatitdoes
. For example, gnr_resolve
uses the Global Names Resolver API to resolve species names. General functions in the package that don't hit a specific API don't have two words separated by an underscore, e.g., classification
.
You need API keys for Encyclopedia of Life (EOL), the Universal Biological Indexer and Organizer (uBio), Tropicos, and Plantminer.
Note that a few data sources require SOAP web services, which are difficult to support in R across all operating systems. These include: World Register of Marine Species, Pan-European Species directories Infrastructure , and Mycobank, so far. Data sources that use SOAP web services have been moved to a new package called taxizesoap
. Find it at https://github.com/ropensci/taxizesoap.
taxize
Souce | Function prefix | API Docs | API key |
---|---|---|---|
Encylopedia of Life |
eol
|
link | link |
Taxonomic Name Resolution Service |
tnrs
|
"api.phylotastic.org/tnrs" | none |
Integrated Taxonomic Information Service |
itis
|
link | none |
Phylomatic |
phylomatic
|
link | none |
uBio |
ubio
|
link | link |
Global Names Resolver |
gnr
|
link | none |
Global Names Index |
gni
|
link | none |
IUCN Red List |
iucn
|
link | none |
Tropicos |
tp
|
link | link |
Plantminer |
plantminer
|
link | link |
Theplantlist dot org |
tpl
|
** | none |
Catalogue of Life |
col
|
link | none |
Global Invasive Species Database |
gisd
|
*** | none |
National Center for Biotechnology Information |
ncbi
|
none | none |
CANADENSYS Vascan name search API |
vascan
|
link | none |
International Plant Names Index (IPNI) |
ipni
|
link | none |
Barcode of Life Data Systems (BOLD) |
bold
|
link | none |
National Biodiversity Network (UK) |
nbn
|
link | none |
**: There are none! We suggest using TPL
and TPLck
functions in the taxonstand package. We provide two functions to get bullk data: tpl_families
and tpl_get
.
***: There are none! The function scrapes the web directly.
For more examples see the tutorial
install.packages("taxize")
Windows users install Rtools first.
install.packages("devtools")
devtools::install_github("taxize", "ropensci")
library('taxize')
Alot of taxize
revolves around taxonomic identifiers. Because, as you know, names can be a mess (misspelled, synonyms, etc.), it's better to get an identifier that a particular data sources knows about, then we can move forth acquiring more fun taxonomic data.
uids <- get_uid(c("Chironomus riparius", "Chaetopteryx"))
Classifications - think of a species, then all the taxonomic ranks up from that species, like genus, family, order, class, kingdom.
out <- classification(uids)
lapply(out, head)
#> $`315576`
#> name rank id
#> 1 cellular organisms no rank 131567
#> 2 Eukaryota superkingdom 2759
#> 3 Opisthokonta no rank 33154
#> 4 Metazoa kingdom 33208
#> 5 Eumetazoa no rank 6072
#> 6 Bilateria no rank 33213
#>
#> $`492549`
#> name rank id
#> 1 cellular organisms no rank 131567
#> 2 Eukaryota superkingdom 2759
#> 3 Opisthokonta no rank 33154
#> 4 Metazoa kingdom 33208
#> 5 Eumetazoa no rank 6072
#> 6 Bilateria no rank 33213
Get immediate children of Salmo. In this case, Salmo is a genus, so this gives species within the genus.
children("Salmo", db = 'ncbi')
#> $Salmo
#> childtaxa_id childtaxa_name childtaxa_rank
#> 1 1509524 Salmo marmoratus x Salmo trutta species
#> 2 1484545 Salmo cf. cenerinus BOLD:AAB3872 species
#> 3 1483130 Salmo zrmanjaensis species
#> 4 1483129 Salmo visovacensis species
#> 5 1483128 Salmo rhodanensis species
#> 6 1483127 Salmo pellegrini species
#> 7 1483126 Salmo opimus species
#> 8 1483125 Salmo macedonicus species
#> 9 1483124 Salmo lourosensis species
#> 10 1483123 Salmo labecula species
#> 11 1483122 Salmo farioides species
#> 12 1483121 Salmo chilo species
#> 13 1483120 Salmo cettii species
#> 14 1483119 Salmo cenerinus species
#> 15 1483118 Salmo aphelios species
#> 16 1483117 Salmo akairos species
#> 17 1201173 Salmo peristericus species
#> 18 1035833 Salmo ischchan species
#> 19 700588 Salmo labrax species
#> 20 237411 Salmo obtusirostris species
#> 21 235141 Salmo platycephalus species
#> 22 234793 Salmo letnica species
#> 23 62065 Salmo ohridanus species
#> 24 33518 Salmo marmoratus species
#> 25 33516 Salmo fibreni species
#> 26 33515 Salmo carpio species
#> 27 8032 Salmo trutta species
#> 28 8030 Salmo salar species
#>
#> attr(,"class")
#> [1] "children"
#> attr(,"db")
#> [1] "ncbi"
Get all species in the genus Apis
downstream("Apis", db = 'itis', downto = 'Species', verbose = FALSE)
#> $Apis
#> tsn parentname parenttsn taxonname rankid rankname
#> 1 154396 Apis 154395 Apis mellifera 220 Species
#> 2 763550 Apis 154395 Apis andreniformis 220 Species
#> 3 763551 Apis 154395 Apis cerana 220 Species
#> 4 763552 Apis 154395 Apis dorsata 220 Species
#> 5 763553 Apis 154395 Apis florea 220 Species
#> 6 763554 Apis 154395 Apis koschevnikovi 220 Species
#> 7 763555 Apis 154395 Apis nigrocincta 220 Species
#>
#> attr(,"class")
#> [1] "downstream"
#> attr(,"db")
#> [1] "itis"
Get all genera up from the species Pinus contorta (this includes the genus of the species, and its co-genera within the same family).
upstream("Pinus contorta", db = 'itis', upto = 'Genus', verbose=FALSE)
#> $`Pinus contorta`
#> tsn parentname parenttsn taxonname rankid rankname
#> 1 18031 Pinaceae 18030 Abies 180 Genus
#> 2 18033 Pinaceae 18030 Picea 180 Genus
#> 3 18035 Pinaceae 18030 Pinus 180 Genus
#> 4 183396 Pinaceae 18030 Tsuga 180 Genus
#> 5 183405 Pinaceae 18030 Cedrus 180 Genus
#> 6 183409 Pinaceae 18030 Larix 180 Genus
#> 7 183418 Pinaceae 18030 Pseudotsuga 180 Genus
#> 8 822529 Pinaceae 18030 Keteleeria 180 Genus
#> 9 822530 Pinaceae 18030 Pseudolarix 180 Genus
#>
#> attr(,"class")
#> [1] "upstream"
#> attr(,"db")
#> [1] "itis"
synonyms("Salmo friderici", db='ubio')
#> ubioid target family rank
#> 1 2529704 Salmo friderici Pisces species
#> 2 169693 Salmo friderici Pisces species
#> $`Salmo friderici`
#> namebankid namestring
#> 1 130562 Leporinus friderici friderici
#> 2 169693 Salmo friderici
#> 3 2495407 Leporinus friderici friderici
#> fullnamestring
#> 1 Leporinus friderici friderici (Bloch, 1794)
#> 2 Salmo friderici Bloch, 1794
#> 3 Leporinus friderici friderici
get_ids(names="Salvelinus fontinalis", db = c('ubio','ncbi'), verbose=FALSE)
#> ubioid target family rank
#> 1 2501330 Salvelinus fontinalis Pisces species
#> 2 6581534 Salvelinus fontinalis Salmonidae species
#> 3 137827 Salvelinus fontinalis Pisces species
#> 4 6244425 Salvelinus fontinalis Salmonidae trinomial
#> 5 7130714 Salvelinus fontinalis Salmonidae trinomial
#> 6 6653671 Salvelinus fontinalis Salmonidae trinomial
#> $ubio
#> Salvelinus fontinalis
#> "2501330"
#> attr(,"class")
#> [1] "ubioid"
#> attr(,"match")
#> [1] "found"
#> attr(,"uri")
#> [1] "http://www.ubio.org/browser/details.php?namebankID=2501330"
#>
#> $ncbi
#> Salvelinus fontinalis
#> "8038"
#> attr(,"class")
#> [1] "uid"
#> attr(,"match")
#> [1] "found"
#> attr(,"uri")
#> [1] "http://www.ncbi.nlm.nih.gov/taxonomy/8038"
#>
#> attr(,"class")
#> [1] "ids"
You can limit to certain rows when getting ids in any get_*()
functions
get_ids(names="Poa annua", db = "gbif", rows=1)
#> $gbif
#> Poa annua
#> "2704179"
#> attr(,"class")
#> [1] "gbifid"
#> attr(,"match")
#> [1] "found"
#> attr(,"uri")
#> [1] "http://www.gbif.org/species/2704179"
#>
#> attr(,"class")
#> [1] "ids"
Furthermore, you can just back all ids if that's your jam with the get_*_()
functions (all get_*()
functions with additional _
underscore at end of function name)
get_ids_(c("Chironomus riparius", "Pinus contorta"), db = 'nbn', rows=1:3)
#> $nbn
#> $nbn$`Chironomus riparius`
#> ptaxonVersionKey searchMatchTitle rank nameStatus
#> 1 NBNSYS0000027573 Chironomus riparius Species Recommended
#> 2 NBNSYS0000023345 Paederus riparius Species Recommended
#> 3 NHMSYS0001718042 Elaphrus riparius Species Recommended
#>
#> $nbn$`Pinus contorta`
#> ptaxonVersionKey searchMatchTitle rank nameStatus
#> 1 NHMSYS0000494848 Pinus contorta var. contorta Variety Recommended
#> 2 NBNSYS0000004786 Pinus contorta Species Recommended
#> 3 NHMSYS0000494848 Pinus contorta subsp. contorta Subspecies Recommended
#>
#>
#> attr(,"class")
#> [1] "ids"
sci2comm('Helianthus annuus', db = 'itis')
#> $`Helianthus annuus`
#> [1] "common sunflower" "sunflower" "wild sunflower"
#> [4] "annual sunflower"
comm2sci("black bear", db = "itis")
#> $`black bear`
#> [1] "Ursus americanus luteolus" "Ursus americanus americanus"
#> [3] "Ursus americanus" "Ursus americanus"
#> [5] "Chiropotes satanas" "Ursus thibetanus"
#> [7] "Ursus thibetanus"
numeric
to uid
as.uid(315567)
#> [1] "315567"
#> attr(,"class")
#> [1] "uid"
#> attr(,"match")
#> [1] "found"
#> attr(,"uri")
#> [1] "http://www.ncbi.nlm.nih.gov/taxonomy/315567"
list
to uid
as.uid(list("315567", "3339", "9696"))
#> [1] "315567" "3339" "9696"
#> attr(,"class")
#> [1] "uid"
#> attr(,"match")
#> [1] "found" "found" "found"
#> attr(,"uri")
#> [1] "http://www.ncbi.nlm.nih.gov/taxonomy/315567"
#> [2] "http://www.ncbi.nlm.nih.gov/taxonomy/3339"
#> [3] "http://www.ncbi.nlm.nih.gov/taxonomy/9696"
out <- as.uid(c(315567, 3339, 9696))
(res <- data.frame(out))
#> ids class match uri
#> 1 315567 uid found http://www.ncbi.nlm.nih.gov/taxonomy/315567
#> 2 3339 uid found http://www.ncbi.nlm.nih.gov/taxonomy/3339
#> 3 9696 uid found http://www.ncbi.nlm.nih.gov/taxonomy/9696
Check out our milestones to see what we plan to get done for each version.
taxize
in R doing citation(package = 'taxize')