taxize
allows users to search over many taxonomic data sources for species names (scientific and common) and download up and downstream taxonomic hierarchical information - among other things.
The taxize
tutorial is can be found at https://ropensci.org/tutorials/taxize.html
The functions in the package that hit a specific API have a prefix and suffix separated by an underscore. They follow the format of service_whatitdoes
. For example, gnr_resolve
uses the Global Names Resolver API to resolve species names. General functions in the package that don’t hit a specific API don’t have two words separated by an underscore, e.g., classification
.
You need API keys for Encyclopedia of Life (EOL), Tropicos, IUCN, and NatureServe.
Note that a few data sources require SOAP web services, which are difficult to support in R across all operating systems. These include: Pan-European Species directories Infrastructure and Mycobank. Data sources that use SOAP web services have been moved to taxizesoap
at https://github.com/ropensci/taxizesoap.
taxize
Souce | Function prefix | API Docs | API key |
---|---|---|---|
Encylopedia of Life |
eol
|
link | link |
Taxonomic Name Resolution Service |
tnrs
|
“api.phylotastic.org/tnrs” | none |
Integrated Taxonomic Information Service |
itis
|
link | none |
Global Names Resolver |
gnr
|
link | none |
Global Names Index |
gni
|
link | none |
IUCN Red List |
iucn
|
link | link |
Tropicos |
tp
|
link | link |
Theplantlist dot org |
tpl
|
** | none |
Catalogue of Life |
col
|
link | none |
National Center for Biotechnology Information |
ncbi
|
none | none |
CANADENSYS Vascan name search API |
vascan
|
link | none |
International Plant Names Index (IPNI) |
ipni
|
link | none |
Barcode of Life Data Systems (BOLD) |
bold
|
link | none |
National Biodiversity Network (UK) |
nbn
|
link | none |
Index Fungorum |
fg
|
link | none |
EU BON |
eubon
|
link | none |
Index of Names (ION) |
ion
|
link | none |
Open Tree of Life (TOL) |
tol
|
link | none |
World Register of Marine Species (WoRMS) |
worms
|
link | none |
NatureServe |
natserv
|
link | link |
Wikipedia |
wiki
|
link | none |
**: There are none! We suggest using TPL
and TPLck
functions in the taxonstand package. We provide two functions to get bullk data: tpl_families
and tpl_get
.
***: There are none! The function scrapes the web directly.
See the newdatasource tag in the issue tracker
For more examples see the tutorial
install.packages("taxize")
Windows users install Rtools first.
install.packages("devtools")
devtools::install_github("ropensci/taxize")
library('taxize')
Alot of taxize
revolves around taxonomic identifiers. Because, as you know, names can be a mess (misspelled, synonyms, etc.), it’s better to get an identifier that a particular data sources knows about, then we can move forth acquiring more fun taxonomic data.
uids <- get_uid(c("Chironomus riparius", "Chaetopteryx"))
Classifications - think of a species, then all the taxonomic ranks up from that species, like genus, family, order, class, kingdom.
out <- classification(uids)
lapply(out, head)
#> $`315576`
#> name rank id
#> 1 cellular organisms no rank 131567
#> 2 Eukaryota superkingdom 2759
#> 3 Opisthokonta no rank 33154
#> 4 Metazoa kingdom 33208
#> 5 Eumetazoa no rank 6072
#> 6 Bilateria no rank 33213
#>
#> $`492549`
#> name rank id
#> 1 cellular organisms no rank 131567
#> 2 Eukaryota superkingdom 2759
#> 3 Opisthokonta no rank 33154
#> 4 Metazoa kingdom 33208
#> 5 Eumetazoa no rank 6072
#> 6 Bilateria no rank 33213
Get immediate children of Salmo. In this case, Salmo is a genus, so this gives species within the genus.
children("Salmo", db = 'ncbi')
#> $Salmo
#> childtaxa_id childtaxa_name childtaxa_rank
#> 1 1509524 Salmo marmoratus x Salmo trutta species
#> 2 1484545 Salmo cf. cenerinus BOLD:AAB3872 species
#> 3 1483130 Salmo zrmanjaensis species
#> 4 1483129 Salmo visovacensis species
#> 5 1483128 Salmo rhodanensis species
#> 6 1483127 Salmo pellegrini species
#> 7 1483126 Salmo opimus species
#> 8 1483125 Salmo macedonicus species
#> 9 1483124 Salmo lourosensis species
#> 10 1483123 Salmo labecula species
#> 11 1483122 Salmo farioides species
#> 12 1483121 Salmo chilo species
#> 13 1483120 Salmo cettii species
#> 14 1483119 Salmo cenerinus species
#> 15 1483118 Salmo aphelios species
#> 16 1483117 Salmo akairos species
#> 17 1201173 Salmo peristericus species
#> 18 1035833 Salmo ischchan species
#> 19 700588 Salmo labrax species
#> 20 237411 Salmo obtusirostris species
#> 21 235141 Salmo platycephalus species
#> 22 234793 Salmo letnica species
#> 23 62065 Salmo ohridanus species
#> 24 33518 Salmo marmoratus species
#> 25 33516 Salmo fibreni species
#> 26 33515 Salmo carpio species
#> 27 8032 Salmo trutta species
#> 28 8030 Salmo salar species
#>
#> attr(,"class")
#> [1] "children"
#> attr(,"db")
#> [1] "ncbi"
Get all species in the genus Apis
downstream(as.tsn(154395), db = 'itis', downto = 'species', verbose = FALSE)
#> $`154395`
#> tsn parentname parenttsn taxonname rankid rankname
#> 1 154396 Apis 154395 Apis mellifera 220 species
#> 2 763550 Apis 154395 Apis andreniformis 220 species
#> 3 763551 Apis 154395 Apis cerana 220 species
#> 4 763552 Apis 154395 Apis dorsata 220 species
#> 5 763553 Apis 154395 Apis florea 220 species
#> 6 763554 Apis 154395 Apis koschevnikovi 220 species
#> 7 763555 Apis 154395 Apis nigrocincta 220 species
#>
#> attr(,"class")
#> [1] "downstream"
#> attr(,"db")
#> [1] "itis"
Get all genera up from the species Pinus contorta (this includes the genus of the species, and its co-genera within the same family).
upstream("Pinus contorta", db = 'itis', upto = 'Genus', verbose=FALSE)
#> tsn target
#> 1 183327 Pinus contorta
#> 2 183332 Pinus contorta ssp. bolanderi
#> 3 822698 Pinus contorta ssp. contorta
#> 4 183329 Pinus contorta ssp. latifolia
#> 5 183330 Pinus contorta ssp. murrayana
#> 6 529672 Pinus contorta var. bolanderi
#> 7 183328 Pinus contorta var. contorta
#> 8 529673 Pinus contorta var. latifolia
#> 9 529674 Pinus contorta var. murrayana
#> commonNames
#> 1 scrub pine,shore pine,tamarack pine,lodgepole pine
#> 2 Bolander's beach pine
#> 3 NA
#> 4 black pine,Rocky Mountain lodgepole pine
#> 5 tamarack pine,Sierra lodgepole pine
#> 6 Bolander beach pine
#> 7 coast pine,lodgepole pine,beach pine,shore pine
#> 8 tall lodgepole pine,lodgepole pine,Rocky Mountain lodgepole pine
#> 9 Murray's lodgepole pine,Sierra lodgepole pine,tamarack pine
#> nameUsage
#> 1 accepted
#> 2 not accepted
#> 3 not accepted
#> 4 not accepted
#> 5 not accepted
#> 6 accepted
#> 7 accepted
#> 8 accepted
#> 9 accepted
#> $`Pinus contorta`
#> tsn parentname parenttsn taxonname rankid rankname
#> 1 18031 Pinaceae 18030 Abies 180 genus
#> 2 18033 Pinaceae 18030 Picea 180 genus
#> 3 18035 Pinaceae 18030 Pinus 180 genus
#> 4 183396 Pinaceae 18030 Tsuga 180 genus
#> 5 183405 Pinaceae 18030 Cedrus 180 genus
#> 6 183409 Pinaceae 18030 Larix 180 genus
#> 7 183418 Pinaceae 18030 Pseudotsuga 180 genus
#> 8 822529 Pinaceae 18030 Keteleeria 180 genus
#> 9 822530 Pinaceae 18030 Pseudolarix 180 genus
#>
#> attr(,"class")
#> [1] "upstream"
#> attr(,"db")
#> [1] "itis"
synonyms("Acer drummondii", db="itis")
#> tsn target commonNames nameUsage
#> 1 183671 Acer drummondii NA not accepted
#> 2 183672 Rufacer drummondii NA not accepted
#> $`Acer drummondii`
#> sub_tsn acc_name acc_tsn
#> 1 183671 Acer rubrum var. drummondii 526853
#> 2 183671 Acer rubrum var. drummondii 526853
#> 3 183671 Acer rubrum var. drummondii 526853
#> acc_author syn_author
#> 1 (Hook. & Arn. ex Nutt.) Sarg. (Hook. & Arn. ex Nutt.) E. Murray
#> 2 (Hook. & Arn. ex Nutt.) Sarg. Hook. & Arn. ex Nutt.
#> 3 (Hook. & Arn. ex Nutt.) Sarg. (Hook. & Arn. ex Nutt.) Small
#> syn_name syn_tsn
#> 1 Acer rubrum ssp. drummondii 28730
#> 2 Acer drummondii 183671
#> 3 Rufacer drummondii 183672
#>
#> attr(,"class")
#> [1] "synonyms"
#> attr(,"db")
#> [1] "itis"
get_ids(names="Salvelinus fontinalis", db = c('itis', 'ncbi'), verbose=FALSE)
#> $itis
#> Salvelinus fontinalis
#> "162003"
#> attr(,"match")
#> [1] "found"
#> attr(,"multiple_matches")
#> [1] FALSE
#> attr(,"pattern_match")
#> [1] FALSE
#> attr(,"uri")
#> [1] "http://www.itis.gov/servlet/SingleRpt/SingleRpt?search_topic=TSN&search_value=162003"
#> attr(,"class")
#> [1] "tsn"
#>
#> $ncbi
#> Salvelinus fontinalis
#> "8038"
#> attr(,"class")
#> [1] "uid"
#> attr(,"match")
#> [1] "found"
#> attr(,"multiple_matches")
#> [1] FALSE
#> attr(,"pattern_match")
#> [1] FALSE
#> attr(,"uri")
#> [1] "https://www.ncbi.nlm.nih.gov/taxonomy/8038"
#>
#> attr(,"class")
#> [1] "ids"
You can limit to certain rows when getting ids in any get_*()
functions
get_ids(names="Poa annua", db = "gbif", rows=1)
#> $gbif
#> Poa annua
#> "2704179"
#> attr(,"class")
#> [1] "gbifid"
#> attr(,"match")
#> [1] "found"
#> attr(,"multiple_matches")
#> [1] TRUE
#> attr(,"pattern_match")
#> [1] FALSE
#> attr(,"uri")
#> [1] "http://www.gbif.org/species/2704179"
#>
#> attr(,"class")
#> [1] "ids"
Furthermore, you can just back all ids if that’s your jam with the get_*_()
functions (all get_*()
functions with additional _
underscore at end of function name)
get_ids_(c("Chironomus riparius", "Pinus contorta"), db = 'nbn', rows=1:3)
#> $nbn
#> $nbn$`Chironomus riparius`
#> guid scientificName rank taxonomicStatus
#> 1 NBNSYS0000027573 Chironomus riparius species accepted
#> 2 NHMSYS0000864966 Damaeus (Damaeus) riparius species accepted
#> 3 NHMSYS0021059238 Rhizoclonium riparium species accepted
#>
#> $nbn$`Pinus contorta`
#> guid scientificName rank taxonomicStatus
#> 1 NBNSYS0000004786 Pinus contorta species accepted
#> 2 NHMSYS0000494858 Pinus contorta var. murrayana variety accepted
#> 3 NHMSYS0000494848 Pinus contorta var. contorta variety accepted
#>
#>
#> attr(,"class")
#> [1] "ids"
sci2comm('Helianthus annuus', db = 'itis')
#> tsn target
#> 1 36616 Helianthus annuus
#> 2 525928 Helianthus annuus ssp. jaegeri
#> 3 525929 Helianthus annuus ssp. lenticularis
#> 4 525930 Helianthus annuus ssp. texanus
#> 5 536095 Helianthus annuus var. lenticularis
#> 6 536096 Helianthus annuus var. macrocarpus
#> 7 536097 Helianthus annuus var. texanus
#> commonNames nameUsage
#> 1 annual sunflower,sunflower,wild sunflower,common sunflower accepted
#> 2 NA not accepted
#> 3 NA not accepted
#> 4 NA not accepted
#> 5 NA not accepted
#> 6 NA not accepted
#> 7 NA not accepted
#> $`Helianthus annuus`
#> [1] "common sunflower" "sunflower" "wild sunflower"
#> [4] "annual sunflower"
comm2sci("black bear", db = "itis")
#> $`black bear`
#> [1] "Chiropotes satanas" "Ursus thibetanus"
#> [3] "Ursus thibetanus" "Ursus americanus luteolus"
#> [5] "Ursus americanus" "Ursus americanus"
#> [7] "Ursus americanus americanus"
spp <- c("Sus scrofa", "Homo sapiens", "Nycticebus coucang")
lowest_common(spp, db = "ncbi")
#> name rank id
#> 21 Boreoeutheria below-class 1437010
numeric
to uid
as.uid(315567)
#> [1] "315567"
#> attr(,"class")
#> [1] "uid"
#> attr(,"match")
#> [1] "found"
#> attr(,"multiple_matches")
#> [1] FALSE
#> attr(,"pattern_match")
#> [1] FALSE
#> attr(,"uri")
#> [1] "http://www.ncbi.nlm.nih.gov/taxonomy/315567"
list
to uid
as.uid(list("315567", "3339", "9696"))
#> [1] "315567" "3339" "9696"
#> attr(,"class")
#> [1] "uid"
#> attr(,"match")
#> [1] "found" "found" "found"
#> attr(,"multiple_matches")
#> [1] FALSE FALSE FALSE
#> attr(,"pattern_match")
#> [1] FALSE FALSE FALSE
#> attr(,"uri")
#> [1] "http://www.ncbi.nlm.nih.gov/taxonomy/315567"
#> [2] "http://www.ncbi.nlm.nih.gov/taxonomy/3339"
#> [3] "http://www.ncbi.nlm.nih.gov/taxonomy/9696"
out <- as.uid(c(315567, 3339, 9696))
(res <- data.frame(out))
#> ids class match multiple_matches pattern_match
#> 1 315567 uid found FALSE FALSE
#> 2 3339 uid found FALSE FALSE
#> 3 9696 uid found FALSE FALSE
#> uri
#> 1 http://www.ncbi.nlm.nih.gov/taxonomy/315567
#> 2 http://www.ncbi.nlm.nih.gov/taxonomy/3339
#> 3 http://www.ncbi.nlm.nih.gov/taxonomy/9696
Alphebetical
Check out our milestones to see what we plan to get done for each version.
taxize
in R doing citation(package = 'taxize')