bold
is an R package to connect to BOLD Systems via their API. Functions in bold
let you search for sequence data, specimen data, sequence + specimen data, and download raw trace files.
See also the taxize book for more options for taxonomic workflows with BOLD: https://ropenscilabs.github.io/taxize-book/
Install
Install bold
from CRAN
install.packages("bold")
Or install the development version from GitHub
devtools::install_github("ropensci/bold")
Load the package
library("bold")
bold_tax_name
searches for names with names.
bold_tax_name(name = 'Diplura')
#> taxid taxon tax_rank tax_division parentid parentname
#> 1 603673 Diplura genus Protists 53974 Scytosiphonaceae
#> 2 734358 Diplura class Animals 20 Arthropoda
#> specimenrecords taxonrep representitive_image.image
#> 1 6 <NA> <NA>
#> 2 294 Diplura SSWLA/BIOUG09581-E05+1434047314.jpg
#> representitive_image.apectratio input
#> 1 NA Diplura
#> 2 1.339 Diplura
bold_tax_name(name = c('Diplura', 'Osmia'))
#> taxid taxon tax_rank tax_division parentid parentname
#> 1 603673 Diplura genus Protists 53974 Scytosiphonaceae
#> 2 734358 Diplura class Animals 20 Arthropoda
#> 3 4940 Osmia genus Animals 4962 Megachilinae
#> specimenrecords taxonrep representitive_image.image
#> 1 6 <NA> <NA>
#> 2 294 Diplura SSWLA/BIOUG09581-E05+1434047314.jpg
#> 3 2238 Osmia BUSA/IMG_3061.jpg
#> representitive_image.apectratio input
#> 1 NA Diplura
#> 2 1.339 Diplura
#> 3 1.486 Osmia
bold_tax_id
searches for names with BOLD identifiers.
bold_tax_id(id = 88899)
#> input taxid taxon tax_rank tax_division parentid parentname
#> 1 88899 88899 Momotus genus Animals 88898 Momotidae
bold_tax_id(id = c(88899, 125295))
#> input taxid taxon tax_rank tax_division parentid parentname
#> 1 88899 88899 Momotus genus Animals 88898 Momotidae
#> 2 125295 125295 Helianthus genus Plants 151101 Asteroideae
#> taxonrep
#> 1 <NA>
#> 2 Helianthus
The BOLD sequence API gives back sequence data, with a bit of metadata.
The default is to get a list back
bold_seq(taxon = 'Coelioxys')[1:2]
#> [[1]]
#> [[1]]$id
#> [1] "ABEE013-17"
#>
#> [[1]]$name
#> [1] "Coelioxys afra"
#>
#> [[1]]$gene
#> [1] "ABEE013-17"
#>
#> [[1]]$sequence
#> [1] "AATATTATATATAATTTTTGCAATTTGATCAGGTATAATTGGATCTTCATTAAGAATAATTATTCGAATAGAATTAAGAACTCCAGGAAGATGAATCAACAACGATCAAATTTATAATTCTTTTATTACAGCTCATGCATTTTTAATAATTTTTTTTTTAGTAATACCATTTTTAATTGGAGGATTTGGAAATTGATTAGTACCTTTAATACTAGGAGCCCCCGATATAGCTTTTCCACGAATAAATAATGTAAGATTTTGACTATTACCTCCCTCAATTTTCTTATTATTATCAAGAACCCTAATTAACCCAAGAGCTGGTACTGGATGAACTGTATATCCTCCTTTATCCTTATATACATTTCATGCCTCACCTTCCGTTGATTTAGCAATTTTTTCACTTCATTTATCAGGAATTTCATCAATTATTGGATCAATAAATTTTATTGTTACAATCTTAATAATAAAAAATTTTTCTTTAAATTATAGACAAATACCATTATTTTCATGATCAGTTTTAATTACTACAATTTTACTTTTATTATCACTACCAATTTTAGCTGGAGCAATTACTATACTCCTATTTGATCGAAATTTAAATACCTCATTCTTTGACCCAATAGGAGGAGGAGATCCAATTTTATATCAACATTTATTT-----------------"
#>
#>
#> [[2]]
#> [[2]]$id
#> [1] "ABEE014-17"
#>
#> [[2]]$name
#> [1] "Coelioxys aurolimbata"
#>
#> [[2]]$gene
#> [1] "ABEE014-17"
#>
#> [[2]]$sequence
#> [1] "------------------TGCAATCTGATCAGGAATAATTGGATCTTCATTAAGAATAATTATTCGAATAGAATTAAGAATCCCAGGATCTTGGATTAATAATGATCAAATTTATAACTCTTTTATTACAGCTCATGCATTTTTAATAATTTTTTTTCTGGTTATACCATTTTTAATTGGAGGATTTGGAAATTGATTAGCCCCTTTAATGTTAGGAGCTCCAGATATAGCCTTCCCTCGAATAAATAATATTAGATTTTGATTATTACCTCCTTCTTTATTAATATTATTAATTAGTAATTTAATTAACCCTAGACCAGGAACAGGATGAACAATTTACCCTCCTTTATCTTTATATAATTATCATCCATCACCATCTGTAGATTTAGCAATTTTTTCTTTACATTTATCAGGAGTTTCATCTATTATCGGGTCAATAAATTTTATTGTAACAATTTTAATAATAAAAAATTATTCAATAAACTACAATCAAATACCTTTATTCCCATGATCAGTTTTGATTACTACAATTTTATTATTATTATCTCTACCTGTTTTAGCAGGAGCAATCACAATATTATTATTTGATCGTAATTTAAACTCATCATTTTTTGACCCTTTAGGAGGAGGAGATCCTATTCTATACCAACATTTATTT-----------------"
You can optionally get back the crul
response object
res <- bold_seq(taxon = 'Coelioxys', response = TRUE)
res$response_headers
#> $status
#> [1] "HTTP/1.1 200 OK"
#>
#> $date
#> [1] "Wed, 14 Nov 2018 00:48:10 GMT"
#>
#> $server
#> [1] "Apache/2.2.15 (Red Hat)"
#>
#> $`x-powered-by`
#> [1] "PHP/5.3.15"
#>
#> $`content-disposition`
#> [1] "attachment; filename=fasta.fas"
#>
#> $connection
#> [1] "close"
#>
#> $`transfer-encoding`
#> [1] "chunked"
#>
#> $`content-type`
#> [1] "application/x-download"
You can do geographic searches
bold_seq(geo = "USA")
#> [[1]]
#> [[1]]$id
#> [1] "GBAN1777-08"
#>
#> [[1]]$name
#> [1] "Macrobdella decora"
#>
#> [[1]]$gene
#> [1] "GBAN1777-08"
#>
#> [[1]]$sequence
#> [1] "---------------------------------ATTGGAATCTTGTATTTCTTATTAGGTACATGATCTGCTATAGTAGGGACCTCTATA---AGAATAATTATTCGAATTGAATTAGCTCAACCTGGGTCGTTTTTAGGAAAT---GATCAAATTTACAATACTATTGTTACTGCTCATGGATTAATTATAATTTTTTTTATAGTAATACCTATTTTAATTGGAGGGTTTGGTAATTGATTAATTCCGCTAATA---ATTGGTTCTCCTGATATAGCTTTTCCACGTCTTAATAATTTAAGATTTTGATTACTTCCGCCATCTTTAACTATACTTTTTTGTTCATCTATAGTCGAAAATGGAGTAGGTACTGGATGGACTATTTACCCTCCTTTAGCAGATAACATTGCTCATTCTGGACCTTCTGTAGATATA---GCAATTTTTTCACTTCATTTAGCTGGTGCTTCTTCTATTTTAGGTTCATTAAATTTTATTACTACTGTAGTTAATATACGATGACCAGGGATATCTATAGAGCGAATTCCTTTATTTATTTGATCCGTAATTATTACTACTGTATTGCTATTATTATCTTTACCAGTATTAGCAGCT---GCTATTTCAATATTATTAACAGATCGTAACTTAAATACTAGATTTTTTGACCCAATAGGAGGAGGGGATCCTATTTTATTCCAACATTTATTTTGATTTTTTGGCCACCCTGAAGTTTATATTTTAATTTTACCAGGATTTGGAGCTATTTCTCATGTAGTAAGTCATAACTCT---AAAAAATTAGAACCGTTTGGATCATTAGGGATATTATATGCAATAATTGGAATTGCAATTTTAGGTTTTATTGTTTGAGCACATCATATATTTACAGTAGGTCTTGATGTAGATACACGAGCTTATTTTACAGCAGCTACAATAGTTATTGCTGTTCCTACAGGAATTAAAGTATTTAGGTGATTG---GCAACT"
#>
#>
#> [[2]]
#> [[2]]$id
#> [1] "GBAN1780-08"
#>
#> [[2]]$name
#> [1] "Haemopis terrestris"
#>
#> [[2]]$gene
#> [1] "GBAN1780-08"
#>
#> [[2]]$sequence
#> [1] "---------------------------------ATTGGAACWTTWTATTTTATTTTNGGNGCTTGATCTGCTATATTNGGGATCTCAATA---AGGAATATTATTCGAATTGAGCCATCTCAACCTGGGAGATTATTAGGAAAT---GATCAATTATATAATTCATTAGTAACAGCTCATGGATTAATTATAATTTTCTTTATGGTTATGCCTATTTTGATTGGTGGGTTTGGTAATTGATTACTACCTTTAATA---ATTGGAGCCCCTGATATAGCTTTTCCTCGATTAAATAATTTAAGTTTTTGATTATTACCACCTTCATTAATTATATTGTTAAGATCCTCTATTATTGAAAGAGGGGTAGGTACAGGTTGAACCTTATATCCTCCTTTAGCAGATAGATTATTTCATTCAGGTCCATCGGTAGATATA---GCTATTTTTTCATTACATATAGCTGGAGCATCATCTATTTTAGGCTCATTAAACTTTATTTCTACAATTATTAATATACGAATTAAAGGTATAAGATCTGATCGAGTACCTTTATTTGTATGATCAGTTGTTATTACAACAGTTCTGTTATTATTGTCTTTACCTGTTTTAGCTGCA---GCTATTACTATATTATTAACAGATCGTAATTTAAATACTACTTTTTTTGATCCTATAGGAGGTGGAGATCCAGTATTGTTTCAACACTTATTTTGATTTTTTGGTCATCCAGAAGTATATATTTTGATTTTACCAGGATTTGGAGCAATTTCTCATATTATTACAAATAATTCT---AAAAAATTGGAACCTTTTGGATCTCTTGGTATAATTTATGCTATAATTGGAATTGCAGTTTTAGGGTTTATTGTATGAGCCCATCATATATTTACTGTAGGATTAGATGTTGATACTCGAGCTTATTTTACAGCAGCTACTATAGTTATTGCTGTTCCTACTGGTATTAAAGTTTTTAGGTGATTA---GCAACA"
#>
#>
#> [[3]]
#> [[3]]$id
#> [1] "GBNM0293-06"
#>
#> [[3]]$name
#> [1] "Steinernema carpocapsae"
#>
#> [[3]]$gene
#> [1] "GBNM0293-06"
#>
#> [[3]]$sequence
#> [1] "---------------------------------------------------------------------------------ACAAGATTATCTCTTATTATTCGTTTAGAGTTGGCTCAACCTGGTCTTCTTTTGGGTAAT---GGTCAATTATATAATTCTATTATTACTGCTCATGCTATTCTTATAATTTTTTTCATAGTTATACCTAGAATAATTGGTGGTTTTGGTAATTGAATATTACCTTTAATATTGGGGGCTCCTGATATAAGTTTTCCACGTTTGAATAATTTAAGTTTTTGATTGCTACCAACTGCTATATTTTTGATTTTAGATTCTTGTTTTGTTGACACTGGTTGTGGTACTAGTTGAACTGTTTATCCTCCTTTGAGG---ACTTTAGGTCACCCTGGYAGAAGTGTAGATTTAGCTATTTTTAGTCTTCATTGTGCAGGAATTAGCTCAATTTTAGGGGCTATTAATTTTATATGTACTACAAAAAATCTTCGTAGTAGTTCTATTTCTTTGGAACATATAAGACTTTTTGTTTGGGCTGTTTTTGTTACTGTTTTTTTATTAGTTTTATCTTTACCTGTTTTAGCTGGTGCTATTACTATGCTTTTAACAGACCGTAATTTAAATACTTCTTTTTTT------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------"
#>
#>
#> [[4]]
#> [[4]]$id
#> [1] "NEONV108-11"
#>
#> [[4]]$name
#> [1] "Aedes thelcter"
#>
#> [[4]]$gene
#> [1] "NEONV108-11"
#>
#> [[4]]$sequence
#> [1] "AACTTTATACTTCATCTTCGGAGTTTGATCAGGAATAGTTGGTACATCATTAAGAATTTTAATTCGTGCTGAATTAAGTCAACCAGGTATATTTATTGGAAATGACCAAATTTATAATGTAATTGTTACAGCTCATGCTTTTATTATAATTTTCTTTATAGTTATACCTATTATAATTGGAGGATTTGGAAATTGACTAGTTCCTCTAATATTAGGAGCCCCAGATATAGCTTTCCCTCGAATAAATAATATAAGTTTTTGAATACTACCTCCCTCATTAACTCTTCTACTTTCAAGTAGTATAGTAGAAAATGGATCAGGAACAGGATGAACAGTTTATCCACCTCTTTCATCTGGAACTGCTCATGCAGGAGCCTCTGTTGATTTAACTATTTTTTCTCTTCATTTAGCCGGAGTTTCATCAATTTTAGGGGCTGTAAATTTTATTACTACTGTAATTAATATACGATCTGCAGGAATTACTCTTGATCGACTACCTTTATTCGTTTGATCTGTAGTAATTACAGCTGTTTTATTACTTCTTTCACTTCCTGTATTAGCTGGAGCTATTACAATACTATTAACTGATCGAAATTTAAATACATCTTTCTTTGATCCAATTGGAGGAGGAGACCCAATTTTATACCAACATTTATTT"
#>
#>
#> [[5]]
#> [[5]]$id
#> [1] "NEONV109-11"
#>
#> [[5]]$name
#> [1] "Aedes thelcter"
#>
#> [[5]]$gene
#> [1] "NEONV109-11"
#>
#> [[5]]$sequence
#> [1] "AACTTTATACTTCATCTTCGGAGTTTGATCAGGAATAGTTGGTACATCATTAAGAATTTTAATTCGTGCTGAATTAAGTCAACCAGGTATATTTATTGGAAATGACCAAATTTATAATGTAATTGTTACAGCTCATGCTTTTATTATAATTTTCTTTATAGTTATACCTATTATAATTGGAGGATTTGGAAATTGACTAGTTCCTCTAATATTAGGAGCCCCAGATATAGCTTTCCCTCGAATAAATAATATAAGTTTTTGAATACTACCTCCCTCATTAACTCTTCTACTTTCAAGTAGTATAGTAGAAAATGGGTCAGGAACAGGATGAACAGTTTATCCACCTCTTTCATCTGGAACTGCTCATGCAGGAGCCTCTGTTGATTTAACTATTTTTTCTCTTCATTTAGCCGGAGTTTCATCAATTTTAGGGGCTGTAAATTTTATTACTACTGTAATTAATATACGATCTGCAGGAATTACTCTTGATCGACTACCTTTATTCGTTTGATCTGTAGTAATTACAGCTGTTTTATTACTTCTTTCACTTCCTGTATTAGCTGGAGCTATTACAATACTATTAACTGATCGAAATTTAAATACATCTTTCTTTGACCCAATTGGAGGGGGAGACCCAATTTTATACCAACATTTATTT"
And you can search by researcher name
bold_seq(researchers = 'Thibaud Decaens')[[1]]
#> $id
#> [1] "MORCA001-14"
#>
#> $name
#> [1] "Lepidoptera"
#>
#> $gene
#> [1] "MORCA001-14"
#>
#> $sequence
#> [1] "AACATTATATTTTATTTTTGGTATTTGAGCAGGAATAATTGGAACTTCTTTAAGTTTATTAATTCGAGCTGAATTGGGTAATCCCGGTTCTTTAATTGGTGATGACCAAATCTATAATACTATTGTAACAGCCCATGCTTTTATTATAATTTTCTTTATAGTTATACCTATTATAATTGGAGGATTTGGAAATTGACTAGTTCCTTTAATACTAGGAGCCCCTGATATAGCTTTCCCCCGAATAAATAATATAAGATTTTGATTACTACCCCCTTCAATTACACTTTTAATTTCTAGAAGAATTGTAGAAAATGGAGCAGGTACTGGATGAACAGTCTACCCCCCTCTTTCATCTAATATCGCTCATGGAGGTAGTTCAGTTGATTTAGCTATTTTTTCCTTACATTTAGCTGGTATTTCATCTATTTTAGGTGCTATTAATTTTATTACAACAATTATTAACATACGATTAAATAAACTATCATTTGATCAAATACCCCTATTTGTTTGAGCTGTAGGAATTACAGCATTTTTACTTTTATTATCTTTACCTGTTTTAGCGGGAGCTATTACAATACTATTAACTGATCGTAATTTAAACACTTCATTTTTTGATCCCGCGGGAGGAGGAGATCCAATTTTATATCAACACTTATTT"
by taxon IDs
bold_seq(ids = c('ACRJP618-11', 'ACRJP619-11'))
#> [[1]]
#> [[1]]$id
#> [1] "ACRJP618-11"
#>
#> [[1]]$name
#> [1] "Lepidoptera"
#>
#> [[1]]$gene
#> [1] "ACRJP618-11"
#>
#> [[1]]$sequence
#> [1] "------------------------TTGAGCAGGCATAGTAGGAACTTCTCTTAGTCTTATTATTCGAACAGAATTAGGAAATCCAGGATTTTTAATTGGAGATGATCAAATCTACAATACTATTGTTACGGCTCATGCTTTTATTATAATTTTTTTTATAGTTATACCTATTATAATTGGAGGATTTGGTAATTGATTAGTTCCCCTTATACTAGGAGCCCCAGATATAGCTTTCCCTCGAATAAACAATATAAGTTTTTGGCTTCTTCCCCCTTCACTATTACTTTTAATTTCCAGAAGAATTGTTGAAAATGGAGCTGGAACTGGATGAACAGTTTATCCCCCACTGTCATCTAATATTGCCCATAGAGGTACATCAGTAGATTTAGCTATTTTTTCTTTACATTTAGCAGGTATTTCCTCTATTTTAGGAGCGATTAATTTTATTACTACAATTATTAATATACGAATTAACAGTATAAATTATGATCAAATACCACTATTTGTGTGATCAGTAGGAATTACTGCTTTACTCTTATTACTTTCTCTTCCAGTATTAGCAGGTGCTATCACTATATTATTAACGGATCGAAATTTAAATACATCATTTTTTGATCCTGCAGGAGGAGGAGATCCAATTTTATATCAACATTTATTT"
#>
#>
#> [[2]]
#> [[2]]$id
#> [1] "ACRJP619-11"
#>
#> [[2]]$name
#> [1] "Lepidoptera"
#>
#> [[2]]$gene
#> [1] "ACRJP619-11"
#>
#> [[2]]$sequence
#> [1] "AACTTTATATTTTATTTTTGGTATTTGAGCAGGCATAGTAGGAACTTCTCTTAGTCTTATTATTCGAACAGAATTAGGAAATCCAGGATTTTTAATTGGAGATGATCAAATCTACAATACTATTGTTACGGCTCATGCTTTTATTATAATTTTTTTTATAGTTATACCTATTATAATTGGAGGATTTGGTAATTGATTAGTTCCCCTTATACTAGGAGCCCCAGATATAGCTTTCCCTCGAATAAACAATATAAGTTTTTGGCTTCTTCCCCCTTCACTATTACTTTTAATTTCCAGAAGAATTGTTGAAAATGGAGCTGGAACTGGATGAACAGTTTATCCCCCACTGTCATCTAATATTGCCCATAGAGGTACATCAGTAGATTTAGCTATTTTTTCTTTACATTTAGCAGGTATTTCCTCTATTTTAGGAGCGATTAATTTTATTACTACAATTATTAATATACGAATTAACAGTATAAATTATGATCAAATACCACTATTTGTGTGATCAGTAGGAATTACTGCTTTACTCTTATTACTTTCTCTTCCAGTATTAGCAGGTGCTATCACTATATTATTAACGGATCGAAATTTAAATACATCATTTTTTGATCCTGCAGGAGGAGGAGATCCAATTTTATATCAACATTTATTT"
by container (containers include project codes and dataset codes)
bold_seq(container = 'ACRJP')[[1]]
#> $id
#> [1] "ACRJP001-09"
#>
#> $name
#> [1] "Lepidoptera"
#>
#> $gene
#> [1] "ACRJP001-09"
#>
#> $sequence
#> [1] "AACATTATATTTTATTTTTGGGATCTGATCTGGAATAGTAGGGACATCTTTAAGTATACTAATTCGAATAGAACTAGGAAATCCTGGATGTTTAATTGGGGATGATCAAATTTATAATACTATTGTTACAGCTCATGCTTTTATTATAATTTTTTTTATAGTTATACCCATTATAATTGGAGGTTTTGGCAATTGACTTGTACCATTAATATTAGGAGCCCCTGATATAGCATTTCCCCGAATAAATAATATAAGATTTTGACTTCTTCCCCCCTCATTAATTTTATTAATTTCAAGAAGAATTGTTGAAAATGGAGCAGGAACAGGATGAACAGTCTATCCTCCATTATCTTCTAATATTGCGCATAGAGGATCCTCTGTTGATTTAGCTATTTTCTCACTTCATTTAGCAGGAATTTCTTCTATTTTAGGAGCAATTAATTTTATTACAACTATTATTAATATACGAATAAATAATTTACTTTTTGACCAAATACCTCTATTTGTTTGAGCAGTAGGTATTACAGCTGTTCTTCTTTTATTATCATTACCAGTATTAGCAGGAGCAATTACCATACTATTAACAGATCGTAATTTAAATACTTCTTTCTTTGATCCTGCTGGAGGAGGAGATCCAATTTTATACCAACATTTATTT"
by bin (a bin is a Barcode Index Number)
bold_seq(bin = 'BOLD:AAA5125')[[1]]
#> $id
#> [1] "ASARD6776-12"
#>
#> $name
#> [1] "Eacles ormondei"
#>
#> $gene
#> [1] "ASARD6776-12"
#>
#> $sequence
#> [1] "AACTTTATATTTTATTTTTGGAATTTGAGCAGGTATAGTAGGAACTTCTTTAAGATTACTAATTCGAGCAGAATTAGGTACCCCCGGATCTTTAATTGGAGATGACCAAATTTATAATACCATTGTAACAGCTCATGCTTTTATTATAATTTTTTTTATAGTTATACCTATTATAATTGGAGGATTTGGAAATTGATTAGTACCCCTAATACTAGGAGCTCCTGATATAGCTTTCCCCCGAATAAATAATATAAGATTTTGACTATTACCCCCATCTTTAACCCTTTTAATTTCTAGAAGAATTGTCGAAAATGGAGCTGGAACTGGATGAACAGTTTATCCCCCCCTTTCATCTAATATTGCTCATGGAGGCTCTTCTGTTGATTTAGCTATTTTTTCCCTTCATCTAGCTGGAATCTCATCAATTTTAGGAGCTATTAATTTTATCACAACAATCATTAATATACGACTAAATAATATAATATTTGACCAAATACCTTTATTTGTATGAGCTGTTGGTATTACAGCATTTCTTTTATTGTTATCTTTACCTGTACTAGCTGGAGCTATTACTATACTTTTAACAGATCGAAACTTAAATACATCATTTTTTGACCCAGCAGGAGGAGGAGATCCTATTCTCTATCAACATTTATTT"
And there are more ways to query, check out the docs for ?bold_seq
.
The BOLD specimen API doesn't give back sequences, only specimen data. By default you download tsv
format data, which is given back to you as a data.frame
res <- bold_specimens(taxon = 'Osmia')
head(res[,1:8])
#> processid sampleid recordID catalognum
#> 1 ABEE151-17 NHMW-HYM 773 8362250
#> 2 ABEE184-17 NHMW-HYM 2142 8362283 NHMW-HYM 2142
#> 3 ABEE185-17 NHMW-HYM 2144 8362284 NHMW-HYM 2144
#> 4 ABEE188-17 NHMW-HYM 2242 8362287 NHMW-HYM 2242
#> 5 ABEE190-17 NHMW-HYM 2259 8362289 NHMW-HYM 2259
#> 6 GBAH3878-08 EU726629 856409 EU726629
#> fieldnum
#> 1
#> 2 NBH2 Zimmermann 2017.04.03 Oesterreich WienAUTWienZimmermann
#> 3 NBH2 Zimmermann 2017.04.03 Oesterreich WienAUTWienZimmermann
#> 4 NHB3 Schoder 2017.05.30 Oesterreich WienAUTWienSchoder
#> 5 NBH9 Schoder 2017.06.15 Oesterreich WienAUTWienSchoder
#> 6
#> institution_storing collection_code bin_uri
#> 1 Naturhistorisches Museum Wien NA BOLD:AAE5409
#> 2 Naturhistorisches Museum Wien NA BOLD:AAE5409
#> 3 Naturhistorisches Museum Wien NA BOLD:ADJ1069
#> 4 Naturhistorisches Museum Wien NA BOLD:AAF2155
#> 5 Naturhistorisches Museum Wien NA BOLD:AAD0313
#> 6 Mined from GenBank, NCBI NA BOLD:AAA4494
You can optionally get back the data in XML
format
bold_specimens(taxon = 'Osmia', format = 'xml')
<?xml version="1.0" encoding="UTF-8"?>
<bold_records xsi:noNamespaceSchemaLocation="http://www.boldsystems.org/schemas/BOLDPublic_record.xsd" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<record>
<record_id>1470124</record_id>
<processid>BOM1525-10</processid>
<bin_uri>BOLD:AAN3337</bin_uri>
<specimen_identifiers>
<sampleid>DHB 1011</sampleid>
<catalognum>DHB 1011</catalognum>
<fieldnum>DHB1011</fieldnum>
<institution_storing>Marjorie Barrick Museum</institution_storing>
</specimen_identifiers>
<taxonomy>
You can choose to get the crul
response object back if you'd rather work with the raw data returned from the BOLD API.
res <- bold_specimens(taxon = 'Osmia', format = 'xml', response = TRUE)
res$url
#> [1] "http://v4.boldsystems.org/index.php/API_Public/specimen?taxon=Osmia&format=xml"
res$status_code
#> [1] 200
res$response_headers
#> $status
#> [1] "HTTP/1.1 200 OK"
#>
#> $date
#> [1] "Wed, 14 Nov 2018 00:48:44 GMT"
#>
#> $server
#> [1] "Apache/2.2.15 (Red Hat)"
#>
#> $`x-powered-by`
#> [1] "PHP/5.3.15"
#>
#> $`content-disposition`
#> [1] "attachment; filename=bold_data.xml"
#>
#> $connection
#> [1] "close"
#>
#> $`transfer-encoding`
#> [1] "chunked"
#>
#> $`content-type`
#> [1] "application/x-download"
The specimen/sequence combined API gives back specimen and sequence data. Like the specimen API, this one gives by default tsv
format data, which is given back to you as a data.frame
. Here, we're setting sepfasta=TRUE
so that the sequence data is given back as a list, and taken out of the data.frame
returned so the data.frame
is more manageable.
res <- bold_seqspec(taxon = 'Osmia', sepfasta = TRUE)
res$fasta[1:2]
#> $`ABEE151-17`
#> [1] "----------------------------------------------------------TTTTTGCTATATGATCAGGTACAGTAGGTTCAGCTATAAGAATTATTATTCGAATAGAACTTAGAGTTCCAGGATCATGAATTTCTAATGACCAAATTTATAATACTTTAGTAACTGCTCATGCTTTTTTAATAATTTTCTTTCTTGTAATACCATTTCTAATTGGAGGATTTGGAAATTGATTAATTCCTTTAATATTAGGAATTCCAGATATAGCCTTTCCACGAATAAATAATATTAGATTTTGACTTTTACCACCTTCTTTAATATTATTAATATTAAGAAATTTTATAAATCCAAGTCCAGGAACTGGATGAACTGTTTATCCTCCTCTTTCATCTTATATATTTCATTCTTCCCCATCAGTAGATTTAGCAATTTTTTCATTACATATTTCCGGATTATCCTCTATTATAGGTTCATTAAATTTTATTGTCACAATTATTATAATAAAAAATATTTCATTAAAACATACTCAATTACCCTTATTTTCTTGATCTGTATTTATTACTACTATTTTATTACTTTTCTCTCTCCCAGTTTTAGCTGGAGCTATTACTATACTTTTATTTGATCGAAATTTTAACACCTCATTTTTTGACCCGACGGGAGGTGGAGATCCAATTTTATACCAACATTTATTTTGATTTTTTGGACAT-----------------------"
#>
#> $`ABEE184-17`
#> [1] "-------CTCACTATAGGGATTCAACCAATCATAAAGATATTGGAATTCTTTATATAATTTTTGCTATATGATCAGGTACAGTAGGTTCAGCTATAAGAATTATTATTCGAATAGAACTTAGAGTTCCAGGATCATGAATTTCTAATGACCAAATTTATAATACTTTAGTAACTGCTCATGCTTTTTTAATAATTTTCTTTCTTGTAATACCATTTCTAATTGGAGGATTTGGAAATTGATTAATTCCTTTAATATTAGGAATTCCAGATATAGCCTTTCCACGAATAAATAATATTAGATTTTGACTTTTACCACCTTCTTTAATATTATTAATATTAAGAAATTTTATAAATCCAAGTCCAGGAACTGGATGAACTGTTTATCCTCCTCTTTCATCTTATATATTTCATTCTTCCCCATCAGTAGATTTAGCAATTTTTTCATTACATATTTCCGGATTATCCTCTATTATAGGTTCATTAAATTTTATTGTCACAATTATTATAATAAAAAATATTTCATTAAAACATACTCAATTACCCTTATTTTCTTGATCTGTATTTATTACTACTATTTTATTACTTTTCTCTCTCCCAGTTTTAGCTGGAGCTATTACTATACTTTTATTTGATCGAAATTTTAACACCTCATTTTTTGACC-------------------------------------------------------------------------------"
Or you can index to a specific sequence like
res$fasta['GBAH0293-06']
#> $`GBAH0293-06`
#> [1] "------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------TTAATGTTAGGGATTCCAGATATAGCTTTTCCACGAATAAATAATATTAGATTTTGACTGTTACCTCCATCTTTAATATTATTACTTTTAAGAAATTTTTTAAATCCAAGTCCTGGAACAGGATGAACAGTTTATCCTCCTTTATCATCAAATTTATTTCATTCTTCTCCTTCAGTTGATTTAGCAATTTTTTCTTTACATATTTCAGGTTTATCTTCTATTATAGGTTCATTAAATTTTATTGTTACAATTATTATAATAAAAAATATTTCTTTAAAATATATTCAATTACCTTTATTTTCTTGATCTGTATTTATTACTACTATTCTTTTATTATTTTCTTTACCTGTATTAGCTGGAGCTATTACTATATTATTATTTGATCGAAATTTTAATACATCTTTTTTTGATCCAACAGGAGGGGGAGATCCAATTCTTTATCAACATTTATTTTGATTTTTTGGTCATCCTGAAGTTTATATTTTAATTTTACCTGGATTTGGATTAATTTCTCAAATTATTTCTAATGAAAGAGGAAAAAAAGAAACTTTTGGAAATATTGGTATAATTTATGCTATATTAAGAATTGGACTTTTAGGTTTTATTGTT---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------"
This function downloads files to your machine - it does not load them into your R session - but prints out where the files are for your information.
bold_trace(taxon = 'Osmia', quiet = TRUE)