Introducing europepmc, an R interface to Europe PMC RESTful API

Najko Jahn

2018-04-20

What is searched?

Europe PMC is a repository of life science literature. Europe PMC ingests all PubMed content and extends its index with other sources, including Agricola, a bibliographic database of citations to the agricultural literature, or Biological Patents.

Index coverage

Index coverage

For more background on Europe PMC, see:

https://europepmc.org/About

Levchenko, M., Gou, Y., Graef, F., Hamelers, A., Huang, Z., Ide-Smith, M., … McEntyre, J. (2017). Europe PMC in 2017. Nucleic Acids Research, 46(D1), D1254–D1260. https://doi.org/10.1093/nar/gkx1005

How to search Europe PMC with R?

This client supports the Europe PMC search syntax. If you are unfamiliar with searching Europe PMC, check out the Europe PMC query builder, a very nice tool that helps you to create your queries. To make use of your Europe PMC queries in R, simply copy & paste the search string to the search functions of this package.

In the following, some examples how to search Europe PMC are presented.

Managing search results

By default, 100 records are returned, but the number of results can be expanded or limited with the limit parameter.

europepmc::epmc_search('"Human malaria parasites"', limit = 10)
#> # A tibble: 10 x 27
#>    id     source pmid   doi    title    authorString    journalTitle issue
#>    <chr>  <chr>  <chr>  <chr>  <chr>    <chr>           <chr>        <chr>
#>  1 29109… MED    29109… 10.11… Validat… Uddin T, McFad… Antimicrob … 1    
#>  2 28902… MED    28902… 10.11… A genet… Sayers CP, Mol… Cell Microb… 1    
#>  3 27894… MED    27894… 10.10… Plasmod… Maeno Y, Culle… Parasitology 4    
#>  4 28900… MED    28900… 10.11… Can Mix… Singh US, Siwa… Biomed Res … <NA> 
#>  5 29669… MED    29669… 10.11… The bio… Awono-Ambene P… Parasit Vec… 1    
#>  6 29370… MED    29370… 10.13… A novel… Komaki-Yasuda … PLoS One     1    
#>  7 27748… MED    27748… 10.10… Non-hum… Martinelli A, … Parasitology 1    
#>  8 PMC55… PMC    <NA>   <NA>   Can Mix… Singh US, Siwa… Biomed Res … <NA> 
#>  9 28525… MED    28525… 10.10… The use… Othman AS, Mar… Expert Rev … 7    
#> 10 28531… MED    28531… 10.13… Experim… Singh N, Barne… PLoS One     5    
#> # ... with 19 more variables: journalVolume <chr>, pubYear <chr>,
#> #   journalIssn <chr>, pubType <chr>, isOpenAccess <chr>, inEPMC <chr>,
#> #   inPMC <chr>, hasPDF <chr>, hasBook <chr>, citedByCount <int>,
#> #   hasReferences <chr>, hasTextMinedTerms <chr>,
#> #   hasDbCrossReferences <chr>, hasLabsLinks <chr>,
#> #   hasTMAccessionNumbers <chr>, firstPublicationDate <chr>,
#> #   pageInfo <chr>, pmcid <chr>, hasSuppl <chr>

Results are sorted by relevance. Other options via the sort parameter are

Loop over queries

Sometimes, you would like to send more than one search to Europe PMC at once. A simple solution is using plyr::ldply():

my_dois <- c(
  "10.1159/000479962",
  "10.1002/sctm.17-0081",
  "10.1161/strokeaha.117.018077",
  "10.1007/s12017-017-8447-9"
  )
  plyr::ldply(my_dois, function(x) {
  europepmc::epmc_search(paste0("DOI:", x))
  })
#>         id source     pmid                          doi
#> 1 28957815    MED 28957815            10.1159/000479962
#> 2 28941317    MED 28941317         10.1002/sctm.17-0081
#> 3 29018132    MED 29018132 10.1161/strokeaha.117.018077
#> 4 28623611    MED 28623611    10.1007/s12017-017-8447-9
#>                                                                                                                                 title
#> 1 Clinical Relevance of Patent Foramen Ovale and Atrial Septum Aneurysm in Stroke: Findings of a Single-Center Cross-Sectional Study.
#> 2                                 Concise Review: Extracellular Vesicles Overcoming Limitations of Cell Therapies in Ischemic Stroke.
#> 3                                                 One-Stop Management of Acute Stroke Patients: Minimizing Door-to-Reperfusion Times.
#> 4              Deferiprone Rescues Behavioral Deficits Induced by Mild Iron Exposure in a Mouse Model of Alpha-Synuclein Aggregation.
#>                                                                                                    authorString
#> 1                                 Schnieder M, Siddiqui T, Karch A, Bähr M, Hasenfuss G, Liman J, Schroeter MR.
#> 2                                                                    Doeppner TR, Bähr M, Hermann DM, Giebel B.
#> 3 Psychogios MN, Behme D, Schregel K, Tsogkas I, Maier IL, Leyhe JR, Zapf A, Tran J, Bähr M, Liman J, Knauth M.
#> 4                                     Carboni E, Tatenhorst L, Tönges L, Barski E, Dambeck V, Bähr M, Lingor P.
#>            journalTitle issue journalVolume pubYear            journalIssn
#> 1            Eur Neurol   5-6            78    2017 0014-3022; 1421-9913; 
#> 2 Stem Cells Transl Med    11             6    2017 2157-6564; 2157-6580; 
#> 3                Stroke    11            48    2017 0039-2499; 1524-4628; 
#> 4    Neuromolecular Med   2-3            19    2017 1535-1084; 1559-1174; 
#>    pageInfo
#> 1   264-269
#> 2 2044-2052
#> 3 3152-3155
#> 4   309-321
#>                                                               pubType
#> 1                                                     journal article
#> 2                                           review; journal article; 
#> 3 clinical trial; research support, non-u.s. gov't; journal article; 
#> 4                                 research-article; journal article; 
#>   isOpenAccess inEPMC inPMC hasPDF hasBook citedByCount hasReferences
#> 1            N      N     N      N       N            0             Y
#> 2            N      N     N      N       N            0             Y
#> 3            N      N     N      N       N            1             N
#> 4            Y      Y     N      Y       N            1             Y
#>   hasTextMinedTerms hasDbCrossReferences hasLabsLinks
#> 1                 N                    N            Y
#> 2                 N                    N            Y
#> 3                 N                    N            Y
#> 4                 Y                    N            Y
#>   hasTMAccessionNumbers firstPublicationDate      pmcid hasSuppl
#> 1                     N           2017-09-28       <NA>     <NA>
#> 2                     N           2017-09-23       <NA>     <NA>
#> 3                     N           2017-10-10       <NA>     <NA>
#> 4                     Y           2017-06-16 PMC5570801        Y

Output options

By default, a non-nested data frame printed as tibble is returned. Other formats are output = "id_list"" returning a list of IDs and sources, and output = “‘raw’”" to get full metadata as list. Please be aware that these lists can become very large.

More advanced options to search Europe PMC

Text-mined terms

Europe PMC parses article metadata for various concepts and terms.

Semantic types Description/Examples
accession A unique identifier given to a DNA or protein sequence record
chemical e.g. Granzymes, Peptides, Hydrogen
disease e.g. dysthymias, gid, icterohemorrhagic
efo Experimental Factor Ontology e.g. generation, health, mortality rate, scale, findings, genome etc.
gene_protein e.g. atp, cl-43, ecoriir, gng11, ipt1, mlks
go_term A Gene Ontology (GO) term e.g. annealing, neuroblasts
organism e.g. pneumocystidomycetes, sarus, terebratulide

Here’s how to search for publications about meningitis:

europepmc::epmc_search('disease:meningitis')
#> # A tibble: 100 x 27
#>    id     source pmid  pmcid doi   title  authorString  journalTitle issue
#>    <chr>  <chr>  <chr> <chr> <chr> <chr>  <chr>         <chr>        <chr>
#>  1 29304… MED    2930… PMC5… 10.1… Evalu… Mpoza E, Muk… PLoS One     1    
#>  2 29495… MED    2949… PMC5… 10.3… Menin… McCarthy PC,… Vaccines (B… 1    
#>  3 29253… MED    2925… PMC5… 10.1… Early… Kambiré D, S… J Infect     3    
#>  4 29580… MED    2958… PMC5… 10.1… Forei… Nasher F, Fö… BMC Microbi… 1    
#>  5 29509… MED    2950… PMC5… 10.3… Genet… Ousmane S, D… Antibiotics… 1    
#>  6 29454… MED    2945… PMC5… 10.1… Cereb… Takahashi K,… J Neuroinfl… 1    
#>  7 29547… MED    2954… PMC5… 10.1… Bioin… Andreae CA, … PLoS One     3    
#>  8 29594… MED    2959… PMC5… 10.3… Loop-… Seki M, Kilg… Front Pedia… <NA> 
#>  9 29364… MED    2936… PMC5… 10.1… The c… Yaesoubi R, … PLoS Med     1    
#> 10 29593… MED    2959… PMC5… 10.1… Preva… Lee H, Seo Y… Sci Rep      1    
#> # ... with 90 more rows, and 18 more variables: journalVolume <chr>,
#> #   pubYear <chr>, journalIssn <chr>, pageInfo <chr>, pubType <chr>,
#> #   isOpenAccess <chr>, inEPMC <chr>, inPMC <chr>, hasPDF <chr>,
#> #   hasBook <chr>, hasSuppl <chr>, citedByCount <int>,
#> #   hasReferences <chr>, hasTextMinedTerms <chr>,
#> #   hasDbCrossReferences <chr>, hasLabsLinks <chr>,
#> #   hasTMAccessionNumbers <chr>, firstPublicationDate <chr>

To see, which other terms were text-mined on the article level, use the europepmc::epmc_tm() function.

Data integrations

Another nice feature of Europe PMC is to search for cross-references between Europe PMC to other databases. For instance, to get publications cited by entries in the Protein Data bank in Europe published 2016:

europepmc::epmc_search('(HAS_PDB:y) AND FIRST_PDATE:2016')
#> # A tibble: 100 x 27
#>    id     source pmid  pmcid doi   title  authorString  journalTitle issue
#>    <chr>  <chr>  <chr> <chr> <chr> <chr>  <chr>         <chr>        <chr>
#>  1 28089… MED    2808… PMC5… 10.1… Struc… Sluchanko NN… Structure    2    
#>  2 28035… MED    2803… PMC5… 10.1… Struc… Waz S, Nakam… J Biol Chem  7    
#>  3 28030… MED    2803… PMC5… 10.1… Struc… Christensen … PLoS One     12   
#>  4 28028… MED    2802… PMC5… 10.1… Struc… Dow GT, Gilb… Protein Sci  3    
#>  5 28024… MED    2802… PMC5… 10.1… Cryst… Kuk AC, Mash… Nat Struct … 2    
#>  6 28011… MED    2801… PMC5… 10.1… Struc… Levdikov VM,… J Biol Chem  7    
#>  7 28009… MED    2800… PMC5… 10.1… Struc… Zhao H, Wei … Sci Rep      <NA> 
#>  8 28005… MED    2800… <NA>  10.1… Cycli… Coxon CR, An… J Med Chem   5    
#>  9 28004… MED    2800… <NA>  10.1… Disco… Cheeseman MD… J Med Chem   1    
#> 10 28065… MED    2806… <NA>  10.1… Kobuv… Klima M, Cha… Structure    2    
#> # ... with 90 more rows, and 18 more variables: journalVolume <chr>,
#> #   pubYear <chr>, journalIssn <chr>, pageInfo <chr>, pubType <chr>,
#> #   isOpenAccess <chr>, inEPMC <chr>, inPMC <chr>, hasPDF <chr>,
#> #   hasBook <chr>, hasSuppl <chr>, citedByCount <int>,
#> #   hasReferences <chr>, hasTextMinedTerms <chr>,
#> #   hasDbCrossReferences <chr>, hasLabsLinks <chr>,
#> #   hasTMAccessionNumbers <chr>, firstPublicationDate <chr>

The following sources are supported

To retrieve metadata about these external database links, use europepmc_epmc_db().

Citations and reference sections

Europe PMC let us also obtain citation metadata and reference sections. For retrieving citation metadata per article, use

europepmc::epmc_citations("9338777", limit = 500)
#> # A tibble: 216 x 11
#>    id     source citationType  title authorString journalAbbrevia… pubYear
#>    <chr>  <chr>  <chr>         <chr> <chr>        <chr>              <int>
#>  1 28437… MED    "research su… Thre… Colon-Moran… Virology            2017
#>  2 28054… MED    journal arti… Anti… Inoue Y, Yo… Ann Biomed Eng      2017
#>  3 27832… MED    "research-ar… Tran… Kim N, Choi… PLoS One            2016
#>  4 27649… MED    "research-ar… Comp… Nascimento … PLoS One            2016
#>  5 27527… MED    "review-arti… How … Denner J.    Viruses             2016
#>  6 27466… MED    "research su… Exis… Kuse K, Ito… J Virol             2016
#>  7 26991… MED    journal arti… Micr… Plotzki E, … Xenotransplanta…    2016
#>  8 26067… MED    "brief-repor… Comp… Tang HB, Ou… Genome Announc      2015
#>  9 26043… MED    "research su… Tole… Denner J, P… Virus Res           2015
#> 10 25956… MED    "research su… Viru… Plotzki E, … Virus Res           2015
#> # ... with 206 more rows, and 4 more variables: volume <chr>,
#> #   pageInfo <chr>, citedByCount <int>, issue <chr>

For reference section from an article:

europepmc::epmc_refs("28632490", limit = 200)
#> # A tibble: 169 x 19
#>    id     source citationType title   authorString  journalAbbrevia… issue
#>    <chr>  <chr>  <chr>        <chr>   <chr>         <chr>            <chr>
#>  1 12002… MED    JOURNAL ART… Triclo… Adolfsson-Er… Chemosphere      9-10 
#>  2 18795… MED    JOURNAL ART… In vit… Ahn KC, Zhao… Environ. Health… 9    
#>  3 18556… MED    JOURNAL ART… Effect… Aiello AE, C… Am J Public Hea… 8    
#>  4 17683… MED    JOURNAL ART… Consum… Aiello AE, L… Clin. Infect. D… <NA> 
#>  5 15273… MED    JOURNAL ART… Relati… Aiello AE, M… Antimicrob. Age… 8    
#>  6 18207… MED    JOURNAL ART… The in… Allmyr M, Ha… Sci. Total Envi… 1    
#>  7 17007… MED    JOURNAL ART… Triclo… Allmyr M, Ad… Sci. Total Envi… 1    
#>  8 26948… MED    JOURNAL ART… Pressu… Alvarez-Rive… J Chromatogr A   <NA> 
#>  9 23192… MED    JOURNAL ART… Exposu… Anderson SE,… Toxicol. Sci.    1    
#> 10 25837… MED    JOURNAL ART… Observ… Vladar EK, L… Methods Cell Bi… <NA> 
#> # ... with 159 more rows, and 12 more variables: pubYear <int>,
#> #   volume <chr>, pageInfo <chr>, citedOrder <int>, match <chr>,
#> #   essn <chr>, issn <chr>, publicationTitle <chr>, publisherLoc <chr>,
#> #   publisherName <chr>, externalLink <chr>, doi <chr>

Fulltext access

Europe PMC gives not only access to metadata, but also to full-texts. Adding AND (OPEN_ACCESS:y) to your search query, returns only those articles where Europe PMC has also the fulltext.

Fulltext as xml can accessed via the PubMed Central ID (PMCID):

europepmc::epmc_ftxt("PMC3257301")
#> {xml_document}
#> <article article-type="research-article" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:mml="http://www.w3.org/1998/Math/MathML">
#> [1] <front>\n  <journal-meta>\n    <journal-id journal-id-type="nlm-ta"> ...
#> [2] <body>\n  <sec id="s1">\n    <title>Introduction</title>\n    <p>Atm ...
#> [3] <back>\n  <ack>\n    <p>We would like to thank Dr. C. Gourlay and Dr ...