The rppo package contains just two functions. One to query terms from the Plant Phenology Ontology (PPO) and another to query the data global plant phenology data portal (PPO data portal). Following are three examples which illustrate use of these functions: the first two sections illustrate the ppo_data
and ppo_terms
functions and the third section illustrates how to use the functions together.
It is frequently useful to look through the list of present and absent terms contained in the PPO. The ppo_terms
function returns present terms, absent terms, or both, with columns containing a termID, label, definition and full URI for each term. Use the termIDs returned from this function to query terms in the ppo_data
function. The following example returns the present terms into a “present_terms” data frame and a sample slice from the dataframe.
present_terms <- ppo_terms(present = TRUE)
# print the first five rows, with just the termIDs and labels
print(present_terms[1:5,c("termID","label")])
#> termID label
#> 1 obo:PPO_0002359 abscised cones or seeds present
#> 2 obo:PPO_0002358 abscised fruits or seeds present
#> 3 obo:PPO_0002357 abscised leaves present
#> 4 obo:PPO_0002311 breaking leaf buds present
#> 5 obo:PPO_0002346 cones present
The ppo_data
function queries the PPO Data Portal, passing values to the database and extracting matching results. The results of the ppo_data
function are returned as a list with five elements: 1) a data frame containing data, 2) a readme string containing usage information and some statistics about the query itself, 3) a citation string containing information about proper citation, 4) a number_possible integer indicating the total number of results if a limit has been specified, and 5) a status code returned from the service.
The “df” variable below is populated with results from the data element in the results list, with an example slice of data showing the first record.
results <- ppo_data(genus = "Quercus", fromYear = 2013, toYear = 2013, fromDay = 100, toDay = 110, termID = 'obo:PPO_0002313', limit = 10)
df <- results$data
print(df[1:1,])
#> dayOfYear year genus specificEpithet latitude longitude
#> 1 106 2013 Quercus lobata 34.67545 -120.0407
#> termID
#> 1 obo:PPO_0002316,obo:PPO_0002322,obo:PPO_0002018,obo:PPO_0002318,obo:PPO_0002022,obo:PPO_0002312,obo:PPO_0002313,obo:PPO_0002014,obo:PPO_0002024,obo:PPO_0002015,obo:PPO_0002000,obo:PPO_0002017,obo:PPO_0002020,obo:PPO_0002320,obo:PPO_0002315
#> source eventId
#> 1 USA-NPN http://n2t.net/ark:/21547/Amg22054478
The readme and citation files returned by the list of results can be accessed by calling the readme and citation elements. Note that the the file “citation_and_data_use_policies.txt” that is referred to in the readme file can be accessed using cat(results$citation)
cat(results$readme)
#> The following contains information about your download from the Global Plant
#> Phenology Database. Please refer to the citation_and_data_use_policies.txt
#> file for important information about data usage policies, licensing, and
#> citation protocols for each dataset. This file contains summary information
#> about the query that was run.
#>
#> data file = data.csv
#> date query ran = Tue Jun 05 2018 19:44:07 GMT-0400 (EDT)
#> query = +genus:Quercus AND +plantStructurePresenceTypes:"http://purl.obolibrary.org/obo/PPO_0002313" AND +year:>=2013 AND +year:<=2013 AND +dayOfYear:>=100 AND +dayOfYear:<=110 AND source:USA-NPN,NEON
#> fields returned = dayOfYear,year,genus,specificEpithet,latitude,longitude,source,eventId
#> user specified limit = 10
#> total results possible = 518
#> total results returned = 0
The results lists also shows the number of possible results in the results set, which is useful if the submitted query had a limit. For example, in the query above, the limit is set to 10 but we want to know how many records were possible if the limit was not set.
cat(results$number_possible)
#> 518
Here we will generate a data frame showing the frequency of “present” and “absent” terms for a particular query. The query is for genus = “Quercus” and latitude > 47. For each row in the returned data frame ppo_data
will typically return multiple terms in the termID field, corresponding to phenological stages as defined by the PPO. For our example, we will generate a frequency table of the number of times “present” or “absent” term occur in the entire returned dataset. Note that the termID field returned by ppo_data
will return “presence” terms in addition to “present” and “absent” terms, while the ppo_terms
function only returns “present” and “absent” terms. Thus, our frequency distribution only counts the number of “present” and “absent” terms [For an in-depth discussion of the difference between “presence”, “present”, and “absent”, see https://www.frontiersin.org/articles/10.3389/fpls.2018.00517/full]. Finally, since termIDs are returned as URI identifiers and not easily readable text, this example maps termIDs to labels. The resulting data frame shows two columns: 1) a column of term labels, and 2) a frequency of the number of times this label appeared in the result set.
###############################################################################
# Generate a frequency data frame showing the number of times each termID
# is populated for genus equals "Quercus" above latitude of 47
# Note that all latitude/longitude queries need to be in the format of a
# bounding box
###############################################################################
df <- ppo_data(
genus = "Quercus",
bbox="47,-180,90,180")
#> sending request for data ...
#> https://www.plantphenology.org/api/v2/download/?q=%2Bgenus:Quercus+AND+%2Blatitude:>=47+AND+%2Blatitude:<=90+AND+%2Blongitude:>=-180+AND+%2Blongitude:<=180+AND+source:USA-NPN,NEON&source=latitude,longitude,year,dayOfYear,termID
# return just the termID column
t1 <- df$data[,c('termID')]
# paste each cell into one string
t2<-paste(t1, collapse = ",")
# split strings at ,
t3<-strsplit(t2, ",")
# create a frequency table as a data frame
freqFrame <- as.data.frame(table(t3))
# create a new data frame that we want to populate
resultFrame <- data.frame(
label = character(),
frequency = integer(),
stringsAsFactors = FALSE)
###############################################################################
# Replace termIDs with labels in frequency frame
###############################################################################
# fetch "present" and "absent" terms using `ppo_terms`
termList <- ppo_terms(absent = TRUE, present = TRUE);
#> sending request for terms ...
# loop all "present"" and "absent" terms
for (term in 1:nrow(termList)) {
termListTermID<-termList[term,'termID'];
termListLabel<-termList[term,'label'];
# loop all rows that have a frequency generated
for (row in 1:nrow(freqFrame)) {
freqFrameTermID = freqFrame[row,'t3']
freqFrameFrequency = freqFrame[row,'Freq']
# Populate resultFrame with matching "present" or "absent" labels.
# In this step, we will ignore "presence" terms
# found in the frequency frame since the ppo_terms only returns
# "present" and "absent" terms.
if (freqFrameTermID == termListTermID) {
resultFrame[nrow(resultFrame)+1,] <- c(termListLabel,freqFrameFrequency)
}
}
}
# print results, showing term labels and a frequency count
print(resultFrame)
#> label frequency
#> 1 abscised cones or seeds absent 365
#> 2 abscised fruits or seeds absent 365
#> 3 abscised leaves absent 365
#> 4 abscised leaves present 4
#> 5 breaking leaf buds absent 159
#> 6 breaking leaf buds present 32
#> 7 expanded immature true leaves absent 159
#> 8 expanding true leaves present 54
#> 9 expanding unfolded true leaves absent 159
#> 10 expanding unfolded true leaves present 22
#> 11 floral structures present 16
#> 12 flowers present 7
#> 13 fruits present 12
#> 14 immature unfolded true leaves absent 159
#> 15 immature unfolded true leaves present 22
#> 16 leaf buds present 32
#> 17 mature true leaves absent 159
#> 18 new above-ground shoot-borne shoot systems present 32
#> 19 new shoot system present 32
#> 20 non-dormant leaf buds present 32
#> 21 non-senesced floral structures absent 175
#> 22 non-senesced floral structures present 9
#> 23 non-senesced flowers present 7
#> 24 non-senescing unfolded true leaves absent 159
#> 25 non-senescing unfolded true leaves present 22
#> 26 open floral structures absent 175
#> 27 open flower heads absent 181
#> 28 open flowers absent 181
#> 29 open flowers present 7
#> 30 pollen-releasing floral structures absent 533
#> 31 pollen-releasing flower heads absent 358
#> 32 pollen-releasing flowers absent 358
#> 33 pollen-releasing flowers present 4
#> 34 reproductive structures present 28
#> 35 ripe fruits absent 362
#> 36 ripening fruits absent 176
#> 37 ripening fruits present 12
#> 38 senescing true leaves absent 342
#> 39 senescing true leaves present 6
#> 40 true leaves present 101
#> 41 unfolded true leaves absent 159
#> 42 unfolded true leaves present 69
#> 43 unfolding true leaves present 32
#> 44 unopened floral structures absent 175
#> 45 unripe fruits absent 176
#> 46 vascular leaves present 101