This task view contains information about using R to obtain and parse data from the web. The base version of R does not ship with many tools for interacting with the web. Thankfully, there are an increasingly large number of tools for interacting with the web. A list of available packages and functions is presented below, grouped by the type of activity.
If you have any comments or suggestions for additions or improvements for this taskview, go to GitHub and
submit an issue
, or make some changes and
submit a pull request
. If you can't contribute on GitHub,
send Scott an email
. If you have an issue with one of the packages discussed below, please contact the maintainer of that package.
If you know of a web service, API, data source, or other online resource that is not yet supported by an R package, consider adding it to
the package development to do list on GitHub
.
Tools for Working with the Web from R
Parsing Data from the Web
-
downloading files
:
download.file()
is in base R and commonly used way to download a file. However, downloading files over HTTPS is not supported in R's internal method for
download.file(). The
download()
function in the package
downloader
wraps
download.file(), and takes all the same arguments, but works for https across platforms.
-
tabular data as txt, csv, etc.
: You can use
read.table(),
read.csv(), and friends to read a table directly from a URL, or after acquiring the csv file from the web via e.g.,
getURL()
from RCurl.
read.csv()
works with http but not https, i.e.:
read.csv("http://..."), but not
read.csv("https://..."). You can download a file first before reading the file in R, and you can use
downloader
to download over https.
read.table()
and friends also have a
text
parameter so you can read a table if a table is encoded as a string with line breaks, etc.
-
JSON I/O
: JSON is
javascript object notation
. There are three packages for reading and writing JSON:
rjson,
RJSONIO, and
jsonlite.
jsonlite
includes a different parser from
RJSONIO
called
yajl
. We recommend using
jsonlite. Check out the paper describing jsonlite by Jeroen Ooms
http://arxiv.org/abs/1403.2805
.
-
XML/HTML I/O
: The package
XML
contains functions for parsing XML and HTML, and supports xpath for searching XML (think regex for strings). A helpful function to read data from one or more HTML tables is
readHTMLTable().
XML
also includes
XPATH
parsing ability, see
xpathApply()
and
xpathSApply(). The
XML2R
package is a collection of convenient functions for coercing XML into data frames (development version
on GitHub
). An alternative to
XML
is
selectr
, which parses CSS3 Selectors and translates them to XPath 1.0 expressions.
XML
package is often used for parsing xml and html, but
selectr
translates CSS selectors to XPath, so can use the CSS selectors instead of XPath. The
selectorgadget browser extension
can be used to identify page elements.
RHTMLForms
reads HTML documents and obtains a description of each of the forms it contains, along with the different elements and hidden fields.
scrapeR
provides additional tools for scraping data from HTML and XML documents.
htmltab
extracts structured information from HTML tables, similar to
XML::readHTMLTable
of the
XML
package, but automatically expands row and column spans in the header and body cells, and users are given more control over the identification of header and body rows which will end up in the R table.
-
rvest
: rvest scrapes html from web pages, and is designed to work with
magrittr
to make it easy to express common web scraping tasks.
-
The
tldextract
package extract top level domains and subdomains from a host name. It's a port of
a Python library of the same name
.
-
webutils: Utility functions for developing web applications. Parsers for
application/x-www-form-urlencoded
as well as
multipart/form-data.
Source on Github
-
urltools: URL encoding, decoding, parsing, and parameter extraction.
Source on Github
-
The
repmis
package contains a
source_data()
command to load and cache plain-text data from a URL (either http or https). It also includes
source_Dropbox()
for downloading/caching plain-text data from non-public Dropbox folders and
source_XlsxData()
for downloading/caching Excel xlsx sheets.
-
rsdmx
provides tools to read data and metadata documents exchanged through the Statistical Data and Metadata Exchange (SDMX) framework. The package currently focuses on the SDMX XML standard format (SDMX-ML).
project website (Github)
.
Curl, HTTP, FTP, HTML, XML, SOAP
-
RCurl: A low level curl wrapper that allows one to compose general HTTP requests and provides convenient functions to fetch URIs, get/post forms, etc. and process the results returned by the Web server. This provides a great deal of control over the HTTP/FTP connection and the form of the request while providing a higher-level interface than is available just using R socket connections. It also provide tools for Web authentication.
-
httr: A light wrapper around
RCurl
that makes many things easier, but still allows you to access the lower level functionality of
RCurl. It has convenient http verbs:
GET(),
POST(),
PUT(),
DELETE(),
PATCH(),
HEAD(),
BROWSE(). These wrap functions are more convenient to use, though less configurable than counterparts in
RCurl. The equivalent of httr's
GET()
in
RCurl
is
getForm(). Likewise, the equivalent of
httr
's
POST()
in
RCurl
is
postForm(). http status codes are helpful for debugging http calls. This package makes this easier using, for example,
stop_for_status()
gets the http status code from a response object, and stops the function if the call was not successful. See also
warn_for_status(). Note that you can pass in additional Curl options to the
config
parameter in http calls.
-
curl: The
curl
package has a function
curl()
that's a drop-in replacement for base
url()
with better performance and support for http 2.0, ssl (https, ftps), gzip, deflate and more. There's also a replacement for
download.file()
called
download_curl().
Source on Github
-
The
XMLRPC
package provides an implementation of XML-RPC, a relatively simple remote procedure call mechanism that uses HTTP and XML. This can be used for communicating between processes on a single machine or for accessing Web services from within R.
-
The
XMLSchema
package provides facilities in R for reading XML schema documents and processing them to create definitions for R classes and functions for converting XML nodes to instances of those classes. It provides the framework for meta-computing with XML schema in R
-
RTidyHTML
interfaces to the libtidy library for correcting HTML documents that are not well-formed. This library corrects common errors in HTML documents.
-
W3CMarkupValidator
provides an R Interface to W3C Markup Validation Services for validating HTML documents.
-
SSOAP
provides a client-side SOAP (Simple Object Access Protocol) mechanism. It aims to provide a high-level interface to invoke SOAP methods provided by a SOAP server.
-
Rcompression: Interface to zlib and bzip2 libraries for performing in-memory compression and decompression in R. This is useful when receiving or sending contents to remote servers, e.g. Web services, HTTP requests via RCurl. (not on CRAN)
-
The
CGIwithR
package allows one to use R scripts as CGI programs for generating dynamic Web content. HTML forms and other mechanisms to submit dynamic requests can be used to provide input to R scripts via the Web to create content that is determined within that R script. (not on CRAN)
-
httpRequest: HTTP Request protocols. Implements the GET, POST and multipart POST request.
Authentication
-
Using web resources can require authentication, either via API keys, OAuth, username:password combination, or via other means. Additionally, sometimes web resources that require authentication be in the header of an http call, which requires a little bit of extra work. API keys and username:password combos can be combined within a url for a call to a web resource (api key: http://api.foo.org/?key=yourkey; user/pass: http://username:password@api.foo.org), or can be specified via commands in
RCurl
or
httr. OAuth is the most complicated authentication process, and can be most easily done using
httr. See the 6 demos within
httr, three for OAuth 1.0 (linkedin, twitter, vimeo) and three for OAuth 2.0 (facebook, GitHub, google).
ROAuth
is a package that provides a separate R interface to OAuth. OAuth is easier to to do in
httr, so start there.
Web Frameworks
-
DeployR Open
is a server-based framework for integrating R into other applications via Web Services.
-
The
shiny
package makes it easy to build interactive web applications with R.
-
The
Rook
web server interface contains the specification and convenience software for building and running Rook applications.
-
The
opencpu
framework for embedded statistical computation and reproducible research exposes a web API interfacing R, LaTeX and Pandoc.
This API is used for example to integrate statistical functionality into systems, share and execute scripts or reports on centralized servers, and build R based apps.
-
A package by
Yihui Xie
called
servr
provides a simple HTTP server to serve files under a given directory based on the
httpuv
package.
-
The
httpuv
package, made by Joe Cheng at RStudio, provides low-level socket and protocol support for handling HTTP and WebSocket requests directly within R. Another related package, perhaps which
httpuv
replaces, is websockets, also made by Joe Cheng.
-
websockets
: A simple HTML5 websocket interface for R, by Joe Cheng. Available in CRAN archives.
-
Plot.ly is a company that allows you to create visualizations in the web using R (and Python). They have an R package in development
here
, as well as access to their services via
a REST API
. (not on CRAN)
-
The
WADL
package provides tools to process Web Application Description Language (WADL) documents and to programmatically generate R functions to interface to the REST methods described in those WADL documents. (not on CRAN)
-
The
RDCOMServer
provides a mechanism to export R objects as (D)COM objects in Windows. It can be used along with the
RDCOMClient
package which provides user-level access from R to other COM servers. (not on CRAN)
-
The
RSelenium
package (development version on GitHub
here
) provides a set of R bindings for the Selenium 2.0 webdriver using the
JsonWireProtocol
. Selenium automates browsers. Using RSelenium you can automate browsers locally or remotely. This can aid in automated application testing, load testing and web scraping. Examples are given interacting with popular projects such as
shiny
and
sauceLabs
.
-
rapporter.net
provides an online environment (SaaS) to host and run
rapport
statistical report templates in the cloud.
-
neocities
wraps the API for the
Neocities
web hosting service. (not on CRAN)
-
The
Tiki
Wiki CMS/Groupware framework has an R plugin (
PluginR
) to run R code from wiki pages, and use data from their own collected web databases (trackers). A demo:
http://r.tiki.org
. More info in a
useR!2013 presentation
.
-
The
MediaWiki
has an extension (
Extension:R
) to run R code from wiki pages, and use uploaded data. Links to demo pages (in German) can be found at the
category page for R scripts
at MM-Stat. A mailing list is available:
R-sig-mediawiki
.
-
whisker: Implementation of logicless templating based on
Mustache
in R. Mustache syntax is described in
http://mustache.github.io/mustache.5.html
JavaScript
-
ggvis
makes it easy to describe interactive web graphics in R. It fuses the ideas of ggplot2 and
shiny, rendering graphics on the web with Vega.
-
rCharts
(not on CRAN) allows for interactive Javascript charts from R.
-
rVega
(not on CRAN) is an R wrapper for Vega.
-
clickme
(not on CRAN) is an R package to create interactive plots.
-
animint
(not on CRAN) allows an interactive animation to be defined using a list of ggplots with clickSelects and showSelected aesthetics, then exported to CSV/JSON/D3/JavaScript for viewing in a web browser.
-
The
SpiderMonkey
package provides a means of evaluating JavaScript code, creating JavaScript objects and calling JavaScript functions and methods from within R. This can work by embedding the JavaScript engine within an R session or by embedding R in an browser such as Firefox and being able to call R from JavaScript and call back to JavaScript from R.
-
d3Network: Tools for creating D3 JavaScript network, tree, dendrogram, and Sankey graphs from R.
Code sharing
-
gistr: Work with GitHub gists (
gist.github.com
) from R.
gistr
allows you to create new gists, update gists with new files, rename files, delete files, get and delete gists, star and un-star gists, fork gists, open a gist in your default browser, get embed code for a gist, list gist commits, and get rate limit information when authenticated.
Source on Github
Data Sources on the Web Accessible via R
Agriculture
-
cimis
: R package for retrieving data from CIMIS, the California Irrigation Management Information System. Available in CRAN archives only.
-
FAOSTAT: The package hosts a list of functions to download, manipulate, construct and aggregate agricultural statistics provided by the FAOSTAT (Food and Agricultural Organization of the United Nations) database.
Amazon Web Services
-
awsConnect
(not on CRAN): Another package using the AWS Command Line Interface to control EC2 and S3. Only available for Linux and Mac OS.
-
MTurkR: Access to Amazon Mechanical Turk Requester API via R. Development version on GitHub
here
.
-
RAmazonDBREST
provides an interface to Amazon's Simple DB API.
-
RAmazonS3
package provides the basic infrastructure within R for communicating with the S3 Amazon storage server.
This is a commercial server that allows one to store content and retrieve it from any machine connected to the Internet.
-
s3mpi
(not on CRAN): Another packages for interacting with Amazon S3.
-
segue: Another package for managing EC2 instances and S3 storage, which includes a parallel lapply function for the Elastic Map Reduce (EMR) engine called
emrlapply(). Uses Hadoop Streaming on Amazon's EMR in order to get simple parallel computation.
Astronomy
-
RStars: Star-API provides API access to the American Museum of Natural History's Digital Universe Data, including positions, luminosity, color, and other data on over 100,000 stars as well as constellations, exo-planets, clusters and others.
Source on Github
.
E-commerce
Chemistry
-
rpubchem: Interface to the PubChem Collection.
Cloud hosting
-
analogsea
: A general purpose R client for the Digital Ocean v2 API. In addition, the package includes functions to install various R tools including base R, RStudio server, and more. There's an improving interface to interact with docker on your remote droplets via this package.
Data Depots
-
ckanr
: A generic R client to interact with the CKAN data portal software API (
http://ckan.org/
). Allows user to swap out the base URL to use any CKAN instance.
-
dataone
: Read/write access to data and metadata from the
DataONE network
of Member Node data repositories.
-
dvn: Provides access to The Dataverse Network API.
-
factualR: Thin wrapper for the
Factual.com
server API.
-
rfigshare: Programmatic interface for
Figshare.com
.
-
infochimps
: An R wrapper for the infochimps.com API services, from
Drew Conway
. The CRAN version is archived. Development is available on GitHub
here
.
-
jSonarR: Enables users to access MongoDB by running queries and returning their results in R data frames. jSonarR uses data processing and conversion capabilities in the jSonar Analytics Platform and the
JSON Studio Gateway
, to convert JSON to a tabular format.
-
RSocrata: Provided with a Socrata dataset resource URL, or a Socrata SoDA web API query, returns an R data frame. Converts dates to POSIX format. Supports CSV and JSON. Manages throttling by Socrata.
-
Quandl: A package that interacts directly with the
Quandl
API to offer data in a number of formats usable in R, as well as the ability to upload and search.
-
rdatamarket: Fetches data from DataMarket.com, either as timeseries in zoo form (dmseries) or as long-form data frames (dmlist).
-
rerddap
: A generic R client to interact with any ERDDAP instance, which is a special case of OPeNDAP (
https://en.wikipedia.org/wiki/OPeNDAP
), or
Open-source Project for a Network Data Access Protocol
. Allows user to swap out the base URL to use any ERDDAP instance.
-
RSocrata
: (temporarily archived on CRAN for email bounce) Provided with a Socrata dataset resource URL, or a Socrata SoDA web API query, returns an R data frame. Converts dates to POSIX format. Supports CSV and JSON. Manages throttling by Socrata.
-
yhatr: Lets you deploy, maintain, and invoke models via the
Yhat
REST API.
Data Science Tools
Earth Science
-
BerkeleyEarth
: Data input for Berkeley Earth Surface Temperature. Archived on CRAN.
-
CHCN: A compilation of historical through contemporary climate measurements scraped from the Environment Canada Website Including tools for scraping data, creating metadata and formatting temperature files.
-
crn: Provides the core functions required to download and format data from the Climate Reference Network. Both daily and hourly data are downloaded from the ftp, a consolidated file of all stations is created, station metadata is extracted. In addition functions for selecting individual variables and creating R friendly datasets for them is provided.
-
dataRetrieval: Collection of functions to help retrieve USGS data from either web services or user-provided data files.
on GitHub
.
-
decctools: Provides functions for retrieving energy statistics from the United Kingdom Department of Energy and Climate Change and related data sources. The current version focuses on total final energy consumption statistics at the local authority, MSOA, and LSOA geographies. Methods for calculating the generation mix of grid electricity and its associated carbon intensity are also provided.
-
GhcnDaily: A package that downloads and processes Global Historical Climatology Network (GHCN) daily data from the National Climatic Data Center (NCDC).
-
marmap: Import, plot and analyze bathymetric and topographic data from NOAA.
-
Metadata
: Collates metadata for climate surface stations. Archived on CRAN.
-
meteoForecast: meteoForecast is a package to access to several Numerical Weather Prediction services both in raster format and as a time series for a location. Currenty it works with
GFS
,
Meteogalicia
,
OpenMeteo
,
NAM
, and
RAP
.
Source on Github
-
okmesonet: Retrieves Oklahoma (USA) Mesonet climatological data provided by the Oklahoma Climatological Survey.
-
raincpc: The Climate Prediction Center's (CPC) daily rainfall data for the entire world, from 1979 to the present, at a resolution of 50 km (0.5 degrees lat-lon). This package provides functionality to download and process the raw data from CPC.
-
rainfreq: Estimates of rainfall at desired frequency and desired duration are often required in the design of dams and other hydraulic structures, catastrophe risk modeling, environmental planning and management. One major source of such estimates for the USA is the NOAA National Weather Service's (NWS) division of Hydrometeorological Design Studies Center (HDSC). Raw data from NWS-HDSC is available at 1-km resolution and comes as a huge number of GIS files.
-
rFDSN: Search for and download seismic time series in miniSEED format (a minimalist version of the Standard for the Exchange of Earthquake Data) from
International Federation of Digital Seismograph Networks
repositories. This package can also be used to gather information about seismic networks (stations, channels, locations, etc) and find historical earthquake data (origins, magnitudes, etc).
-
RNCEP: Obtain, organize, and visualize
NCEP
weather data.
-
rnoaa: R interface to NOAA Climate data API.
-
rNOMADS: An interface to the
NOAA Operational Model Archive and Distribution System (NOMADS)
that allows download of global and regional weather model data, and supports a variety of models ranging from global weather data to an altitude of 40 km, to high resolution regional weather models, to wave and sea ice models. It can also retrieve archived NOMADS models. Source:
rnomads.
-
rnrfa: Utility functions to retrieve data from the UK National River Flow Archive via an API (http://www.ceh.ac.uk/data/nrfa/). There are functions to retrieve stations falling in a bounding box, to generate a map and extracting time series and general information.
-
soilDB: A collection of functions for reading data from USDA-NCSS soil databases.
-
sos4R: A client for Sensor Observation Services (SOS) as specified by the Open Geospatial Consortium (OGC). It allows users to retrieve metadata from SOS web services and to interactively create requests for near real-time observation data based on the available sensors, phenomena, observations, etc. using thematic, temporal and spatial filtering.
-
waterData: An R Package for retrieval, analysis, and anomaly calculation of daily hydrologic time series data.
-
weatherData: Functions that help in fetching weather data from websites. Given a location and a date range, these functions help fetch weather data (temperature, pressure etc.) for any weather related analysis.
Ecological and Evolutionary Biology
-
ALA4R
(not on CRAN): Programmatic R interface to the
Atlas of Living Australia
.
-
dismo: Species distribution modeling, with wrappers to some APIs.
-
ecoengine: ecoengine (
http://ecoengine.berkeley.edu/
) provides access to more than 2 million georeferenced specimen records from the Berkeley Natural History Museums.
http://bnhm.berkeley.edu/
-
ecoretriever: Provides an R interface to the
EcoData Retriever
via the EcoData Retriever's command line interface. The EcoData Retriever automates the tasks of finding, downloading, and cleaning ecological datasets, and then stores them in a local database (including SQLite, MySQL, etc.).
On GitHub
.
-
flora: Retrieve taxonomical information of botanical names from the Flora do Brasil website.
-
neotoma
(not on CRAN): Programmatic R interface to the Neotoma Paleoecological Database.
-
paleobioDB: Functions to wrap each endpoint of the PaleobioDB API, plus functions to visualize and process the fossil data. The API documentation for the Paleobiology Database can be found at
http://paleobiodb.org/data1.1/
.
-
rbison: Wrapper to the USGS Bison API.
-
Rcolombos: This package provides programmatic access to Colombos, a web based interface for exploring and analyzing comprehensive organism-specific cross-platform expression compendia of bacterial organisms.
-
rebird: A programmatic interface to the eBird database.
-
rdopa
(not on CRAN): Access data from the
Digital Observatory for Protected Areas
(DOPA) REST API.
Source on Github
-
Reol: An R interface to the Encyclopedia of Life (EOL) API. Includes functions for downloading and extracting information off the EOL pages.
-
rfishbase: A programmatic interface to fishbase.org.
-
rfisheries: Package for interacting with fisheries databases at openfisheries.org.
-
rgbif: Interface to the Global Biodiversity Information Facility API methods.
-
rnbn: An R interface to the
UK National Biodiversity Network
. Development version on GitHub
here
.
-
rnpn
(not on CRAN): Wrapper to the National Phenology Network database API.
-
rPlant: An R interface to the the many computational resources iPlant offers through their RESTful application programming interface. Currently,
rPlant
functions interact with the iPlant foundational API, the Taxonomic Name Resolution Service API, and the Phylotastic Taxosaurus API. Before using rPlant, users will have to register with the
iPlant Collaborative
-
rvertnet: A wrapper to the VertNet collections database API.
-
rWBclimate: R interface for the World Bank climate data.
-
rYoutheria: A programmatic interface to web-services of Youtheria, an online database of mammalian trait data. Development version on GitHub
here
-
spocc: A programmatic interface to many species occurrence data sources, including GBIF, USGS's BISON, iNaturalist, Berkeley Ecoinformatics Engine eBird, AntWeb, and more as they sources become easily available.
-
taxize: Taxonomic information from around the web.
-
The
tpl
package, created by Gustavo Carvalho, doesn't interact with the web directly, but queries locally stored data from
theplantlist.org
, and data will be updated when theplantlist updates, which is not very often. There is another package for interacting with this same data, called
Taxonstand.
-
treebase: An R package for discovery, access and manipulation of online phylogenies.
Economics and Business
-
blsAPI: Get data from the U.S. Bureau of Labor Statistics API. Users provide parameters as specified in
http://www.bls.gov/developers/api_signature.htm
and the function returns a JSON string.
Source on Github
-
ONETr
searches and retrieves occupational data from
O*NET Online
. Development version on GitHub
here
.
-
pxweb: Generic interface for the PX-Web/PC-Axis API. The PX-Web/PC-Axis API is used by organizations such as Statistics Sweden and Statistics Finland to disseminate data. The R package can interact with all PX-Web/PC-Axis APIs to fetch information about the data hierarchy, extract metadata and extract and parse statistics to R data.frame format.
Source on GitHub
.
-
psidR
Contains functions to download and format longitudinal datasets from the Panel Study of Income Dynamics (PSID).
-
WDI: Search, extract and format data from the World Bank's World Development Indicators.
-
The
Zillow
package provides an R interface to the
Zillow
Web Service API. It allows one to get the Zillow estimate for the price of a particular property specified by street address and ZIP code (or city and state), to find information (e.g. size of property and lot, number of bedrooms and bathrooms, year built.) about a given property, and to get comparable properties.
Finance
-
Datastream2R
(not on CRAN): Another package for accessing the Datastream service. This package downloads data from the Thomson Reuters DataStream DWE server, which provides XML access to the Datastream database of economic and financial information.
-
fImport: Environment for teaching "Financial Engineering and Computational Finance"
-
IBrokers: Provides native R access to Interactive Brokers Trader Workstation API.
-
pdfetch: A package for downloading economic and financial time series from public sources.
-
quantmod: Functions for financial quantitative modelling as well as data acquisition, plotting and other utilities.
-
Rbitcoin: Ineract with Bitcoin. Both public and private API calls. Support HTTP over SSL. Debug messages of Rbitcoin, debug messages of RCurl, error handling.
-
rbitcoinchartsapi: An R package for the
BitCoinCharts.com
API. From their website: "Bitcoincharts provides financial and technical data related to the Bitcoin network and this data can be accessed via a JSON application programming interface (API)."
-
RCryptsy
Wraps the API for the
Cryptsy
crypto-currency trading platform.
Source on GitHub
.
-
RDatastream
(not on CRAN): An R interface to the
Thomson Dataworks Enterprise SOAP API
(paid), with some convenience functions for retrieving Datastream data specifically.
-
RJSDMX: Retrieve data and metadata from SDMX compliant data providers..
On Github
.
-
TFX: Connects to TrueFX(tm) for free streaming real-time and historical tick-by-tick market data for dealable interbank foreign exchange rates with millisecond detail.
-
Thinknum: Interacts with the
Thinknum
API.
-
tseries: Includes the
get.hist.quote
for historical financial data.
-
ustyc: US Treasury yield curve data retrieval. Development version on GitHub
here
.
Genes and Genomes
-
cgdsr: R-Based API for accessing the MSKCC Cancer Genomics Data Server (CGDS).
-
rsnps: This package is a programmatic interface to various SNP datasets on the web: openSNP, NBCI's dbSNP database, and Broad Institute SNP Annotation and Proxy Search. This package started as a library to interact with openSNP alone, so most functions deal with openSNP.
-
chromer: A programmatic interface to the
Chromosome Counts Database
.
Source on Github
-
The
mygene.r
package is an R client for accessing
Mygene.info
annotation and query services.
-
primerTree: Visually Assessing the Specificity and Informativeness of Primer Pairs.
-
seq2R
: Detect compositional changes in genomic sequences - with some interaction with GenBank. Archived on CRAN.
-
seqinr: Exploratory data analysis and data visualization for biological sequence (DNA and protein) data.
-
NCBI EUtils web services: See the NCBI section
Google Web Services
-
bigrquery
(not on CRAN): An interface to Google's bigquery from R.
-
GFusionTables
(not on CRAN): An R interface to Google Fusion Tables. Google Fusion Tables is a data mangement system in the cloud. This package provides R functions to browse Fusion Tables catalog, retrieve data from Gusion Tables dtd storage to R and to upload data from R to Fusion Tables
-
googlePublicData
: (archived on CRAN for email bounce) An R library to build Google's public data explorer DSPL metadata files.
-
googleVis: Interface between R and the Google chart tools.
-
plotGoogleMaps: Plot SP or SPT(STDIF,STFDF) data as HTML map mashup over Google Maps.
-
plotKML: Visualization of spatial and spatio-temporal objects in Google Earth.
-
RGA: Provides functions for accessing and retrieving data from the
Google Analytics APIs
. Supports OAuth 2.0 authorization. Also, the
RGA
package provides a shiny app to explore data. There is another R package for the same service (RGoogleAnalytics); see above entry.
-
RGoogleAnalytics: Provides functions for accessing and retrieving data from the Google Analytics API.
Source on Github
. There is another R package for the same service (RGA); see next entry.
-
The
RGoogleDocs
package is an example of using the RCurl and XML packages to quickly develop an interface to the Google Documents API.
-
RGoogleStorage
provides programmatic access to the Google Storage API. This allows R users to access and store data on Google's storage. We can upload and download content, create, list and delete folders/buckets, and set access control permissions on objects and buckets.
-
RGoogleTrends
provides programmatic access to Google Trends data. This is information about the popularity of a particular query.
-
translate: Bindings for the Google Translate API v2
-
translateR
provides bindings for both Google and Microsoft translation APIs.
-
googlePublicData: An R library to build Google's public data explorer DSPL metadata files.
Government
-
acs: Download, manipulate, and present data from the US Census American Community Survey.
-
BerlinData: Easy access to
http://daten.berlin.de
. It allows you to search through the data catalogue and to download the data directly from within R. Development version on GitHub
here
.
-
dkstat
(not on CRAN): A package to access the
StatBank API
from
Statistics Denmark
.
-
EIAdata: U.S.
Energy Information Administration (EIA)
API client.
-
enigma:
Enigma
holds many public datasets from governments, companies, universities, and organizations. Enigma provides an API for data, metadata, and statistics on each of the datasets. enigma is an R client to interact with the Enigma API, including getting the data and metadata for datasets in Enigma, as well as collecting statistics on datasets. In addition, you can download a gzipped csv file of a dataset if you want the whole dataset. An API key from Enigma is required to use enigma.
Source on Github
.
-
federalregister: Client package for the U.S. Federal Register API. Development version on GitHub
here
.
-
govStatJPN: Functions to get public survey data in Japan.
-
polidata: Access to various political data APIs, including e.g.
Google Civic Information API
or
Sunlight Congress API
for US Congress data, and
POPONG API
for South Korea National Assembly data.
Source on Github
-
pollstR: An R client for the Huffpost Pollster API. Development version on GitHub
here
.
-
pvsR: An R package to interact with the Project Vote Smart API for scientific research.
-
recalls: Access U.S. Federal Government Recall Data. Development version on GitHub
here
.
-
ropensecretsapi: An R package for the OpenSecrets.org web services API.
-
RPublica: ProPublica API Client. Development version on GitHub
here
.
-
rsunlight: R client for the Sunlight Labs APIs. There are functions for Sunlight Labs Congress, Transparency, Open States, Real Time Congress, Capitol Words, and Influence Explorer APIs. Data outputs are R lists. There are also a few convenience functions for visualizing data and writing data to .csv.
Source on GitHub
.
-
rtimes
: (not on CRAN) R client for the New York Times APIs, including the Congress, Article Search, Campaign Finance, and Geographic APIs. The focus is on those that deal with political data, but throwing in Article Search and Geographic for good measure.
Source on GitHub
.
-
sorvi: Various tools for retrieving and working with Finnish open government data. Development version on GitHub
here
.
-
wethepeople
: An R client for interacting with the White House's "We The People" petition API.
Literature, Metadata, Text, and Altmetrics
-
alm: R wrapper to the almetrics API platform developed by PLoS.
-
aRxiv: An R client for the arXiv API, a repository of electronic preprints for computer science, mathematics, physics, quantitative biology, quantitative finance, and statistics.
Source on Github
.
-
The
Aspell
package provides an interface to the aspell library for checking the spelling of words and documents.
-
boilerpipeR: Generic Extraction of main text content from HTML files; removal of ads, sidebars and headers using the boilerpipe Java library.
-
JSTORr
(Not on CRAN): Simple text mining of journal articles from JSTOR's Data for Research service
-
ngramr: Retrieve and plot word frequencies through time from the Google Ngram Viewer.
-
OAIHarvester: Harvest metadata using the Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH).
-
pubmed.mineR: An R package for text mining of
PubMed Abstracts
. Supports fetching text and XML from PubMed.
-
rAltmetric: Query and visualize metrics from Altmetric.com.
-
rbhl: R interface to the Biodiversity Heritage Library (BHL) API.
-
RefManageR: Import and Manage BibTeX and BibLaTeX references with RefManager.
-
rentrez: Talk with NCBI entrez using R.
-
RMendeley
: Implementation of the Mendeley API in R. Archived on CRAN. It's been archived on CRAN temporarily until pacakge is updated for the new Mendeley API.
-
rmetadata
(not on CRAN): Get scholarly metadata from around the web.
-
rorcid
(not on CRAN): A programmatic interface the Orcid.org API.
-
rplos: A programmatic interface to the Web Service methods provided by the Public Library of Science journals for search.
-
rpubmed
(not on CRAN): Tools for extracting and processing Pubmed and Pubmed Central records.
-
scholar
provides functions to extract citation data from Google Scholar. Convenience functions are also provided for comparing multiple scholars and predicting future h-index values.
-
The
Sxslt
package is an R interface to Dan Veillard's libxslt translator. It allows R programmers to use XSLT directly from within R, and also allows XSL code to make use of R functions.
-
tm.plugin.webmining: Extensible text retrieval framework for news feeds in XML (RSS, ATOM) and JSON formats. Currently, the following feeds are implemented: Google Blog Search, Google Finance, Google News, NYTimes Article Search, Reuters News Feed, Yahoo Finance and Yahoo Inplay.
-
WikipediR: WikipediR is a wrapper for the MediaWiki API, aimed particularly at the Wikimedia 'production' wikis, such as Wikipedia.
Source on Github
Machine Learning as a Service
-
bigml: BigML, a machine learning web service.
-
indicoio: R-based client for Machine Learning APIs at
http://indico.io
. Wrappers for Positive/Negative Sentiment Analysis, Political Sentiment Analysis, Image Feature Extraction, Facial Emotion Recognition, Facial Feature Extraction, and Language Detection.
Source on Github
-
MTurkR: Access to Amazon Mechanical Turk Requester API via R.
Maps
-
The
GeoIP
package maps IP addresses and host names to geographic locations - latitude, longitude, region, city, zip code, etc.
-
ggmap: Allows for the easy visualization of spatial data and models on top of Google Maps, OpenStreetMaps, Stamen Maps, or CloudMade Maps using ggplot2.
-
leafletR: Allows you to display your spatial data on interactive web-maps using the open-source JavaScript library Leaflet.
-
osmar: This package provides infrastructure to access OpenStreetMap data from different sources to work with the data in common R manner and to convert data into available infrastructure provided by existing R packages (e.g., into sp and igraph objects).
-
The
R2GoogleMaps
package - which is different from
RgoogleMaps
- provides a mechanism to generate JavaScript code from R that displays data using Google Maps.
-
RgoogleMaps: This package serves two purposes: It provides a comfortable R interface to query the Google server for static maps, and use the map as a background image to overlay plots within R.
-
The
RKML
is an implementation that provides users with high-level facilities to generate KML, the Keyhole Markup Language for display in, e.g., Google Earth.
-
RKMLDevice
allows to create R graphics in KML format in a manner that allows them to be displayed on Google Earth (or Google Maps).
-
rydn
(not on CRAN): R package to interface with the Yahoo Developers network geolocation APIs.
Marketing
-
anametrix: Bidirectional connector to Anametrix API.
Media: Images, Graphics, Videos, Music
-
colourlovers: Extracts colors and multi-color patterns from
COLOURlovers
, for use in creating R graphics color palettes. Development version on GitHub
here
.
-
imguR: A package to share plots using the image hosting service
Imgur.com
. The development version is on GitHub
here
. knitr also has a function
imgur_upload()
to load images from literate programming documents.
-
meme
(not on CRAN): Provides the ability to create internet memes from template images using several online meme-generation services.
-
RLastFM
: A package to interface to the last.fm API. Archived on CRAN.
-
rscribd
(not on CRAN): API client for publishing documents to
Scribd
.
-
The
RUbigraph
package provides an R interface to a Ubigraph server for drawing interactive, dynamic graphs.
You can add and remove vertices/nodes and edges in a graph and change their attributes/characteristics such as shape, color, size.
NCBI
-
hoardeR: Information retrieval from NCBI databases, with main focus on Blast.
-
rentrez: Talk with NCBI Eutils API using R. This is probably the best package to interact with NCBI EUtils. You can get data across all the databases in NCBI EUtils.
Source on Github
-
reutils: Interface with NCBI databases such as PubMed, Genbank, or GEO via the Entrez Programming Utilities (EUtils).
Source on Github
.
-
RISmed: Download content from NCBI databases. Intended for analyses of NCBI database content, not reference management. See rpubmed for more literature oriented stuff from NCBI.
News
-
GuardianR: Provides an interface to the Open Platform's Content API of the Guardian Media Group. It retrieves content from news outlets The Observer, The Guardian, and guardian.co.uk from 1999 to current day.
-
rtimes
(not on CRAN): R client for the New York Times APIs, including the Congress, Article Search, Campaign Finance, and Geographic APIs.
Other
-
AWS.tools
: An R package to interact with Amazon Web Services (EC2/S3). The CRAN version is archived.
Development version is available on GitHub
-
datamart: Provides an S4 infrastructure for unified handling of internal datasets and web based data sources. Examples include dbpedia, eurostat and sourceforge.
-
discgolf
(not on CRAN): Provides R client to interact with the API for the
Discourse
web forum platform. The API is for an installed instance of Discourse, not for the Discourse site itself.
-
gmailr: Access the Gmail RESTful API from R
-
qualtrics
(not on CRAN): Provides functions to interact with the
Qualtrics
online survey tool.
-
mailR: Interface to Apache Commons Email to send emails from within R.
-
pushoverr: Sending push notifications to mobile devices (iOS and Android) and desktop using
Pushover
.
Source on Github
-
rDrop
(not on CRAN): Dropbox interface.
-
redcapAPI: Access data stored in REDCap databases using an API. REDCap (Research Electronic Data CAPture) is a web application for building and managing online surveys and databases developed at Vanderbilt University.
Source on Github
.
-
RForcecom: RForcecom provides a connection to Force.com and Salesforce.com from R.
-
Rmonkey
(not on CRAN): Provides programmatic access to
Survey Monkey
for creating simple surveys and retrieving survey results.
-
RPushbullet: Provides an easy-to-use interface for the Pushbullet service which provides fast and efficient notifications between computers, phones and tablets. By
Dirk Eddelbuettel
-
slackr: R client for Slack.com messaging platform.
Source on Github
-
sos4R: R client for the OGC Sensor Observation Service.
-
zendeskR: This package provides an R wrapper for the Zendesk API.
Public Health
-
cdcfluview
: (not on CRAN) R client for CDC FluView data (WHO and ILINet).
-
rClinicalCodes: R tools for integrating with the www.clinicalcodes.org web repository, by
David Springate
-
rclinicaltrials: ClinicalTrials.gov is a registry and results database of publicly and privately supported clinical studies of human participants conducted around the world. This is an R client for that data.
Source on Github
Social media
-
plusser
has been designed to to facilitate the retrieval of Google+ profiles, pages and posts. It also provides search facilities. Currently a Google+ API key is required for accessing Google+ data.
-
Rfacebook: Provides an interface to the Facebook API.
-
The
Rflickr
package provides an R interface to the Flickr photo management and sharing application Web service. (not on CRAN)
-
Rlinkedin
(not on CRAN): R client for the LinkedIn API. Auth is via OAuth.
-
RTwitterAPI
(not on CRAN): Yet another Twitter R client.
-
SocialMediaMineR
is an analytic tool that returns information about the popularity of a URL on social media sites.
-
streamR: This package provides a series of functions that allow R users to access Twitter's filter, sample, and user streams, and to parse the output into data frames. OAuth authentication is supported.
-
tumblR: R client for the Tumblr API (
https://www.tumblr.com/docs/en/api/v2
). Tumblr is a microblogging platform and social networking website
https://www.tumblr.com
.
Source on Github
-
twitteR: Provides an interface to the Twitter web API.
Sports
-
bbscrapeR
(not on CRAN): Tools for Collecting Data from nba.com and wnba.com
-
fbRanks: Association Football (Soccer) Ranking via Poisson Regression - uses time dependent Poisson regression and a record of goals scored in matches to rank teams via estimated attack and defense strengths.
-
nhlscrapr: Compiling the NHL Real Time Scoring System Database for easy use in R.
-
pitchRx: Tools for Collecting and Visualizing Major League Baseball PITCHfx Data
Web Analytics
-
GTrendsR
(Not on CRAN): R functions to perform and display Google Trends queries. Another Github package (
rGtrends
) is now deprecated, but supported a previous version of Google Trends and may still be useful for developers.
-
rgauges: This package provides functions to interact with the Gaug.es API. Gaug.es is a web analytics service, like Google analytics. You have to have a Gaug.es account to use this package.
-
RGA: Provides functions for accessing and retrieving data from the
Google Analytics APIs
. Supports OAuth 2.0 authorization. Also, the
RGA
package provides a shiny app to explore data. There is another R package for the same service (RGoogleAnalytics); see above entry.
-
RGoogleAnalytics: Provides functions for accessing and retrieving data from the Google Analytics API.
Source on Github
. There is another R package for the same service (RGA); see next entry.
-
RGoogleTrends
provides programmatic access to Google Trends data. This is information about the popularity of a particular query.
-
RSiteCatalyst: Functions for accessing the Adobe Analytics (Omniture SiteCatalyst) Reporting API.