This Task View contains information about to use R and the world wide web together. The base version of R does not ship with many tools for interacting with the web. Thankfully, there are an increasingly large number of tools for interacting with the web. This task view focuses on packages for obtaining web-based data and information, frameworks for building web-based R applications, and online services that can be accessed from R. A list of available packages and functions is presented below, grouped by the type of activity. The
rOpenSci Task View: Open Data
provides further discussion of online data sources that can be accessed from R.
If you have any comments or suggestions for additions or improvements for this Task View, go to GitHub and
submit an issue
, or make some changes and
submit a pull request
. If you can’t contribute on GitHub,
send Scott an email
. If you have an issue with one of the packages discussed below, please contact the maintainer of that package. If you know of a web service, API, data source, or other online resource that is not yet supported by an R package, consider adding it to
the package development to do list on GitHub
.
Core Tools For HTTP Requests
There are two packages that should cover most use cases of interacting with the web from R.
httr
provides a user-friendly interface for executing HTTP methods (GET, POST, PUT, HEAD, DELETE, etc.) and provides support for modern web authentication protocols (OAuth 1.0, OAuth 2.0). HTTP status codes are helpful for debugging HTTP calls. httr makes this easier using, for example,
stop_for_status(), which gets the http status code from a response object, and stops the function if the call was not successful. (See also
warn_for_status().) Note that you can pass in additional libcurl options to the
config
parameter in http calls.
curl
is a lower-level package that provides a closer interface between R and the
libcurl C library
, but is less user-friendly. It may be useful for operations on web-based XML or to perform FTP operations.
curl::curl()
is an SSL-compatible replacement for base R’s
url()
and has support for http 2.0, SSL (https, ftps), gzip, deflate and more. For websites serving insecure HTTP (i.e. using the “http” not “https” prefix), most R functions can extract data directly, including
read.table
and
read.csv; this also applies to functions in add-on packages such as
jsonlite::fromJSON()
and
XML::parseXML. For more specific situations, the following resources may be useful:
-
RCurl
is another low level client for libcurl. Of the two low-level curl clients, we recommend using
curl.
httpRequest
is another low-level package for HTTP requests that implements the GET, POST and multipart POST verbs, but we do not recommend its use.
-
crul
(
GitHub
) is an R6-based HTTP client that provides asynchronous HTTP requests, a pagination helper, HTTP mocking via
webmockr, and request caching via
vcr. It targets developers of other R packages more so than end users.
-
curlconverter
(not on CRAN) is a useful package for converting curl command-line code, for example from a browser developer’s console, into R code.
-
request
provides a high-level package that is useful for developing other API client packages.
httping
provides simplified tools to ping and time HTTP requests, around
httr
calls.
httpcache
provides a mechanism for caching HTTP requests.
-
For dynamically generated webpages (i.e., those requiring user interaction to display results),
RSelenium
can be used to automate those interactions and extract page contents. It provides a set of bindings for the Selenium 2.0 webdriver using the
JsonWireProtocol
. It can also aid in automated application testing, load testing, and web scraping.
seleniumPipes
(
GitHub
) provides a “pipe”-oriented interface to the same.
rdom
(not on CRAN) uses
phantomjs
to access a webpage’s Document Object Model (DOM).
decapitated
provides headless Chrome browser orchestration.
-
For capturing static content of web pages
postlightmercury
is a client for the web service
Mercury
that turns web pages into structured and clean text.
-
Another, higher-level alternative package useful for webscraping is
rvest, which is designed to work with
magrittr
to make it easy to express common web scraping tasks.
-
Many base R tools can be used to download web content, provided that the website does not use SSL (i.e., the URL does not have the “https” prefix).
download.file()
is a general purpose function that can be used to download a remote file. For SSL, the
download()
function in
downloader
wraps
download.file(), and takes all the same arguments.
-
Tabular data sets (e.g., txt, csv, etc.) can be input using
read.table(),
read.csv(), and friends, again assuming that the files are not hosted via SSL. An alternative is to use
httr::GET
(or
RCurl::getURL) to first read the file into R as a character vector before parsing with
read.table(text=...), or you can download the file to a local directory.
rio
(
GitHub
) provides an
import()
function that can read a number of common data formats directly from an https:// URL. The
repmis
function
source_data()
can load and cache plain-text data from a URL (either http or https). That package also includes
source_Dropbox()
for downloading/caching plain-text data from non-public Dropbox folders and
source_XlsxData()
for downloading/caching Excel xlsx sheets.
-
Authentication
: Using web resources can require authentication, either via API keys, OAuth, username:password combination, or via other means. Additionally, sometimes web resources that require authentication be in the header of an http call, which requires a little bit of extra work. API keys and username:password combos can be combined within a url for a call to a web resource (api key: http://api.foo.org/?key=yourkey; user/pass: http://username:password@api.foo.org), or can be specified via commands in
RCurl
or
httr. OAuth is the most complicated authentication process, and can be most easily done using
httr. See the 6 demos within
httr, three for OAuth 1.0 (linkedin, twitter, vimeo) and three for OAuth 2.0 (facebook, GitHub, google).
ROAuth
is a package that provides a separate R interface to OAuth. OAuth is easier to to do in
httr, so start there.
googleAuthR
provides an OAuth 2.0 setup specifically for Google web services.
Handling HTTP Errors/Codes
-
fauxpas
brings a set of Ruby or Python like R6 classes for each individual HTTP status code, allowing simple and verbose messages, with a choice of using messages, warnings, or stops.
-
httpcode
is a simple package to help a user/package find HTTP status codes and associated messages by name or number.
Parsing Structured Web Data
The vast majority of web-based data is structured as plain text, HTML, XML, or JSON (javascript object notation). Web service APIs increasingly rely on JSON, but XML is still prevalent in many applications. There are several packages for specifically working with these format. These functions can be used to interact directly with insecure webpages or can be used to parse locally stored or in-memory web files.
-
XML
: There are two packages for working with XML:
XML
and
xml2
(
GitHub
). Both support general XML (and HTML) parsing, including XPath queries. The package
xml2
is less fully featured, but more user friendly with respect to memory management, classes (e.g., XML node vs. node set vs. document), and namespaces. Of the two, only the
XML
supports
de novo
creation of XML nodes and documents. The
XML2R
(
GitHub
) package is a collection of convenient functions for coercing XML into data frames. An alternative to
XML
is
selectr
, which parses CSS3 Selectors and translates them to XPath 1.0 expressions.
XML
package is often used for parsing xml and html, but selectr translates CSS selectors to XPath, so can use the CSS selectors instead of XPath.
-
HTML
: All of the tools that work with XML also work for HTML, though HTML is - in practice - more prone to be malformed. Some tools are designed specifically to work with HTML.
xml2::read_html()
is a good first function to use for importing HTML.
htmltools
provides functions to create HTML elements.
htmltab
(
GitHub
) extracts structured information from HTML tables, similar to
XML::readHTMLTable
of the
XML
package, but automatically expands row and column spans in the header and body cells, and users are given more control over the identification of header and body rows which will end up in the R table. The
selectorgadget browser extension
can be used to identify page elements.
RHTMLForms
reads HTML documents and obtains a description of each of the forms it contains, along with the different elements and hidden fields.
scrapeR
provides additional tools for scraping data from HTML documents.
htmltidy
(
GitHub
) provides tools to “tidy” messy HTML documents.
htm2txt
uses regex to converts html documents to plain text by removing all html tags.
Rcrawler
does crawling and scraping of web pages.
-
JSON
: There are several packages for reading and writing JSON:
rjson,
RJSONIO, and
jsonlite.
jsonlite
includes a different parser from
RJSONIO
called
yajl
. We recommend using
jsonlite. Check out the paper describing jsonlite by Jeroen Ooms
https://arxiv.org/abs/1403.2805
.
jqr
provides bindings for the fast JSON library,
jq
.
jsonvalidate
(
GitHub
) validates JSON against a schema using the “is-my-json-valid” Javascript library;
validatejsonr
does the same using the RapidJSON C++ library;
ajv
does the same using the ajv Javascript library.
ndjson
(
GitHub
) supports the “ndjson” format.
-
RSS/Atom
:
feedeR
can be used to parse RSS or Atom feeds.
tidyRSS
parses RSS, Atom XML/JSON and geoRSS into a tidy data.frame.
-
swagger
can be used to automatically generate functions for working with an web service API that provides documentation in
Swagger.io
format.
Tools for Working with URLs
-
The
httr::parse_url()
function can be used to extract portions of a URL. The
RCurl::URLencode()
and
utils::URLencode()
functions can be used to encode character strings for use in URLs.
utils::URLdecode()
decodes back to the original strings.
urltools
(
GitHub
) can also handle URL encoding, decoding, parsing, and parameter extraction.
-
The
tldextract
package extracts top level domains and subdomains from a host name. It’s a port of
a Python library of the same name
.
-
iptools
can facilitate working with IPv4 addresses, including for use in geolocation.
-
urlshorteneR
offers URL expansion and analysis for Bit.ly, Goo.gl, and is.gd.
longurl
uses the longurl.org API to provide similar functionality.
-
gdns
provides access to Google’s secure HTTP-based DNS resolution service.
Tools for Working with Scraped Webpage Contents
-
Several packages can be used for parsing HTML documents.
boilerpipeR
provides generic extraction of main text content from HTML files; removal of ads, sidebars and headers using the boilerpipe Java library.
RTidyHTML
interfaces to the libtidy library for correcting HTML documents that are not well-formed. This library corrects common errors in HTML documents.
W3CMarkupValidator
provides an R Interface to W3C Markup Validation Services for validating HTML documents.
-
For XML documents, the
XMLSchema
package provides facilities in R for reading XML schema documents and processing them to create definitions for R classes and functions for converting XML nodes to instances of those classes. It provides the framework for meta-computing with XML schema in R.
xslt
is an extension for the
xml2
package to transform XML documents by applying an xslt style-sheet. (It can be seen as a modern replacement for
Sxslt, which is an interface to Dan Veillard’s libxslt translator, and the
SXalan
package.) This may be useful for webscraping, as well as transforming XML markup into another human- or machine-readable format (e.g., HTML, JSON, plain text, etc.).
SSOAP
provides a client-side SOAP (Simple Object Access Protocol) mechanism. It aims to provide a high-level interface to invoke SOAP methods provided by a SOAP server.
XMLRPC
provides an implementation of XML-RPC, a relatively simple remote procedure call mechanism that uses HTTP and XML. This can be used for communicating between processes on a single machine or for accessing Web services from within R.
-
Rcompression
(not on CRAN): Interface to zlib and bzip2 libraries for performing in-memory compression and decompression in R. This is useful when receiving or sending contents to remote servers, e.g. Web services, HTTP requests via RCurl.
-
tm.plugin.webmining: Extensible text retrieval framework for news feeds in XML (RSS, ATOM) and JSON formats. Currently, the following feeds are implemented: Google Blog Search, Google Finance, Google News, NYTimes Article Search, Reuters News Feed, Yahoo Finance and Yahoo Inplay.
-
webshot
uses
PhantomJS
to provide screenshots of web pages without a browser. It can be useful for testing websites (such as Shiny applications).
Security
-
securitytxt
identifies and parses web Ssecurity policy files.
Other Useful Packages and Functions
-
Javascript
:
V8
(
GitHub
) is an R interface to Google’s open source, high performance JavaScript engine. It can wrap Javascript libraries as well as NPM packages. The
SpiderMonkey
package provides another means of evaluating JavaScript code, creating JavaScript objects and calling JavaScript functions and methods from within R. This can work by embedding the JavaScript engine within an R session or by embedding R in an browser such as Firefox and being able to call R from JavaScript and call back to JavaScript from R. The
js
package wraps
V8
and validates, reformats, optimizes and analyzes JavaScript code.
-
Email:
:
mailR
is an interface to Apache Commons Email to send emails from within R.
sendmailR
provides a simple SMTP client.
gmailr
provides access the Google’s gmail.com RESTful API.
-
Mocking:
:
webmockr
is a library for stubbing and setting expectations on HTTP requests. It is inspired from Rubys
webmock. This package only helps mock HTTP requests, and returns nothing when requests match expectations. webmockr integrates with the HTTP packages
crul
and
httr. See
Testing
for mocking with returned responses.
-
Testing:
:
vcr
provides an interface to easily cache HTTP requests in R package test suites (but can be used outside of testing use cases as well). vcr relies on
webmockr
to do the HTTP request mocking. vcr integrates with the HTTP packages
crul
and
httr.
httptest
provides a framework for testing packages that communicate with HTTP APIs, offering tools for mocking APIs, for recording real API responses for use as mocks, and for making assertions about HTTP requests, all without requiring a live connection to the API server at runtime.
-
Miscellaneous
:
webutils
contains various functions for developing web applications, including parsers for
application/x-www-form-urlencoded
as well as
multipart/form-data.
mime
(
GitHub
) guesses the MIME type for a file from its extension.
rsdmx
provides tools to read data and metadata documents exchanged through the Statistical Data and Metadata Exchange (SDMX) framework. The package currently focuses on the SDMX XML standard format (SDMX-ML).
robotstxt
provides functions and classes for parsing robots.txt files and checking access permissions;
spiderbar
does the same.
uaparserjs
(
GitHub
) uses the javascript
“ua-parser” library
to parse User-Agent HTTP headers.
rjsonapi
consumes APIs that Follow the
JSON API Specification
.
rapiclient
is a client for consuming APIs that follow the
Open API format
.
restfulr
models a RESTful service as if it were a nested R list.
Web and Server Frameworks
-
DeployR
is part of Microsoft R Server that provides support for integrating R as an application and website backend.
-
The
shiny
package makes it easy to build interactive web applications with R.
-
Other web frameworks include:
fiery
that is meant to be more flexible but less easy to use than shiny (reqres
and
routr
are utilities used by fiery that provide HTTP request and response classes, and HTTP routing, respectively);
prairie
(not on CRAN) which is a lightweight web framework that uses magrittr-style syntax and is modeled after
expressjs
;
rcloud
provides an iPython notebook-style web-based R interface; and
Rook, which contains the specification and convenience software for building and running Rook applications.
-
The
opencpu
framework for embedded statistical computation and reproducible research exposes a web API interfacing R, LaTeX and Pandoc. This API is used for example to integrate statistical functionality into systems, share and execute scripts or reports on centralized servers, and build R based apps.
-
Several general purpose server/client frameworks for R exist.
Rserve
and
RSclient
provide server and client functionality for TCP/IP or local socket interfaces.
httpuv
provides a low-level socket and protocol support for handling HTTP and WebSocket requests directly within R. Another related package, perhaps which
httpuv
replaces, is
websockets
.
servr
provides a simple HTTP server to serve files under a given directory based on httpuv.
-
Several packages offer functionality for turning R code into a web API.
jug
is a simple API-builder web framework, built around
httpuv.
FastRWeb
provides some basic infrastructure for this.
plumber
allows you to create a REST API by decorating existing R source code.
-
The
WADL
package provides tools to process Web Application Description Language (WADL) documents and to programmatically generate R functions to interface to the REST methods described in those WADL documents. (not on CRAN)
-
The
RDCOMServer
provides a mechanism to export R objects as (D)COM objects in Windows. It can be used along with the
RDCOMClient
package which provides user-level access from R to other COM servers. (not on CRAN)
-
rapporter.net
provides an online environment (SaaS) to host and run
rapport
statistical report templates in the cloud.
-
radiant
(
GitHub
) is Shiny-based GUI for R that runs in a browser from a server or local machine.
-
neocities
wraps the API for the
Neocities
web hosting service. (not on CRAN)
-
The
Tiki
Wiki CMS/Groupware framework has an R plugin (
PluginR
) to run R code from wiki pages, and use data from their own collected web databases (trackers). A demo:
https://r.tiki.org/tiki-index.php
.
-
The
MediaWiki
has an extension (
Extension:R
) to run R code from wiki pages, and use uploaded data. A mailing list used to be available: R-sig-mediawiki.
-
whisker: Implementation of logicless templating based on
Mustache
in R. Mustache syntax is described in
http://mustache.github.io/mustache.5.html
-
CGIwithR
(not on CRAN) allows one to use R scripts as CGI programs for generating dynamic Web content. HTML forms and other mechanisms to submit dynamic requests can be used to provide input to R scripts via the Web to create content that is determined within that R script.
Web Services
Cloud Computing and Storage
-
Amazon Web Services is a popular, proprietary cloud service offering a suite of computing, storage, and infrastructure tools.
aws.signature
provides functionality for generating AWS API request signatures.
-
Simple Storage Service (S3)
is a commercial server that allows one to store content and retrieve it from any machine connected to the Internet.
aws.s3
is probably the best R client for S3.
RAmazonS3
and
s3mpi
(not on CRAN) provides basic infrastructure for communicating with S3.
awsConnect
(not on CRAN) is another package using the AWS Command Line Interface to control EC2 and S3, which is only available for Linux and Mac OS.
-
Elastic Cloud Compute (EC2)
is a cloud computing service. AWS.tools and
awsConnect
(not on CRAN) both use the AWS command line interface to control EC2.
segue
(not on CRAN) is another package for managing EC2 instances and S3 storage, which includes a parallel version of
lapply()
for the Elastic Map Reduce (EMR) engine called
emrlapply(). It uses Hadoop Streaming on Amazon’s EMR in order to get simple parallel computation.
-
Simple Notification Service (SNS)
is a service for Pub/Sub messaging and mobile notifications for microservices, distributed systems, and serverless applications.
aws.sns
-
AWS Polly
is a Text-to-Speech (TTS) cloud service that converts text into lifelike speech.
aws.polly
-
DBREST
:
RAmazonDBREST
provides an interface to Amazon’s Simple DB API.
-
The cloudyr project
, which is currently under active development on GitHub, aims to provide a unified interface to the full Amazon Web Services suite without the need for external system dependencies.
-
googleComputeEngineR
interacts with the Google Compute Engine API, and lets you create, start and stop instances in the Google Cloud.
-
Cloud Storage
:
googleCloudStorageR
interfaces with Google Cloud Storage.
boxr
(
GitHub
) is a lightweight, high-level interface for the
box.com API
.
rdrop2
is a Dropbox interface that provides access to a full suite of file operations, including dir/copy/move/delete operations, account information (including quotas) and the ability to upload and download files from any Dropbox account.
backblazer
(
GitHub
) provides access to the Backblaze B2 storage API.
-
Docker
:
analogsea
is a general purpose client for the Digital Ocean v2 API. In addition, the package includes functions to install various R tools including base R, RStudio server, and more. There’s an improving interface to interact with docker on your remote droplets via this package.
-
crunch
GitHub
provides an interface to the
crunch.io
storage and analytics platform.
crplyr
GitHub
implements
dplyr
methods on top of Crunch, and
crunchy
GitHub
facilitates making Shiny apps on Crunch.
-
rrefine
provides a client for the
OpenRefine
(formerly Google Refine) data cleaning service.
Document and Code Sharing
-
Code Sharing
:
gistr
(
GitHub
) works with GitHub gists (
gist.github.com
) from R, allowing you to create new gists, update gists with new files, rename files, delete files, get and delete gists, star and un-star gists, fork gists, open a gist in your default browser, get embed code for a gist, list gist commits, and get rate limit information when authenticated.
git2r
provides bindings to the git version control system and two packages provide access to the GitHub API:
gh
and
rgithub
(not on CRAN), all of which can facilitate code or data sharing via GitHub.
gitlabr
is a
GitLab
-specific client.
pastebin
is a client for
https://pastebin.com/
a code sharing site.
-
Data archiving
:
dataverse
(
GitHub
) provides access to Dataverse 4 APIs.
rfigshare
(
GitHub
) connects with
Figshare.com
.
dataone
provides read/write access to data and metadata from the
DataONE network
of Member Node data repositories.
dataone
(
GitHub
) provides a client for
DataONE
repositories.
-
Google Drive/Google Documents
:
driver
(not on CRAN) is a thin client for the Google Drive API. The
RGoogleDocs
package is an example of using the RCurl and XML packages to quickly develop an interface to the Google Documents API.
RGoogleStorage
provides programmatic access to the Google Storage API. This allows R users to access and store data on Google’s storage. We can upload and download content, create, list and delete folders/buckets, and set access control permissions on objects and buckets.
-
Google Sheets
:
googlesheets
(
GitHub
) can access private or public Google Sheets by title, key, or URL. Extract data or edit data. Create, delete, rename, copy, upload, or download spreadsheets and worksheets.
gsheet
(
GitHub
) can download Google Sheets using just the sharing link. Spreadsheets can be downloaded as a data frame, or as plain text to parse manually.
-
imguR
(
GitHub
) is a package to share plots using the image hosting service
Imgur.com
. knitr also has a function
imgur_upload()
to load images from literate programming documents.
-
rscribd
(not on CRAN): API client for publishing documents to
Scribd
.
Data Analysis and Processing Services
-
Crowdsourcing
: Amazon Mechanical Turk is a paid crowdsourcing platform that can be used to semi-automate tasks that are not easily automated.
MTurkR
(
GitHub
)) provides access to the Amazon Mechanical Turk Requester API.
microworkers
(not on CRAN) can distribute tasks and retrieve results for the Microworkers.com platform.
-
Geospatial/Geolocation/Geocoding
: Several packages connect to geolocation/geocoding services.
rgeolocate
(
GitHub
) offers several online and offline tools.
rydn
(not on CRAN) is an interface to the Yahoo Developers network geolocation APIs, and
ipapi
can be used to geolocate IPv4/6 addresses and/or domain names using the
http://ip-api.com/
API.
threewords
connects to the
What3Words API
, which represents every 3-meter by 3-meter square on earth as a three-word phrase.
opencage
(
GitHub
) provides access to to the
OpenCage
geocoding service.
geoparser
(
GitHub
) interfaces with the
Geoparser.io
web service to identify place names from plain text.
nominatim
(not on CRAN) connects to the
OpenStreetMap Nominatim API
for reverse geocoding.
PostcodesioR
(not on CRAN) provides post code lookup and geocoding for the United Kingdom.
geosapi
is an R client for the
GeoServer
REST API, an open source implementation used widely for serving spatial data.
geonapi
provides an interface to the
GeoNetwork
legacy API, an opensource catalogue for managing geographic metadata.
ows4R
is a new R client for the
OGC
standard Web-Services, such Web Feature Service (WFS) for data and Catalogue Service (CSW) for metadata.
-
Image Processing
:
RoogleVision
(not on CRAN) links to the Google Cloud Vision image recognition service.
-
Machine Learning as a Service
: Several packages provide access to cloud-based machine learning services.
AzureML
links with the Microsoft Azure machine learning service.
bigml
(
GitHub
) connects to BigML.
OpenML
(
GitHub
) is the official client for
the OpenML API
.
ddeploy
wraps the
Duke Analytics model deployment API
.
clarifai
(
GitHub
) is a
Clarifai.com
client that enables automated image description.
rLTP
(
GitHub
) accesses the
ltp-cloud service
.
languagelayeR
is a client for Languagelayer, a language detection API.
googlepredictionapi
(not on CRAN): is an R client for the
Google Prediction API
, a suite of cloud machine learning tools.
yhatr
lets you deploy, maintain, and invoke models via the Yhat REST API.
datarobot
works with Data Robot’s predictive modeling platform.
mscsweblm4r
(
GitHub
) interfaces with the Microsoft Cognitive Services Web Language Model API and
mscstexta4r
(
GitHub
) uses the Microsoft Cognitive Services Text Analytics REST API.
rosetteApi
links to the
Rosette
text analysis API.
birdnik
(
GitHub
) provides an interface to
Wordnik
, an online dictionary.
googleLanguageR
provides interfaces to Google’s Cloud Translation API, Natural Language API, Cloud Speech API, and the Cloud Text-to-Speech API.
-
Machine Translation
:
translate
provides bindings for the Google Translate API v2 and
translateR
provides bindings for both Google and Microsoft translation APIs.
mstranslator
] provides an R wrapper for the Microsoft Translator API.
RYandexTranslate
(
GitHub
) connects to
Yandex Translate
.
transcribeR
provides automated audio transcription via the HP IDOL service.
-
Document Processing
:
abbyyR
GitHub
and
captr
(
GitHub
) connect to optical character recognition (OCR) APIs.
pdftables
(
GitHub
) uses
the PDFTables.com webservice
to extract tables from PDFs.
-
Mapping
:
osmar
provides infrastructure to access OpenStreetMap data from different sources to work with the data in common R manner and to convert data into available infrastructure provided by existing R packages (e.g., into sp and igraph objects).
osrm
(
GitHub
) provides shortest paths and travel times from OpenStreetMap.
osmplotr
(
GitHub
) extracts customizable map images from OpenStreetMap.
RgoogleMaps
serves two purposes: it provides a comfortable R interface to query the Google server for static maps, and use the map as a background image to overlay plots within R.
R2GoogleMaps
provides a mechanism to generate JavaScript code from R that displays data using Google Maps.
placement
(
GitHub
) provides drive time and geolocation services from Google Maps.
RKMLDevice
allows to create R graphics in Keyhole Markup Language (KML) format in a manner that allows them to be displayed on Google Earth (or Google Maps), and
RKML
provides users with high-level facilities to generate KML.
plotKML
can visualization spatial and spatio-temporal objects in Google Earth.
plotGoogleMaps
pls SP or SPT (STDIF,STFDF) data as an HTML map mashup over Google Maps.
ggmap
allows for the easy visualization of spatial data and models on top of Google Maps, OpenStreetMaps, Stamen Maps, or CloudMade Maps using ggplot2.
mapsapi
is an sf-compatible interface to Google Maps API.
leafletR: Allows you to display your spatial data on interactive web-maps using the open-source JavaScript library Leaflet.
cartodb-r
(not on CRAN) provides an API interface to
Cartodb.com
.
openadds
(
GitHub
) is an
Openaddresses
client, and
banR
provides access to the “Base Adresses Nationale” (BAN) API for French addresses.
-
Online Surveys
: qualtRics (not on CRAN) and
qualtrics
(not on CRAN) provide functions to interact with
Qualtrics
.
WufooR
(
GitHub
) can retrieve data from
Wufoo.com
forms.
redcapAPI
(
GitHub
) can provide access to data stored in a REDCap (Research Electronic Data CAPture) database, which is a web application for building and managing online surveys and databases developed at Vanderbilt University.
formr
facilitates use of the
formr
survey framework, which is built on openCPU.
Rexperigen
is a client for the
Experigen experimental platform
.
-
Visualization
: Plot.ly is a company that allows you to create visualizations in the web using R (and Python), which is accessible via
plotly.
googleVis
provides an interface between R and the Google chart tools. The
RUbigraph
package provides an R interface to a Ubigraph server for drawing interactive, dynamic graphs. You can add and remove vertices/nodes and edges in a graph and change their attributes/characteristics such as shape, color, size. Interactive, Javascript-enabled graphics are an increasingly useful output format for data analysis.
ggvis
makes it easy to describe interactive web graphics in R. It fuses the ideas of ggplot2 and
shiny, rendering graphics on the web with Vega.
d3Network
provides tools for creating D3 JavaScript network, tree, dendrogram, and Sankey graphs from R.
clickme
(not on CRAN) allows for interactive Javascript charts from R.
animint
(not on CRAN) allows an interactive animation to be defined using a list of ggplots with clickSelects and showSelected aesthetics, then exported to CSV/JSON/D3/JavaScript for viewing in a web browser.
rVega
(not on CRAN) is an R wrapper for Vega.
-
Other
:
Social Media Clients
-
plusser
has been designed to to facilitate the retrieval of Google+ profiles, pages and posts. It also provides search facilities. Currently a Google+ API key is required for accessing Google+ data.
-
Rfacebook
and
facebook.S4
provide interfaces to the Facebook API.
-
The
Rflickr
package provides an interface to the Flickr photo management and sharing application Web service. (not on CRAN)
-
instaR
(
GitHub
) is a client for the
Instagram API
.
-
Rlinkedin
(not on CRAN) is a client for the LinkedIn API. Auth is via OAuth.
-
rpinterest
connects to the
Pintrest
API.
-
tumblR
(
GitHub
) is a client for the Tumblr API (
https://www.tumblr.com/docs/en/api/v2
). Tumblr is a microblogging platform and social networking website
https://www.tumblr.com/
.
-
vkR
is a client for VK, a social networking site based in Russia.
-
meetupr
is a client for the Meetup.com API.
-
Twitter
:
twitteR
(
GitHub
) provides an interface to the Twitter web API. It claims to be deprecated in favor of
rtweet
(
GitHub
).
RTwitterAPI
(not on CRAN) is another Twitter client.
twitterreport
(not on CRAN) focuses on report generation based on Twitter data.
streamR
provides a series of functions that allow users to access Twitter’s filter, sample, and user streams, and to parse the output into data frames. OAuth authentication is supported.
tweet2r
is an alternative implementation geared toward SQLite and postGIS databases.
graphTweets
produces a network graph from a data.frame of tweets.
tweetscores
(not on CRAN) implements a political ideology scaling measure for specified Twitter users.
-
brandwatchR
is a package to retrieve a data from the Brandwatch social listening API. Both raw text and aggregate statistics are available, as well as project and query management functions.
Web Analytics Services
-
Google Trends
:
gtrendsR
offers functions to perform and display Google Trends queries.
RGoogleTrends
provides an alternative.
-
Google Analytics
:
googleAnalyticsR,
ganalytics;
GAR, and
RGA
provide functions for accessing and retrieving data from the
Google Analytics APIs
. The latter supports OAuth 2.0 authorization.
RGA
provides a shiny app to explore data.
searchConsoleR
links to the
Google Search Console
(formerly Webmaster Tools).
-
Online Advertising
:
fbRads
can manage Facebook ads via the Facebook Marketing API.
RDoubleClick
(not on CRAN) can retrieve data from Google’s DoubleClick Campaign Manager Reporting API.
RSmartlyIO
(
GitHub
) loads Facebook and Instagram advertising data provided by
Smartly.io
.
-
Other services
:
RSiteCatalyst
has functions for accessing the Adobe Analytics (Omniture SiteCatalyst) Reporting API.
-
RAdwords
(
GitHub
) is a package for loading Google Adwords data.
-
webreadr
(
GitHub
) can process various common forms of request log, including the Common and Combined Web Log formats and AWS logs.
Web Services for R Package Development
-
R-Hub
http://log.r-hub.io/
is a project to enable package builds across all architectures.
rhub
is a package that interfaces with R-Hub to allow you to check a package on the platform.
Other Web Services
-
Fitness Apps
:
fitbitScraper
(
GitHub
) retrieves Fitbit data.
RGoogleFit
provides similar functionality for
Google Fit
.
-
Push Notifications
:
RPushbullet
provides an easy-to-use interface for the Pushbullet service which provides fast and efficient notifications between computers, phones and tablets.
pushoverr
(
GitHub
) can sending push notifications to mobile devices (iOS and Android) and desktop using
Pushover
.
notifyme
(
GitHub
) can control Phillips Hue lighting.
-
Reference/bibliography/citation management
:
RefManageR
imports and manage BibTeX and BibLaTeX references with RefManager.
rorcid
(
GitHub
) is a programmatic interface the
Orcid.org
API, which can be used for identifying scientific authors and their publications (e.g., by DOI).
rdatacite
connects to
DataCite
, which manages DOIs and metadata for scholarly datasets.
scholar
provides functions to extract citation data from Google Scholar. Convenience functions are also provided for comparing multiple scholars and predicting future h-index values.
mathpix
convert an image of a formula (typeset or handwritten) via Mathpix webservice to produce the LaTeX code.
-
Literature
:
rplos
is a programmatic interface to the Web Service methods provided by the Public Library of Science journals for search.
rpubmed
(not on CRAN) provides tools for extracting and processing Pubmed and Pubmed Central records, and
europepmc
connects to the Europe PubMed Central service.
pubmed.mineR
is a package for text mining of
PubMed Abstracts
that supports fetching text and XML from PubMed.
jstor
provides functions and helpers to import metadata, ngrams and full-texts from Data for Research service by JSTOR;
JSTORr
does a similar thing.
aRxiv
is a client for the arXiv API, a repository of electronic preprints for computer science, mathematics, physics, quantitative biology, quantitative finance, and statistics.
roadoi
provides an interface to the
Unpaywall API
for finding free full-text versions of academic papers.
rcoreoa
is an interface to the
CORE API
, a search interface for open access scholarly articles.
rcrossref
is an interface to Crossref’s API,
crminer
extracts full text from scholarly articles retrieved via Crossref’s Text and Data Mining service; and
fulltext
is a general purpose package to search for, retrieve and extract full text from scholarly articles.
-
Automated Metadata Harvesting
:
oai
and
OAIHarvester
harvest metadata using the Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH) standard.
rresync
is a client for the
ResourceSync framework
, a sort of replacement for OAI-PMH.
-
Wikipedia
:
WikipediR
(
GitHub
) is a wrapper for the MediaWiki API, aimed particularly at the Wikimedia ‘production’ wikis, such as Wikipedia.
rwikidata
and
WikidataR
(
GitHub
) can request data from
Wikidata.org
, the free knowledgebase.
wikipediatrend
(
GitHub
) provides access to Wikipedia page access statistics.
WikiSocio
can retrieve contributor lists and revision data.
WikidataQueryServiceR
is a client for the
Wikidata Query Service
.
-
bigrquery
(
GitHub
): An interface to Google’s bigquery.
-
sparkbq
(
GitHub
): Google BigQuery support for sparklyr.
-
colourlovers
(
GitHub
) extracts colors and multi-color patterns from
COLOURlovers
, for use in creating R graphics color palettes.
-
cymruservices
queries
Team Cymru
web security services.
-
datamart: Provides an S4 infrastructure for unified handling of internal datasets and web based data sources. Examples include dbpedia, eurostat and sourceforge.
-
discgolf
(
GitHub
) provides a client to interact with the API for the
Discourse
web forum platform. The API is for an installed instance of Discourse, not for the Discourse site itself.
-
rdpla
((GitHub)[https://github.com/ropensci/rdpla]) works with the
Digital Public Library of America
API.
-
factualR: Thin wrapper for the
Factual.com
server API.
-
HIBPwned
is a client for
Have I Been Pwned
.
-
internetarchive: API client for internet archive metadata.
-
irced
(not on CRAN) is an IRC chat client.
-
jSonarR: Enables users to access MongoDB by running queries and returning their results in data.frames. jSonarR uses data processing and conversion capabilities in the jSonar Analytics Platform and the
JSON Studio Gateway
, to convert JSON to a tabular format.
-
LendingClub
connects with the
LendingClub API
.
-
livechatR
is a client for the
LiveChat API
.
-
lucr
performs currency conversions using
Open Exchange Rates
.
-
mockaRoo
(not on CRAN) uses the
MockaRoo API
to generate mock or fake data based on an input schema.
-
pivotaltrackR
provides an interface to the API for
Pivotal Tracker
, an agile project management tool.
-
randNames
(
GitHub
) generates random names and personal identifying information using the
https://randomapi.com/
API.
-
Rblpapi
(
GitHub
) is a client for Bloomberg Finance L.P.
ROpenFIGI
(
GitHub
) provides an interface to Bloomberg’s
OpenFIGI
API.
-
rerddap: A generic R client to interact with any ERDDAP instance, which is a special case of OPeNDAP (
https://en.wikipedia.org/wiki/OPeNDAP
), or
Open-source Project for a Network Data Access Protocol
. Allows user to swap out the base URL to use any ERDDAP instance.
-
ripplerestr
provides an interface to the
Ripple
protocol for making financial transactions.
-
refimpact
connects to
the UK Research Excellence Framework 2014 Impact Case Studies Database API
.
-
restimizeapi
provides an interface to trading website
estimize.com
.
-
RForcecom: RForcecom provides a connection to Force.com and Salesforce.com.
-
Rgoodreads
(not on CRAN) interacts with
Goodreads
.
-
Two packages,
owmr
and
ROpenWeatherMap, work with the
Open Weather Map API
.
-
RSauceLabs
(
GitHub
) connects to
SauceLabs
.
-
RSocrata
access data for Socrata open data portals.
soql
is a pipe-oriented set of tools for constructing Socrata queries.
-
RStripe
provides an interface to
Stripe
, an online payment processor.
-
RZabbix
links with the
Zabbix network monitoring service API
.
-
rwars
retrieve and reformat data from the
Star Wars API (SWAPI)
.
-
slackr
(
GitHub
) is a client for Slack.com messaging platform.
-
SlideShaRe
(not on CRAN) is a client for Slideshare.
-
stackr
(not on CRAN): An unofficial wrapper for the read-only features of the
Stack Exchange API
.
-
telegram
(
GitHub
) connects with the Telegram Bot API.
-
trelloR
(
GitHub
) connects to the
Trello API
.
-
tuber
is a YouTube API client and
tubern
is a client for the YouTube Analytics and Reporting API
-
ubeR
is an interface to the Uber API.
-
udapi
connects to Urban Dictionary.
-
useRsnap
(not on CRAN) provides an interface to the API for
Usersnap
, a tool for collecting feedback from web application users.
-
yummlyr
(
GitHub
) provides an interface to the
Yummly
recipe database.
-
zendeskR: This package provides a wrapper for the Zendesk API.
-
ZillowR
is a client for the Zillow real estate service.
-
docuSignr
provides an interface to the DocuSign
Rest API
.
-
giphyr
is an R interface to the
Giphy API
for GIF’s
-
duckduckr
is an R interface
DuckDuckGo’s Instant Answer API