Task View on Web Technologies #rstats

Task Views on R offer a good way to navigate the 5000 + plus packages . Screenshot from 2013-09-17 19:16:50 They are here-  http://cran.r-project.org/web/views/

ROpenSci has a CRAN Task View , except it is on Github, and it is on using web services from within R. I think it is more like a R API View!

UPDATE- It is updated and now on CRAN here

I wish CRAN View allowed MORE Markdown expecially in the views– or the whole site …

😉

http://cran.r-project.org/web/views/WebTechnologies.html

You can see a lot of Not on CRAN packages huh ed?

CRAN Task View: Web Technologies and Services

Maintainer: Scott Chamberlain, Karthik Ram, Christopher Gandrud, Patrick Mair
Contact: scott at ropensci.org
Version: 2013-10-02
This task view contains information about using R to obtain and parse data from the web. The base version of R does not ship with many tools for interacting with the web. Thankfully, there are an increasingly large number of tools for interacting with the web. If you have any comments or suggestions for additions or improvements please contact the maintainer of the task view. A list of available packages and functions is presented below, grouped by the type of activity.

Tools for Working with the Web from R

Parsing Data from the Web

  • The repmis package contains a source_data() command to load plain-text data from a URL (either http or https).
  • The package XML contains functions for parsing XML and HTML, and supports xpath for searching XML (think regex for strings). A helpful function to read data from one or more HTML tables is readHTMLTable().
  • scrapeR provides additional tools for scraping data from HTML and XML documents.
  • The XML2R package (to be on CRAN soon) is a collection of convenient functions for coercing XML into data frames.
  • The rjson converts R object into Javascript object notation (JSON) objects and vice-versa.
  • An alternative to the rjson is RJSONIO which also converts to and from data in JSON format (it is fast for parsing).
  • An alternative to the XML package is selectr, which parses CSS3 Selectors and translates them to XPath 1.0 expressions.

Curl/HTTP/FTP and Authentication:

  • RCurl: A low level curl wrapper that allows one to compose general HTTP requests and provides convenient functions to fetch URIs, get/post forms, etc. and process the results returned by the Web server. This provides a great deal of control over the HTTP/FTP connection and the form of the request while providing a higher-level interface than is available just using R socket connections. It also provide tools for Web authentication.
  • httr: A light wrapper around RCurl that makes many things easier, but still allows you to access the lower level functionality of RCurl. It has convenient http verbs: GET(), POST(), PUT(), DELETE(), PATCH(), HEAD(), BROWSE(). These wrap functions are more convenient to use, though less configurable than counterparts in RCurl. http status codes are helpful for debugging http calls. This package makes this easier using, for example, stop_for_status() gets the http status code from a response object, and stops the function if the call was not successful.
  • Using web resources can require authentication, either via API keys, OAuth, username:password combination, or via other means. ROAuth is a package that provides a separate R interface to OAuth. OAuth is the most complicated authentication process, and can be most easily done using httr (see package demos).

Web Frameworks

  • The shiny package makes it easy to build interactive web applications with R.
  • The Rook web server interface contains the specification and convenience software for building and running Rook applications.
  • The opencpu framework for embedded statistical computation and reproducible research exposes a web API interfacing R, LaTeX and Pandoc. This API is used for example to integrate statistical functionality into systems, share and execute scripts or reports on centralized servers, and build R based apps.

JavaScript

  • ggvis (not on CRAN) makes it easy to describe interactive web graphics in R. It fuses the ideas of ggplot2 and shiny, rendering graphics on the web with Vega.
  • rCharts (not on CRAN) allows for interactive javascript charts from R.
  • rVega (not on CRAN) is an R wrapper for Vega.
  • clickme (not on CRAN) is an R package to create interactive plots.

Data Sources on the Web Accessible via R

Ecological and Evolutionary Biology

  • rvertnet: A wrapper to the VertNet collections database API.
  • rgbif: Interface to the Global Biodiversity Information Facility API methods.
  • rfishbase: A programmatic interface to fishbase.org.
  • treebase: An R package for discovery, access and manipulation of online phylogenies.
  • taxize: Taxonomic information from around the web.
  • dismo: Species distribution modeling, with wrappers to some APIs.
  • rnbn (not on CRAN): Access to the UK National Biodiversity Network data.
  • rWBclimate (not on CRAN): R interface for the World Bank climate data.
  • rbison (not on CRAN): Wrapper to the USGS Bison API.
  • neotoma (not on CRAN): Programmatic R interface to the Neotoma Paleoecological Database.
  • rnoaa (not on CRAN): R interface to NOAA Climate data API.
  • rnpn (not on CRAN): Wrapper to the National Phenology Network database API.
  • rfisheries: Package for interacting with fisheries databases at openfisheries.org.
  • rebird: A programmatic interface to the eBird database.
  • flora: Retrieve taxonomical information of botanical names from the Flora do Brasil website.
  • Rcolombos: This package provides programmatic access to Colombos, a web based interface for exploring and analyzing comprehensive organism-specific cross-platform expression compendia of bacterial organisms.
  • Reol: An R interface to the Encyclopedia of Life (EOL) API. Includes functions for downloading and extracting information off the EOL pages.
  • rPlant: An R interface to the the many computational resources iPlant offers through their RESTful application programming interface. Currently, rPlant functions interact with the iPlant foundational API, the Taxonomic Name Resolution Service API, and the Phylotastic Taxosaurus API. Before using rPlant, users will have to register with the iPlant Collaborative. http://www.iplantcollaborative.org/discover/discovery-environment

Genes and Genomes

  • cgdsr: R-Based API for accessing the MSKCC Cancer Genomics Data Server (CGDS).
  • rsnps (not on CRAN): Wrapper to the openSNP data API and the Broad Institute SNP Annotation and Proxy Search.
  • rentrez: Talk with NCBI entrez using R.

Earth Science

  • RNCEP: Obtain, organize, and visualize NCEP weather data.
  • crn: Provides the core functions required to download and format data from the Climate Reference Network. Both daily and hourly data are downloaded from the ftp, a consolidated file of all stations is created, station metadata is extracted. In addition functions for selecting individual variables and creating R friendly datasets for them is provided.
  • BerkeleyEarth: Data input for Berkeley Earth Surface Temperature.
  • waterData: An R Package for retrieval, analysis, and anomaly calculation of daily hydrologic time series data.
  • CHCN: A compilation of historical through contemporary climate measurements scraped from the Environment Canada Website Including tools for scraping data, creating metadata and formating temperature files.
  • decctools: Provides functions for retrieving energy statistics from the United Kingdom Department of Energy and Climate Change and related data sources. The current version focuses on total final energy consumption statistics at the local authority, MSOA, and LSOA geographies. Methods for calculating the generation mix of grid electricity and its associated carbon intensity are also provided.
  • Metadata: Collates metadata for climate surface stations.
  • sos4R: A client for Sensor Observation Services (SOS) as specified by the Open Geospatial Consortium (OGC). It allows users to retrieve metadata from SOS web services and to interactively create requests for near real-time observation data based on the available sensors, phenomena, observations et cetera using thematic, temporal and spatial filtering.

Economics

  • WDI: Search, extract and format data from the World Bank’s World Development Indicators.
  • FAOSTAT: The package hosts a list of functions to download, manipulate, construct and aggregate agricultural statistics provided by the FAOSTAT (Food and Agricultural Organization of the United Nations) database.

Chemistry

  • rpubchem: Interface to the PubChem Collection.

Agriculture

  • cimis: R package for retrieving data from CIMIS, the California Irrigation Management Information System.

Literature, Metadata, Text, and Altmetrics

  • rplos: A programmatic interface to the Web Service methods provided by the Public Library of Science journals for search.
  • rbhl (not on CRAN): R interface to the Biodiversity Heritage Library (BHL) API.
  • rmetadata (not on CRAN): Get scholarly metadata from around the web.
  • RMendeley: Implementation of the Mendeley API in R.
  • rentrez: Talk with NCBI entrez using R.
  • rorcid (not on CRAN): A programmatic interface the Orcid.org API.
  • rpubmed (not on CRAN): Tools for extracting and processing Pubmed and Pubmed Central records.
  • rAltmetic (not on CRAN): Query and visualize metrics from Altmetric.com.
  • rImpactStory: Programmatic interface to the ImpactStory API.
  • alm (not on CRAN): R wrapper to the almetrics API platform developed by PLoS.
  • ngramr: Retrieve and plot word frequencies through time from the Google Ngram Viewer.

Marketing

  • anametrix: Bidirectional connector to Anametrix API.

Data Depots

  • dvn: Provides access to The Dataverse Network API.
  • rfigshare: Programmatic interface for Figshare.
  • factualR: Thin wrapper for the Factual.com server API.
  • dataone: A package that provides read/write access to data and metadata from the DataONE network of Member Node data repositories.
  • yhatr: Lets you deploy, maintain, and invoke models via the Yhat REST API.
  • RSocrata: Provided with a Socrata dataset resource URL, or a Socrata SoDA web API query, returns an R data frame. Converts dates to POSIX format. Supports CSV and JSON. Manages throttling by Socrata.

Machine Learning as a Service

  • bigml: BigML, a machine learning web service.
  • MTurkR: Access to Amazon Mechanical Turk Requester API via R.

Web Analytics

  • rgauges (not on CRAN): Interface to Gaug.es API.
  • RSiteCatalyst: Functions for accessing the Adobe Analytics (Omniture SiteCatalyst) Reporting API.
  • r-google-analytics (not on CRAN): Provides access to Google Analytics.

News

  • GuardianR: Provides an interface to the Open Platform’s Content API of the Guardian Media Group. It retrieves content from news outlets The Observer, The Guardian, and guardian.co.uk from 1999 to current day.

Images, Videos, Music

  • imguR: A package to share plots using the image hosting service imgur.com.
  • RLastFM: A package to interface to the last.fm API.

Sports

  • nhlscrapr: Compiling the NHL Real Time Scoring System Database for easy use in R.

Maps

  • RgoogleMaps: This package serves two purposes: It provides a comfortable R interface to query the Google server for static maps, and use the map as a background image to overlay plots within R.
  • osmar: This package provides infrastructure to access OpenStreetMap data from different sources to work with the data in common R manner and to convert data into available infrastructure provided by existing R packages (e.g., into sp and igraph objects).
  • ggmap: Allows for the easy visualization of spatial data and models on top of Google Maps, OpenStreetMaps, Stamen Maps, or CloudMade Maps using ggplot2.

Social media

  • streamR: This package provides a series of functions that allow R users to access Twitter’s filter, sample, and user streams, and to parse the output into data frames. OAuth authentication is supported.
  • twitteR: Provides an interface to the Twitter web API.

Government

  • wethepeople: An R client for interacting with the White House’s “We The People” petition API.
  • govdat: Interface to various APIs for government data, including New York Times congress API, and the Sunlight Foundation set of APIs.
  • govStatJPN: Functions to get public survey data in Japan.

Other

  • sos4R: R client for the OGC Sensor Observation Service.
  • datamart: Provides an S4 infrastructure for unified handling of internal datasets and web based data sources. Examples include dbpedia, eurostat and sourceforge.
  • rDrop (not on CRAN): Dropbox interface.
  • zendeskR: This package provides an R wrapper for the Zendesk API.

The original was on github

https://github.com/ropensci/webservices

Some new packages I really liked!

  • rDrop: Dropbox interface.
  • nhlscraper: Compiling the NHL Real Time Scoring System Database for easy use in R
  • osmar: This package provides infrastructure to access OpenStreetMap data from different sources, to work with the data in common R manner, and to convert data into available infrastructure provided by existing R packages (e.g., into sp and igraph objects).
  • MTurkR: Access to Amazon Mechanical Turk Requester API via R. more

  • rgauges: Interface to Gaug.es API more (not on CRAN)

  • RSiteCatalyst: Adobe Analytics (Omniture SiteCatalyst) Reporting API

  • GuardianR: Provides an interface to the Open Platform’s Content API of the Guardian Media Group.

  • imguR: A package to share plots using the image hosting service imgur.com
  • RLastFM: A package to interface to the last.fm API.

Author: Ajay Ohri

http://about.me/ajayohri

1 thought on “Task View on Web Technologies #rstats”

  1. Wow, there is a Guardian API. Had no idea, thats Awesome! HT, I just set this up, so accessible to pull data out the content API.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s