Using #Rstats for online data access

There are multiple packages in R to read data straight from online datasets.
These are as follows-

1) Google PredictionAPI package – http://code.google.com/p/r-google-prediction-api-v12/

You can upload your data to Google Storage and then train it using this package for the Google Prediction API

# install package
install.packages("googlepredictionapi_0.12.tar.gz", repos=NULL, type="source")

library(rjson)
library(RCurl)
library(googlepredictionapi)

#--- initialize

# turn off SSL check - see: http://code.google.com/p/r-google-analytics/issues/detail?id=1#c5 & http://www.omegahat.org/RCurl/FAQ.html
options(RCurlOptions = list(capath = system.file("CurlSSL", "cacert.pem", package = "RCurl"), ssl.verifypeer = FALSE))

# put your own email, password and API key below
myEmail <- "***"
myPassword <- "***"
myAPIkey <- "***"

# put path to python.exe on your computer and path do gsutil directory
myPython <- "c:/Python27/python.exe"
myGSUtilPath <- "c:/gsutil/"

myVerbose <- FALSE

#--- work

# upload local CVS file to Google Storage and initiate training; local file must be in R working directory
my.model <- PredictionApiTrain(data="./language_id_pl.txt",remote.file="gs://prediction_example/prediction_models/languages")

# alternative: initiate training of a model already uploaded to Google Storage
my.model <- PredictionApiTrain(data="gs://prediction_example/prediction_models/languages",tillDone=FALSE) # tillDone - repeat checking till model is trained

# check whether model is trained; if tillDone=TRUE was set above, there is no need for that
result <- PredictionApiCheckTrainingStatus("prediction_example","prediction_models/languages",verbose=TRUE)

# you can adapt the result returned by PredictionApiCheckTrainingStatus to 'predictionapimodel' class used in predictions
my.model <- WrapModel(result)

summary(my.model)

# check new data against model (I have added some Polish-language texts to the Google Prediction API 'Hello World' example)
predict(my.model,"'Prezydent Obama spotkał się z parlamentarzystami'")

# please note, this package returns all labels and scores for a given data in a format:
# [1] "Polish"   "French"   "Spanish"  "English"  "0.36195"  "0.26396"  "0.260067" "0.114022"

# some other prediction request
predict(my.model,"'This is a test'")

# list objects in a Google Storage bucket
PredictionApiListObjects("prediction_example","prediction_models/languages",verbose=TRUE)

2) RCurl from http://www.omegahat.org/RCurl/

allows us to download files from Web servers, post forms, use HTTPS (the secure HTTP), use persistent connections, upload files, use binary content, handle redirects, password authentication, etc.

The primary top-level entry points are

3) The basic package for this was HttpRequest http://cran.r-project.org/web/packages/httpRequest/

HTTP Request protocols. Implements the GET, POST and multipart POST request.

4) The Infochimps package provides functions to access all of the APIs currently available infochimps.com. For more information see http://api.infochimps.com/

http://infochimps.com/ has 14,000 data sets

The free account at http://api.infochimps.com/

Baboon

Free

100,000 API calls/mo
2,000 calls/hr burst
Attribution require

5) WDI- You can access only the World Bank Data using this R package http://cran.r-project.org/web/packages/WDI/index.html

Search, extract and format data from the World Bank’s World Development Indicators

6) Quantmod allows you to download financial data from Yahoo Finance

http://cran.r-project.org/web/packages/quantmod/index.html

Also see http://www.quantmod.com/

7) The latest package is Rdatamarket

It fetches data from DataMarket.com, either as timeseries in zoo form (dmseries) or as long-form data frames (dmlist)

Also see https://github.com/DataMarket/rdatamarket

http://datamarket.com/ has 100 million time series from the most important data providers, such as the UN, World Bank and Eurostat.

8) XML package

Most packages in this category end up dependent on the XML package which is used for reading and creating XML (and HTML) documents (including DTDs), both local and accessible via HTTP or FTP

http://cran.r-project.org/web/packages/XML/index.html

9) The RBloomberg package can access Bloomberg data (but requires a Bloomberg installations on a Windows PC).

10) Additional packages are scrapeR http://cran.r-project.org/web/packages/scrapeR/index.html

Additional note-

Many people find RJSON useful for data interchange.

From http://cran.r-project.org/web/packages/rjson/index.html

Converts R object into JSON objects and vice-versa

http://www.json.org/

JSON (JavaScript Object Notation) is a lightweight data-interchange format.

It is easy for humans to read and write. It is easy for machines to parse and generate.