There are multiple packages in R to read data straight from online datasets.
These are as follows-
1) Google PredictionAPI package – http://code.google.com/p/r-google-prediction-api-v12/
You can upload your data to Google Storage and then train it using this package for the Google Prediction API
# install package install.packages("googlepredictionapi_0.12.tar.gz", repos=NULL, type="source") library(rjson) library(RCurl) library(googlepredictionapi) #--- initialize # turn off SSL check - see: http://code.google.com/p/r-google-analytics/issues/detail?id=1#c5 & http://www.omegahat.org/RCurl/FAQ.html options(RCurlOptions = list(capath = system.file("CurlSSL", "cacert.pem", package = "RCurl"), ssl.verifypeer = FALSE)) # put your own email, password and API key below myEmail <- "***" myPassword <- "***" myAPIkey <- "***" # put path to python.exe on your computer and path do gsutil directory myPython <- "c:/Python27/python.exe" myGSUtilPath <- "c:/gsutil/" myVerbose <- FALSE #--- work # upload local CVS file to Google Storage and initiate training; local file must be in R working directory my.model <- PredictionApiTrain(data="./language_id_pl.txt",remote.file="gs://prediction_example/prediction_models/languages") # alternative: initiate training of a model already uploaded to Google Storage my.model <- PredictionApiTrain(data="gs://prediction_example/prediction_models/languages",tillDone=FALSE) # tillDone - repeat checking till model is trained # check whether model is trained; if tillDone=TRUE was set above, there is no need for that result <- PredictionApiCheckTrainingStatus("prediction_example","prediction_models/languages",verbose=TRUE) # you can adapt the result returned by PredictionApiCheckTrainingStatus to 'predictionapimodel' class used in predictions my.model <- WrapModel(result) summary(my.model) # check new data against model (I have added some Polish-language texts to the Google Prediction API 'Hello World' example) predict(my.model,"'Prezydent Obama spotkał się z parlamentarzystami'") # please note, this package returns all labels and scores for a given data in a format: # [1] "Polish" "French" "Spanish" "English" "0.36195" "0.26396" "0.260067" "0.114022" # some other prediction request predict(my.model,"'This is a test'") # list objects in a Google Storage bucket PredictionApiListObjects("prediction_example","prediction_models/languages",verbose=TRUE)
2) RCurl from http://www.omegahat.org/RCurl/
allows us to download files from Web servers, post forms, use HTTPS (the secure HTTP), use persistent connections, upload files, use binary content, handle redirects, password authentication, etc.
The primary top-level entry points are
3) The basic package for this was HttpRequest http://cran.r-project.org/web/packages/httpRequest/
HTTP Request protocols. Implements the GET, POST and multipart POST request.
4) The Infochimps package provides functions to access all of the APIs currently available infochimps.com. For more information see http://api.infochimps.com/
http://infochimps.com/ has 14,000 data sets
The free account at http://api.infochimps.com/
Baboon
- 100,000 API calls/mo
- 2,000 calls/hr burst
- Attribution require
5) WDI- You can access only the World Bank Data using this R package http://cran.r-project.org/web/packages/WDI/index.html
Search, extract and format data from the World Bank’s World Development Indicators
6) Quantmod allows you to download financial data from Yahoo Finance
http://cran.r-project.org/web/packages/quantmod/index.html
Also see http://www.quantmod.com/
7) The latest package is Rdatamarket
It fetches data from DataMarket.com, either as timeseries in zoo form (dmseries) or as long-form data frames (dmlist)
Also see https://github.com/DataMarket/rdatamarket
http://datamarket.com/ has 100 million time series from the most important data providers, such as the UN, World Bank and Eurostat.
8) XML package
Most packages in this category end up dependent on the XML package which is used for reading and creating XML (and HTML) documents (including DTDs), both local and accessible via HTTP or FTP
http://cran.r-project.org/web/packages/XML/index.html
9) The RBloomberg package can access Bloomberg data (but requires a Bloomberg installations on a Windows PC).
10) Additional packages are scrapeR http://cran.r-project.org/web/packages/scrapeR/index.html
Additional note-
Many people find RJSON useful for data interchange.
From http://cran.r-project.org/web/packages/rjson/index.html
Converts R object into JSON objects and vice-versa
JSON (JavaScript Object Notation) is a lightweight data-interchange format.
It is easy for humans to read and write. It is easy for machines to parse and generate.
a very nice and comprehensive list is also here- http://moderntoolmaking.blogspot.com/2011/08/25-more-ways-to-bring-data-into-r.html