Home » Posts tagged 'Application programming interface' (Page 2)
Tag Archives: Application programming interface
Trying out Google Prediction API from R
So I saw the news at NY R Meetup and decided to have a go at Prediction API Package (which first started off as a blog post at
http://onertipaday.blogspot.com/2010/11/r-wrapper-for-google-prediction-api.html
1)My OS was Ubuntu 10.10 Netbook
Ubuntu has a slight glitch plus workaround for installing the RCurl package on which the Google Prediction API is dependent- you need to first install this Ubuntu package for RCurl to install libcurl4-gnutls-dev
Once you install that using Synaptic,
Simply start R
2) Install Packages rjson and Rcurl using install.packages and choosing CRAN
Since GooglePredictionAPI is not yet on CRAN
,
3) Download that package from
You need to copy this downloaded package to your “first library ” folder
When you start R, simply run
.libPaths()[1]
and thats the folder you copy the GooglePredictionAPI package you downloaded.
5) Now the following line works
- Under R prompt,
> install.packages("googlepredictionapi_0.1.tar.gz", repos=NULL, type="source")
6) Uploading data to Google Storage using the GUI (rather than gs util)
Just go to https://sandbox.google.com/storage/
and thats the Google Storage manager
Notes on Training Data-
Use a csv file
The first column is the score column (like 1,0 or prediction score)
There are no headers- so delete headers from data file and move the dependent variable to the first column (Note I used data from the kaggle contest for R package recommendation at
http://kaggle.com/R?viewtype=data )
6) The good stuff:
Once you type in the basic syntax, the first time it will ask for your Google Credentials (email and password)
It then starts showing you time elapsed for training.
Now you can disconnect and go off (actually I got disconnected by accident before coming back in a say 5 minutes so this is the part where I think this is what happened is why it happened, dont blame me, test it for yourself) -
and when you come back (hopefully before token expires) you can see status of your request (see below)
> library(rjson) > library(RCurl) Loading required package: bitops > library(googlepredictionapi) > my.model <- PredictionApiTrain(data="gs://numtraindata/training_data") The request for training has sent, now trying to check if training is completed Training on numtraindata/training_data: time:2.09 seconds Training on numtraindata/training_data: time:7.00 seconds
7)
Note I changed the format from the URL where my data is located- simply go to your Google Storage Manager and right click on the file name for link address ( https://sandbox.google.com/storage/numtraindata/training_data.csv)
to gs://numtraindata/training_data (that kind of helps in any syntax error)
8) From the kind of high level instructions at https://code.google.com/p/google-prediction-api-r-client/, you could also try this on a local file
Usage
## Load googlepredictionapi and dependent libraries library(rjson) library(RCurl) library(googlepredictionapi) ## Make a training call to the Prediction API against data in the Google Storage. ## Replace MYBUCKET and MYDATA with your data. my.model <- PredictionApiTrain(data="gs://MYBUCKET/MYDATA") ## Alternatively, make a training call against training data stored locally as a CSV file. ## Replace MYPATH and MYFILE with your data. my.model <- PredictionApiTrain(data="MYPATH/MYFILE.csv")
At the time of writing my data was still getting trained, so I will keep you posted on what happens.
Related Articles
- An R interface to the Google Prediction API (revolutionanalytics.com)
- Google Prediction Goes to the Movies (technoverseblog.com)
- 11 new APIs: Google Predictions, Amazon User Management (programmableweb.com)
- R at Google (r-bloggers.com)
- Google API Console Opens Up Millions of Queries Daily (programmableweb.com)
- Canonical Design Team: So, you want to provide an API for the world to use? (design.canonical.com)
How to Analyze Wikileaks Data – R SPARQL
Drew Conway- one of the very very few Project R voices I used to respect until recently. declared on his blog http://www.drewconway.com/zia/
Why I Will Not Analyze The New WikiLeaks Data
and followed it up with how HE analyzed the post announcing the non-analysis.
“If you have not visited the site in a week or so you will have missed my previous post on analyzing WikiLeaks data, which from the traffic and 35 Comments and 255 Reactions was at least somewhat controversial. Given this rare spotlight I thought it would be fun to use the infochimps API to map out the geo-location of everyone that visited the blog post over the last few days. Unfortunately, after nearly two years with the same web hosting service, only today did I realize that I was not capturing daily log files for my domain”
Anyways – non American users of R Project can analyze the Wikileaks data using the R SPARQL package I would advise American friends not to use this approach or attempt to analyze any data because technically the data is still classified and it’s possession is illegal (which is the reason Federal employees and organizations receiving federal funds have advised not to use this or any WikiLeaks dataset)
https://code.google.com/p/r-sparql/
Overview
R is a programming language designed for statistics.
R Sparql allows you to run SPARQL Queries inside R and store it as a R data frame.
The main objective is to allow the integration of Ontologies with Statistics.
It requires Java and rJava installed.
Example (in R console):
> library(sparql)> data <- query("SPARQL query>","RDF file or remote SPARQL Endpoint")
and the data in a remote SPARQL http://www.ckan.net/package/cablegate
SPARQL is an easy language to pick up, but dammit I am not supposed to blog on my vacations.
http://code.google.com/p/r-sparql/wiki/GettingStarted
Getting Started¶
1. Installation
1.1 Make sure Java is installed and is the default JVM:
$ sudo apt-get install sun-java6-bin sun-java6-jre sun-java6-jdk$ sudo update-java-alternatives -s java-6-sun
1.2 Configure R to use the correct version of Java
$ sudo R CMD javareconf
1.3 Install the rJava library
$ R> install.packages("rJava")> q()
1.4 Download and install the sparql library
Download: http://code.google.com/p/r-sparql/downloads/list
$ R CMD INSTALL sparql-0.1-X.tar.gz
2. Executing a SPARQL query
2.1 Start R
#Load the librarylibrary(sparql)#Run the queryresult <- query("SELECT ... ", "http://...")#Print the resultprint(result)
3. Examples
3.1 The Query can be a string or a local file:
query("SELECT ?date ?number ?season WHERE { ... }", "local-file.rdf")
query("my-query.rq", "local-file.rdf")
The package will detect if my-query.rq exists and will load it from the file.
3.3 The uri can be a file or an url (for remote queries):
query("SELECT ... ","local-file.db")
query("SELECT ... ","http://dbpedia.org/sparql")
3.4 Get some examples here: http://code.google.com/p/r-sparql/downloads/list
SPARQL Tutorial-
http://openjena.org/ARQ/Tutorial/index.html
Also read-
http://webr3.org/blog/linked-data/virtuoso-6-sparqlgeo-and-linked-data/
and from the favorite blog of Project R- Also known as NY Times
In May 2009, the Obama administration started putting raw government data on the Web. It started with 47 data sets. Today, there are more than 270,000 government data sets, spanning every imaginable category from public health to foreign aid.
Related Articles
- RDF and JSON: A Clash of Model and Syntax (ldodds.com)
- Getting started with SPARQL by Bob DuCharme – Lotico (lotico.com)
- SPARQL Endpoint set-up and load any twitter profile into the RDF Store (laurensgoessemantic.wordpress.com)
- Virtuoso SPARQL Query Demo (demo.openlinksw.com)
- Fun with infochimps: Animated Blog Post Hit Map (r-bloggers.com)
- WikiLeaks list of ‘critical’ sites: Is it a ‘menu for terrorists’? – Christian Science Monitor (news.google.com)
- Operation Payback: WikiLeaks Avenged by Hacktivists (pcworld.com)
- Getting Started With Linked Data – OpenUp Laboratories Example SPARQL Queries (ouseful.info)
