Save the Data

Breakdown of political party representation in...
Image via Wikipedia

I just read an online cause here-

http://sunlightfoundation.com/savethedata/

Some of the most important technology programs that keep Washington accountable are in danger of being eliminated. Data.gov, USASpending.gov, the IT Dashboard and other federal data transparency and government accountability programs are facing a massive budget cut, despite only being a tiny fraction of the national budget. Help save the data and make sure that Congress doesn’t leave the American people in the dark.

I wonder why the federal government/ non profit agencies can help create a SPARQL database, and in days of cloud computing, why a tech major cannot donate storage space to it, after all despite US corporate tax rate being high, US technological companies do end up paying a lower rate thanks to tax breaks/routing overseas revenue.

In the new age data is power, and the US has led in its mission to use technology to further its own values even especially in Middle East. The datasets should be made public and transitioned to the private sector/academia for research and re designing for data augmentation with out straining the massive deficit /borrowing/ fighting 3 wars. Of particular interest would be datasets of campaign finances  and donors especially given large number of retail/small donors/internet marketing in elections as it will also help serve as an example of democracy and change. Even countries like China can create a corruption/expense efficiency tracking internal dashboard with restricted rights to help with rural and urban governance.

How to Analyze Wikileaks Data – R SPARQL

Logo for R
Image via Wikipedia

Drew Conway- one of the very very few Project R voices I used to respect until recently. declared on his blog http://www.drewconway.com/zia/

Why I Will Not Analyze The New WikiLeaks Data

and followed it up with how HE analyzed the post announcing the non-analysis.

“If you have not visited the site in a week or so you will have missed my previous post on analyzing WikiLeaks data, which from the traffic and 35 Comments and 255 Reactions was at least somewhat controversial. Given this rare spotlight I thought it would be fun to use the infochimps API to map out the geo-location of everyone that visited the blog post over the last few days. Unfortunately, after nearly two years with the same web hosting service, only today did I realize that I was not capturing daily log files for my domain”

Anyways – non American users of R Project can analyze the Wikileaks data using the R SPARQL package I would advise American friends not to use this approach or attempt to analyze any data because technically the data is still classified and it’s possession is illegal (which is the reason Federal employees and organizations receiving federal funds have advised not to use this or any WikiLeaks dataset)

https://code.google.com/p/r-sparql/

Overview

R is a programming language designed for statistics.

R Sparql allows you to run SPARQL Queries inside R and store it as a R data frame.

The main objective is to allow the integration of Ontologies with Statistics.

It requires Java and rJava installed.

Example (in R console):

> library(sparql)> data <- query("SPARQL query>","RDF file or remote SPARQL Endpoint")

and the data in a remote SPARQL  http://www.ckan.net/package/cablegate

SPARQL is an easy language to pick  up, but dammit I am not supposed to blog on my vacations.

http://code.google.com/p/r-sparql/wiki/GettingStarted

Getting Started

1. Installation

1.1 Make sure Java is installed and is the default JVM:

$ sudo apt-get install sun-java6-bin sun-java6-jre sun-java6-jdk$ sudo update-java-alternatives -s java-6-sun

1.2 Configure R to use the correct version of Java

$ sudo R CMD javareconf

1.3 Install the rJava library

$ R> install.packages("rJava")> q()

1.4 Download and install the sparql library

Download: http://code.google.com/p/r-sparql/downloads/list

$ R CMD INSTALL sparql-0.1-X.tar.gz

2. Executing a SPARQL query

2.1 Start R

#Load the librarylibrary(sparql)#Run the queryresult <- query("SELECT ... ", "http://...")#Print the resultprint(result)

3. Examples

3.1 The Query can be a string or a local file:

query("SELECT ?date ?number ?season WHERE {  ... }", "local-file.rdf")
query("my-query.rq", "local-file.rdf")

The package will detect if my-query.rq exists and will load it from the file.

3.3 The uri can be a file or an url (for remote queries):

query("SELECT ... ","local-file.db")
query("SELECT ... ","http://dbpedia.org/sparql")

3.4 Get some examples here: http://code.google.com/p/r-sparql/downloads/list

SPARQL Tutorial-

http://openjena.org/ARQ/Tutorial/index.html

Also read-

http://webr3.org/blog/linked-data/virtuoso-6-sparqlgeo-and-linked-data/

and from the favorite blog of Project R- Also known as NY Times

http://bits.blogs.nytimes.com/2010/11/15/sorting-through-the-government-data-explosion/?twt=nytimesbits

In May 2009, the Obama administration started putting raw 
government data on the Web. 
It started with 47 data sets. Today, there are more than
 270,000 government data sets, spanning every imaginable 
category from public health to foreign aid.