Software Review- BigML.com – Machine Learning meets the Cloud

I had a chance to dekko the new startup BigML https://bigml.com/ and was suitably impressed by the briefing and my own puttering around the site. Here is my review-

1) The website is very intutively designed- You can create a dataset from an uploaded file in one click and you can create a Decision Tree model in one click as well. I wish other cloud computing websites like Google Prediction API make design so intutive and easy to understand. Also unlike Google Prediction API, the models are not black box models, but have a description which can be understood.

2) It includes some well known data sources for people trying it out. They were kind enough to offer 5 invite codes for readers of Decisionstats ( if you want to check it yourself, use the codes below the post, note they are one time only , so the first five get the invites.

BigML is still invite only but plan to get into open release soon.

3) Data Sources can only be by uploading files (csv) but they plan to change this hopefully to get data from buckets (s3? or Google?) and from URLs.

4) The one click operation to convert data source into a dataset shows a histogram (distribution) of individual variables.The back end is clojure , because the team explained it made the easiest sense and fit with Java. The good news (?) is you would never see the clojure code at the back end. You can read about it from http://clojure.org/

As cloud computing takes off (someday) I expect clojure popularity to take off as well.

Clojure is a dynamic programming language that targets the Java Virtual Machine (and the CLR, and JavaScript). It is designed to be a general-purpose language, combining the approachability and interactive development of a scripting language with an efficient and robust infrastructure for multithreaded programming. Clojure is a compiled language – it compiles directly to JVM bytecode, yet remains completely dynamic. Every feature supported by Clojure is supported at runtime. Clojure provides easy access to the Java frameworks, with optional type hints and type inference, to ensure that calls to Java can avoid reflection.

Clojure is a dialect of Lisp

5) As of now decision trees is the only distributed algol, but they expect to roll out other machine learning stuff soon. Hopefully this includes regression (as logit and linear) and k means clustering. The trees are created and pruned in real time which gives a slightly animated (and impressive effect). and yes model building is an one click operation.

The real time -live pruning is really impressive and I wonder why /how it can ever be replicated in other software based on desktop, because of the sheer interactive nature.

Making the model is just half the work. Creating predictions and scoring the model is what is really the money-earner. It is one click and customization is quite intuitive. It is not quite PMML compliant yet so I hope some Zemanta like functionality can be added so huge amounts of models can be applied to predictions or score data in real time.

If you are a developer/data hacker, you should check out this section too- it is quite impressive that the designers of BigML have planned for API access so early.

https://bigml.com/developers

BigML.io gives you:

Secure programmatic access to all your BigML resources.

Fully white-box access to your datasets and models.

Asynchronous creation of datasets and models.

Near real-time predictions.

Note: For your convenience, some of the snippets below include your real username and API key.

Please keep them secret.

REST API

BigML.io conforms to the design principles of Representational State Transfer (REST). BigML.io is enterely HTTP-based.

BigML.io gives you access to four basic resources: Source, Dataset, Model and Prediction. You cancreate, read, update, and delete resources using the respective standard HTTP methods: POST, GET,PUT and DELETE.

All communication with BigML.io is JSON formatted except for source creation. Source creation is handled with a HTTP PUT using the “multipart/form-data” content-type

HTTPS

All access to BigML.io must be performed over HTTPS

and https://bigml.com/developers/quick_start ( In think an R package which uses JSON ,RCurl would further help in enhancing ease of usage).

Summary-

Overall a welcome addition to make software in the real of cloud computing and statistical computation/business analytics both easy to use and easy to deploy with fail safe mechanisms built in.

Check out https://bigml.com/ for yourself to see.

The invite codes are here -one time use only- first five get the invites- so click and try your luck, machine learning on the cloud.

If you dont get an invite (or it is already used, just leave your email there and wait a couple of days to get approval)

Cyber Cold War

I try to write on cyber conflict without getting into the politics of why someone is hacking someone else. I always get beaten by someone in the comments thread when I write on politics.

But recent events have forced me to update my usual “how-to” cyber conflict to “why” cyber conflict. This is because of a terrorist attack in my hometown Delhi.

(updated-

http://www.nytimes.com/2012/02/14/world/middleeast/israeli-embassy-officials-attacked-in-india-and-georgia.html?_r=1&hp

Iran allegedly tried (as per Israel) to assassinate the wife of Israeli Defence Attache in Delhi using a magnetic bomb, India as she went to school to pick up her kids, somebody else put a grenade in Israeli embassy car in Georgia which was found in time.

Based on reports , initial work suggests the bomb was much more sophisticated than local terrorists, but the terrorists seemed to have some local recce work done.

India has 0 history of antisemitism but this is the second time Israelis have been targeted since 26/11 Mumbai attacks. India buys 12 % of oil annually from Iran (and refuses to join the oil embargo called by US and Europe)

Cyber Conflict is less painful than conflict, which is inevitable as long as mankind exists. Also the Western hemisphere needs a moon shot (cyber conflict could be the Sputnik like moment) and with declining and aging populations but better technology, Western Hemisphere govts need cyber conflict as they are running out of humans to fight their wars. Eastern govt. are even more obnoxious in using children for conflict propaganda, and corruption.

Last week CIA.gov website went down

This week Iranian govt is allegedly blocking https traffic on eve of Annual Revolution Day (what a coincidence!)

Some resources to help Internet users in Iran (or maybe this could be a dummy test for the big one – hacking the great firewall of China)

News from Hacker News-

http://news.ycombinator.com/item?id=3575029

I’m writing this to report the serious troubles we have regarding accessing Internet in Iran at the moment. Since Thursday Iranian government has shutted down the https protocol which has caused almost all google services (gmail, and google.com itself) to become inaccessible. Almost all websites that reply on Google APIs (like wolfram alpha) won’t work. Accessing to any website that replies on https (just imaging how many websites use this protocol, from Arch Wiki to bank websites). Also accessing many proxies is also impossible. There are almost no official reports on this and with many websites and my email accounts restricted I can just confirm this based on my own and friends experience. I have just found one report here:

Iran Shut Down Gmail , Google , Yahoo and sites using “Https” Protocol

The reason for this horrible shutdown is that the Iranian regime celebrates 1979 Islamic revolution tomorrow.

I just wanted to let you guys know about this. If you have any solution regarding bypassing this restriction please help!

The boys at Tor think they can help-

~~but its not so elegant, as I prefer creating a batch file rather than explain coding to newbies.~~

this is still getting to better and easier interfaces

https://www.torproject.org/projects/obfsproxy-instructions.html.en

Obfsproxy Instructions

client torrc

Step 1: Install dependencies, obfsproxy, and Tor

You will need a C compiler (gcc), the autoconf and autotools build system, the git revision control system, pkg-config andlibtool, libevent-2 and its headers, and the development headers of OpenSSL.

On Debian testing or Ubuntu oneiric, you could do:
# apt-get install autoconf autotools-dev gcc git pkg-config libtool libevent-2.0-5 libevent-dev libevent-openssl-2.0-5 libssl-dev

If you’re on a more stable Linux, you can either try our experimental backport libevent2 debs or build libevent2 from source.

Clone obfsproxy from its git repository:
$ git clone https://git.torproject.org/obfsproxy.git
The above command should create and populate a directory named ‘obfsproxy’ in your current directory.

Compile obfsproxy:
$ cd obfsproxy
$ ./autogen.sh && ./configure && make

Optionally, as root install obfsproxy in your system:
# make install

If you prefer not to install obfsproxy as root, you can instead just modify the Transport lines in your torrc file (explained below) to point to your obfsproxy binary.

You will need Tor 0.2.3.11-alpha or later.

Step 2a: If you’re the client…

First, you need to learn the address of a bridge that supports obfsproxy. If you don’t know any, try asking a friend to set one up for you. Then the appropriate lines to your tor configuration file:

UseBridges 1
Bridge obfs2 128.31.0.34:1051
ClientTransportPlugin obfs2 exec /usr/local/bin/obfsproxy --managed

Don’t forget to replace 128.31.0.34:1051 with the IP address and port that the bridge’s obfsproxy is listening on.
Congratulations! Your traffic should now be obfuscated by obfsproxy. You are done! You can now start using Tor.

For old fashioned tunnel creation under Seas of English Channel-

http://dag.wieers.com/howto/ssh-http-tunneling/

Tunneling SSH over HTTP(S)

This document explains how to set up an Apache server and SSH client to allow tunneling SSH over HTTP(S). This can be useful on restricted networks that either firewall everything except HTTP traffic (tcp/80,tcp/443) or require users to use a local (HTTP) proxy.

A lot of people asked why doing it like this if you can just make sshd listen on port 443. Well, that might work if your environment is not hardened like I have seen at several companies, but this setup has a few advantages.

You can proxy to anywhere (see the Proxy directive in Apache) based on names
You can proxy to any port you like (see the AllowCONNECT directive in Apache)
It works even when there is a layer-7 protocol firewall
If you enable proxytunnel ssl support, it is indistinguishable from real SSL traffic
You can come up with nice hostnames like ‘downloads.yourdomain.com’ and ‘pictures.yourdomain.com’ and for normal users these will look like normal websites when visited.
There are many possibilities for doing authentication further along the path
You can do proxy-bouncing to the n-th degree to mask where you’re coming from or going to (however this requires more changes to proxytunnel, currently I only added support for one remote proxy)
You do not have to dedicate an IP-address for sshd, you can still run an HTTPS site

and some crypto for young people

http://users.telenet.be/d.rijmenants/en/onetimepad.htm

Me- What am I doing about it? I am just writing poems on hacking at http://poemsforkush.com

Note on Internet Privacy (Updated)and a note on DNSCrypt

I noticed the brouaha on Google’s privacy policy. I am afraid that social networks capture much more private information than search engines (even if they integrate my browser history, my social network, my emails, my search engine keywords) – I am still okay. All they are going to do is sell me better ads (maybe than just flood me with ads hoping to get a click). Of course Microsoft should take it one step forward and capture data from my desktop as well for better ads, that would really complete the curve. In any case , with the Patriot Act, most information is available to the Government anyway.

But it does make sense to have an easier to understand privacy policy, and one of my disappointments is the complete lack of visual appeal in such notices. Make things simple as possible, but no simpler, as Al-E said.

Privacy activists forget that ads run on models built on AGGREGATED data, and most models are scored automatically. Unless you do something really weird and fake like, chances are the data pertaining to you gets automatically collected, algorithmic-ally aggregated, then modeled and scored, and a corresponding ad to your score, or segment is shown to you. Probably no human eyes see raw data (but big G can clarify that)

( I also noticed Google gets a lot of free advice from bloggers. hey, if you were really good at giving advice to Google- they WILL hire you !)

on to another tool based (than legalese based approach to privacy)

I noticed tools like DNSCrypt increase internet security, so that all my integrated data goes straight to people I am okay with having it (ad sellers not governments!)

Unfortunately it is Mac Only, and I will wait for Windows or X based tools for a better review. I noticed some lag in updating these tools , so I can only guess that the boys of Baltimore have been there, so it is best used for home users alone.

Maybe they can find a chrome extension for DNS dummies.

http://www.opendns.com/technology/dnscrypt/

Why DNSCrypt is so significant

In the same way the SSL turns HTTP web traffic into HTTPS encrypted Web traffic, DNSCrypt turns regular DNS traffic into encrypted DNS traffic that is secure from eavesdropping and man-in-the-middle attacks. It doesn’t require any changes to domain names or how they work, it simply provides a method for securely encrypting communication between our customers and our DNS servers in our data centers. We know that claims alone don’t work in the security world, however, so we’ve opened up the source to our DNSCrypt code base and it’s available onGitHub.

DNSCrypt has the potential to be the most impactful advancement in Internet security since SSL, significantly improving every single Internet user’s online security and privacy.

and

http://dnscurve.org/crypto.html

The DNSCurve project adds link-level public-key protection to DNS packets. This page discusses the cryptographic tools used in DNSCurve.

Elliptic-curve cryptography

DNSCurve uses elliptic-curve cryptography, not RSA.

RSA is somewhat older than elliptic-curve cryptography: RSA was introduced in 1977, while elliptic-curve cryptography was introduced in 1985. However, RSA has shown many more weaknesses than elliptic-curve cryptography. RSA’s effective security level was dramatically reduced by the linear sieve in the late 1970s, by the quadratic sieve and ECM in the 1980s, and by the number-field sieve in the 1990s. For comparison, a few attacks have been developed against some rare elliptic curves having special algebraic structures, and the amount of computer power available to attackers has predictably increased, but typical elliptic curves require just as much computer power to break today as they required twenty years ago.

IEEE P1363 standardized elliptic-curve cryptography in the late 1990s, including a stringent list of security criteria for elliptic curves. NIST used the IEEE P1363 criteria to select fifteen specific elliptic curves at five different security levels. In 2005, NSA issued a new “Suite B” standard, recommending the NIST elliptic curves (at two specific security levels) for all public-key cryptography and withdrawing previous recommendations of RSA.

Some specific types of elliptic-curve cryptography are patented, but DNSCurve does not use any of those types of elliptic-curve cryptography.

Occupy the Internet

BORN IN THE USA

Continue reading “Occupy the Internet”

Google Plus API- statistical text mining anyone

For the past year and two I have noticed a lot of statistical analysis using #rstats /R on unstructured text generated in real time by the social network Twitter. From an analytic point of view , Google Plus is an interesting social network , as it is a social network that is new and arrived after the analytic tools are relatively refined. It is thus an interesting use case for evolution of people behavior measured globally AFTER analytic tools in text mining are evolved and we can thus measure how people behave and that behavior varies as the social network and its user interface evolves.

And it would also be a nice benchmark to do sentiment analysis across multiple social networks.

Some interesting use cases of using Twitter that have been used in R.

Using R to search Twitter for analysis

http://www.franklincenterhq.org/2429/using-r-to-search-twitter-for-analysis/

Text Data Mining With Twitter And R

http://heuristically.wordpress.com/2011/04/08/text-data-mining-twitter-r/

TWITTER FROM R… SURE, WHY NOT!

http://www.cerebralmastication.com/2009/06/twitter-from-r-sure-why-not/

A package called TwitteR

http://cran.r-project.org/web/packages/twitteR/

http://cran.r-project.org/web/packages/twitteR/vignettes/twitteR.pdf

slides from my R tutorial on Twitter text mining #rstats

http://jeffreybreen.wordpress.com/2011/07/04/twitter-text-mining-r-slides/

Generating graphs of retweets and @-messages on Twitter using R and Gephi

http://blog.ynada.com/339

But with Google Plus API now active

The Console lets you see and manage the following project information:

Activated APIs – Activate one or more APIs to enable traffic monitoring, filtering, and billing, and API-specific pages for your project. Read more about activating APIs here.
Traffic information – The Console reports traffic information for each activated API. Additionally, you can cap or filter usage by API. Read more about traffic reporting and request filtering here.
Billing information – When you activate billing, your activated APIs can exceed the courtesy usage quota. Usage fees are billed to the Google Checkout account that you specify. Read more about billing here.
Project keys – Each project is identified by either an API key or an OAuth 2.0 token. Use this key/token in your API requests to identify the project, in order to record usage data, enforce your filtering restrictions, and bill usage to the proper project. You can use the Console to generate or revoke API keys or OAuth 2.0 certificates to use in your application. Read more about keys here.
Team members – You can specify additional members with read, write, or ownership access to this project’s Console page. Read more about team members here.

https://code.google.com/apis/console/b/0/

Google+ API			Courtesy limit: 1,000 queries/day

Effective limits:

API	Per-User Limit	Used	Courtesy Limit
Google+ API	5.0 requests/second/user	0%	1,000 queries/day

http://developers.google.com/+/api/

API Calls

Most of the Google+ API follows a RESTful API design, meaning that you use standard HTTP methods to retrieve and manipulate resources. For example, to get the profile of a user, you might send an HTTP request like:

GET https://www.googleapis.com/plus/v1/people/userId

Common Parameters

Different API methods require parameters to be passed either as part of the URL path or as query parameters. Additionally, there are a few parameters that are common to all API endpoints. These are all passed as optional query parameters.

Parameter Name	Value	Description
`callback`	`string`	Specifies a JavaScript function that will be passed the response data for using the API with JSONP.
`fields`	`string`	Selector specifying which fields to include in a partial response.
`key`	`string`	API key. Your API key identifies your project and provides you with API access, quota, and reports. Required unless you provide an OAuth 2.0 token.
`access_token`	`string`	OAuth 2.0 token for the current user. Learn more about OAuth.
`prettyPrint`	`boolean`	If set to “true”, data output will include line breaks and indentation to make it more readable. If set to “false”, unnecessary whitespace is removed, reducing the size of the response. Defaults to “true”.
`userIp`	`string`	Identifies the IP address of the end user for whom the API call is being made. This allows per-user quotas to be enforced when calling the API from a server-side application. Learn more about Capping Usage.

Data Formats

Resources in the Google+ API are represented using JSON data formats. For example, retrieving a user’s profile may result in a response like:

{
  "kind": "plus#person",
  "id": "118051310819094153327",
  "displayName": "Chirag Shah",
  "url": "https://plus.google.com/118051310819094153327",
  "image": {
    "url": "https://lh5.googleusercontent.com/-XnZDEoiF09Y/AAAAAAAAAAI/AAAAAAAAYCI/7fow4a2UTMU/photo.jpg"
  }
}

Common Properties

While each type of resource will have its own unique representation, there are a number of common properties that are found in almost all resource representations.

Property Name	Value	Description
`displayName`	`string`	This is the name of the resource, suitable for displaying to a user.
`id`	`string`	This property uniquely identifies a resource. Every resource of a given kind will have a unique `id`. Even though an `id` may sometimes look like a number, it should always be treated as a string.
`kind`	`string`	This identifies what kind of resource a JSON object represents. This is particularly useful when programmatically determining how to parse an unknown object.
`url`	`string`	This is the primary URL, or permalink, for the resource.

Pagination

In requests that can respond with potentially large collections, such as Activities list, each response contains a limited number of items, set by maxResults(default: 20). Each response also contains a nextPageToken property. To obtain the next page of items, you pass this value of nextPageToken to the pageTokenproperty of the next request. Repeat this process to page through the full collection.

For example, calling Activities list returns a response with nextPageToken:

{
  "kind": "plus#activityFeed",
  "title": "Plus Public Activities Feed",
  "nextPageToken": "CKaEL",
  "items": [
    {
      "kind": "plus#activity",
      "id": "123456789",
      ...
    },
    ...
  ]
  ...
}

To get the next page of activities, pass the value of this token in with your next Activities list request:

https://www.googleapis.com/plus/v1/people/me/activities/public?pageToken=CKaEL

As before, the response to this request includes nextPageToken, which you can pass in to get the next page of results. You can continue this cycle to get new pages — for the last page, “nextPageToken” will be absent.

it would be interesting the first wave of analysis on this new social network and see if it is any different from others, if at all.

After all, an API is only as good as the analysis and applications that can be done on the data it provides

Cloud Computing using Python

I liked the new features in PiCloud , which is a cloud computing way to use Python. Python is increasingly popular as a computational language, and the cloud is the way where HW is headed to atleast as of 2011-12

http://www.picloud.com/

The new features allows you to publish your own functions as urls.

By publishing your Python functions to URLs. Why would you want to publish a function?

To call your Python functions from a programming language other than Python.

To use PiCloud from Google AppEngine, which does not support our native client library.

To easily setup a scalable RPC system.

Here’s a peek at the interface:

You publish a Python function

cloud.rest.publish(your_func, ‘myfunction’)

We give you a URL Back

https://api.picloud.com/r/2/myfunction/

You make an HTTP request using your method of choice to the URL

curl -k -u ‘key:secret_key’ https://api.picloud.com/r/2/myfunction/

It certainly is an interesting development and I am wondering how other languages can adopt this paradigm as well.

For R, as of now http://www.cloudnumbers.com/ seems to be the only player in the cloud.

It would be exciting to see more players in the cloud statistical analytical space.

Using Google Fusion Tables from #rstats

But after all that- I was quite happy to see Google Fusion Tables within Google Docs. Databases as a service ? Not quite but still quite good, and lets see how it goes.

https://www.google.com/fusiontables/DataSource?dsrcid=implicit&hl=en_US&pli=1

http://googlesystem.blogspot.com/2011/09/fusion-tables-new-google-docs-app.html

But what interests me more is

http://code.google.com/apis/fusiontables/docs/developers_guide.html

The Google Fusion Tables API is a set of statements that you can use to search for and retrieve Google Fusion Tables data, insert new data, update existing data, and delete data. The API statements are sent to the Google Fusion Tables server using HTTP GET requests (for queries) and POST requests (for inserts, updates, and deletes) from a Web client application. The API is language agnostic: you can write your program in any language you prefer, as long as it provides some way to embed the API calls in HTTP requests.

The Google Fusion Tables API does not provide the mechanism for submitting the GET and POST requests. Typically, you will use an existing code library that provides such functionality; for example, the code libraries that have been developed for the Google GData API. You can also write your own code to implement GET and POST requests.

Also see http://code.google.com/apis/fusiontables/docs/sample_code.html

Google Fusion Tables API Sample Code

Libraries

SQL API

Language	Library	Public repository	Samples
Python	Fusion Tables Python Client Library	fusion-tables-client-python/	Samples
PHP	Fusion Tables PHP Client Library	fusion-tables-client-php/	Samples

Featured Samples

An easy way to learn how to use an API can be to look at sample code. The table above provides links to some basic samples for each of the languages shown. This section highlights particularly interesting samples for the Fusion Tables API.

SQL API

Language	Featured samples	API version
cURL	Hello, cURLA simple example showing how to use curl to access Fusion Tables.	SQL API
Google Apps Script	Google Apps Script ExampleA simple example showing how to run the SHOW TABLES query in Google Spreadsheets.	SQL API
Java	Hello, WorldA simple walkthrough that shows how the Google Fusion Tables API statements work. OAuth example on fusion-tables-apiThe Google Fusion Tables team shows how OAuth authorization enables you to use the Google Fusion Tables API from a foreign web server with delegated authorization.	SQL API
Python	Docs List ExampleDemonstrates how to: List tables Set permissions on tables Move a table to a folder	Docs List API
Android (Java)	Basic Sample ApplicationDemo application shows how to create a crowd-sourcing application that allows users to report potholes and save the data to a Fusion Table.	SQL API
JavaScript – FusionTablesLayer	Using the FusionTablesLayer, you can display data on a Google Map The basics Basic FusionTablesLayer example Multiple Layers per map Heatmap Changing the layer query Using a select menu Using a slider On zoom Using checkboxes Autocomplete Textbox Spatial Queries (see also “Working with Geographic Data”) Find features within a radius Find features within a bounding box Find the n nearest features Dynamic Styling Basic Styling Demo Multiple Columns (with Legend!) Advanced Custom Info Windows Info Window with Driving Directions Custom Marker Icons Legend Pan and Zoom to User-Entered Address Also check out FusionTablesLayer Builder, which generates all the code necessary to include a Google Map with a Fusion Table Layer on your own website.	FusionTablesLayer, Google Maps API
JavaScript – Google Chart Tools	Using the Google Chart Tools, you can request data from Fusion Tables to use in visualizations or to display directly in an HTML page. Note: responses are limited to 500 rows of data. Basic Request Data Table Visualization Bar Chart Visualization Word Cloud Visualization	Google Chart Tools

External Resources

Google Fusion Tables is dedicated to providing code examples that illustrate typical uses, best practices, and really cool tricks. If you do something with the Google Fusion Tables API that you think would be interesting to others, please contact us at googletables-feedback@google.com about adding your code to our Examples page.

Shape EscapeA tool for uploading shape files to Fusion Tables.
GDALOGR Simple Feature Library has incorporated Fusion Tables as a supported format.
Arc2CloudArc2Earth has included support for upload to Fusion Tables via Arc2Cloud.
Java and Google App EngineODK Aggregate is an AppEngine application by the Open Data Kit team, uses Google Fusion Tables to store survey data that is collected through input forms on Android mobile phones. Notable code:
- Create a Fusion Table – FusionTableServlet.java
- Insert data into a Fusion Table – SubmissionFusionTable.java
R packageAndrei Lopatenko has written an R interface to Fusion Tables so Fusion Tables can be used as the data store for R.
RubySimon Tokumine has written a Ruby gem for access to Fusion Tables from Ruby.

Updated-You can use Google Fusion Tables from within R from http://andrei.lopatenko.com/rstat/fusion-tables.R

ft.connect <- function(username, password) {
  url = "https://www.google.com/accounts/ClientLogin";
  params = list(Email = username, Passwd = password, accountType="GOOGLE", service= "fusiontables", source = "R_client_API")
 connection = postForm(uri = url, .params = params)
 if (length(grep("error", connection, ignore.case = TRUE))) {
 	stop("The wrong username or password")
 	return ("")
 }
 authn = strsplit(connection, "\nAuth=")[[c(1,2)]]
 auth = strsplit(authn, "\n")[[c(1,1)]]
 return (auth)
}

ft.disconnect <- function(connection) {
}

ft.executestatement <- function(auth, statement) {
      url = "http://tables.googlelabs.com/api/query"
      params = list( sql = statement)
      connection.string = paste("GoogleLogin auth=", auth, sep="")
      opts = list( httpheader = c("Authorization" = connection.string))
      result = postForm(uri = url, .params = params, .opts = opts)
      if (length(grep("<HTML>\n<HEAD>\n<TITLE>Parse error", result, ignore.case = TRUE))) {
      	stop(paste("incorrect sql statement:", statement))
      }
      return (result)
}

ft.showtables <- function(auth) {
   url = "http://tables.googlelabs.com/api/query"
   params = list( sql = "SHOW TABLES")
   connection.string = paste("GoogleLogin auth=", auth, sep="")
   opts = list( httpheader = c("Authorization" = connection.string))
   result = getForm(uri = url, .params = params, .opts = opts)
   tables = strsplit(result, "\n")
   tableid = c()
   tablename = c()
   for (i in 2:length(tables[[1]])) {
     	str = tables[[c(1,i)]]
   	    tnames = strsplit(str,",")
   	    tableid[i-1] = tnames[[c(1,1)]]
   	    tablename[i-1] = tnames[[c(1,2)]]
   	}
   	tables = data.frame( ids = tableid, names = tablename)
    return (tables)
}

ft.describetablebyid <- function(auth, tid) {
   url = "http://tables.googlelabs.com/api/query"
   params = list( sql = paste("DESCRIBE", tid))
   connection.string = paste("GoogleLogin auth=", auth, sep="")
   opts = list( httpheader = c("Authorization" = connection.string))
   result = getForm(uri = url, .params = params, .opts = opts)
   columns = strsplit(result,"\n")
   colid = c()
   colname = c()
   coltype = c()
   for (i in 2:length(columns[[1]])) {
     	str = columns[[c(1,i)]]
   	    cnames = strsplit(str,",")
   	    colid[i-1] = cnames[[c(1,1)]]
   	    colname[i-1] = cnames[[c(1,2)]]
   	    coltype[i-1] = cnames[[c(1,3)]]
   	}
   	cols = data.frame(ids = colid, names = colname, types = coltype)
    return (cols)
}

ft.describetable <- function (auth, table_name) {
   table_id = ft.idfromtablename(auth, table_name)
   result = ft.describetablebyid(auth, table_id)
   return (result)
}

ft.idfromtablename <- function(auth, table_name) {
    tables = ft.showtables(auth)
	tableid = tables$ids[tables$names == table_name]
	return (tableid)
}

ft.importdata <- function(auth, table_name) {
	tableid = ft.idfromtablename(auth, table_name)
	columns = ft.describetablebyid(auth, tableid)
	column_spec = ""
	for (i in 1:length(columns)) {
		column_spec = paste(column_spec, columns[i, 2])
		if (i < length(columns)) {
			column_spec = paste(column_spec, ",", sep="")
		}
	}
	mdata = matrix(columns$names,
	              nrow = 1, ncol = length(columns),
	              dimnames(list(c("dummy"), columns$names)), byrow=TRUE)
	select = paste("SELECT", column_spec)
	select = paste(select, "FROM")
	select = paste(select, tableid)
	result = ft.executestatement(auth, select)
    numcols = length(columns)
    rows = strsplit(result, "\n")
    for (i in 3:length(rows[[1]])) {
    	row = strsplit(rows[[c(1,i)]], ",")
    	mdata = rbind(mdata, row[[1]])
   	}
   	output.frame = data.frame(mdata[2:length(mdata[,1]), 1])
   	for (i in 2:ncol(mdata)) {
   		output.frame = cbind(output.frame, mdata[2:length(mdata[,i]),i])
   	}
   	colnames(output.frame) = columns$names
    return (output.frame)
}

quote_value <- function(value, to_quote = FALSE, quote = "'") {
	 ret_value = ""
     if (to_quote) {
     	ret_value = paste(quote, paste(value, quote, sep=""), sep="")
     } else {
     	ret_value = value
     }
     return (ret_value)
}

converttostring <- function(arr, separator = ", ", column_types) {
	con_string = ""
	for (i in 1:(length(arr) - 1)) {
		value = quote_value(arr[i], column_types[i] != "number")
		con_string = paste(con_string, value)
	    con_string = paste(con_string, separator, sep="")
	}

    if (length(arr) >= 1) {
    	value = quote_value(arr[length(arr)], column_types[length(arr)] != "NUMBER")
    	con_string = paste(con_string, value)
    }
}

ft.exportdata <- function(auth, input_frame, table_name, create_table) {
	if (create_table) {
       create.table = "CREATE TABLE "
       create.table = paste(create.table, table_name)
       create.table = paste(create.table, "(")
       cnames = colnames(input_frame)
       for (columnname in cnames) {
         create.table = paste(create.table, columnname)
    	 create.table = paste(create.table, ":string", sep="")
    	   if (columnname != cnames[length(cnames)]){
    		  create.table = paste(create.table, ",", sep="")
           }
       }
      create.table = paste(create.table, ")")
      result = ft.executestatement(auth, create.table)
    }
    if (length(input_frame[,1]) > 0) {
    	tableid = ft.idfromtablename(auth, table_name)
	    columns = ft.describetablebyid(auth, tableid)
	    column_spec = ""
	    for (i in 1:length(columns$names)) {
		   column_spec = paste(column_spec, columns[i, 2])
		   if (i < length(columns$names)) {
			  column_spec = paste(column_spec, ",", sep="")
		   }
	    }
    	insert_prefix = "INSERT INTO "
    	insert_prefix = paste(insert_prefix, tableid)
    	insert_prefix = paste(insert_prefix, "(")
    	insert_prefix = paste(insert_prefix, column_spec)
    	insert_prefix = paste(insert_prefix, ") values (")
    	insert_suffix = ");"
    	insert_sql_big = ""
    	for (i in 1:length(input_frame[,1])) {
    		data = unlist(input_frame[i,])
    		values = converttostring(data, column_types  = columns$types)
    		insert_sql = paste(insert_prefix, values)
    		insert_sql = paste(insert_sql, insert_suffix) ;
    		insert_sql_big = paste(insert_sql_big, insert_sql)
    		if (i %% 500 == 0) {
    			ft.executestatement(auth, insert_sql_big)
    			insert_sql_big = ""
    		}
    	}
        ft.executestatement(auth, insert_sql_big)
    }
}

REST API

HTTPS

Please share:

Obfsproxy Instructions

Step 1: Install dependencies, obfsproxy, and Tor

Step 2a: If you’re the client…

Please share:

Elliptic-curve cryptography

Please share:

Please share:

Effective limits:

Common Parameters

Data Formats

Common Properties

Pagination

Please share:

Please share:

Google Fusion Tables API Sample Code

Libraries

SQL API

Featured Samples

SQL API

External Resources

Please share: