Google Plus API- statistical text mining anyone

For the past year and two I have noticed a lot of statistical analysis using #rstats /R on unstructured text generated in real time by the social network Twitter. From an analytic point of view , Google Plus is an interesting social network , as it is a social network that is new and arrived after the analytic tools are relatively refined. It is thus an interesting use case for evolution of people behavior measured globally AFTER analytic tools in text mining are evolved and we can thus measure how people behave and that behavior varies as the social network and its user interface evolves.

And it would also be  a nice benchmark to do sentiment analysis across multiple social networks.

Some interesting use cases of using Twitter that have been used in R.

  • Using R to search Twitter for analysis
http://www.franklincenterhq.org/2429/using-r-to-search-twitter-for-analysis/
  • Text Data Mining With Twitter And R
  • TWITTER FROM R… SURE, WHY NOT!
  • A package called TwitteR
  • slides from my R tutorial on Twitter text mining #rstats
  • Generating graphs of retweets and @-messages on Twitter using R and Gephi
But with Google Plus API now active

The Console lets you see and manage the following project information:

  • Activated APIs – Activate one or more APIs to enable traffic monitoring, filtering, and billing, and API-specific pages for your project. Read more about activating APIs here.
  • Traffic information – The Console reports traffic information for each activated API. Additionally, you can cap or filter usage by API. Read more about traffic reporting and request filtering here.
  • Billing information – When you activate billing, your activated APIs can exceed the courtesy usage quota. Usage fees are billed to the Google Checkout account that you specify. Read more about billing here.
  • Project keys – Each project is identified by either an API key or an OAuth 2.0 token. Use this key/token in your API requests to identify the project, in order to record usage data, enforce your filtering restrictions, and bill usage to the proper project. You can use the Console to generate or revoke API keys or OAuth 2.0 certificates to use in your application. Read more about keys here.
  • Team members – You can specify additional members with read, write, or ownership access to this project’s Console page. Read more about team members here.
Google+ API Courtesy limit: 1,000 queries/day

Effective limits:

API Per-User Limit Used Courtesy Limit
Google+ API 5.0 requests/second/user 0% 1,000 queries/day
API Calls
Most of the Google+ API follows a RESTful API design, meaning that you use standard HTTP methods to retrieve and manipulate resources. For example, to get the profile of a user, you might send an HTTP request like:

GET https://www.googleapis.com/plus/v1/people/userId

Common Parameters

Different API methods require parameters to be passed either as part of the URL path or as query parameters. Additionally, there are a few parameters that are common to all API endpoints. These are all passed as optional query parameters.

Parameter Name

Value

Description

callback

string

Specifies a JavaScript function that will be passed the response data for using the API with JSONP.

fields

string

Selector specifying which fields to include in a partial response.

key

string

API key. Your API key identifies your project and provides you with API access, quota, and reports. Required unless you provide an OAuth 2.0 token.

access_token

string

OAuth 2.0 token for the current user. Learn more about OAuth.

prettyPrint

boolean

If set to “true”, data output will include line breaks and indentation to make it more readable. If set to “false”, unnecessary whitespace is removed, reducing the size of the response. Defaults to “true”.

userIp

string

Identifies the IP address of the end user for whom the API call is being made. This allows per-user quotas to be enforced when calling the API from a server-side application. Learn more about Capping Usage.

Data Formats

Resources in the Google+ API are represented using JSON data formats. For example, retrieving a user’s profile may result in a response like:

{
  "kind": "plus#person",
  "id": "118051310819094153327",
  "displayName": "Chirag Shah",
  "url": "https://plus.google.com/118051310819094153327",
  "image": {
    "url": "https://lh5.googleusercontent.com/-XnZDEoiF09Y/AAAAAAAAAAI/AAAAAAAAYCI/7fow4a2UTMU/photo.jpg"
  }
}

Common Properties

While each type of resource will have its own unique representation, there are a number of common properties that are found in almost all resource representations.

Property Name

Value

Description

displayName

string

This is the name of the resource, suitable for displaying to a user.

id

string

This property uniquely identifies a resource. Every resource of a given kind will have a unique id. Even though an id may sometimes look like a number, it should always be treated as a string.

kind

string

This identifies what kind of resource a JSON object represents. This is particularly useful when programmatically determining how to parse an unknown object.

url

string

This is the primary URL, or permalink, for the resource.

Pagination

In requests that can respond with potentially large collections, such as Activities list, each response contains a limited number of items, set by maxResults(default: 20). Each response also contains a nextPageToken property. To obtain the next page of items, you pass this value of nextPageToken to the pageTokenproperty of the next request. Repeat this process to page through the full collection.

For example, calling Activities list returns a response with nextPageToken:

{
  "kind": "plus#activityFeed",
  "title": "Plus Public Activities Feed",
  "nextPageToken": "CKaEL",
  "items": [
    {
      "kind": "plus#activity",
      "id": "123456789",
      ...
    },
    ...
  ]
  ...
}

To get the next page of activities, pass the value of this token in with your next Activities list request:

https://www.googleapis.com/plus/v1/people/me/activities/public?pageToken=CKaEL

As before, the response to this request includes nextPageToken, which you can pass in to get the next page of results. You can continue this cycle to get new pages — for the last page, “nextPageToken” will be absent.

 

it would be interesting the first wave of analysis on this new social network and see if it is any different from others, if at all.
After all, an API is only as good as the analysis and applications  that can be done on the data it provides

 

+ 1 your website -updated

how to add the all new plus one button to your own website

just go here.

submit form

wait

https://services.google.com/fb/forms/plusonesignup/

also see https://profiles.google.com/u/0/+1/personalization/

or read the hack here

http://www.yvoschaap.com/weblog/the_google_1_button_discovered

The buttons does exists because there is personalisation option available refering to non-Google sites.

Google claims the button is “coming soon” but I couldn’t wait, so I looked around the code, and looked some more, untill I found the button endpoint hiding from me, obfuscated, in a stray piece of javascript.

Check out these live Google +1 buttons:

at

http://fanity.com/


			

Protovis a graphical toolkit for visualization

I just found about a new data visualization tool called Protovis http://vis.stanford.edu/protovis/ex/

Protovis composes custom views of data with simple marks such as bars and dots. Unlike low-level graphics libraries that quickly become tedious for visualization, Protovis defines marks through dynamic properties that encode data, allowing inheritancescales and layouts to simplify construction.

Protovis is free and open-source and is a Stanford project. It has been used in web interface R Node (which I will talk later )

http://squirelove.net/r-node/doku.php

Conventional

While Protovis is designed for custom visualization, it is still easy to create many standard chart types. These simpler examples serve as an introduction to the language, demonstrating key abstractions such as quantitative and ordinal scales, while hinting at more advanced features, including stack layout.

Custom

Many charting libraries provide stock chart designs, but offer only limited customization; Protovis excels at custom visualization design through a concise representation and precise control over graphical marks. These examples, including a few recreations of unusual historical designs, demonstrate the language’s expressiveness.

 

 

Try Protovis today 🙂 http://vis.stanford.edu/protovis/

It uses JavaScript and SVG for web-native visualizations; no plugin required (though you will need a modern web browser)! Although programming experience is helpful, Protovis is mostly declarative and designed to be learned by example.

Opera’s Minimalistic Peer to peer OS Browser

mijn Opera Unite Fridge
Image by Jaap Stronks via Flickr

Yes Opera is a browser but you may as well call it an OS. With an uncluttered design, some mind bending Opera Unite Peer to Peer features (in a browser!) withhttp://unite.opera.com/applications/, and nifty widgets- try singing some Opera. I really dont know how browsers make money, especially since they are suing each other all the time, but well- heres to more choice – if you don’t want a corporation owned browser lusting to sell your leaked privacy data to Don Draper- Opera is a good choice- much better than Sea Monkey and the Fox .

I really liked the option to make my own web server in 2 clicks,and share stuff. The bit trorrent support is really nice but I wonder if there was any Scandinavian brotherly ports in bit torrent sharing 😉 , me hearties

Opera's Minimalistic Peer to peer OS Browser

mijn Opera Unite Fridge
Image by Jaap Stronks via Flickr

Yes Opera is a browser but you may as well call it an OS. With an uncluttered design, some mind bending Opera Unite Peer to Peer features (in a browser!) withhttp://unite.opera.com/applications/, and nifty widgets- try singing some Opera. I really dont know how browsers make money, especially since they are suing each other all the time, but well- heres to more choice – if you don’t want a corporation owned browser lusting to sell your leaked privacy data to Don Draper- Opera is a good choice- much better than Sea Monkey and the Fox .

I really liked the option to make my own web server in 2 clicks,and share stuff. The bit trorrent support is really nice but I wonder if there was any Scandinavian brotherly ports in bit torrent sharing 😉 , me hearties

Cloud Computing with R

Illusion of Depth and Space (4/22) - Rotating ...
Image by Dominic's pics via Flickr

Here is a short list of resources and material I put together as starting points for R and Cloud Computing It’s a bit messy but overall should serve quite comprehensively.

Cloud computing is a commonly used expression to imply a generational change in computing from desktop-servers to remote and massive computing connections,shared computers, enabled by high bandwidth across the internet.

As per the National Institute of Standards and Technology Definition,
Cloud computing is a model for enabling convenient, on-demand network access to a shared pool of configurable computing resources (e.g., networks, servers, storage, applications, and services) that can be rapidly provisioned and released with minimal management effort or service provider interaction.

(Citation: The NIST Definition of Cloud Computing

Authors: Peter Mell and Tim Grance
Version 15, 10-7-09
National Institute of Standards and Technology, Information Technology Laboratory
http://csrc.nist.gov/groups/SNS/cloud-computing/cloud-def-v15.doc)

R is an integrated suite of software facilities for data manipulation, calculation and graphical display.

From http://cran.r-project.org/doc/FAQ/R-FAQ.html#R-Web-Interfaces

R Web Interfaces

Rweb is developed and maintained by Jeff Banfield. The Rweb Home Page provides access to all three versions of Rweb—a simple text entry form that returns output and graphs, a more sophisticated JavaScript version that provides a multiple window environment, and a set of point and click modules that are useful for introductory statistics courses and require no knowledge of the R language. All of the Rweb versions can analyze Web accessible datasets if a URL is provided.
The paper “Rweb: Web-based Statistical Analysis”, providing a detailed explanation of the different versions of Rweb and an overview of how Rweb works, was published in the Journal of Statistical Software (http://www.jstatsoft.org/v04/i01/).

Ulf Bartel has developed R-Online, a simple on-line programming environment for R which intends to make the first steps in statistical programming with R (especially with time series) as easy as possible. There is no need for a local installation since the only requirement for the user is a JavaScript capable browser. See http://osvisions.com/r-online/ for more information.

Rcgi is a CGI WWW interface to R by MJ Ray. It had the ability to use “embedded code”: you could mix user input and code, allowing the HTMLauthor to do anything from load in data sets to enter most of the commands for users without writing CGI scripts. Graphical output was possible in PostScript or GIF formats and the executed code was presented to the user for revision. However, it is not clear if the project is still active.

Currently, a modified version of Rcgi by Mai Zhou (actually, two versions: one with (bitmap) graphics and one without) as well as the original code are available from http://www.ms.uky.edu/~statweb/.

CGI-based web access to R is also provided at http://hermes.sdu.dk/cgi-bin/go/. There are many additional examples of web interfaces to R which basically allow to submit R code to a remote server, see for example the collection of links available from http://biostat.mc.vanderbilt.edu/twiki/bin/view/Main/StatCompCourse.

David Firth has written CGIwithR, an R add-on package available from CRAN. It provides some simple extensions to R to facilitate running R scripts through the CGI interface to a web server, and allows submission of data using both GET and POST methods. It is easily installed using Apache under Linux and in principle should run on any platform that supports R and a web server provided that the installer has the necessary security permissions. David’s paper “CGIwithR: Facilities for Processing Web Forms Using R” was published in the Journal of Statistical Software (http://www.jstatsoft.org/v08/i10/). The package is now maintained by Duncan Temple Lang and has a web page athttp://www.omegahat.org/CGIwithR/.

Rpad, developed and actively maintained by Tom Short, provides a sophisticated environment which combines some of the features of the previous approaches with quite a bit of JavaScript, allowing for a GUI-like behavior (with sortable tables, clickable graphics, editable output), etc.
Jeff Horner is working on the R/Apache Integration Project which embeds the R interpreter inside Apache 2 (and beyond). A tutorial and presentation are available from the project web page at http://biostat.mc.vanderbilt.edu/twiki/bin/view/Main/RApacheProject.

Rserve is a project actively developed by Simon Urbanek. It implements a TCP/IP server which allows other programs to use facilities of R. Clients are available from the web site for Java and C++ (and could be written for other languages that support TCP/IP sockets).

OpenStatServer is being developed by a team lead by Greg Warnes; it aims “to provide clean access to computational modules defined in a variety of computational environments (R, SAS, Matlab, etc) via a single well-defined client interface” and to turn computational services into web services.

Two projects use PHP to provide a web interface to R. R_PHP_Online by Steve Chen (though it is unclear if this project is still active) is somewhat similar to the above Rcgi and Rweb. R-php is actively developed by Alfredo Pontillo and Angelo Mineo and provides both a web interface to R and a set of pre-specified analyses that need no R code input.

webbioc is “an integrated web interface for doing microarray analysis using several of the Bioconductor packages” and is designed to be installed at local sites as a shared computing resource.

Rwui is a web application to create user-friendly web interfaces for R scripts. All code for the web interface is created automatically. There is no need for the user to do any extra scripting or learn any new scripting techniques. Rwui can also be found at http://rwui.cryst.bbk.ac.uk.

Finally, the R.rsp package by Henrik Bengtsson introduces “R Server Pages”. Analogous to Java Server Pages, an R server page is typically HTMLwith embedded R code that gets evaluated when the page is requested. The package includes an internal cross-platform HTTP server implemented in Tcl, so provides a good framework for including web-based user interfaces in packages. The approach is similar to the use of the brew package withRapache with the advantage of cross-platform support and easy installation.

Also additional R Cloud Computing Use Cases
http://wwwdev.ebi.ac.uk/Tools/rcloud/

ArrayExpress R/Bioconductor Workbench

Remote access to R/Bioconductor on EBI’s 64-bit Linux Cluster

Start the workbench by downloading the package for your operating system (Macintosh or Windows), or via Java Web Start, and you will get access to an instance of R running on one of EBI’s powerful machines. You can install additional packages, upload your own data, work with graphics and collaborate with colleagues, all as if you are running R locally, but unlimited by your machine’s memory, processor or data storage capacity.

  • Most up-to-date R version built for multicore CPUs
  • Access to all Bioconductor packages
  • Access to our computing infrastructure
  • Fast access to data stored in EBI’s repositories (e.g., public microarray data in ArrayExpress)

Using R Google Docs
http://www.omegahat.org/RGoogleDocs/run.pdf
It uses the XML and RCurl packages and illustrates that it is relatively quick and easy
to use their primitives to interact with Web services.

Using R with Amazon
Citation
http://rgrossman.com/2009/05/17/running-r-on-amazons-ec2/

Amazon’s EC2 is a type of cloud that provides on demand computing infrastructures called an Amazon Machine Images or AMIs. In general, these types of cloud provide several benefits:

  • Simple and convenient to use. An AMI contains your applications, libraries, data and all associated configuration settings. You simply access it. You don’t need to configure it. This applies not only to applications like R, but also can include any third-party data that you require.
  • On-demand availability. AMIs are available over the Internet whenever you need them. You can configure the AMIs yourself without involving the service provider. You don’t need to order any hardware and set it up.
  • Elastic access. With elastic access, you can rapidly provision and access the additional resources you need. Again, no human intervention from the service provider is required. This type of elastic capacity can be used to handle surge requirements when you might need many machines for a short time in order to complete a computation.
  • Pay per use. The cost of 1 AMI for 100 hours and 100 AMI for 1 hour is the same. With pay per use pricing, which is sometimes called utility pricing, you simply pay for the resources that you use.

Connecting to R on Amazon EC2- Detailed tutorials
Ubuntu Linux version
https://decisionstats.com/2010/09/25/running-r-on-amazon-ec2/
and Windows R version
https://decisionstats.com/2010/10/02/running-r-on-amazon-ec2-windows/

Connecting R to Data on Google Storage and Computing on Google Prediction API
https://github.com/onertipaday/predictionapirwrapper
R wrapper for working with Google Prediction API

This package consists in a bunch of functions allowing the user to test Google Prediction API from R.
It requires the user to have access to both Google Storage for Developers and Google Prediction API:
see
http://code.google.com/apis/storage/ and http://code.google.com/apis/predict/ for details.

Example usage:

#This example requires you had previously created a bucket named data_language on your Google Storage and you had uploaded a CSV file named language_id.txt (your data) into this bucket – see for details
library(predictionapirwrapper)

and Elastic R for Cloud Computing
http://user2010.org/tutorials/Chine.html

Abstract

Elastic-R is a new portal built using the Biocep-R platform. It enables statisticians, computational scientists, financial analysts, educators and students to use cloud resources seamlessly; to work with R engines and use their full capabilities from within simple browsers; to collaborate, share and reuse functions, algorithms, user interfaces, R sessions, servers; and to perform elastic distributed computing with any number of virtual machines to solve computationally intensive problems.
Also see Karim Chine’s http://biocep-distrib.r-forge.r-project.org/

R for Salesforce.com

At the point of writing this, there seem to be zero R based apps on Salesforce.com This could be a big opportunity for developers as both Apex and R have similar structures Developers could write free code in R and charge for their translated version in Apex on Salesforce.com

Force.com and Salesforce have many (1009) apps at
http://sites.force.com/appexchange/home for cloud computing for
businesses, but very few forecasting and statistical simulation apps.

Example of Monte Carlo based app is here
http://sites.force.com/appexchange/listingDetail?listingId=a0N300000016cT9EAI#

These are like iPhone apps except meant for business purposes (I am
unaware if any university is offering salesforce.com integration
though google apps and amazon related research seems to be on)

Force.com uses a language called Apex  and you can see
http://wiki.developerforce.com/index.php/App_Logic and
http://wiki.developerforce.com/index.php/An_Introduction_to_Formulas
Apex is similar to R in that is OOPs

SAS Institute has an existing product for taking in Salesforce.com data.

A new SAS data surveyor is
available to access data from the Customer Relationship Management
(CRM) software vendor Salesforce.com. at
http://support.sas.com/documentation/cdl/en/whatsnew/62580/HTML/default/viewer.htm#datasurveyorwhatsnew902.htm)

Personal Note-Mentioning SAS in an email to a R list is a big no-no in terms of getting a response and love. Same for being careless about which R help list to email (like R devel or R packages or R help)

For python based cloud see http://pi-cloud.com

Search, Sports,Social Media,SlideShares, Scribd

An image of a house fly eye surface by using S...
Image via Wikipedia

Some slideshare.net presentations I really liked.

A tutorial on SEO and SEM-

Carole Ann Matignon deals with optimization and scheduling, rules in the…….NFL!

 

 

Carole, We are waiting for the sequel on  analytics on football and the beer game.

Social Media Screw-Ups

Social Media doesnt matter at all- Social Media matters a lot- Still undecided? Take a look

Slideshare is a great VISUAL interface on sharing content. I liked Google Docs embedding as well, but Matt Mullenberg and Matt Cutts seemed to have stopped talking. Mullenberg is going like Zuckenberg, not willing to align with Sergey Mikhaylovich Brin. or maybe they are afraid of Big Brother Brin. Google loves Java and Javascript (even when they are getting sued for it)- while Matt M  hates it- bad for RIA I guess.

Scribd also is a great way to share content- and probably is small enough for. WordPress.com to allow embedding

Thats the reason why I sometimes prefer Scribd for sharing my poetry to Slideshare and Google Docs. Also I like the enhanced analytics and the much easier and evolved interface for reading. Slideshare is much more successful than Scribd because it is open to sharing with everyone- scribd tries to get you to register …;)

(* Also see MIT’s beer game at http://beergame.mit.edu/ which is ahem different from Duke’s beer games).