BigML meets R #rstats

I am just checking the nice new R package created by BigML.com co-founder Justin Donaldson. The name of the new package is bigml, which can confuse a bit since there do exist many big suffix named packages in R (including biglm)

The bigml package is available at CRAN http://cran.r-project.org/web/packages/bigml/index.html

I just tweaked the code given at http://blog.bigml.com/2012/05/10/r-you-ready-for-bigml/ to include the ssl authentication code at http://www.brocktibert.com/blog/2012/01/19/358/

so it goes

> library(bigml)
Loading required package: RJSONIO
Loading required package: RCurl
Loading required package: bitops
Loading required package: plyr
> setCredentials(“bigml_username”,”API_key”)

# download the file needed for authentication
download.file(url="http://curl.haxx.se/ca/cacert.pem", destfile="cacert.pem")

# set the curl options
curl <- getCurlHandle()
options(RCurlOptions = list(capath = system.file("CurlSSL", "cacert.pem",
package = "RCurl"),
ssl.verifypeer = FALSE))
curlSetOpt(.opts = list(proxy = 'proxyserver:port'), curl = curl)

> iris.model = quickModel(iris, objective_field = ‘Species’)

Of course there are lots of goodies added here , so read the post yourself at http://blog.bigml.com/2012/05/10/r-you-ready-for-bigml/

Incidentally , the author of this R package (bigml) Justin Donalsdon who goes by name sudojudo at http://twitter.com/#!/sudojudo has also recently authored two other R packages including tsne at  http://cran.r-project.org/web/packages/tsne/index.html (tsne: T-distributed Stochastic Neighbor Embedding for R (t-SNE) -A “pure R” implementation of the t-SNE algorithm) and a GUI toolbar http://cran.r-project.org/web/packages/sculpt3d/index.html (sculpt3d is a GTK+ toolbar that allows for more interactive control of a dataset inside the RGL plot window. Controls for simple brushing, highlighting, labeling, and mouseMode changes are provided by point-and-click rather than through the R terminal interface)

This along with the fact the their recently released python bindings for bigml.com was one of the top news at Hacker News- shows bigML.com is going for some traction in bringing cloud computing, better software interfaces and data mining together!

Using Google Analytics with R

Some code to read in data from Google Analytics data. Some modifications include adding the SSL authentication code and modifying (in bold) the table.id parameter to choose correct website from a GA profile with many websites

The Google Analytics Package files can be downloaded from http://code.google.com/p/r-google-analytics/downloads/list

It provides access to Google Analytics data natively from the R Statistical Computing programming language. You can use this library to retrieve an R data.frame with Google Analytics data. Then perform advanced statistical analysis, like time series analysis and regressions.

Supported Features

  • Access to v2 of the Google Analytics Data Export API Data Feed
  • A QueryBuilder class to simplify creating API queries
  • API response is converted directly into R as a data.frame
  • Library returns the aggregates, and confidence intervals of the metrics, dynamically if they exist
  • Auto-pagination to return more than 10,000 rows of information by combining multiple data requests. (Upper Limit 1M rows)
  • Authorization through the ClientLogin routine
  • Access to all the profiles ids for the authorized user
  • Full documentation and unit tests
Code-

> library(XML)

>

> library(RCurl)

Loading required package: bitops

>

> #Change path name in the following to the folder you downloaded the Google Analytics Package

>

> source(“C:/Users/KUs/Desktop/CANADA/R/RGoogleAnalytics/R/RGoogleAnalytics.R”)

>

> source(“C:/Users/KUs/Desktop/CANADA/R/RGoogleAnalytics/R/QueryBuilder.R”)

> # download the file needed for authentication

> download.file(url=”http://curl.haxx.se/ca/cacert.pem&#8221;, destfile=”cacert.pem”)

trying URL ‘http://curl.haxx.se/ca/cacert.pem&#8217; Content type ‘text/plain’ length 215993 bytes (210 Kb) opened

URL downloaded 210 Kb

>

> # set the curl options

> curl <- getCurlHandle()

> options(RCurlOptions = list(capath = system.file(“CurlSSL”, “cacert.pem”,

+ package = “RCurl”),

+ ssl.verifypeer = FALSE))

> curlSetOpt(.opts = list(proxy = ‘proxyserver:port’), curl = curl)

An object of class “CURLHandle” Slot “ref”: <pointer: 0000000006AA2B70>

>

> # 1. Create a new Google Analytics API object

>

> ga <- RGoogleAnalytics()

>

> # 2. Authorize the object with your Google Analytics Account Credentials

>

> ga$SetCredentials(“USERNAME”, “PASSWORD”)

>

> # 3. Get the list of different profiles, to help build the query

>

> profiles <- ga$GetProfileData()

>

> profiles #Error Check to See if we get the right website

$profile AccountName ProfileName TableId

1 dudeofdata.com dudeofdata.com ga:44926237

2 knol.google.com knol.google.com ga:45564890

3 decisionstats.com decisionstats.com ga:46751946

$total.results

total.results

1 3

>

> # 4. Build the Data Export API query

>

> #Modify the start.date and end.date parameters based on data requirements

>

> #Modify the table.id at table.id = paste(profiles$profile[X,3]) to get the X th website in your profile

> # 4. Build the Data Export API query

> query <- QueryBuilder() > query$Init(start.date = “2012-01-09”, + end.date = “2012-03-20”, + dimensions = “ga:date”,

+ metrics = “ga:visitors”,

+ sort = “ga:date”,

+ table.id = paste(profiles$profile[3,3]))

>

>

> #5. Make a request to get the data from the API

>

> ga.data <- ga$GetReportData(query)

[1] “Executing query: https://www.google.com/analytics/feeds/data?start-date=2012%2D01%2D09&end-date=2012%2D03%2D20&dimensions=ga%3Adate&metrics=ga%3Avisitors&sort=ga%3Adate&ids=ga%3A46751946&#8221;

>

> #6. Look at the returned data

>

> str(ga.data)

List of 3

$ data :’data.frame’: 72 obs. of 2 variables: ..

$ ga:date : chr [1:72] “20120109” “20120110” “20120111” “20120112” … ..

$ ga:visitors: num [1:72] 394 405 381 390 323 47 169 67 94 89 …

$ aggr.totals :’data.frame’: 1 obs. of 1 variable: ..

$ aggregate.totals: num 28348

$ total.results: num 72

>

> head(ga.data$data)

ga:date ga:visitors

1 20120109 394

2 20120110 405

3 20120111 381

4 20120112 390

5 20120113 323

6 20120114 47 >

> #Plotting the Traffic >

> plot(ga.data$data[,2],type=”l”)

Update- Some errors come from pasting Latex directly to WordPress. Here is some code , made pretty-r in case you want to play with the GA api

library(XML)

library(RCurl)

#Change path name in the following to the folder you downloaded the Google Analytics Package 

source("C:/Users/KUs/Desktop/CANADA/R/RGoogleAnalytics/R/RGoogleAnalytics.R")

source("C:/Users/KUs/Desktop/CANADA/R/RGoogleAnalytics/R/QueryBuilder.R")
# download the file needed for authentication
download.file(url="http://curl.haxx.se/ca/cacert.pem", destfile="cacert.pem")

# set the curl options
curl <- getCurlHandle()
options(RCurlOptions = list(capath = system.file("CurlSSL", "cacert.pem",
package = "RCurl"),
ssl.verifypeer = FALSE))
curlSetOpt(.opts = list(proxy = 'proxyserver:port'), curl = curl)

# 1. Create a new Google Analytics API object 

ga <- RGoogleAnalytics()

# 2. Authorize the object with your Google Analytics Account Credentials 

ga$SetCredentials("ohri2007@gmail.com", "XXXXXXX")

# 3. Get the list of different profiles, to help build the query

profiles <- ga$GetProfileData()

profiles #Error Check to See if we get the right website

# 4. Build the Data Export API query 

#Modify the start.date and end.date parameters based on data requirements 

#Modify the table.id at table.id = paste(profiles$profile[X,3]) to get the X th website in your profile 
# 4. Build the Data Export API query
query <- QueryBuilder()
query$Init(start.date = "2012-01-09",
                   end.date = "2012-03-20",
                   dimensions = "ga:date",
                   metrics = "ga:visitors",
                   sort = "ga:date",
                   table.id = paste(profiles$profile[3,3]))

#5. Make a request to get the data from the API 

ga.data <- ga$GetReportData(query)

#6. Look at the returned data 

str(ga.data)

head(ga.data$data)

#Plotting the Traffic 

plot(ga.data$data[,2],type="l")

Created by Pretty R at inside-R.org

Reading Google Docs using R Curl

 

and finally I can download my Google spreadsheet file using-

require(RCurl)

download.file(url=”http://curl.haxx.se/ca/cacert.pem&#8221;, destfile=”cacert.pem”)

url=”https://docs.google.com/spreadsheet/pub?key=0AtYMMvghK2ytcldUcWNNZTltcXdIZUZ2MWU0R1NfeWc&output=csv&#8221;

b <- getURL(url,cainfo=”cacert.pem”)

write.table(b,quote = FALSE, sep = “,”,file=”test.csv”)

 

Previously (for past 3 5 7 hours)-

The codes at http://blog.revolutionanalytics.com/2011/09/using-google-spreadsheets-with-r-an-update.html dont work thanks to the SSL authentication issue. and the packages at [http://www.omegahat.org/RGoogleDocs/] and [https://r-forge.r-project.org/projects/rgoogledata/] are missing in action. 

So I mixed the codes at http://blog.revolutionanalytics.com/2009/09/how-to-use-a-google-spreadsheet-as-data-in-r.html and http://www.brocktibert.com/blog/2012/01/19/358/

and I get this error while using http://thebiobucket.blogspot.in/2012/03/r-function-to-read-data-from-google.html#more

Error in read.table(file = file, header = header, sep = sep, quote = quote, :
more columns than column names

 

Cyber Cold War

I try to write on cyber conflict without getting into the politics of why someone is hacking someone else. I always get beaten by someone in the comments thread when I write on politics.

But recent events have forced me to update my usual “how-to” cyber conflict to “why” cyber conflict. This is because of a terrorist attack in my hometown Delhi.

(updated-

http://www.nytimes.com/2012/02/14/world/middleeast/israeli-embassy-officials-attacked-in-india-and-georgia.html?_r=1&hp

Iran allegedly tried  (as per Israel) to assassinate the wife of Israeli Defence Attache in Delhi using a magnetic bomb, India as she went to school to pick up her kids, somebody else put a grenade in Israeli embassy car in Georgia which was found in time. 

Based on reports , initial work suggests the bomb was much more sophisticated than local terrorists, but the terrorists seemed to have some local recce work done.

India has 0 history of antisemitism but this is the second time Israelis have been targeted since 26/11 Mumbai attacks. India buys 12 % of oil annually from Iran (and refuses to join the oil embargo called by US and Europe)

Cyber Conflict is less painful than conflict, which is inevitable as long as mankind exists. Also the Western hemisphere needs a moon shot (cyber conflict could be the Sputnik like moment) and with declining and aging populations but better technology, Western Hemisphere govts need cyber conflict as they are running out of humans to fight their wars. Eastern govt. are even more obnoxious in using children for conflict propaganda, and corruption.

Last week CIA.gov website went down

This week Iranian govt is allegedly blocking https traffic on eve of Annual Revolution Day (what a coincidence!)

 

Some resources to help Internet users in Iran (or maybe this could be a dummy test for the big one – hacking the great firewall of China)

News from Hacker News-

http://news.ycombinator.com/item?id=3575029

 

I’m writing this to report the serious troubles we have regarding accessing Internet in Iran at the moment. Since Thursday Iranian government has shutted down the https protocol which has caused almost all google services (gmail, and google.com itself) to become inaccessible. Almost all websites that reply on Google APIs (like wolfram alpha) won’t work. Accessing to any website that replies on https (just imaging how many websites use this protocol, from Arch Wiki to bank websites). Also accessing many proxies is also impossible. There are almost no official reports on this and with many websites and my email accounts restricted I can just confirm this based on my own and friends experience. I have just found one report here:

Iran Shut Down Gmail , Google , Yahoo and sites using “Https” Protocol

The reason for this horrible shutdown is that the Iranian regime celebrates 1979 Islamic revolution tomorrow.

I just wanted to let you guys know about this. If you have any solution regarding bypassing this restriction please help!

 

The boys at Tor think they can help-

but its not so elegant, as I prefer creating a  batch file rather than explain coding to newbies. 

this is still getting to better and easier interfaces

https://www.torproject.org/projects/obfsproxy-instructions.html.en

Obfsproxy Instructions

client torrc

Step 1: Install dependencies, obfsproxy, and Tor

 

You will need a C compiler (gcc), the autoconf and autotools build system, the git revision control system, pkg-config andlibtoollibevent-2 and its headers, and the development headers of OpenSSL.

On Debian testing or Ubuntu oneiric, you could do:
# apt-get install autoconf autotools-dev gcc git pkg-config libtool libevent-2.0-5 libevent-dev libevent-openssl-2.0-5 libssl-dev

If you’re on a more stable Linux, you can either try our experimental backport libevent2 debs or build libevent2 from source.

Clone obfsproxy from its git repository:
$ git clone https://git.torproject.org/obfsproxy.git
The above command should create and populate a directory named ‘obfsproxy’ in your current directory.

Compile obfsproxy:
$ cd obfsproxy
$ ./autogen.sh && ./configure && make

Optionally, as root install obfsproxy in your system:
# make install

If you prefer not to install obfsproxy as root, you can instead just modify the Transport lines in your torrc file (explained below) to point to your obfsproxy binary.

You will need Tor 0.2.3.11-alpha or later.


Step 2a: If you’re the client…

 

First, you need to learn the address of a bridge that supports obfsproxy. If you don’t know any, try asking a friend to set one up for you. Then the appropriate lines to your tor configuration file:

UseBridges 1
Bridge obfs2 128.31.0.34:1051
ClientTransportPlugin obfs2 exec /usr/local/bin/obfsproxy --managed

Don’t forget to replace 128.31.0.34:1051 with the IP address and port that the bridge’s obfsproxy is listening on.
 Congratulations! Your traffic should now be obfuscated by obfsproxy. You are done! You can now start using Tor.

For old fashioned tunnel creation under Seas of English Channel-

http://dag.wieers.com/howto/ssh-http-tunneling/

Tunneling SSH over HTTP(S)
This document explains how to set up an Apache server and SSH client to allow tunneling SSH over HTTP(S). This can be useful on restricted networks that either firewall everything except HTTP traffic (tcp/80,tcp/443) or require users to use a local (HTTP) proxy.
A lot of people asked why doing it like this if you can just make sshd listen on port 443. Well, that might work if your environment is not hardened like I have seen at several companies, but this setup has a few advantages.

  • You can proxy to anywhere (see the Proxy directive in Apache) based on names
  • You can proxy to any port you like (see the AllowCONNECT directive in Apache)
  • It works even when there is a layer-7 protocol firewall
  • If you enable proxytunnel ssl support, it is indistinguishable from real SSL traffic
  • You can come up with nice hostnames like ‘downloads.yourdomain.com’ and ‘pictures.yourdomain.com’ and for normal users these will look like normal websites when visited.
  • There are many possibilities for doing authentication further along the path
  • You can do proxy-bouncing to the n-th degree to mask where you’re coming from or going to (however this requires more changes to proxytunnel, currently I only added support for one remote proxy)
  • You do not have to dedicate an IP-address for sshd, you can still run an HTTPS site

Related-

http://opensourceandhackystuff.blogspot.in/2012/02/captive-portal-security-part-1.html

and some crypto for young people

http://users.telenet.be/d.rijmenants/en/onetimepad.htm

 

Me- What am I doing about it? I am just writing poems on hacking at http://poemsforkush.com

Note on Internet Privacy (Updated)and a note on DNSCrypt

I noticed the brouaha on Google’s privacy policy. I am afraid that social networks capture much more private information than search engines (even if they integrate my browser history, my social network, my emails, my search engine keywords) – I am still okay. All they are going to do is sell me better ads (maybe than just flood me with ads hoping to get a click). Of course Microsoft should take it one step forward and capture data from my desktop as well for better ads, that would really complete the curve. In any case , with the Patriot Act, most information is available to the Government anyway.

But it does make sense to have an easier to understand privacy policy, and one of my disappointments is the complete lack of visual appeal in such notices. Make things simple as possible, but no simpler, as Al-E said.

 

Privacy activists forget that ads run on models built on AGGREGATED data, and most models are scored automatically. Unless you do something really weird and fake like, chances are the data pertaining to you gets automatically collected, algorithmic-ally aggregated, then modeled and scored, and a corresponding ad to your score, or segment is shown to you. Probably no human eyes see raw data (but big G can clarify that)

 

( I also noticed Google gets a lot of free advice from bloggers. hey, if you were really good at giving advice to Google- they WILL hire you !)

on to another tool based (than legalese based approach to privacy)

I noticed tools like DNSCrypt increase internet security, so that all my integrated data goes straight to people I am okay with having it (ad sellers not governments!)

Unfortunately it is Mac Only, and I will wait for Windows or X based tools for a better review. I noticed some lag in updating these tools , so I can only guess that the boys of Baltimore have been there, so it is best used for home users alone.

 

Maybe they can find a chrome extension for DNS dummies.

http://www.opendns.com/technology/dnscrypt/

Why DNSCrypt is so significant

In the same way the SSL turns HTTP web traffic into HTTPS encrypted Web traffic, DNSCrypt turns regular DNS traffic into encrypted DNS traffic that is secure from eavesdropping and man-in-the-middle attacks.  It doesn’t require any changes to domain names or how they work, it simply provides a method for securely encrypting communication between our customers and our DNS servers in our data centers.  We know that claims alone don’t work in the security world, however, so we’ve opened up the source to our DNSCrypt code base and it’s available onGitHub.

DNSCrypt has the potential to be the most impactful advancement in Internet security since SSL, significantly improving every single Internet user’s online security and privacy.

and

http://dnscurve.org/crypto.html

The DNSCurve project adds link-level public-key protection to DNS packets. This page discusses the cryptographic tools used in DNSCurve.

Elliptic-curve cryptography

DNSCurve uses elliptic-curve cryptography, not RSA.

RSA is somewhat older than elliptic-curve cryptography: RSA was introduced in 1977, while elliptic-curve cryptography was introduced in 1985. However, RSA has shown many more weaknesses than elliptic-curve cryptography. RSA’s effective security level was dramatically reduced by the linear sieve in the late 1970s, by the quadratic sieve and ECM in the 1980s, and by the number-field sieve in the 1990s. For comparison, a few attacks have been developed against some rare elliptic curves having special algebraic structures, and the amount of computer power available to attackers has predictably increased, but typical elliptic curves require just as much computer power to break today as they required twenty years ago.

IEEE P1363 standardized elliptic-curve cryptography in the late 1990s, including a stringent list of security criteria for elliptic curves. NIST used the IEEE P1363 criteria to select fifteen specific elliptic curves at five different security levels. In 2005, NSA issued a new “Suite B” standard, recommending the NIST elliptic curves (at two specific security levels) for all public-key cryptography and withdrawing previous recommendations of RSA.

Some specific types of elliptic-curve cryptography are patented, but DNSCurve does not use any of those types of elliptic-curve cryptography.

 

Using #Rstats for online data access

There are multiple packages in R to read data straight from online datasets.
These are as follows- Continue reading “Using #Rstats for online data access”

Facebook to Google Plus Migration

and there is a new tool on that already but you are on your own if your data gets redirected. Does Chrome take legal liability for malware extensions? Dunno-and yes it works on Chrome alone (at the point of speaking)

https://chrome.google.com/webstore/detail/ficlccidpkaiepnnboobcmafnnfoomga

 

Facebook Friend Exporter
Logo 

Facebook Friend Exporter
Verified author: mohamedmansour.com
Free
Get *your* data contact out of Facebook to Google Contacts or CSV, whether they want you to or not.
103 ratings
5,527 users
Install
Description
Get *your* data contact out of Facebook, whether they want you to or not. You gave them your friends and allowed them to store that data, and you have right to take it back out! Facebook doesn't own my friends. Only available in English Facebook. Any other language will not work.

SOURCE CODE: http://goo.gl/VtRCl (GitHub) fb-exporter

PRE NOTICE:
 1 - Must have English version of Facebook for this to work (you can switch)
 2 - Do not enable SSL for Facebook use HTTP not HTTPS
 3 - If you need any help running this, contact me. Commenting below will be lost.
 4 - An "Export" button will appear on Facebooks toolbar after refresh once installed.
 5 - Please disable all Facebook Extensions that you have downloaded, many of them affect the page. For example "Better Facebook" breaks this extension.

This extension will allow you to get your friends information that they shared to you: Continue reading "Facebook to Google Plus Migration"