Updating R for Business Analytics

I just updated my R for Business Analytics site (http://rforanalytics.wordpress.com/ ). Additions are as below you can go to http://rforanalytics.wordpress.com/ for the complete list- What I am trying to do is build a kind of Task View dedicated to Business Analytics (aimed at Business Analyst and Data Scientists) with slightly better HTML ( maybe Markdown later on) and some visual appeal.


Interviews with R Community



Jeroen Ooms (OpenCPU)


Christian (Statace)


Ian Fellows (Deducer)


Jeff Allen (Trestle)


Gergely Darcozi (RApporter)


ODBC /Databases for R (including Hadoop and NoSQL)


R with MongoDB


This R package provides an interface to the NoSQL MongoDB database
using the MongoDB C-driver version 0.8


R with JSON


This package is a fork of the RJSONIO package 

R with CouchDB


R with MonetDB


MonetDB.R: Connect MonetDB to R

Allows to pull data from MonetDB into R

Cassandra with R


Neo4j with R


# Function for querying Neo4j from within R 
# from http://stackoverflow.com/questions/11188918/use-neo4j-with-r
query <- function(querystring) {
    h = basicTextGatherer()
    curlPerform(url = "localhost:7474/db/data/ext/CypherPlugin/graphdb/execute_query", 
        postfields = paste("query", curlEscape(querystring), 
        sep = "="), writefunction = h$update, verbose = FALSE)
    result <- fromJSON(h$value())
    data <- data.frame(t(sapply(result$data, unlist)))
    names(data) <- result$columns
# -------------------------------------- 
# import all data into neo4j
# --------------------------------------
nrow(venueDataset)  # number of venues


RHadoop consists of the following packages:

  • NEW! plyrmr – higher level plyr-like data processing for structured data, powered by rmr
  • rmr – functions providing Hadoop MapReduce functionality in R
  • rhdfs – functions providing file management of the HDFS from within R
  • rhbase – functions providing database management for the HBase distributed database from within R

R with Spark


SparkR is an R package that provides a light-weight frontend to use Apache Spark from R. SparkR exposes the Spark API through the RDD class and allows users to interactively run jobs from the R shell on a cluster.

R with Hive


RHive is an R extension facilitating distributed computing via HIVE query. RHive allows easy usage of HQL(Hive SQL) in R, and allows easy usage of R objects and R functions in Hive.


DDR with R – Rhipe (dormant)



A package to connect and run queries on Cloudera Impala (thanks to Mu Sigma)


Pig with R


Updates at Statace : Early access to make your own R in the browser GUI #rstats

The guys at Statace released major updates- I am particularly excited for the ability to create a custom GUI box for your own analysis or for sharing with consulting clients or students.

What does that mean? Basically they are making it a bit like R Commander Extensions- so if you have a package or analysis you would rather do visually (than code) – you can create a GUI module for it. The modular extension is quite cool in my opinion, but further proof will be in how well designed the pudding is.


Public sharing of results
Now you can share your analysis results for the world to see (example). Just click Share in the results pane.

Google Drive integration
We added integration with Google Drive. This makes collaboration and synchronization of large files even easier. Don’t forget we also support Dropbox. Just click the Connect to menu in the file manager.

Plots zoom and SVG export
Now you can open plots in a separate window that supports zoom in and zoom out. From it, you can also export to the SVG format which is ideal for printing. Just click the lens icon next to any plot.

Point-and-click PCA + data transformation without R knowledge
You can now carry out a PCA by just pointing and clicking though Analysis > Dimensional Analysis > Principal Components Analysis. We also added the Data menu which allows you to filter and sort datasets without any knowledge of R.

(Secret) Build your own visual dialog box to run R code
Do you have colleagues who don’t know R but need to use functionality you developed? Do you do consulting and want your customers to be able to run your models with point-and-click? Do you want to share a piece of R code with the world in an easy-to-use way?
StatAce now allows you to easily create a custom graphical interface for your R code. The process is entirely visual (no coding) and is what we use to build our own Data & Analysis menus (e.g. the bivariate correlation and linear regression dialog boxes). We are testing the functionality with a limited number of users, and their feedback has been great. Drop us a line at predict@statace.com to request early access.


Screenshot 2014-04-15 15.34.25




Comparing PIG with Hive SQL


a = LOAD 'nyse' USING org.apache.hcatalog.pig.HCatLoader();
b = FILTER a BY stock_symbol =='IBM' ;
c = group b all;
d = foreach c generate AVG(b.stock_volume);
dump d;

In SQL (Hive)

select AVG(stock_volume) from nyse where stock_symbol =="IBM"

(from HDP 2.0 Horton Sandbox Example)

Also see



Installing Scala on CentOS

Scala files are now here http://www.scala-lang.org/files/archive/

wget http://www.scala-lang.org/files/archive/scala-2.10.1.tgz
tar xvf scala-2.10.1.tgz
sudo mv scala-2.10.1 /usr/lib
sudo ln -s /usr/lib/scala-2.10.1 /usr/lib/scala
export PATH=$PATH:/usr/lib/scala/bin
scala -version




“NSA was not aware of the recently identified vulnerability in OpenSSL, the so-called Heartbleed vulnerability, until it was made public in a private sector cybersecurity report,” the Office of the Director of National Intelligence said in a statement to HuffPost. “Reports that say otherwise are wrong.”

Thanks to Mr Snowden , NSA slide showing how NSA had cracked SSL before???




In an NSA presentation slide on “Google Cloud Exploitation,” however, a sketch shows where the “Public Internet” meets the internal “Google Cloud” where their data resides. In hand-printed letters, the drawing notes that encryption is “added and removed here!” The artist adds a smiley face, a cheeky celebration of victory over Google security.



with added stuff

and from



ps- Dont be Evil Google. Just give all the data to the Government 😉

Interview Torch Browser

Here is an interview with Torch browser which embeds bit torrents within the Chromium open source browser and is compatible with all Chrome downloads. With a new 1 million users in 2014, Torch is lighting up the Internet at http://www.torchbrowser.com/

Ajay- Why did you create torch and how did you create it. your startup story:
Torch- Torch browser was a project born out of the demand for easy one-click media and file sharing. Rather than bogging down ones browser with multiple file sharing, torrenting, and media sharing software or extensions, we wanted to streamline these concepts into a one-click, user friendly experience. As well as eventually integrate the simplicity and one step solution in a way that would give the user instant access to their desired media or file share. For this reason we have advanced Torch to include features such as no wait torrenting, once the torrent starts to download, it can already be played. Drag & Drop, which allows for simple and quick searching, instant audio extraction, file download accelerator, Torch Music, and so much more.

What has been user feedback and stats:
The user feedback has been extremely positive. You can see this by following our Facebook page at http://facebook.com/torchbrowser, where we proudly boast over 1MM likes
We constantly take polls requesting user feedback and feature requests. Even though one cannot please everyone, we do our best to develop Torchaccording to the majorities requests. If enough people want it, we will build it. Because ultimately it’s the user’s browser, not ours.

How do you intend to monetize:
This is a valid and common question. Currently our focus is on filling the needs of our users, developing for multiple platforms, adding features and improving overall system performance. At the same time, we are discussing possible future premium services that will generate revenues.

Any plans to embed tor in torch, to enable relay server through single clickbrowser and what about chrome plugins like mafiaafire:
All Chromium plugins available at the Chrome web store are compatible with Torch. The issue of by-passing unwarranted blocks is under serious consideration by our team. We are all for the complete and public access to all things internet. If it’s on the net, and it’s intended to be shared, then it should be accessible to all. Censorship and unjustified blocks are not things that we feel comfortable with and we don’t believe that our users need to tolerate this.

What legal concerns if any did you encounter or plan for during the product design phase:
There are no legal concerns. Like other browsers and similar applications,Torch is a general purpose tool for browsing, sharing and downloading internet content.

Do you collect user data:
Torch does not collect any personal data. The only data collected by Torch is non-personally identifiable and used for technical and functional purposes only. You can read more concerning this in our privacy policy at:http://torchbrowser.com/privacy

Your views on NSA spying internationally:
This question is a bit vague and beyond the scope of our development being that system security should be managed via personal OS firewall settings. However, unwarranted intrusions into ones personal domain is not something that makes anyone feel comfortable, nor should it be tolerated. With that said, we are evaluating the benefits of including added browsing protection into a future build.

Again, thank you for the opportunity, and we hope that this has been helpful.

The Torch Team.

Writing for kdnuggets.com

I have been writing freelance for kdnuggets.com

Its a great learning for me to be a better writer especially for analytics and programming

These are a list of articles -interviews are in bold and I will keep updating this list when there are new additions

  1. Interview: Ingo Mierswa, RapidMiner CEO on “Predaction” and Key Turning Points June 2014

  2. Guide to Data Science Cheat Sheets 2014/05/12

  3. Book Review: Data Just Right 2014/04/03
  4. Exclusive Interview: Richard Socher, founder of etcML, Easy Text Classification Startup 2014/03/31
  5. Trifacta – Tackling Data Wrangling with Automation and Machine Learning 2014/03/17
  6. Paxata automates Data Preparation for Big Data Analytics 2014/03/07
  7. etcML Promises to Make Text Classification Easy  2014/03/05
  8. Wolfram Breakthrough Knowledge-based Programming Language – what it means for Data Science? 2014/03/02
%d bloggers like this: