Interview Jeroen Ooms OpenCPU #rstats

Below an interview with Jeroen Ooms, a pioneer in R and web development. Jeroen contributes to R by developing packages and web applications for multiple projects.

jeroen

Ajay- What are you working on these days?
Jeroen- My research revolves around challenges and opportunities of using R in embedded applications and scalable systems. After developing numerous web applications, I started the OpenCPU project about 1.5 year ago, as a first attempt at a complete framework for proper integration of R in web services. As I work on this, I run into challenges that shape my research, and sometimes become projects in their own. For example, the RAppArmor package provides the security framework for OpenCPU, but can be used for other purposes as well. RAppArmor interfaces to some methods in the Linux kernel, related to setting security and resource limits. The github page contains the source code, installation instructions, video demo’s, and a draft of a paper for the journal of statistical software. Another example of a problem that appeared in OpenCPU is that applications that used to work were breaking unexpectedly later on due to changes in dependency packages on CRAN. This is actually a general problem that affects almost all R users, as it compromises reliability of CRAN packages and reproducibility of results. In a paper (forthcoming in The R Journal), this problem is discussed in more detail and directions for improvement are suggested. A preprint of the paper is available on arXiv: http://arxiv.org/abs/1303.2140.

I am also working on software not directly related to R. For example, in project Mobilize we teach high school students in Los Angeles the basics of collecting and analyzing data. They use mobile devices to upload surveys with questions, photos, gps, etc using the ohmage software. Within Mobilize and Ohmage, I am in charge of developing web applications that help students to visualize the data they collaboratively collected. One public demo with actual data collected by students about snacking behavior is available at: http://jeroenooms.github.com/snack. The application allows students to explore their data, by filtering, zooming, browsing, comparing etc. It helps students and teachers to access and learn from their data, without complicated tools or programming. This approach would easily generalize to other fields, like medical data or BI. The great thing about this application is that it is fully client side; the backend is simply a CSV file. So it is very easy to deploy and maintain.

Ajay-What’s your take on difference between OpenCPU and RevoDeployR ?
Jeroen- RevoDeployR and OpenCPU both provide a system for development of R web applications, but in a fairly different context. OpenCPU is open source and written completely in R, whereas RevoDeployR is proprietary and written in Java. I think Revolution focusses more on a complete solution in a corporate environment. It integrates with the Revolution Enterprise suite and their other big data products, and has built-in functionality for authentication, managing privileges, server administration, support for MS Windows, etc. OpenCPU on the other hand is much smaller and should be seen as just a computational backend, analogous to a database backend. It exposes a clean HTTP api to call R functions to be embedded in larger systems, but is not a complete end-product in itself.

OpenCPU is designed to make it easy for a statistician to expose statistical functionality that will used by web developers that do not need to understand or learn R. One interesting example is how we use OpenCPU inside OpenMHealth, a project that designs an architecture for mobile applications in the health domain. Part of the architecture are so called “Data Processing Units”, aka DPU’s. These are simple, modular I/O units that do various sorts of data processing, similar to unix tools, but then over HTTPS. For example, the mobility dpu is used to calculate distances between gps coordinates via a simple http call, which OpenCPU maps to the corresponding R function implementing the harversine formula.

Ajay- What are your views on Shiny by RStudio?
Jeroen- RStudio seems very promising. Like Revolution, they deliver a more full featured product than any of my projects. However, RStudio is completely open source, which is great because it allows anyone to leverage the software and make it part of their projects. I think this is one of the reasons why the product has gotten a lot of traction in the community, which has in turn provided RStudio with great feedback to further improve the product. It illustrates how open source can be a win-win situation. I am currently developing a package to run OpenCPU inside RStudio, which will make developing and running OpenCPU apps much easier.

Ajay- Are you still developing excellent RApache web apps (which IMHO could be used for visualization like business intelligence tools?)
Jeroen–   The OpenCPU framework was a result of those webapps (including ggplot2 for graphical exploratory analysis, lme4 for online random effects modeling, stockplot for stock predictions and irttool.com, an R web application for online IRT analysis). I started developing some of those apps a couple of years ago, and realized that I was repeating a large share of the infrastructure for each application. Based on those experiences I extracted a general purpose framework. Once the framework is done, I’ll go back to developing applications 🙂

Ajay- You have helped  build web apps, openCPU, RAppArmor, Ohmage , Snack , mobility apps .What’s your thesis topic on?
Jeroen- My thesis revolves around all of the technical and social challenges of moving statistical computing beyond the academic and private labs, into more public, accessible and social places. Currently statistics is still done to mostly manually by specialists using software to load data, perform some analysis, and produce results that end up in a report or presentation. There are great opportunities to leverage the open source analysis and visualization methods that R has to offer as part of open source stacks, services, systems and applications. However, several problems need to be addressed before this can actually be put in production. I hope my doctoral research will contribute to taking a step in that direction.

Ajay- R is RAM constrained but the cloud offers lots of RAM. Do you see R increasing in usage on the cloud? why or why not?
Jeroen-   Statistical computing can greatly benefit from the resources that the cloud has to offer. Software like OpenCPU, RStudio, Shiny and RevoDeployR all provide some approach of moving computation to centralized servers. This is only the beginning. Statisticians, researchers and analysts will continue to increasingly share and publish data, code and results on social cloud-based computing platforms. This will address some of the hardware challenges, but also contribute towards reproducible research and further socialize data analysis, i.e. improve learning, collaboration and integration.

That said, the cloud is not going to solve all problems. You mention the need for more memory, but that is only one direction to scale in. Some of the issues we need to address are more fundamental and require new algorithms, different paradigms, or a cultural change. There are many exciting efforts going on that are at least as relevant as big hardware. Gelman’s mc-stan implements a new MC method that makes Bayesian inference easier and faster while supporting more complex models. This is going to make advanced Bayesian methods more accessible to applied researchers, i.e. scale in terms of complexity and applicability. Also Javascript is rapidly becoming more interesting. Performance of Google’s javascript engine V8 outruns any other scripting language at this point, and the huge Javascript community provides countless excellent software libraries. For example D3 is a graphics library that is about to surpass R in terms of functionality, reliability and user base. The snack viz that I developed for Mobilize is based largely on D3. Finally, Julia is another young language for technical computing with lots of activity and very smart people behind it. These developments are just as important for the future of statistical computing as big data solutions.

About-
You can read more on Jeroen and his work at  http://jeroenooms.github.com/ and reach out to him here http://www.linkedin.com/in/datajeroen

Running R and RStudio Server on Red Hat Linux RHEL #rstats

Installing R

  • sudo rpm -ivh http://dl.fedoraproject.org/pub/epel/6/i386/epel-release-6-8.noarch.rpm

(OR sudo rpm -ivh http://dl.fedoraproject.org/pub/epel/6/x86_64/epel-release-6-8.noarch.rpm )

THEN

  • sudo yum install R

THEN

  • sudo R

(and to paste in Linux Window- just use Shift + Insert)

To Install RStudio (from http://www.rstudio.com/ide/download/server)

32-bit

  •  wget http://download2.rstudio.org/rstudio-server-0.97.320-i686.rpm
  •  sudo yum install --nogpgcheck rstudio-server-0.97.320-i686.rpm

OR 64-bit

  •  wget http://download2.rstudio.org/rstudio-server-0.97.320-x86_64.rpm
  •  sudo yum install --nogpgcheck rstudio-server-0.97.320-x86_64.rpm

Then

  • sudo rstudio-server verify-installation

Changing Firewalls in your RHEL

-Change to Root

  • sudo bash 

-Change directory

  • cd etc/sysconfig

-Read Iptables ( or firewalls file)

  • vi iptables

( to quite vi , press escape, then colon :  then q )

-Change Iptables to open port 8787

  • /sbin/iptables -A INPUT -p tcp --dport 8787 -j ACCEPT

Add new user name (here newuser1)

  • sudo useradd newuser1

Change password in new user name

  • sudo passwd newuser1

Now just login to IPADDRESS:8787 with user name and password above

(credit- IBM SmartCloud Support ,http://www.youtube.com/watch?v=woVjq83gJkg&feature=player_embedded, Rstudio help, David Walker http://datamgmt.com/installing-r-and-rstudio-on-redhat-or-centos-linux/, www.google.com ,Michael Grieb)
 

 

Interview Pranay Agrawal Co-Founder Fractal Analytics

Here is an interview with Pranay Agrawal, Executive Vice President- Global Client Development, Fractal Analytics – one of India’s leading analytics services providers and one of the pioneers in analytics services delivery.

Ajay- Describe Fractal Analytics’ journey as a startup to a pioneer in the Predictive Analytics Services industry. What were some of the key turning points in the field of analytics that you have noticed during these times?

IMG_5387

Pranay- In 2000, Fractal Analytics started as a pure-play analytics services company in India with a focus on financial services. Five years later, we spread our operation to the United States and opened new verticals. Today, we have the widest global footprint among analytics providers and have experience handling data and deep understanding of consumer behavior in over 150 counties. We have matured from an analytics service organization to a productized analytics services firm, specializing in consumer goods, retail, financial services, insurance and technology verticals.
We are on the fore-front of a massive inflection point with Big Data Analytics at the center. We have witnessed the transformation of analytics within our clients from a cost center to the most critical division that drives competitive advantage.  Advances are quickly converging in computer science, artificial intelligence, machine learning and game theory, changing the way how analytics is consumed by B2B and B2C companies. Companies that use analytics well are poised to excel in innovation, customer engagement and business performance.

Ajay- What are analytical tools that you use at Fractal Analytics? Are there any trends in analytical software usage that you have observed?

Pranay- We are tools agnostic to serve our clients using whatever platforms they need to ensure they can quickly and effectively operationalize the results we deliver.  We use R, SAS, SPSS, SpotFire, Tableau, Xcelsius, Webfocus, Microstrategy and Qlikview. We are seeing an increase in adoption of open source platform such as R, and specialize tools for dashboard like Tableau/Qlikview, plus an entire spectrum of emerging tools to process manage and extract information from Big Data that support Hadoop and NoSQL data structures

Ajay- What are Fractal Analytics plans for Big Data Analytics?

Pranay- We see our clients being overwhelmed by the increasing complexity of the data. While they are all excited by the possibilities of Big Data, on-the-ground struggle continues to realize its full potential. The analytics paradigm is changing in the context of Big Data. Our solutions focus on how to make it super-simple for our clients combined with analytics sophistication possible with Big Data.
Let’s take our Customer Genomics solution for retailers as an example. Retailers are collecting information about Shopper behaviors through every transaction. Retailers want to transform their business to make it more customer-centric but do not know how to go about it. Our Customer Genomics solution uses advanced machine learning algorithm to label every shopper across more than 80 different dimensions. Retailers use these to identify which products it should deep-discount depending on what price-sensitive shoppers buy. They are transforming the way they plan their assortment, planogram and targeted promotions armed with this intelligence.

We are also building harmonization engines using Concordia to enable real-time update of Customer Genomics based on every direct, social, or shopping transaction. This will further bridge the gap between marketing actions and consumer behavior to drive loyalty, market share and profitability.

Ajay- What are some of the key things that differentiate Fractal Analytics from the rest of the industry? How are you different?

Pranay- We are one of the pioneer pure-play analytics firm with over a decade of experience consulting with Fortune 500 companies. What clients most appreciate about working with us includes:

  • Experience managing structured and unstructured Big Data (volume, variety) with a deep understanding of consumer behavior in more than 150 counties
  • Advanced analytics leveraging supervised machine-learning platforms
  • Proprietary products for example: Concordia for data harmonization, Customer Genomics for consumer insights and personalized marketing, Pincer for pricing optimization, Eavesdrop for social media listening,  Medley for assortment optimization in retail industry and Known Value Item for retail stores
  • Deep industry expertise enables us to leverage cross-industry knowledge to solve a wide range of marketing problems
  • Lowest attrition rates in the industry and very selective hiring process makes us a great place to work

Ajay- What are some of the initiatives that you have taken to ensure employee satisfaction and happiness?

Pranay- We believe happy employees create happy customers. We are building a great place to work by taking a personal interest in grooming people. Our people are highly engaged as evidenced by 33% new hire referrals and the highest Glassdoor ratings in our industry.
We recognize the accomplishments and contributions made through many programs such as:

  1. FractElite – where peers nominate and defend the best of us
  2. Recognition board – where anyone can write a visible thank you
  3. Value cards – where anyone can acknowledge great role model behavior in one or more values
  4. Townhall – a quarterly all hands where we announce anniversaries and FractElite awards, with an open forum to ask questions
  5. Employee engagement surveys – to measure and report out on satisfaction programs
  6. Open access to managers and leadership team – to ensure we understand and appreciate each person’s unique goals and ambitions, coach for high performance, and laud their success

Ajay- How happy are Fractal Analytics customers quantitatively?  What is your retention rate- and what plans do you have for 2013?

Pranay- As consultants, delivering value with great service is critical to our growth, which has nearly doubled in the last year. Most of our clients have been with us for over five years and we are typically considered a strategic partner.
We conduct client satisfaction surveys during and after each project to measure our performance and identify opportunities to serve our clients better. In 2013, we will continue partnering with our clients to define additional process improvements from applying best practice in engagement management to building more advanced analytics and automated services to put high-impact decisions into our clients’ hands faster.

About

Pranay Agrawal -Pranay co-founded Fractal Analytics in 2000 and heads client engagement worldwide. He has a MBA from India Institute of Management (IIM) Ahmedabad, Bachelors in Accounting from Bangalore University, and Certified Financial Risk Manager from GARP. He is is also available online on http://www.linkedin.com/in/pranayfractal

Fractal Analytics is a provider of predictive analytics and decision sciences to financial services, insurance, consumer goods, retail, technology, pharma and telecommunication industries. Fractal Analytics helps companies compete on analytics and in understanding, predicting and influencing consumer behavior. Over 20 fortune 500 financial services, consumer packaged goods, retail and insurance companies partner with Fractal to make better data driven decisions and institutionalize analytics inside their organizations.

Fractal sets up analytical centers of excellence for its clients to tackle tough big data challenges, improve decision management, help understand, predict & influence consumer behavior, increase marketing effectiveness, reduce risk and optimize business results.

 

Book Review- Predictive Analytics: The Power to Predict Who Will Click, Buy, Lie, or Die

I recently read a review copy of Dr Eric Siegel’s new book Predictive Analytics: The Power to Predict Who Will Click, Buy, Lie, or Die.

(Disclaimer-I have interviewed Eric here in September 2009, and we have been in touch over the years as his Predictive Analytics Conference became a blog partner and then a sponsor here at Decisionstats.com PAWCON also took off, becoming the biggest brand in independent analytics conferences)

So it was with a slight note of optimism that I opened this book, and it has so far exceeded my expectations. This is a very lucidly writtern, well explained book that can help people at all levels for analytics. There is a wealth of information here in a wide variety of business domains, and the beautifully designed book also has great tables, examples, and quotes, cartoons to make a very readable case for predictive analytics. Of course, Eric has some views, he loves ensemble modeling, conversion uplift models, privacy concerns, and his background in academics help explain even technical things very elaborately and in an interesting manner.

And at $14,6 it is quite a steal from Amazon. So buy a copy and read it. I would recommend it for helping build a case for predictive analytics , evangelizing to clients, and even to students in grad school programs. This is how analytics books should be , easy to read, and lucid to remember and practical to execute!

Countering Communist China’s CyberWar

How the West Counters China

  • Using United Nations and WTO to present evidence to push for financial penalties
  • Define Cyber- Retaliation rules of engagement and doctrine for hacking attacks
  • Delineate the obfuscation between Anonymous, State Sponsored Hacks, Hactivism, Cyber Criminals- and build clear rules of engagement
  • Provoke Chinese Naval and Air Assets (using the Opium War’s lessons)
  • Create a digital cyber-warfare alliance using Australia, Japan, Taiwan, South Korea, India , Tibetan Exiles and NATO

How China can counter the West

  • Build a dossier of false or misplaced allegations that are leveled at China and use them when something sticks
  • Highlight Western Government’s breaches of citizen privacy and digital surveillance
  • Highlight efforts of intellectual property theft, monopolistic actions and industrial espionage in the West
  • Host more black hat conferences within Macau and Hong Kong if not mainland China
  • Support Anonymous and Digital Activism as potential allies

The supreme art of war is to subdue the enemy without fighting.” ― Sun Tzu

The rise of the global MOOCs ( Massively Open Online Course)

Threatening the monopoly of corrupted universities that cater to rising educational aspirations with mediocre , recycled courses, the MOOC revolution now breaks out of North America to reach global universities.Of course, Coursera leads and I liked the tight integration with Meetup.com –

May I suggest they look at codeacademy and build much more gamification than  currently available in their own courses.

 

From-

http://blog.coursera.org/post/43625628117/29-new-schools-92-new-courses-5-languages-4

Today we’re welcoming 29 new universities to the Coursera community of 2.7 million students and 33 existing universities across 4 continents!

also see

http://viz.coursera.org/2013-02-20-globe/

mooc1

and

https://www.edx.org/press/edx-expands-internationally

To date, edX has more than 700,000 individuals on its platform, who account for more than 900,000 course enrollments.

EdX, the not-for-profit online learning enterprise founded by Harvard University and the Massachusetts Institute of Technology (MIT), announced today the international expansion of its X University Consortium with the addition of six new global higher education institutions. The Australian National University (ANU), Delft University of Technology in the Netherlands, École Polytechnique Fédérale de Lausanne (EPFL) in Switzerland, McGill University and the University of Toronto in Canada, and Rice University in the United States are joining the Consortium

Do you want a 100,000 extra analysts in your platform or language. Why don’t you let your training department design a MOOC for these platforms. Because the more people that are trained in your software or platform, the more brand ambassadors you have to selling it to future clients. Companies like 10gen are already doing it for MongoDB. And Oracle is doing it for R Enterprise ( see https://decisionstats.com/2012/10/23/10gen-and-online-education-mongodb/)

What is a MOOC?

http://en.wikipedia.org/wiki/Massive_open_online_course

A massive open online course (MOOC) is an online course aiming at large-scale participation and open access via the web. MOOCs are a recent development in distance education using open educational resources. They are similar to college courses, but typically do not offer academic credit. Other forms of assessment or certification may be available including those based on learning analytics for online environments.

MOOCs originated within the open educational resources (OER) movement and connectivist roots. Several MOOC-type projects have emerged independently, such as Coursera, Udacity, and edX.[1] Others, like Canvas Network and CourseSites by Blackboard Inc have evolved from learning management systems.

Also see-

http://www.mooc-list.com/  list of Massive Open Online Courses (free online courses) offered by the best universities and entities.

Dhiraj Rajaram creates India’s first billion dollar analytics company

Dhiraj Rajaram, got featured in Economic Times recently as the CEO- founder of India’s first billion dollar valuation analytics startup.

Mu Sigma attracts a clutch of foreign investors, gets valued at $1 billion, Dhiraj Rajaram is now king of data

This year, the company which employs 2,500 people across a development centre in Bangalore and offices in the US, UK and Australia, will build a data analytics lab in the US and hire 400 data scientists there.

I first met Dhiraj in 2008 Q1 for a job. We didnt agree partly because I needed to be close to my son ( who was 4 mth old) and I ended up taking a contract with another Bangalore based company. What impressed me at that time was something I rarely see in India’s analytics entrepreneurs-

1)  A Grand Vision- Dhiraj said- I am trying to build the largest math factory on the world.

2) Focus- Dhiraj was focused only on analytics projects. No quick and easy outsourcing low end tasks and outsourcing for him.

3) Positivity- Not once during the entire two hour interaction did he say a negative word on competition, attrition, challenges, pressures.

4) Flamboyance- I wonder sometimes why a colorful culture like India’s end up with people being so meek in corporate culture. Dhiraj was probably one of the most flamboyant senior analytics leaders.

But there were some concerns I had in 2008 q1- including plans for IPO ( I thought that was early) and senior management flux ( the COO left in a few months).

Anyways Dhiraj grew the 200 strong team to around 900 by 2010 q3. This time again he called me for a job interview. This time we again found that there was nothing I was really good at in analytics company- with my interest in open source, blogging and writing books, and my morbid fear of managing people in operations. However I noticed some changes-

  1. There were greater signs of process driven orientation ( including messages to keep meetings short)
  2. There were newer people in senior management
  3. Dhiraj was slightly more restrained in his frank talk ( given his increasing stature and demands on his time and attention on him)
  4. I loved the sign on his Office- Jugad. Literally that means ingenuity in Hindi- and shows a glimpse into the maveric, brilliant and flamboyant nature of the CEO.

Again, there were some odd points. Mu Sigma continued to have the perception ( true or false, I dont know) of having a large number of attrition at junior levels. Again there were rumours that Dhiraj had become a bit autocratic in management ( which I found no clue of). I found that the biggest problem that Mu Sigma, Dhiraj had – they were creating enemies just by shaking up the slow IT Services mindset of India- where easy money was available just by low quality labor arbitrage. This cultural opposition to anything new (like a pure analytics company), or anything rapid ( like a company that scales up organically) could have stopped lesser men, but Mu Sigma moved on.

So it was quite nice to read the news, finally an Indian company , had broken the 1 billion mark. Allow me some leeway here. I truly believe analytics and maths have no nationality. But if you see the rampant poverty in India , what we need is more aggressive and impatient businessmen like Mr Rajaram, than the chalta hain _ ” it is okay” attitude.

Dhiraj and team, take a bow. You make us proud!