Latest Interview – Rapid Miner CEO Ingo Mierswa

Here is an interview I did with the CEO of Rapid Miner, Ingo Mierswa. Ingo, who is something of a prodigy and genius with multi-lingual capabilities, stellar academic and business record talks on navigating the journey for an open source startup.

http://www.kdnuggets.com/2014/06/interview-ingo-mierswa-rapidminer-analytics-turning-points.html

Popularized by Michael (Monty) Widenius, one of the founders of MySQL and an investor in RapidMiner, business source is a commercial software license model that offers many of the benefits of open source, but with a built-in time delay on users being able to access new versions of our products.

 

Related-

  1. Guide to Data Science Cheat Sheets 2014/05/12
  2. Book Review: Data Just Right 2014/04/03
  3. Exclusive Interview: Richard Socher, founder of etcML, Easy Text Classification Startup 2014/03/31
  4. Trifacta – Tackling Data Wrangling with Automation and Machine Learning 2014/03/17
  5. Paxata automates Data Preparation for Big Data Analytics 2014/03/07
  6. etcML Promises to Make Text Classification Easy  2014/03/05
  7. Wolfram Breakthrough Knowledge-based Programming Language – what it means for Data Science? 2014/03/02

10 for 10 – Packt lowers cost of books for students and researchers alike

The high cost of textbooks and science books is an open scandal. Despite this publishers are barely profitable, and the ecosystem is ripe for disruption.

Packt is one such player. I have reviewed many books for them ( in return I get ebooks and books – some of which I give to my students).

Now they have an intriguing offer.

As you are aware, this month, Packt is celebrating 10 years of success with over 2000 Titles in its Library. To celebrate this huge milestone, we have come up with an exciting opportunity for collaboration which you might be interested in.

Packt is offering all of its eBooks and Videos at just $10 each. This campaign is specifically aimed towards thanking all our customers for their support and opening up our comprehensive range of titles just for $10 each. This promotion covers every title and customers can stock up on as many copies as they like until July 5th. I hope you find this as a great opportunity to explore what’s new and maintain your personal and professional development.

Interested- you can see http://www.packtpub.com/10years

Disclosure- The author was offered 2 free ebooks as part of this campaign on social media. Books is one thing he is willing to blog for 😉

Analysing Google Plus posts using R language #rstats

Here is a short post in retrieving information from the Google+ API using R, and then analysing it.

To create an API key:

  1. Go to the Google Developers Console.
  2. Create or select a project.
  3. In the sidebar on the left, select APIs & auth.
  4. In the displayed list of APIs, find the Google+ API and set its status to ON.
  5. In the sidebar on the left, select Credentials.
  6. Create an API key by clicking Create New Key. Select the appropriate kind of key: Server key  Then clickCreate.

from- https://developers.google.com/+/api/oauth

and the R code

#install.packages("plusser")
library(plusser)
help(plusser)
library(RCurl)
options(RCurlOptions = list(cainfo = system.file("CurlSSL", "cacert.pem", package = "RCurl")))
setAPIkey('AIzaSyBtYqDsAtzp4FOS7FGbrc_n6mD-uJIOvcQ')
myProfile=harvestProfile("+AjayOhri", parseFun = parseProfile)
str(myProfile)
myposts=harvestPage("+AjayOhri", parseFun = parsePost, results = 1, nextToken = NULL, cr = 1)
str(myposts)
head(myposts)
plot(myposts$ti,myposts$nC) #number of comments
plot(myposts$ti,myposts$nP) #number of likes or plus 1
plot(myposts$ti,myposts$nR) #number of reshares

some screenshots and images Screenshot 2014-06-26 13.33.08

Screenshot 2014-06-26 13.32.56

You can also see the Rpubs document here http://rpubs.com/decisionstats2/plusser Now you can do text analysis and sentiment analysis on myposts$msg and do social media analysis on what makes people like what kind of content. 


For better results, use a google plus id (page or person) which has a lot of PUBLIC posts!

 

ggvis is awesomeness personified #rstats

 

Hu ha! Latest sexy software from our man Dr Hadley Wickham and his ninjas at RStudio. Now YOU can make a Business Intelligence software for FREE. How good is it? time will tell if someone can use it to give Tableau Software and Qlikview a run for the money

Seriously- I would like to see ONE implementation of RHadoop and Shiny with ggplot2 and d3

(Big data analytics indeed 😉 )

from

———————————-

http://ggvis.rstudio.com/

ggvis is a data visualization package for R which lets you:

  • Declaratively describe data graphics with a syntax similar in spirit to ggplot2.
  • Create rich interactive graphics that you can play with locally in Rstudio or in your browser.
  • Leverage shiny’s infrastructure to publish interactive graphics usable from any browser (either within your company or to the world).

The goal is to combine the best of R (e.g. every modelling function you can imagine) and the best of the web (everyone has a web browser). Data manipulation and transformation are done in R, and the graphics are rendered in a web browser, using Vega. For RStudio users, ggvis graphics display in a viewer panel, which is possible because RStudio is a web browser.

Please note that the API has changed significantly between ggvis 0.1 and 0.3. Documentation for the old version is here.

Screenshot 2014-06-25 21.07.50

Great Way to learn Git easily

a great way to learn Git easily is here https://try.github.io/

Screenshot 2014-06-24 19.23.59

This is a much better designed code school project than the one for R

http://tryr.codeschool.com/

However Swirl is a great way to learn  R in an interactive way. its only drawback is it needs to be integrated with something like http://www.r-fiddle.org/#/ for a true automated browser only version

Why do I favor automated elearning solutions now? Because teaching the same thing again and again can be boring for the teacher and videos can be boring for the students. Note how the potential student is given positive reinforcement to boost his morale, something any good teacher know.

There is more to open source statistics than R #rstats

There is more to open source statistics than R   and other things your Professor never told you

There is a disturbing trend I see in members of R Community particularly in its evangelical wing, in claiming R is the panacea and cure for all things statistical. No one software can and will be able to handle all the parts of the data analytics pipeline equally well, there will always be trade-offs based on perceptual assessments of both current needs and future trends.

While an employee of a proprietary software company can and will always claim that his software is the best and fastest for everything, what has happened over the past few years is that  analytics open source software people have been neatly split into Pythonistas and R Users.

This is a disturbing trend. Rather than mimick and copy libraries and packages between R and Python, there should be a movement for greater inter-operability and transparency in cross training. Why cant you teach Pandas in a R Meetup and Why cant you learn ggplot2 in a Python meetup.

United we stand and both R and Python communities will gain. Divided, the opponents of open source will end up appropriating the work of the community and laugh all the way to the bank.

More R and More Python. Like Rum with Cola. Not Seperately, but together. Is that a pipe dream, or will that benefit industry? I would love to see atleast one big startup making products and services in both!

 

 

Revolution Analytics and RStudio- Different approaches to being open source in #rstats

RStudio is free ( as in beer) and free (as in speech). You pay for RStudio Services ( including training, enterprise and pro editions ).But the software is both open source and free for everyone. The services is how they basically pay for bread and pizza.

https://www.shinyapps.io/

https://www.shinyapps.io/pricing

ShinyApps.io is currently in Alpha which means we’re still figuring out exactly how pricing will work for the service. We do know that we’ll have a tiered pricing model in hopes of making the service accessible to as many different groups as we can. We will offer a free tier for users with light needs and feature requirements.

We’ll announce the specifics of the pricing model for ShinyApps.io in the coming months.

Screenshot 2014-06-21 09.53.34

 

http://www.rstudio.com/products/rstudio-server-pro/

RStudio Server Pro lets multiple users share access to powerful compute resources (memory, processors, etc.). Team leaders can centralize the installation and configuration of their R environment with the visibility and control needed to manage it all effectively.

Screenshot 2014-06-21 09.52.22

http://www.rstudio.com/pricing/smb-pricing/

RStudio offers discounts on RStudio Server Pro and Shiny Server Pro for businesses up to $5 million in annual revenue. Our goal to make it so that small startups and developers can get started easily with a credit card. Our intention is to charge a fair price for the value derived and to grow with each small business as they gain value from our products.

To qualify for Small Business discounts, businesses must:

  • Disclose last year’s annual revenue to RStudio on request (for example, provide an accounting statement)
  • Display “Powered by RStudio™ Shiny” at the bottom of all Shiny application pages
  • Complete the order online (links below)
    • Accept the standard “click-through” RStudio license agreement
    • Pay by credit card
  • Repeat these steps to re-qualify annually at the time of renewal

Screenshot 2014-06-21 09.47.16

—–

The champion of the Enterprise software for the R language remains Revolution Analytics.

They offer source code for all, and free software for academics. The have training services  and they are much ahead of RStudio is partnering up formally with other players and corporates in the ecosystem.  By cleverly using consultants included noted package creators, they have managed to keep their costs down and research output high including R Hadoop, Revo Deploy R and the earlier optimized efforts.

But the basic software is not free, RevoScaleR package does not have a community edition, there is no SMB discounts. Part of the reason is Revolution is funded by Intel and Microsoft initially, while RStudio chugs along on it’s own. Revolution Analytics has also changed 3 CEOs and at one time fired half the staff while RStudio has cautiously and steadily ramped up.

This is one reason lot more people use software from RStudio , lot less people use software from Revolution Analytics, and RevoScaleR package is not so widely known in industry

http://buy.revolutionanalytics.com/

Revolution R Enterprise Workstation

Your workstation license entitles you to exclusive use by a single named user and excludes automated use of the software, including scheduled batch processing and embedding into other software applications; includes the Revolution R Productivity Environment on the Windows platform only.

Revolution R Enterprise Entry Workstation: Up to 4 cores
Revolution R Enterprise Power Workstation: Up to 8 cores

Revolution R Enterprise Server

A server license of Revolution R Enterprise supports unlimited users, and is required for automated applications including scheduled batch processing, and embedding into other software applications. A server license includes use of the DeployR Web Services framework.

Revolution R Enterprise Training

Revolution Analytics provides world class training, designed and delivered by R programming experts, to ensure that you and your team are immediately productive and able to take advantage of all the features and functions available in Revolution R Enterprise Workstation and Server. In addition to our core product courses, we provide industry specific training opportunities as well as custom, on-site training to bring your entire team up to speed all at the same time.

Screenshot 2014-06-21 10.00.29

Even SAS University Edition is now more generous licencing than Revolution Analytics policy for RevoScaleR.

 

Yesterday’s revolutionaries for analytics are today’s contented conservatives.

There is lip service paid to FOSS and FOAS by the so called decade long flag bearers of open source in Revolution Analytics.

But isn’t it ironic,  don’t you think?