Eric Siegel’s Book

Message from Founder of Predictive Analytics Conference, Eric Siegel whom we have interviewed here

To drive early orders for my about-to-launch book, “Predictive Analytics: The Power to Predict Who Will Click, Buy, Lie, or Die” (published by Wiley Feb 18), we’re providing this offer:

PREORDER NOW ($15 on Amazon, currently) and receive:

1) FREE ACCESS to the first of four modules of my online training program, “Predictive Analytics Applied”

2) A 40% DISCOUNT CODE off the full online training ($495), or its in-person version, “Predictive Analytics for Business, Marketing and Web” ($1,495 – April 25-26 in NYC)

 

DETAILS ON THIS OFFER: http://www.pawcon.com/blog/?p=855

– – – – – – – – – – – –

5 REASONS THIS BOOK MATTERS TO EXPERTS: http://www.pawcon.com/patimes/january13

FULL PREFACE: http://www.pawcon.com/patimes/december12

39 COLLEAGUES WHO LOVED THIS BOOK: http://www.pawcon.com/book/praise.php

MORE INFO: http://www.thepredictionbook.com

– – – – – – – – – – – –

“Exciting and engaging – reads like a thriller!”
– Marianna Dizik, Statistician, Google

In this rich, entertaining primer, former Columbia University professor and Predictive Analytics World founder Eric Siegel reveals the power and perils of prediction.

“What Nate Silver did for poker and politics, this does for everything else. A broad, well-written book easily accessible to non-nerd readers.”

—David Leinweber, author, Nerds on Wall Street: & Math, Machines and Wired Markets

“This book is an operating manual for 21st century life. Drawing predictions from big data is at the heart of nearly everything, whether it’s in science, business, finance, sports, or politics. And Eric Siegel is the ideal guide.”

—Stephen Baker, author, The Numerati and Final Jeopardy: Man vs Machine and the Quest to Know Everything

With a foreword from Thomas H. Davenport, coauthor of Competing on Analytics.

Blog Disclaimer-Eric is founder of PAW Conferences , which is a Blog Sponsor here.

 

Visualizing Hadley Wickham #rstats

I like the visual appeal of commits by users over time at Github. For example, we can see Hadley Wickham is committed. But you already knew that. Nice to see a calender heat map being used effectively.

Now if we could only do that to CRAN (?) commits. Come on , Brian- you are not too old for this.

visualizing

Mathematica latest software to offer built-in Integration with R #rstats

Just got a message from the good chaps at Wolfram Alpha/Mathematica

 

Mathematica 9 offers built-in ways to integrate R code into your Mathematica workflow, combining Mathematica‘s broad range of capabilities with the statistical computing language. RLink uses J/Link and rJava/JRI Java libraries to allow the user to exchange data between Mathematica and R and to execute R code from within Mathematica. With RLink, R users can use thousands of functions from across the full Mathematica system.

see more at

http://www.wolfram.com/mathematica/new-in-9/built-in-integration-with-r/

wolfram

The making of a R startup Part 1 #rstats

Note- Decisionstats.com has done almost 105 interviews in the field of analytics, technology startups and thought leaders ( you can see them here http://goo.gl/m3l31). We have covered some of the R authors ( R for SAS and SPSS users, Data Mining using R, Machine Learning for Hackers) , and noted R package creators (ggplot2, RCommander, rattle GUI, forecast)

But what we truly enjoy is interviews with startups in R ecosystem , including founders of Revolution Analytics,Inference for R, RStudio, Cloudnumbers 

The latest startup in the R ecosystem with a promising product is RApporter.net . It has actually been there for some time, but with the launch of their new product we ask them the trials and tribulations of creating an open source startup in the data science field.

This is part 1 of the interview with Gergely Daróczi, co-founder of the Rapporter project.

greg

Ajay- Describe the journey of Rapporter till now, and your product plans for 2013.

Greg- The idea of Rapporter presented itself more then 3 years ago while giving statistics, SPSS and R courses at different Hungarian universities and also creating custom statistical reports for a number of companies for a living at the same time.
Long story short, the three Hungarian co-founder faced similar problems at both sectors: students, just like business clients, admired the capabilities of R and the wide variety of tools found on CRAN,but were not eager at all to get into learn how to use that.
So we tried to make up some plans how to let the non-R users also build on the resources of R, and we came up with the idea of an intuitive web-interface as an R front-end.

The real development of a helper R package (which later become “rapport”) started in the January of 2011 by Aleksandar Blagotić and me1 in our spare time and rather just for fun, as we had a dream about using “annotated statistical templates” in R after a few conversations on StackOverflow. We also worked on a front-end in the means of an Rserve driven PHP engine with MySQL – to be dropped and completely rewritten later after some trying experiences and serious benchmarking.

We have released “rapport” package to the public at the end of 2011 on GitHub, and after a few weeks on CRAN too. Despite the fact that we did our best with creating a decent documentation and also some live examples, we somehow forgot to spread the news of the new package to the R community, so “rapport” did not attract any serious attention.

Even so, our enthusiasm for annotated R “templates” did not wane as time passed, so we continued to work on “rapport” by adding new features and also Aleksandar started to fortify his Ruby on Rails skills. We also dropped Rserve with MySQL back-end, and introduced Jeffrey Horner’s awesome RApache with some NoSQL databases.
To be honest, this change resulted in a one-year delay of releasing Rapporter and no ends of headaches on our end, but in the long run, it was a really smart move after all, as we own an easily scalable and a highly available cluster of servers at the moment.

But back to 2012.

As “rapport” got too complex as time passed with newly added features, Aleksandar and I decided to split the package, which move gave birth to “pander”. At that time “knitr” got more and more familiar among R users, so it was a brave move to release “another” similar package, but the roots of “pander” were more then one year old, we used some custom methods not available in “knitr” (like
capturing the R object beside the printed output of chunks), we needed tweakable global options instead of chunk options and we really wanted to build on the power of Pandoc – just like before.

So we had a package for converting R objects to Pandoc’s markdown with a general S3 method, another package to automatically run that and also capture plots and images a brew-like document with various output formats – like pdf, docx, odt etc.
In the summer, while Aleksandar dealt with the web interface, I worked on some new features in our packages:
• automatic and robust caching of chunks with various options for performance reasons,
• automatically unifying “base”, “lattice” and “ggplot2” images to the same style with user options – like major/minor grid color, font family, color palette, margins etc.
• adding other global options to “pander”, to let our expected clients later personalize their
custom report style with a few clicks.

At the same time, we were searching for different options to prevent running malicious code in the parallel R sessions, which might compromise all our users’ sensitive data. Unfortunately no full blown solution existed at that time, and we really wanted to stand clear of running some Java based interpreters in our network.
So I started to create a parser for R commands, which was supposed to filter out malicious R commands before evaluation, and a handful flu got me some spare time to implement “sandboxR” with an open and live “hack my R server” demo, which ended up in a great challenge on my side, but proved to really work after all.
I also had a few conversations with Jeroen Ooms (the author of the awesome OpenCPU), who faced similar problems on his servers and was eager to prevent the issues with the help of AppArmor. The great news of “RAppArmor” did make “sandboxR” needless (as AppArmor just cannot regulate inner R calls), but we started to evaluate all user specified R commands in a separate hat, which allowed me to make “sanboxR” more permissive with black-filtered functions.
In the middle of the summer, I realized that we have an almost working web application with any number of R workers being able to serve tons of users based on the flexible NoSQL database back- ends, but we had no legal background to release such a service, nor had I any solid financial background to found one – moreover the Rapporter project already took huge amount from my family budget.

As I was against of letting some venture capital to dominate the project, and did not found any accelerator that would take on a project with a maturing, almost market-ready product, me and a few associates decided to found a UK company on our own and having confidence in the future and God.

So we founded Easystats Ltd, the company running rapporter.net, in July, and decided to release the first beta and pretty stable version of the application to the public at the end of September. At that time users could:
• upload and use text or SPSS sav data sets,
• specify more then 20 global options to be applied to all generated reports (like plot themes, table width, date format, decimal mark and number of digits, separators and copula in vectors etc.),
• create reports with the help of predefined statistical “templates”,
• “fork” (clone) any of our templates and modify without restriction, or create new statistical templates from scratch,
• edit the body or remove any part of the reports, resize images with the mouse or even with finger on touch-devices,
• and export reports to pdf, odt or docx formats.

A number of new features were introduced since then:

OpenBUGS integration with more permissive security profiles, users can create custom styles for the exported documents (in LaTeX, docx and odt format) to generate unique and possibly branded reports, to share public or even private reports with anyone without the need for registering on rapporter.net by a simple hyperlink, and to let our users to integrate their templates in any homepage, blog post or even HTML mail, so that let anyone use the power of R with a few clicks building on the knowledge of template authors and our reliable back-end.
Although 2 years ago I was pretty sure that this job would be finished in a few months and that we would possibly have a successful project in a year or two, now I am certain, that bunch of new features will make Rapporter more and more user-friendly, intuitive and extensible in the next few years.
Currently, we are working hard on a redesigned GUI with the help of a dedicated UX team at last (which was a really important structural change in the life of Rapporter, as we can really assign and split tasks now just like we dreamed of when the project was a two-men show), which is to be finished no later then the first quarter of the year. Beside design issues, this change would also result
in some new features, like ordering the templates, data sets and reports by popularity, rating or relevance for the currently active data set; and also letting users to alter the style of the resulting reports in a more seamless way.

The next planned tasks for 2013 include:
• a “data transformation” front-end, which would let users to rename and label variables in any uploaded data set, specify the level of measurement, recode/categorize or create new variables with the help of existing ones and/or any R functions,
• edit tables in reports on the fly (change the decimal mark, highlight some elements, rename columns and split tables to multiple pages with a simple click),
• a more robust API to let third-party users temporary upload data to be used in the analysis,
• option to use multiple data sets in a template and to let users merge or connect data online,
• and some top-secret surprises.

Beside the above tasks, which was made up by us, our team is really interested in any feedback from the users, which might change the above order or add new tasks with higher priority, so be sure to add your two cent on our support page.

And we will have to come up with some account plans with reasonable pricing in 2013 for the hosted service to let us cover the server fees and development expenses. But of course Rapporter will remain free for ever for users with basic needs (like analyzing data sets with only a few hundreds of cases) or anyone in the academic sector, and we also plan to provide an option to run Rapporter “off-site” on any Unix-like environment.

Ajay- What are some of the Big Data use cases I can do with Rapporter?

Greg- Although we have released Rapporter beta only a few months ago, we already heard some pretty promising use-cases from our (potential) clients.

But I must emphasize that at first we are not committed to deal with Big Data in the means of user contributed data sets with billions of cases, but rather concentrating on providing an intuitive and responsive way of analyzing traditional, survey-like data frames up to about 100.000 cases.

Anyway, to be on topic: a really promising project of Optimum Dosing Strategies has been using Rapporter’s API for a number of weeks even in 2012 to compute optimal doses for different kind of antibiotics based on Monte-Carlo simulation and Bayesian adaptive feedback among other methods.
This collaboration lets the ID-ODS team develop a powerful calculator with full-blown reports ready to be attached to medical records – without any special technical knowledge on their side, as we maintain the R engine and the integration part, they code in R. This results in pleased clients all over the world, which makes us happy too.

We really look forward to ship a number of educational templates to be used in real life at several (multilingual) universities from September 2013. These templates would let teachers show customizable and interactive reports to the students with any number of comments and narrative paragraphs, which statistical introductory modules would provide a free alternative to other desktop
software used in education.

In the next few months, a part of our team will focus on spatial analysis templates, which would mean that our users could not just map, but really analyze any of their spatially related data with a few clicks and clear parameters.

Another feature request of a client seems to be a really exciting idea. Currently, Google Analytics and other tracking services provide basic options to view, filter and export the historical data of websites, blogs etc.
As creating an interface between Rapporter and the tracking services to be able to fetch the most recent data is not beyond possibility any more with the help of existing API resources, so our clients could generate annotated usage reports of any specified period of time – without restrictions. Just to emphasize some potential add-ons: using the time-series R packages in the analysis or creating real- time “dashboards” with optional forecasts about live data.

Of course you could think of other kind of live or historical data instead of Google Analytics, as creating a template for e.g. transaction data or gas usage of a household could be addressed at any time, and please do not forget about the above referenced use-cases in the 3 rd question (“[…]Rapporter can help: […]”).

But wait: the beauty of Rapporter is that you could implement all of the above ideas by yourself in our system, even without any help from us.

Ajay- What are some of things that can be easily done with Rapporter than with your plain vanilla R?

Greg- Rapporter is basically developed for creating reproducible, literative and annotated statistical modules (a.k.a. “templates”), which means the passing a data set and the list of variables with some optional arguments would end up in a full-blown written report with automatically styled tables and charts.

So using Rapporter is like writing “Sweave” or “knitr” documents, but you write the template only once, and then apply that to any number of data sets with a simple click on an intuitive user interface.

Beside this major objective: as Rapporter is running in the cloud and sharing reports and templates (or even data sets) with collaborators or with anyone on the Internet is really easy, our users can post, share any R code for free and without restrictions or release the templates with specified license and/or fees in a secured environment.

This means that Rapporter can help:

  1. scholars sharing scientific results or methods with reproducible and instantly available demo and/or dedicated implementation along with publications,
  2. teachers to create self-explanatory statistical templates which would help the students internationalize the subject by practice,
  3. any R developer to share a live and interactive demo of the implemented features of the functions with a few clicks,
  4. businesses could use a statistical platform without restrictions for a reasonable monthly fee instead of expensive and non-portable statistical programs,
  5. governments and national statistical offices to publicize census or other big data with a scientific and reliable analytic tool with annotated and clear reports while insuring the anonymity of the respondents by automatically applying custom methods (like data swapping, rounding, micro-aggregation, PRAM, adding noise etc.) to the tables and results, etc.

And of course, do not forget about one of our main objectives to let us open up the world of R to non-R users too with an intuitive, driving user interface.

(To be continued)-

About

Gergely Daróczi is co-ordinating the development of Rapporter and maintaining their  R packages. Beside he tries to be active in some open-source projects and on StackOverflow, he is a PhD candidate in sociology and also a lecturer at Corvinus University of Budapest and Pázmány Péter Catholic University in Hungary

Rapporter is a web application helping you to create comprehensive, reliable statistical reports on any mobile device or PC, using an intuitive user interface.

The application builds on the power of R beside other technologies and intended to be used in any browser doing the heavy computations on the server side. Some might consider Rapporter as a customizable graphical user interface to R – running in the cloud.

Currently, Rapporter is under heavily development and only invited alpha testers can access the application. Please sign up for an invitation if you want to have an early-bird insight on Rapporter.

part1

The second meetup for R New Delhi Users

The R Users of New Delhi met for the second time on Dec 15, 2012. We meet on the third Saturday of every month.

12121

We talked on epidemiology using epi calc package  ( we have 1 doctor and 1 bio statistician) , and Cloud Computing ( we have two IT guys) and Business Analytics. We also discussed the GUI , R Commander , Rattle, and Deducer for beginners and people transitioning to R from other analytics software. We also discussed the R for SAS and SPSS Users books, and R for Data Mining Book. The free book for R for Epidemiology ( http://cran.r-project.org/doc/contrib/Epicalc_Book.pdf ) was mentioned . Not bad for 1 hour.

We are currently unfunded and unsponsored , I hope to get some sponsors to give away R books to encourage users and group members (excluding my own). The only catch to join this meetup group, you either need to attend (and be local) or present something ( if you are not in Delhi)2

I have been trying to get this group to go from Vector to Matrix to get a bigger sponsorship from Revolution , but I am constrained by meeting in a public cafe. That is due to change since we managed to get one sponsor for meeting place in Noida ( a Business School batchmate who owns his office)

http://www.revolutionanalytics.com/news-events/r-user-group/

Deadlines for applications are:

  • March 31, 2013 for Matrix and Array level groups.
  • September 30, 2013 for Vector level groups.

2013 Sponsorship Levels

The size of the annual grant depends on the size of your group.

Level For groups that are:  Requirements Annual Grant ($USD)
Vector Just getting started A group name, group webpage, and a focus on R. (Here are some tips on starting up a new R user group.) $100
Matrix Smaller but established 3 meetings in last 6 months with 30 attendees or more.  $500
Array Larger and groups  3 meetings in last 6 months with 60 attendees or more. $1000

 

Try and learn R – for Free

A free online course on learning R ( sponsored by O Reilly)

http://tryr.codeschool.com/

Table of Contents

  1. R Syntax: A gentle introduction to R expressions, variables, and functions
  2. Vectors: Grouping values into vectors, then doing arithmetic and graphs with them
  3. Matrices: Creating and graphing two-dimensional data sets
  4. Summary Statistics: Calculating and plotting some basic statistics: mean, median, and standard deviation
  5. Factors: Creating and plotting categorized data
  6. Data Frames: Organizing values into data frames, loading frames from files and merging them
  7. Working With Real-World Data: Testing for correlation between data sets, linear models and installing additional packages

codeschool try R

 

New Delhi R User group meets up

Inspired by David Smith ‘s blog post at http://blog.revolutionanalytics.com/2012/10/r-user-group-sponsorship-applications-open-for-2013.html I set up a meetup group for New Delhi at http://www.meetup.com/New-Delhi-R-UseR-Group/ ( India to my surprise has only 1 R user meetup group before this in Bangalore). The first meeting was awesome, we met in a  cafe, and the plan going forward is to cover cross domain learning and collaboration on tools, startups, mashups and training.

Hopefully we can reach out to analytics enthusiasts in Mumbai and Chennai to help kickstart the R User groups. Indian companies like Mu Sigma have been using R more and more in analytics (offshoring). You can even use the sponsorship from Revolution Analytics to start your meetup group , Meetup.com  gives you a 50% discount if you pay 6 months in advance, and given Oracle’s and IBM/Google\s big Indian presence I hope they lend a hand to User groups for R in India as well.