DecisionStats Interview Scott Draves Beaker Notebook

As part of my research for Python for R Users: A Data Science Approach (Wiley 2016) Here is an interview with Scott Draves,  awesome software artist and developer at Beaker Notebook. Beaker Notebook allows you to use multiple languages together in same interface seamlessly ( like Python, R, JS , Scala)

Ajay Ohri (AO) -What inspired you to make BeakerNotebook? What are some of the design decisions you took? How does it compare to Jupyter Notebook and what do you see the product roadmap ahead for Beaker (with current limitations if any)

Scott Draves (S) – Two Sigma uses a variety of tools. Some have been developed internally over many years, some are open source like Linux, Java, R, and IPython, and some are commercial such as MATLAB and Excel.  Beaker is inspired by all these systems, and many more. It’s a new synthesis on new infrastructure.  The design favors ease of use and high quality. Beaker is about working automatically with one click, and also having total programmability.

Jupyter (which was called IPython when we started) is definitely one of our inspirations.  If fact Beaker is compatible with it and when you run Python in Beaker, it’s talking to your existing IPython backend.  Beaker uses nginx as a reverse proxy to make a collection of backends (one for each language, plus Beaker‘s core server) appear as a single application.

Our roadmap is published on the wiki: https://github.com/twosigma/beaker-notebook/wiki/Roadmap

Screenshot from 2015-12-07 10:17:18

AO- To pass objects from Python to R I need rpy2. How does Beaker simplify this process. For example if I want to use auto.arima from forecast package for a Panda Time Series how would I do it

S- Beaker‘s autotranslation is simpler because it focuses on the data.  That means your R and Python code co-exist in independent cells, each in its native syntax, but they can communicate via the Beaker object that is reflected to exist in all languages.  By contrast with rpy2, you access R through a Python syntax.  For example instead of

robjects.StrVector([‘abc’, ‘def’])

in Beaker you can just say

c(‘abc’, ‘def’)

As for auto.arima, let me first note that by coincidence, the #1 google hit for [auto.arima] is a web site that uses a Flame as its banner, ie was made with an algorithm and I open sourced in the early 90s (see below).

Screen Shot 2015-12-06 at 1.49.17 PM
But anyway, I took the example from the bottom of that page and made it work with a random Pandas data frame.  Here’s the Beaker notebook: https://pub.beakernotebook.com/#/publications/56648fcc-2e8e-41a6-aa4a-1249ee39023c
One improvement in the works is replacing beaker::get(‘df’) with beaker$df.
Screenshot from 2015-12-07 10:13:27

AO- Which industries or businesses would most benefit from ability to use Python, R, JS, Scala etc in same notebook

 

S- Any industry that works rapidly with data in a quantitative and scientific style benefits.  Autotranslation increases your options and makes experimentation and mash-ups easy.  That would include traditional sciences such as genetics and physics, and also business applications in finance, data mining, and machine learning.  But users come from all over. Beaker is at its heart a general purpose tool for exploring with code and data, so we believe the benefits could be widespread.
AO- How would Beaker Notebook be useful for Big Data Analytics and Data Science
S- Beaker is great for exploring data sets with code, visualization, and tables.  And you can turn your research into applications, without recoding, because Beaker notebooks are repeatable, reproducible, and remixable.
AO- Describe your journey in science including earlier famous projects like Electric Sheep et al. What were the key points and things that keep you learning new technologies or pivot to new projects.

S- The long version: my journey started at Brown University developing fnord a generative GUI and language for mathematical research and teaching, especially calculus and differential geometry. Think curves and surfaces in 3D with sliders.  I was also working atIRIS, then I worked for Andries Van Dam in his graphics group and for Thomas Banchoff in the Math dept.  Back in 1988-1990 there was a phenomenal network of people to collaborate with and learn from.  That’s when I became interested in Open Source, initially through the Emacs text editor and LISP UI, which was the first project that I ever contributed back to.

I did my PhD research at CMU SCS, CS Dept.  Early on I had some fortuitous internships, one at SGI working on IRIS Explorer and the other in Tokyo at NTT-Data.  It was there, on an unused supercomputer, I generated the first Flames, what became the first Open Source artwork.  Later back in Pittsburgh I developed Bomb, an “interactive visual musical instrument” that got me into making projection installations and eventually VJing.  I was very lucky to have Peter Lee as my teacher and adviser, he helped me find my voice and also gave me plenty of rope.

My research at CMU culminated in a thesis on Meta-programing for media processing, ie using compilers and types to build low-latency, high-bandwidth systems that are still flexible and allow dynamic experimentation.  The thesis document was generated by a markup language implemented in Scheme that compiled and ran my research code, measured its performance results, and generated the graphs, and could generate LaTeX for typesetting, or HTML for the web.  That was published in 1997, all open source.

I graduated and went to San Francisco, and worked at Transmeta along with Linus Torvalds on a virtual microprocessor and another startup doing internet streaming media infrastructure (now it was 1999).  It was this startup/tech environment of the Bay Area, including Burning Man and the VJ scene that gave birth to the Electric Sheep.  It’s been evolving ever since.

Ao- How can we further increase the supply of Data Scientists . How would people in education find Beaker Notebook Useful
S- Improving UIs like Beaker‘s makes it easier for people to get started with data science.  And because Beaker has one UI for multiple languages, students can spend more time on the scientific and statistical concepts, and less time learning a new GUI for each new language.  Being a web application, Beaker can also be delivered as a service (Domino Data Lab does this already), which helps deal with config/install/os problems on uncontrolled student laptops.  So I hope data in education and also data in civic discourse will benefit and expand.
About
Scott Draves is the inventor of Fractal Flames[1] and the leader of the distributed computing project Electric Sheep.[2][3]  He is currently employed by Two Sigma to develop the Beaker Notebook.

Beaker is a notebook-style development environment for working interactively with large and complex datasets. Its plugin-based architecture allows you to switch between languages

Related-

 

How to use R and Python together

If you can have 31 flavours of Icecream, why can’t you have atleast two flavours for open source data science. R for the data visualization and statistical libraries, Python for machine learning and the production environment. As part of my research for my upcoming book ” Python for R users – A Data Science Approach”, here are some ways to use both Python and R

  1. rpy2 communication channel from Python to R. rpy2 is an interface to R running embedded in a Python process. The project is mature, stable, and widely used. A lucid example of using it is given here at A Slug’s Guide to Python Screenshot from 2015-12-07 09:49:24https://sites.google.com/site/aslugsguidetopython/data-analysis/pandas/calling-r-from-python .
  2. conda -Jupyter – You can use R Kernel from within Jupyter/iPython . You can see here https://www.continuum.io/conda-for-r and https://www.continuum.io/blog/developer/jupyter-and-conda-r  Screenshot from 2015-12-07 09:53:18It uses the R kernel for Jupyter at http://irkernel.github.io/   . Here is a tutorial I wrote in Jupyter but in Python alone Screenshot from 2015-12-07 09:58:36 http://nbviewer.ipython.org/gist/decisionstats/c1684daaeecf62dd4bf4
  3. Beaker Notebook – You can see Beaker from http://beakernotebook.com/ . This is a relatively new kind of software and allows you to mix Python and R within the same notebook (unlike Jupyter which allows you either a Python or a R kernel) . Here is a notebook I created https://pub.beakernotebook.com/#/publications/5657e715-bdaf-4787-99fc-a0d7f37c3e38 Beaker allows even JS, Scala and otehr languages within the same notebook so its heavily amazing as an Idea.  I also note that they are silver sponsors at http://user2016.org/ through their parent company https://www.twosigma.com/

Screenshot from 2015-11-27 09:56:34

Using multiple languages in data science is clearly an idea whose time has come. Tools like Jupyter, rpy2 and Beaker can also speeden up this exciting trend.  The customer should dictate the need for data science, and the need should dictate the software, the software should dictate which data scientist to choose or skill up. Right now, we choose data scientists and software first and then try and fit them to the project use case.

Have an amazing 2016 for data science from the DecisionStats team and I hope you liked us in 2015!

 

 

Python for R Users A Data Science Approach

Coming up in the new year, is my new book on enabling polyglotism in data science. It is called Python for R Users :A Data Science Approach by Wiley ( due in 2016).

It will basically expose the target reader ( a data scientist professional) to a small sub set of the Python language which is most pertinent to data science.

p4r

What the Internet does for people like me in developing countries

  1. It gives us access to the best of knowledge, teaching, experts for free
  2. It gives us unfettered entertainment- free music in Youtube and TV shows like Game of Thrones instead of waiting years for our government to approve it
  3. It allows us to criticize our leaders on blog,s Facebook, Twitter without getting censored by corrupt politicians and a corrupt media- Government nexus
  4. It allows us to keep in touch via Skype via Facebook to people far way without straining our purse
  5. It allows us to learn a lot without paying a lot

That is just me- an urban citizen in a relatively decent economy. The benefits to underprivileged humans is even more

Superpowers

India used to be a Superpower but we declined. China was a superpower then it declined. So did Britain. So did Soviet Russia. The United States remains the aging Rocky Balboa of the superpowers, but you can see some decline in influence compared to when Clinton was President.

What do superpowers do?

  • They invest a lot of money in arms and defence
  • They earn a lot of money from trade so they can invest it in arms
  • They put their own interest ahead the interest of their neighbours and competitors
  • They pretend to go to war if you hurt a single citizen, but they themselves do not do much when thousands of their citizens are mal-treated by pollution, by exploitative working conditions, by small arms and guns, by crime, by inequality

Ultimately I think Switzerland is the only superpower. Their superpower lies in not pretending to be super at all.

During trade and now climate negotiations, the past and the present and the future superpowers collide. The needs of the many are more important than the egos of a few politicians , the brilliance of their advisers and the theatrics of a few.

Does the planet need a CEO? Probably yes, and the United Nations has failed to be a superpower or any power at all. It is just a conference holding organization.

The greatest generation that won Word War 2 in the West and defeated Colonialism in the East was succeeded by the Baby Boomer generation that just boomed and consumed. The next generation will pay the price of the past few generations. The country that has the best care of the next generation for a healthy productive workforce for both economic and defence deployment will win the race to be the Superbpower. Thats not a typo. Stop being a superpower and start being a superb power.

In the meantime, I would rather see Matt Damon colonize Mars and Rocky Balbao teach boxing to the nest generation.