DecisionStats Interview Scott Draves Beaker Notebook

As part of my research for Python for R Users: A Data Science Approach (Wiley 2016) Here is an interview with Scott Draves,  awesome software artist and developer at Beaker Notebook. Beaker Notebook allows you to use multiple languages together in same interface seamlessly ( like Python, R, JS , Scala)

Ajay Ohri (AO) -What inspired you to make BeakerNotebook? What are some of the design decisions you took? How does it compare to Jupyter Notebook and what do you see the product roadmap ahead for Beaker (with current limitations if any)

Scott Draves (S) – Two Sigma uses a variety of tools. Some have been developed internally over many years, some are open source like Linux, Java, R, and IPython, and some are commercial such as MATLAB and Excel.  Beaker is inspired by all these systems, and many more. It’s a new synthesis on new infrastructure.  The design favors ease of use and high quality. Beaker is about working automatically with one click, and also having total programmability.

Jupyter (which was called IPython when we started) is definitely one of our inspirations.  If fact Beaker is compatible with it and when you run Python in Beaker, it’s talking to your existing IPython backend.  Beaker uses nginx as a reverse proxy to make a collection of backends (one for each language, plus Beaker‘s core server) appear as a single application.

Our roadmap is published on the wiki: https://github.com/twosigma/beaker-notebook/wiki/Roadmap

Screenshot from 2015-12-07 10:17:18

AO- To pass objects from Python to R I need rpy2. How does Beaker simplify this process. For example if I want to use auto.arima from forecast package for a Panda Time Series how would I do it

S- Beaker‘s autotranslation is simpler because it focuses on the data.  That means your R and Python code co-exist in independent cells, each in its native syntax, but they can communicate via the Beaker object that is reflected to exist in all languages.  By contrast with rpy2, you access R through a Python syntax.  For example instead of

robjects.StrVector([‘abc’, ‘def’])

in Beaker you can just say

c(‘abc’, ‘def’)

As for auto.arima, let me first note that by coincidence, the #1 google hit for [auto.arima] is a web site that uses a Flame as its banner, ie was made with an algorithm and I open sourced in the early 90s (see below).

Screen Shot 2015-12-06 at 1.49.17 PM
But anyway, I took the example from the bottom of that page and made it work with a random Pandas data frame.  Here’s the Beaker notebook: https://pub.beakernotebook.com/#/publications/56648fcc-2e8e-41a6-aa4a-1249ee39023c
One improvement in the works is replacing beaker::get(‘df’) with beaker$df.
Screenshot from 2015-12-07 10:13:27

AO- Which industries or businesses would most benefit from ability to use Python, R, JS, Scala etc in same notebook

 

S- Any industry that works rapidly with data in a quantitative and scientific style benefits.  Autotranslation increases your options and makes experimentation and mash-ups easy.  That would include traditional sciences such as genetics and physics, and also business applications in finance, data mining, and machine learning.  But users come from all over. Beaker is at its heart a general purpose tool for exploring with code and data, so we believe the benefits could be widespread.
AO- How would Beaker Notebook be useful for Big Data Analytics and Data Science
S- Beaker is great for exploring data sets with code, visualization, and tables.  And you can turn your research into applications, without recoding, because Beaker notebooks are repeatable, reproducible, and remixable.
AO- Describe your journey in science including earlier famous projects like Electric Sheep et al. What were the key points and things that keep you learning new technologies or pivot to new projects.

S- The long version: my journey started at Brown University developing fnord a generative GUI and language for mathematical research and teaching, especially calculus and differential geometry. Think curves and surfaces in 3D with sliders.  I was also working atIRIS, then I worked for Andries Van Dam in his graphics group and for Thomas Banchoff in the Math dept.  Back in 1988-1990 there was a phenomenal network of people to collaborate with and learn from.  That’s when I became interested in Open Source, initially through the Emacs text editor and LISP UI, which was the first project that I ever contributed back to.

I did my PhD research at CMU SCS, CS Dept.  Early on I had some fortuitous internships, one at SGI working on IRIS Explorer and the other in Tokyo at NTT-Data.  It was there, on an unused supercomputer, I generated the first Flames, what became the first Open Source artwork.  Later back in Pittsburgh I developed Bomb, an “interactive visual musical instrument” that got me into making projection installations and eventually VJing.  I was very lucky to have Peter Lee as my teacher and adviser, he helped me find my voice and also gave me plenty of rope.

My research at CMU culminated in a thesis on Meta-programing for media processing, ie using compilers and types to build low-latency, high-bandwidth systems that are still flexible and allow dynamic experimentation.  The thesis document was generated by a markup language implemented in Scheme that compiled and ran my research code, measured its performance results, and generated the graphs, and could generate LaTeX for typesetting, or HTML for the web.  That was published in 1997, all open source.

I graduated and went to San Francisco, and worked at Transmeta along with Linus Torvalds on a virtual microprocessor and another startup doing internet streaming media infrastructure (now it was 1999).  It was this startup/tech environment of the Bay Area, including Burning Man and the VJ scene that gave birth to the Electric Sheep.  It’s been evolving ever since.

Ao- How can we further increase the supply of Data Scientists . How would people in education find Beaker Notebook Useful
S- Improving UIs like Beaker‘s makes it easier for people to get started with data science.  And because Beaker has one UI for multiple languages, students can spend more time on the scientific and statistical concepts, and less time learning a new GUI for each new language.  Being a web application, Beaker can also be delivered as a service (Domino Data Lab does this already), which helps deal with config/install/os problems on uncontrolled student laptops.  So I hope data in education and also data in civic discourse will benefit and expand.
About
Scott Draves is the inventor of Fractal Flames[1] and the leader of the distributed computing project Electric Sheep.[2][3]  He is currently employed by Two Sigma to develop the Beaker Notebook.

Beaker is a notebook-style development environment for working interactively with large and complex datasets. Its plugin-based architecture allows you to switch between languages

Related-

 

Author: Ajay Ohri

http://about.me/ajayohri

One thought on “DecisionStats Interview Scott Draves Beaker Notebook”

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: