Interviews – Page 5 – DECISION STATS

Interview Anne Milley JMP

Here is an interview with Anne Milley,Sr Director, Analytic Strategy, JMP.

Ajay- Review – How was the year 2012 for Analytics in general and JMP in particular?
Anne- 2012 was great! Growing interest in analytics is evident—more analytics books, blogs, LinkedIn groups, conferences, training, capability, integration…. JMP had another good year of worldwide double-digit growth.

Ajay- Forecast- What is your forecast for analytics in terms of top 5 paradigms for 2013?
Anne- In an earlier blog, I had predicted we will continue to see more lively data and information visualizations—by that I mean more interactive and dynamic graphics for both data analysts and information consumers.
We will continue to hear about big data, data science and other trendy terms. As we amass more and more data histories, we can expect to see more innovations in time series visualization. I am excited by the growing interest we see in spatial and image analysis/visualization and hope those trends continue—especially more objective, data-driven image analysis in medicine! Perhaps not a forecast, but a strong desire, to see more people realize and benefit from the power of experimental design. We are pleased that more companies—most recently SiSoft—have integrated with JMP to make DOE a more seamless part of the design engineer’s workflow.

Ajay- Cloud- Cloud Computing seems to be the next computing generation. What are JMP plans for cloud computing?
Anne- With so much memory and compute power on the desktop, there is still plenty of action on PCs. That said, JMP is Citrix-certified and we do see interest in remote desktop virtualization, but we don’t support public clouds.

Ajay- Events- What are your plans for the International Year of Statistics at JMP?
Anne- We kicked off our Analytically Speaking webcast series this year with John Sall in recognition of the first-ever International Year of Statistics. We have a series of blog posts on our International Year of Statistics site that features a noteworthy statistician each month, and in keeping with the goals of Statistics2013, we are happy to:

increase awareness of statistics and why it’s essential,
encourage people to consider it as a profession and/or enhance their skills with more statistical knowledge, and
promote innovation in the sciences of probability and statistics.

Both JMP and SAS are doing a variety of other things to help celebrate statistics all year long!

Ajay- Education Training- How does JMP plan to leverage the MOOC paradigm (massive open online course) as offered by providers like Coursera etc.?
Anne- Thanks to you for posting this to the JMP Professional Network on LinkedIn, where there is some great discussion on this topic. The MOOC concept is wonderful—offering people the ability to invest in themselves, enhance their understanding on such a wide variety of topics, improve their communities…. Since more and more professors are teaching with JMP, it would be great to see courses on various areas of statistics (especially since this is the International Year of Statistics!) using JMP. JMP strives to remove complexity and drudgery from the analysis process so the analyst can stay in flow and focus on solving the problem at hand. For instance, the one-click bootstrap is a great example of something that should be promoted in an intro stats class. Imagine getting to appreciate the applied results and see the effects of sampling variability without having to know distribution theory. It’s good that people have options to enhance their skills—people can download a 30-day free trial of JMP and browse our learning library as well.

Ajay- Product- What are some of the exciting things JMP users and fans can look forward to in the next releases this year?
Anne- There are a number of enhancements and new capabilities planned for new releases of the JMP family of products, but you will have to wait to hear details…. OK, I’ll share a few! JMP Clinical 4.1 will have more sophisticated fraud detection. We are also excited about releasing version 11 of JMP and JMP Pro this September. JMP’s DOE capability is well-known, and we are pleased to offer a brand new class of experimental design—definitive screening designs. This innovation has already been recognized with The 2012 Statistics in Chemistry Award to Scott Allen of Novomer in collaboration with Bradley Jones in the JMP division of SAS. You will hear more about the new releases of JMP and JMP Pro at Discovery Summit in San Antonio—we are excited to have Nate Silver as our headliner!

About-

Anne Milley directs analytic strategy in JMP Product Marketing at SAS. Her ties to SAS began with bank failure prediction at FHLB Dallas. Using SAS continued at 7-Eleven Corporation in Strategic Planning. She has authored papers and served on committees for SAS Education conferences, KDD, and SIAM. In 2008, she completed a 5-month assignment at a UK bank. Milley completed her M.A. in Economics from Florida Atlantic University, did post-graduate work at RWTH Aachen, and is proficient in German.

JMP-

Introduced in 1989, JMP has grown into a family of statistical discovery products used worldwide in almost every industry. JMP is statistical discovery software that links dynamic data visualization with robust statistics, in memory and on the desktop. From its beginnings, JMP software has empowered its users by enabling interactive analytics on the desktop. JMP products continue to complement – and are often deployed with – analytics solutions that provide server-based business intelligence.

Interview – Naveen Gattu, COO and Co-Founder at Gramener #dataviz

Here is an interview with Naveen Gattu, COO and co-founder of Gramener ,one of the most happening data science companies.

Ajay- Describe the story so far for Gramener. What have been the key turning points ?

Naveen- All founders of Gramener are first generation entrepreneurs, started our careers with IBM were very successful in our corporate jobs with hefty pay packages, but always at the back of the mind can’t we work our ourselves and have FUN.

With this thought in mind 6 of us got together in 2010 to lay foundation for Gramener, with our consulting experience we wanted to get into business analytics , but soon we realized that there are lot many people who are doing great analytics but not an effective way of presentation, we wanted to establish niche for ourselves and create an offering to make “Data Consumption” easy and joyful.

Our significant milestone was Airtel Continue reading “Interview – Naveen Gattu, COO and Co-Founder at Gramener #dataviz”

Interview Jeff Allen Trestle Technology #rstats #rshiny

Here is an interview with Jeff Allen who works with R and the new package Shiny in his technology startup. We featured his RGL Demo in our list of Shiny Demos- here

Ajay- Describe how you started using R. What are some of the benefits you noticed on moving to R?

Jeff- I began using R in an internship while working on my undergraduate degree. I was provided with some unformatted R code and asked to modularize the code then wrap it up into an R package for distribution alongside a publication.

To be honest, as a Computer Science student with training more heavily emphasizing the big high-level languages, R took some getting used to for me. It wasn’t until after I concluded that initial project and began using R to do my own data analysis that I began to realize its potential and value. It was the first scripting language which really made interactive use appealing to me — the experience of exploring a dataset in R was unlike anything Continue reading “Interview Jeff Allen Trestle Technology #rstats #rshiny”

The making of a R startup Part 1 #rstats

Note- Decisionstats.com has done almost 105 interviews in the field of analytics, technology startups and thought leaders ( you can see them here http://goo.gl/m3l31). We have covered some of the R authors ( R for SAS and SPSS users, Data Mining using R, Machine Learning for Hackers) , and noted R package creators (ggplot2, RCommander, rattle GUI, forecast)

But what we truly enjoy is interviews with startups in R ecosystem , including founders of Revolution Analytics,Inference for R, RStudio, Cloudnumbers

The latest startup in the R ecosystem with a promising product is RApporter.net . It has actually been there for some time, but with the launch of their new product we ask them the trials and tribulations of creating an open source startup in the data science field.

This is part 1 of the interview with Gergely Daróczi, co-founder of the Rapporter project.

Ajay- Describe the journey of Rapporter till now, and your product plans for 2013.

Greg- The idea of Rapporter presented itself more then 3 years ago while giving statistics, SPSS and R courses at different Hungarian universities and also creating custom statistical reports for a number of companies for a living at the same time.
Long story short, the three Hungarian co-founder faced similar problems at both sectors: students, just like business clients, admired the capabilities of R and the wide variety of tools found on CRAN,but were not eager at all to get into learn how to use that.
So we tried to make up some plans how to let the non-R users also build on the resources of R, and we came up with the idea of an intuitive web-interface as an R front-end.

The real development of a helper R package (which later become “rapport”) started in the January of 2011 by Aleksandar Blagotić and me1 in our spare time and rather just for fun, as we had a dream about using “annotated statistical templates” in R after a few conversations on StackOverflow. We also worked on a front-end in the means of an Rserve driven PHP engine with MySQL – to be dropped and completely rewritten later after some trying experiences and serious benchmarking.

We have released “rapport” package to the public at the end of 2011 on GitHub, and after a few weeks on CRAN too. Despite the fact that we did our best with creating a decent documentation and also some live examples, we somehow forgot to spread the news of the new package to the R community, so “rapport” did not attract any serious attention.

Even so, our enthusiasm for annotated R “templates” did not wane as time passed, so we continued to work on “rapport” by adding new features and also Aleksandar started to fortify his Ruby on Rails skills. We also dropped Rserve with MySQL back-end, and introduced Jeffrey Horner’s awesome RApache with some NoSQL databases.
To be honest, this change resulted in a one-year delay of releasing Rapporter and no ends of headaches on our end, but in the long run, it was a really smart move after all, as we own an easily scalable and a highly available cluster of servers at the moment.

But back to 2012.

As “rapport” got too complex as time passed with newly added features, Aleksandar and I decided to split the package, which move gave birth to “pander”. At that time “knitr” got more and more familiar among R users, so it was a brave move to release “another” similar package, but the roots of “pander” were more then one year old, we used some custom methods not available in “knitr” (like
capturing the R object beside the printed output of chunks), we needed tweakable global options instead of chunk options and we really wanted to build on the power of Pandoc – just like before.

So we had a package for converting R objects to Pandoc’s markdown with a general S3 method, another package to automatically run that and also capture plots and images a brew-like document with various output formats – like pdf, docx, odt etc.
In the summer, while Aleksandar dealt with the web interface, I worked on some new features in our packages:
• automatic and robust caching of chunks with various options for performance reasons,
• automatically unifying “base”, “lattice” and “ggplot2” images to the same style with user options – like major/minor grid color, font family, color palette, margins etc.
• adding other global options to “pander”, to let our expected clients later personalize their
custom report style with a few clicks.

At the same time, we were searching for different options to prevent running malicious code in the parallel R sessions, which might compromise all our users’ sensitive data. Unfortunately no full blown solution existed at that time, and we really wanted to stand clear of running some Java based interpreters in our network.
So I started to create a parser for R commands, which was supposed to filter out malicious R commands before evaluation, and a handful flu got me some spare time to implement “sandboxR” with an open and live “hack my R server” demo, which ended up in a great challenge on my side, but proved to really work after all.
I also had a few conversations with Jeroen Ooms (the author of the awesome OpenCPU), who faced similar problems on his servers and was eager to prevent the issues with the help of AppArmor. The great news of “RAppArmor” did make “sandboxR” needless (as AppArmor just cannot regulate inner R calls), but we started to evaluate all user specified R commands in a separate hat, which allowed me to make “sanboxR” more permissive with black-filtered functions.
In the middle of the summer, I realized that we have an almost working web application with any number of R workers being able to serve tons of users based on the flexible NoSQL database back- ends, but we had no legal background to release such a service, nor had I any solid financial background to found one – moreover the Rapporter project already took huge amount from my family budget.

As I was against of letting some venture capital to dominate the project, and did not found any accelerator that would take on a project with a maturing, almost market-ready product, me and a few associates decided to found a UK company on our own and having confidence in the future and God.

So we founded Easystats Ltd, the company running rapporter.net, in July, and decided to release the first beta and pretty stable version of the application to the public at the end of September. At that time users could:
• upload and use text or SPSS sav data sets,
• specify more then 20 global options to be applied to all generated reports (like plot themes, table width, date format, decimal mark and number of digits, separators and copula in vectors etc.),
• create reports with the help of predefined statistical “templates”,
• “fork” (clone) any of our templates and modify without restriction, or create new statistical templates from scratch,
• edit the body or remove any part of the reports, resize images with the mouse or even with finger on touch-devices,
• and export reports to pdf, odt or docx formats.

A number of new features were introduced since then:

OpenBUGS integration with more permissive security profiles, users can create custom styles for the exported documents (in LaTeX, docx and odt format) to generate unique and possibly branded reports, to share public or even private reports with anyone without the need for registering on rapporter.net by a simple hyperlink, and to let our users to integrate their templates in any homepage, blog post or even HTML mail, so that let anyone use the power of R with a few clicks building on the knowledge of template authors and our reliable back-end.
Although 2 years ago I was pretty sure that this job would be finished in a few months and that we would possibly have a successful project in a year or two, now I am certain, that bunch of new features will make Rapporter more and more user-friendly, intuitive and extensible in the next few years.
Currently, we are working hard on a redesigned GUI with the help of a dedicated UX team at last (which was a really important structural change in the life of Rapporter, as we can really assign and split tasks now just like we dreamed of when the project was a two-men show), which is to be finished no later then the first quarter of the year. Beside design issues, this change would also result
in some new features, like ordering the templates, data sets and reports by popularity, rating or relevance for the currently active data set; and also letting users to alter the style of the resulting reports in a more seamless way.

The next planned tasks for 2013 include:
• a “data transformation” front-end, which would let users to rename and label variables in any uploaded data set, specify the level of measurement, recode/categorize or create new variables with the help of existing ones and/or any R functions,
• edit tables in reports on the fly (change the decimal mark, highlight some elements, rename columns and split tables to multiple pages with a simple click),
• a more robust API to let third-party users temporary upload data to be used in the analysis,
• option to use multiple data sets in a template and to let users merge or connect data online,
• and some top-secret surprises.

Beside the above tasks, which was made up by us, our team is really interested in any feedback from the users, which might change the above order or add new tasks with higher priority, so be sure to add your two cent on our support page.

And we will have to come up with some account plans with reasonable pricing in 2013 for the hosted service to let us cover the server fees and development expenses. But of course Rapporter will remain free for ever for users with basic needs (like analyzing data sets with only a few hundreds of cases) or anyone in the academic sector, and we also plan to provide an option to run Rapporter “off-site” on any Unix-like environment.

Ajay- What are some of the Big Data use cases I can do with Rapporter?

Greg- Although we have released Rapporter beta only a few months ago, we already heard some pretty promising use-cases from our (potential) clients.

But I must emphasize that at first we are not committed to deal with Big Data in the means of user contributed data sets with billions of cases, but rather concentrating on providing an intuitive and responsive way of analyzing traditional, survey-like data frames up to about 100.000 cases.

Anyway, to be on topic: a really promising project of Optimum Dosing Strategies has been using Rapporter’s API for a number of weeks even in 2012 to compute optimal doses for different kind of antibiotics based on Monte-Carlo simulation and Bayesian adaptive feedback among other methods.
This collaboration lets the ID-ODS team develop a powerful calculator with full-blown reports ready to be attached to medical records – without any special technical knowledge on their side, as we maintain the R engine and the integration part, they code in R. This results in pleased clients all over the world, which makes us happy too.

We really look forward to ship a number of educational templates to be used in real life at several (multilingual) universities from September 2013. These templates would let teachers show customizable and interactive reports to the students with any number of comments and narrative paragraphs, which statistical introductory modules would provide a free alternative to other desktop
software used in education.

In the next few months, a part of our team will focus on spatial analysis templates, which would mean that our users could not just map, but really analyze any of their spatially related data with a few clicks and clear parameters.

Another feature request of a client seems to be a really exciting idea. Currently, Google Analytics and other tracking services provide basic options to view, filter and export the historical data of websites, blogs etc.
As creating an interface between Rapporter and the tracking services to be able to fetch the most recent data is not beyond possibility any more with the help of existing API resources, so our clients could generate annotated usage reports of any specified period of time – without restrictions. Just to emphasize some potential add-ons: using the time-series R packages in the analysis or creating real- time “dashboards” with optional forecasts about live data.

Of course you could think of other kind of live or historical data instead of Google Analytics, as creating a template for e.g. transaction data or gas usage of a household could be addressed at any time, and please do not forget about the above referenced use-cases in the 3 rd question (“[…]Rapporter can help: […]”).

But wait: the beauty of Rapporter is that you could implement all of the above ideas by yourself in our system, even without any help from us.

Ajay- What are some of things that can be easily done with Rapporter than with your plain vanilla R?

Greg- Rapporter is basically developed for creating reproducible, literative and annotated statistical modules (a.k.a. “templates”), which means the passing a data set and the list of variables with some optional arguments would end up in a full-blown written report with automatically styled tables and charts.

So using Rapporter is like writing “Sweave” or “knitr” documents, but you write the template only once, and then apply that to any number of data sets with a simple click on an intuitive user interface.

Beside this major objective: as Rapporter is running in the cloud and sharing reports and templates (or even data sets) with collaborators or with anyone on the Internet is really easy, our users can post, share any R code for free and without restrictions or release the templates with specified license and/or fees in a secured environment.

This means that Rapporter can help:

scholars sharing scientific results or methods with reproducible and instantly available demo and/or dedicated implementation along with publications,
teachers to create self-explanatory statistical templates which would help the students internationalize the subject by practice,
any R developer to share a live and interactive demo of the implemented features of the functions with a few clicks,
businesses could use a statistical platform without restrictions for a reasonable monthly fee instead of expensive and non-portable statistical programs,
governments and national statistical offices to publicize census or other big data with a scientific and reliable analytic tool with annotated and clear reports while insuring the anonymity of the respondents by automatically applying custom methods (like data swapping, rounding, micro-aggregation, PRAM, adding noise etc.) to the tables and results, etc.

And of course, do not forget about one of our main objectives to let us open up the world of R to non-R users too with an intuitive, driving user interface.

(To be continued)-

About

Gergely Daróczi is co-ordinating the development of Rapporter and maintaining their R packages. Beside he tries to be active in some open-source projects and on StackOverflow, he is a PhD candidate in sociology and also a lecturer at Corvinus University of Budapest and Pázmány Péter Catholic University in Hungary

Rapporter is a web application helping you to create comprehensive, reliable statistical reports on any mobile device or PC, using an intuitive user interface.

The application builds on the power of R beside other technologies and intended to be used in any browser doing the heavy computations on the server side. Some might consider Rapporter as a customizable graphical user interface to R – running in the cloud.

Currently, Rapporter is under heavily development and only invited alpha testers can access the application. Please sign up for an invitation if you want to have an early-bird insight on Rapporter.

Interview Rob Kabacoff, Author Quick-R #rstats

Here is an interview with Rob Kabacoff, Ph.D, author and creator of the popular R reference website Quick-R (http://www.statmethods.net/)

Ajay- What are the reasons you started using R?

Rob- I had been using SAS and SPSS for many years, when I applied for a position that required a solid command of R programming. I had some experience using S in the early days and wanted to refresh my knowledge before the interview. I was very surprised to see how the language and platform had grown, and how powerful and comprehensive it had become in its new incarnation. It quickly became apparent that I would not be able to develop any kind of expertise in time for the interview. However, despite turning down the position, I become smitten with the language, and continue to use and study it to this day.

Ajay- What were your motivations in writing Quick R and designing your website

Rob- Although I was an experienced programmer and statistician, I found R a very difficult language to learn. The number of packages and functions available can feel overwhelming, and it can be hard to get handle on the language as a whole. I learn best by teaching, so I created Quick-R as a place where people who were familiar with statistics, but not R, could jump into the language rapidly. It started out as a simple cookbook and has expanded ever since.

Ajay- What has been the feedback to your website so far

Rob- The feedback has been amazing. I have received roughly 500 emails thanking me for the site, and there are 10,000+ unique visitors a day. A couple of years ago Manning Publishing asked me to write a book about R and Quick-R turned in “R in Action: Data Analysis and Graphics with R”. After only one year I am already writing a second edition (R changes fast!), but I still support Quick-R every day. Knowing how much it is used is incredibly gratifying.

Ajay- Name some consulting projects in which you used R for great effect? ( or real time case studies with confidential details suppressed)

Rob– I do a lot of research on global leadership. The goal is to understand how leaders in different countries approach the leadership role, what behaviors they rely on, what behaviors they expect from others, and what values they bring to the table.
Differences among leaders in different cultures can be enormous – and understanding them can reduce misunderstandings, conflicts, and tensions. Such research frequently entails comparing the leadership behaviors of business executives and government officials in dozens of countries on dozens (or hundreds) of variables. It can be very challenging to understand such complicated observational data, and communicate it a meaningful way to a nontechnical audience. R really excels at both model building and graphics. In particular, I rely on packages like relimpo to help identify the relative importance of variables in predicting leadership effectiveness, and graphics packages like ggplot2 to build plots that convey the results in easily digestible ways.

Ajay- Initiatives like coursera, and multiple free video lectures on the internet, and helpful websites like yours are helping introduce R to a broader than just a niche audience. How can we make learning statistics and tools more popular.

Rob- I love statistics, and actually think that it is becoming increasingly popular on its own. With the advent of big data, fast and powerful software, and the internet as a driving force, the field of statistics is finally becoming sexy. I am amazed at the number of jobs I see for data scientists of all types (analysts, programmers, modelers, data miners) listed in popular websites like Monster and Career Builder. I think that quantitatively oriented students will always gravitate to languages like R if there are practical books, videos, and websites that show real world applications. Once you see how something can be used, I think you are more willing to buckle down and learn the nitty-gritty details necessary to make it work. For people averse to programming, I think that easy to use GUIs become increasingly important. This is why IBM SPSS has done so well. RCommander and Deducer are good examples of GUIs that can help you to incorporate R into courses that do not include programming.

Ajay- How can we make statistics books more affordable to students while adequately compensating authors, including usage of web based tools.

Rob- Boy, that is a tough one. Quick-R is obviously free and I donate the time and expense it takes to keep it running because I want to contribute to the community. Writing is much harder than I ever imagined and the hundreds of hours it took to write R in Action were exhausting and painful. Even if I didn’t get royalties, I probably would still have written it, but I might not be doing a second edition now. To be honest, only a small portion of the income from traditionally published books go to authors. The rest goes to the publisher, and I can’t speak to costs or profit. To bring the cost down, we would have to reduce the cost to publishers, their profits, or find an alternative distribution model. One solution may be to have authors publish small texts (booklets) that are less time consuming to write and can be offered in PDF format for free or for a small fee. These can be practical use books, explanations for frequently misunderstood topics, or solutions to particular problems. Additionally, I have found that authors will frequently work for recognition (won’t we all?), as well as money. Rewarding authors with attention, opportunities to speak, teach, etc., may be very motivating for many such individuals. Perhaps we could create and promote more websites that aggregate donated online textbooks – giving aspiring authors an opportunity and an outlet for their writing, and an audience in the process.

About-

Rob is a statistical consultant and research methodologist for more than 25 years. His Ph.D. was originally in psychology.For the past 15 years he have been head of research for Management Research Group, a global HR development firm in Portland, Maine and Dublin, Ireland
Rob primarily study cross-cultural leadership and issues of workplace diversity. Before that, Kabacoff was a graduate school professor in Southern Florida for 10 years teaching multivariate statistics and statistical programming (and surprising, family therapy and adult psychopathology).

The book inspired by the Quick -R website is now available! It takes the material there and significantly expands upon it. If you are interested, you can get it here. Use promo code ria38 for a 38% discount

R in Action
Data Analysis and Graphics with R
Robert I. Kabacoff

August, 2011 | 472 pages
ISBN 9781935182399

Interview Rob J Hyndman Forecasting Expert #rstats (decisionstats.com)
Using Rapid Miner and R for Sports Analytics #rstats (decisionstats.com)
New Free Online Book by Rob Hyndman on Forecasting using #Rstats (decisionstats.com)
Interview John Myles White , Machine Learning for Hackers (decisionstats.com)

Interview Rob J Hyndman Forecasting Expert #rstats

Here is an interview with Prof Rob J Hyndman who has created many time series forecasting methods and authored books as well as R packages on the same.

Ajay -Describe your journey from being a student of science to a Professor. What were some key turning points along that journey?

Rob- I started a science honours degree at the University of Melbourne in 1985. By the end of 1985 I found myself simultaneously working as a statistical consultant (having completed all of one year of statistics courses!). For the next three years I studied mathematics, statistics and computer science at university, and tried to learn whatever I needed to in order to help my growing group of clients. Often we would cover things in classes that I’d already taught myself through my consulting work. That really set the trend for the rest of my career. I’ve always been an academic on the one hand, and a statistical consultant on the other. The consulting work has led me to learn a lot of things that I would not otherwise have come across, and has also encouraged me to focus on research problems that are of direct relevance to the clients I work with.

I never set out to be an academic. In fact, I thought that I would get a job in the business world as soon as I finished my degree. But once I completed the degree, I was offered a position as a statistical consultant within the University of Melbourne, helping researchers in various disciplines and doing some commercial work. After a year, I was getting bored doing only consulting, and I thought it would be interesting to do a PhD. I was lucky enough to be offered a generous scholarship which meant I was paid more to study than to continue working.

Again, I thought that I would probably go and get a job in the business world after I finished my PhD. But I finished it early and my scholarship was going to be cut off once I submitted my thesis. So instead, I offered to teach classes for free at the university and delayed submitting my thesis until the scholarship period ran out. That turned out to be a smart move because the university saw that I was a good teacher, and offered me a lecturing position starting immediately I submitted my thesis. So I sort of fell into an academic career.

I’ve kept up the consulting work part-time because it is interesting, and it gives me a little extra money. But I’ve also stayed an academic because I love the freedom to be able to work on anything that takes my fancy.

Ajay- Describe your upcoming book on Forecasting.

Rob- My first textbook on forecasting (with Makridakis and Wheelwright) was written a few years after I finished my PhD. It has been very popular, but it costs a lot of money (about $140 on Amazon). I estimate that I get about $1 for every book sold. The rest goes to the publisher (Wiley) and all they do is print, market and distribute it. I even typeset the whole thing myself and they print directly from the files I provided. It is now about 15 years since the book was written and it badly needs updating. I had a choice of writing a new edition with Wiley or doing something completely new. I decided to do a new one, largely because I didn’t want a publisher to make a lot of money out of students using my hard work.

It seems to me that students try to avoid buying textbooks and will search around looking for suitable online material instead. Often the online material is of very low quality and contains many errors.

As I wasn’t making much money on my textbook, and the facilities now exist to make online publishing very easy, I decided to try a publishing experiment. So my new textbook will be online and completely free. So far it is about 2/3 completed and is available at http://otexts.com/fpp/. I am hoping that my co-author (George Athanasopoulos) and I will finish it off before the end of 2012.

The book is intended to provide a comprehensive introduction to forecasting methods. We don’t attempt to discuss the theory much, but provide enough information for people to use the methods in practice. It is tied to the forecast package in R, and we provide code to show how to use the various forecasting methods.

The idea of online textbooks makes a lot of sense. They are continuously updated so if we find a mistake we fix it immediately. Also, we can add new sections, or update parts of the book, as required rather than waiting for a new edition to come out. We can also add richer content including video, dynamic graphics, etc.

For readers that want a print edition, we will be aiming to produce a print version of the book every year (available via Amazon).

I like the idea so much I’m trying to set up a new publishing platform (otexts.com) to enable other authors to do the same sort of thing. It is taking longer than I would like to make that happen, but probably next year we should have something ready for other authors to use.

Ajay- How can we make textbooks cheaper for students as well as compensate authors fairly

Rob- Well free is definitely cheaper, and there are a few businesses trying to make free online textbooks a reality. Apart from my own efforts, http://www.flatworldknowledge.com/ is producing a lot of free textbooks. And textbookrevolution.org is another great resource.

With otexts.com, we will compensate authors in two ways. First, the print versions of a book will be sold (although at a vastly cheaper rate than other commercial publishers). The royalties on print sales will be split 50/50 with the authors. Second, we plan to have some features of each book available for subscription only (e.g., solutions to exercises, some multimedia content, etc.). Again, the subscription fees will be split 50/50 with the authors.

Ajay- Suppose a person who used to use forecasting software from another company decides to switch to R. How easy and lucid do you think the current documentation on R website for business analytics practitioners such as these – in the corporate world.

Rob- The documentation on the R website is not very good for newcomers, but there are a lot of other R resources now available. One of the best introductions is Matloff’s “The Art of R Programming”. Provided someone has done some programming before (e.g., VBA, python or java), learning R is a breeze. The people who have trouble are those who have only ever used menu interfaces such as Excel. Then they are not only learning R, but learning to think about computing in a different way from what they are used to, and that can be tricky. However, it is well worth it. Once you know how to code, you can do so much more. I wish some basic programming was part of every business and statistics degree.

If you are working in a particular area, then it is often best to find a book that uses R in that discipline. For example, if you want to do forecasting, you can use my book (otexts.com/fpp/). Or if you are using R for data visualization, get hold of Hadley Wickham’s ggplot2 book.

Ajay- In a long and storied career- What is the best forecast you ever made ? and the worst?

Rob- Actually, my best work is not so much in making forecasts as in developing new forecasting methodology. I’m very proud of my forecasting models for electricity demand which are now used for all long-term planning of electricity capacity in Australia (see http://robjhyndman.com/papers/peak-electricity-demand/ for the details). Also, my methods for population forecasting (http://robjhyndman.com/papers/stochastic-population-forecasts/ ) are pretty good (in my opinion!). These methods are now used by some national governments (but not Australia!) for their official population forecasts.

Of course, I’ve made some bad forecasts, but usually when I’ve tried to do more than is reasonable given the available data. One of my earliest consulting jobs involved forecasting the sales for a large car manufacturer. They wanted forecasts for the next fifteen years using less than ten years of historical data. I should have refused as it is unreasonable to forecast that far ahead using so little data. But I was young and naive and wanted the work. So I did the forecasts, and they were clearly outside the company’s (reasonable) expectations, and they then refused to pay me. Lesson learned. It’s better to refuse work than do it poorly.

Probably the biggest impact I’ve had is in helping the Australian government forecast the national health budget. In 2001 and 2002, they had underestimated health expenditure by nearly $1 billion in each year which is a lot of money to have to find, even for a national government. I was invited to assist them in developing a new forecasting method, which I did. The new method has forecast errors of the order of plus or minus $50 million which is much more manageable. The method I developed for them was the basis of the ETS models discussed in my 2008 book on exponential smoothing (www.exponentialsmoothing.net)

. And now anyone can use the method with the ets() function in the forecast package for R.

About-

Rob J Hyndman is Professor of Statistics in the Department of Econometrics and Business Statistics at Monash University and Director of the Monash University Business & Economic Forecasting Unit. He is also Editor-in-Chief of the International Journal of Forecasting and a Director of the International Institute of Forecasters. Rob is the author of over 100 research papers in statistical science. In 2007, he received the Moran medal from the Australian Academy of Science for his contributions to statistical research, especially in the area of statistical forecasting. For 25 years, Rob has maintained an active consulting practice, assisting hundreds of companies and organizations. His recent consulting work has involved forecasting electricity demand, tourism demand, the Australian government health budget and case volume at a US call centre.

Interview John Myles White , Machine Learning for Hackers

Here is an interview with one of the younger researchers and rock stars of the R Project, John Myles White, co-author of Machine Learning for Hackers.

Ajay- What inspired you guys to write Machine Learning for Hackers. What has been the public response to the book. Are you planning to write a second edition or a next book?

John-We decided to write Machine Learning for Hackers because there were so many people interested in learning more about Machine Learning who found the standard textbooks a little difficult to understand, either because they lacked the mathematical background expected of readers or because it wasn’t clear how to translate the mathematical definitions in those books into usable programs. Most Machine Learning books are written for audiences who will not only be using Machine Learning techniques in their applied work, but also actively inventing new Machine Learning algorithms. The amount of information needed to do both can be daunting, because, as one friend pointed out, it’s similar to insisting that everyone learn how to build a compiler before they can start to program. For most people, it’s better to let them try out programming and get a taste for it before you teach them about the nuts and bolts of compiler design. If they like programming, they can delve into the details later.

We once said that Machine Learning for Hackers is supposed to be a chemistry set for Machine Learning and I still think that’s the right description: it’s meant to get readers excited about Machine Learning and hopefully expose them to enough ideas and tools that they can start to explore on their own more effectively. It’s like a warmup for standard academic books like Bishop’s.

The public response to the book has been phenomenal. It’s been amazing to see how many people have bought the book and how many people have told us they found it helpful. Even friends with substantial expertise in statistics have said they’ve found a few nuggets of new information in the book, especially regarding text analysis and social network analysis — topics that Drew and I spend a lot of time thinking about, but are not thoroughly covered in standard statistics and Machine Learning undergraduate curricula.

I hope we write a second edition. It was our first book and we learned a ton about how to write at length from the experience. I’m about to announce later this week that I’m writing a second book, which will be a very short eBook for O’Reilly. Stay tuned for details.

Ajay- What are the key things that a potential reader can learn from this book?

John- We cover most of the nuts and bolts of introductory statistics in our book: summary statistics, regression and classification using linear and logistic regression, PCA and k-Nearest Neighbors. We also cover topics that are less well known, but are as important: density plots vs. histograms, regularization, cross-validation, MDS, social network analysis and SVM’s. I hope a reader walks away from the book having a feel for what different basic algorithms do and why they work for some problems and not others. I also hope we do just a little to shift a future generation of modeling culture towards regularization and cross-validation.

Ajay- Describe your journey as a science student up till your Phd. What are you current research interests and what initiatives have you done with them?

John-As an undergraduate I studied math and neuroscience. I then took some time off and came back to do a Ph.D. in psychology, focusing on mathematical modeling of both the brain and behavior. There’s a rich tradition of machine learning and statistics in psychology, so I got increasingly interested in ML methods during my years as a grad student. I’m about to finish my Ph.D. this year. My research interests all fall under one heading: decision theory. I want to understand both how people make decisions (which is what psychology teaches us) and how they should make decisions (which is what statistics and ML teach us). My thesis is focused on how people make decisions when there are both short-term and long-term consequences to be considered. For non-psychologists, the classic example is probably the explore-exploit dilemma. I’ve been working to import more of the main ideas from stats and ML into psychology for modeling how real people handle that trade-off. For psychologists, the classic example is the Marshmallow experiment. Most of my research work has focused on the latter: what makes us patient and how can we measure patience?

Ajay- How can academia and private sector solve the shortage of trained data scientists (assuming there is one)?

John- There’s definitely a shortage of trained data scientists: most companies are finding it difficult to hire someone with the real chops needed to do useful work with Big Data. The skill set required to be useful at a company like Facebook or Twitter is much more advanced than many people realize, so I think it will be some time until there are undergraduates coming out with the right stuff. But there’s huge demand, so I’m sure the market will clear sooner or later.

The changes that are required in academia to prepare students for this kind of work are pretty numerous, but the most obvious required change is that quantitative people need to be learning how to program properly, which is rare in academia, even in many CS departments. Writing one-off programs that no one will ever have to reuse and that only work on toy data sets doesn’t prepare you for working with huge amounts of messy data that exhibit shifting patterns. If you need to learn how to program seriously before you can do useful work, you’re not very valuable to companies who need employees that can hit the ground running. The companies that have done best in building up data teams, like LinkedIn, have learned to train people as they come in since the proper training isn’t typically available outside those companies.

Of course, on the flipside, the people who do know how to program well need to start learning more about theory and need to start to have a better grasp of basic mathematical models like linear and logistic regressions. Lots of CS students seem not to enjoy their theory classes, but theory really does prepare you for thinking about what you can learn from data. You may not use automata theory if you work at Foursquare, but you will need to be able to reason carefully and analytically. Doing math is just like lifting weights: if you’re not good at it right now, you just need to dig in and get yourself in shape.

About-

John Myles White is a Phd Student in Ph.D. student in the Princeton Psychology Department, where he studies human decision-making both theoretically and experimentally. Along with the political scientist Drew Conway, he is the author of a book published by O’Reilly Media entitled “Machine Learning for Hackers”, which is meant to introduce experienced programmers to the machine learning toolkit. He is also working with Mark Hansenon a book for laypeople about exploratory data analysis.John is the lead maintainer for several R packages, including ProjectTemplate and log4r.

(TIL he has played in several rock bands!)

—–

You can read more in his own words at his blog at http://www.johnmyleswhite.com/about/

He can be contacted via social media at Google Plus at https://plus.google.com/109658960610931658914 or twitter at twitter.com/johnmyleswhite/

Please share:

Please share:

Please share:

Please share:

Related articles

Please share:

Please share:

Please share: