Home » Interviews

Category Archives: Interviews

Writing on APIs for Programmable Web

I have been writing free lance on APIs for Programmable Web. Here is an updated list of the articles, many of these would be of interest to analytics users. Note- some of these are interviews and they are in bold. Note to regular readers: I keep updating this list , and at each updation bring it to the front page, then allowing the blog postings to slide it down!

Scoreoid Aims to Gamify the World Using APIs January 27th, 2014

Plot.ly’s Plot to Visualize More Data January 22nd, 2014

LumenData’s Acquisition of Algorithms.io is a Win-Win January 8th, 2014

Yactraq API Sees Huge Growth in 2013  January 6th, 2014

Scrape.it Describes a Better Way to Extract Data December 20th, 2013

Exclusive Interview: App Store Analytics API December 4th, 2013

APIs Enter 3d Printing Industry November 29th, 2013

PW Interview: José Luis Martinez of Textalytics November 6th, 2013

PW Interview Simon Chan PredictionIO November 5th, 2013

PW Interview: Scott Gimpel Founder and CEO FantasyData.com October 23rd, 2013

PW Interview Brandon Levy, cofounder and CEO of Stitch Labs October 8th, 2013

PW Interview: Jolo Balbin Co-Founder Text Teaser  September 18th, 2013

PW Interview:Bob Bickel CoFounder Redline13 July 29th, 2013

PW Interview : Brandon Wirtz CTO Stremor.com   July 4th, 2013

PW Interview: Andy Bartley, CEO Algorithms.io  June 4th, 2013

PW Interview: Francisco J Martin, CEO BigML.com 2013/05/30

PW Interview: Tal Rotbart Founder- CTO, SpringSense 2013/05/28

PW Interview: Jeh Daruwala CEO Yactraq API, Behavorial Targeting for videos 2013/05/13

PW Interview: Michael Schonfeld of Dwolla API on Innovation Meeting the Payment Web  2013/05/02

PW Interview: Stephen Balaban of Lamda Labs on the Face Recognition API  2013/04/29

PW Interview: Amber Feng, Stripe API, The Payment Web 2013/04/24

PW Interview: Greg Lamp and Austin Ogilvie of Yhat on Shipping Predictive Models via API   2013/04/22

Google Mirror API documentation is open for developers   2013/04/18

PW Interview: Ricky Robinett, Ordr.in API, Ordering Food meets API    2013/04/16

PW Interview: Jacob Perkins, Text Processing API, NLP meets API   2013/04/10

Amazon EC2 On Demand Windows Instances -Prices reduced by 20%  2013/04/08

Amazon S3 API Requests prices slashed by half  2013/04/02

PW Interview: Stuart Battersby, Chatterbox API, Machine Learning meets Social 2013/04/02

PW Interview: Karthik Ram, rOpenSci, Wrapping all science API2013/03/20

Viralheat Human Intent API- To buy or not to buy 2013/03/13

Interview Tammer Kamel CEO and Founder Quandl 2013/03/07

YHatHQ API: Calling Hosted Statistical Models 2013/03/04

Quandl API: A Wikipedia for Numerical Data 2013/02/25

Amazon Redshift API is out of limited preview and available! 2013/02/18

Windows Azure Media Services REST API 2013/02/14

Data Science Toolkit Wraps Many Data Services in One API 2013/02/11

Diving into Codeacademy’s API Lessons 2013/01/31

Google APIs finetuning Cloud Storage JSON API 2013/01/29

Ergast API Puts Car Racing Fans in the Driver’s Seat 2012/12/05
Springer APIs- Fostering Innovation via API Contests 2012/11/20
Statistically programming the web – Shiny,HttR and RevoDeploy API 2012/11/19
Google Cloud SQL API- Bigger ,Faster and now Free 2012/11/12
A Look at the Web’s Most Popular API -Google Maps API 2012/10/09
Cloud Storage APIs for the next generation Enterprise 2012/09/26
Last.fm API: Sultan of Musical APIs 2012/09/12
Socrata Data API: Keeping Government Open 2012/08/29
BigML API Gets Bigger 2012/08/22
Bing APIs: the Empire Strikes Back 2012/08/15
Google Cloud SQL: Relational Database on the Cloud 2012/08/13
Google BigQuery API Makes Big Data Analytics Easy 2012/08/05
Your Store in The Cloud -Google Cloud Storage API 2012/08/01
Predict the future with Google Prediction API 2012/07/30
The Romney vs Obama API 2012/07/27

Interview: Linkurious aims to simplify graph databases

linkurious-239x60-trHere is an interview with a really interesting startup Linkurious and it’s co-founders Sebastien Heymann( also co-founder of Gephi) and Jean Villedieu. They are hoping to making graph databases easier to use and thus spur on their usage.

Decisionstats (DS)-  How did you come up about setting across your startup

Linkurious (L) -A lot of businesses are struggling to understand the connections within their data. Who are the persons connected to this financial transaction? What happens to the telecommunication network if this antenna fails? Who is the most influential person in this community? There are a lot of questions that involve a deep understanding of graphs. Most business intelligence and data visualization tools are not adapted for these questions because they have a hard time handling queries about connections and because their interface is not suited for network visualization.
I noticed this because I co-founded a graph visualization software called Gephi a few years ago. It quickly became a reference and the software was downloaded 250k times last year. It really helped people understand the connections in their data in a new way.
In 2013, this success inspired me to found Linkurious. The idea is to provide a solution that’s easy to use to democratize graph visualization.

What does it mean?
We want to help people understand the connection in their data. Linkurious is really easy to use and optimized for the exploration of graphs.
You can install it in minutes. Then, it gives you a search interface through which you can query the data. What’s special about our software is that the result of your search is represented as a graph that you can explore dynamically. Contrary to Gephi or other graph visualization tools, Linkurious only shows you a limited subset of your data and not the whole graph. The goal here is to focus on what the user is looking for and help him find an answer faster.
In order to do that, Linkurious also comes with the ability to filter nodes or color them according to their properties. This way, it’s much faster to understand the data.

DS- How do you support packages from Python , and R and other languages like Julia? What is Linkurious based on?

L- Linkurious is largely based on a stack of open-source technologies. We rely on Neo4j, the leading graph database to store and access the data. Neo4j can handle really large datasets, this means that our users can access the information much faster than with a traditional SQL database. Neo4j also comes with a query language that allows “smart search”, locating nodes and relationships based on rules like “what’s the shortest path between these 2 nodes?” or “who among the close network of this person has been to London and loves sushi”. That’s the kind of things that Facebook delivers via Graph Search and it’s exciting to see these technologies applied in the business world.
We also use Nodejs, Sigmajs and ElasticSearch.

DS-  Name  a few case studies where enterprises have used graphical analysis for great benefit?

L- There really are a lot of use cases for graph visualization and we are learning about it almost every day. There are well know applications that are connected to security. For example, graph databases are great to identify suspicious patterns across a variety of data sources. People using false identities to defraud bank tend to share addresses, phone numbers or names. Without graphs, it’s hard to see how they are connected and they tend to remain undetected until it’s too late. Graph visualization can be triggered by alert systems. Then, analysts can investigate the data and decide whether the alert should be escalated or not.
In the telecom industry, you can use graph to map your network and identify weak links, assess the potential of a failure (i.e. impact analysis). Graph visualization helps understand these information and better manage the network.

We also have clients in the logistics, health or consulting industry. Every data oriented industry needs data visualization tools, and graphs offer powerful ways to ask new questions and reveal unforeseen information.

DS-What are some of the challenges with creating, sustaining and maintaining a cutting edge technology startup in Europe and France

L- There are a lot of challenges with creating and sustaining a challenges. I think the bigger ones are not necessarily location-related. The main issue is to build something people want. It’s certainly been our biggest challenge. We’ve used a lean startup approach to ship a prototype of our product as fast as we could. The first version of Linkurious was buggy and didn’t much interest from customers. But we did get feedback from a few people who really liked it. Since then, we’ve been focusing on them to develop our vision of Linkurious. We are pleased with the results, I think we are on the right path but it’s really a journey.
As for the more location-related challenges, I think France usually gets a bad rep for not being start-up friendly. Our experience has been quite the contrary. There are administrative annoyances but we also benefit from generous benefits, access to great engineers and a burgeoning startup eco-system!


The mission of Linurio.us is  to help users access and navigate graph databases in a simple manner so they can make sense of their data.

Some of their interesting solutions are here.

Interview Anne Milley JMP

 An interview with noted analytics thought leader Anne Milley from JMP. Anne talks of statistics, cloud computing, culture of JMP, globalization and analytics in general.

DecisionStats(DS) How was 2013 as a year for statistics in general and JMP in particular?  

Anne Milley-  (AM) I’d say the first-ever  International Year of Statistics (Statistics2013) was a great success! We hope to carry some of that momentum into 2014. We are fans of the UK’s 10-year GetStats campaign—they are in the third year, and it seems to be going really well. JMP had a very good year as well, with worldwide double-digit growth again. We are pleased to have launched version 11 of JMP and JMP Pro last year at our annual Discovery Summit user conference.

DS-  Any cloud computing plans for JMP?

AM- We are exploring options, but with memory and storage still so incredibly cheap on the desktop, the responsiveness of local, in-memory computing on Macs or Windows operating systems remains compelling. John Sall said it best in a blog post he wrote in December.  It is our intention to have a public cloud offering in 2014.

DS- Describe the company culture and environment in the JMP division. Any global plans?

AM- John Sall’s passion to bring interactive, intuitive data visualization and analysis on the desktop continues. There is a strong commitment in the JMP division to speeding the statistical discovery process and making it fun. It’s a powerfully motivating factor to work in an environment where that passion and purpose are shared, and where we get to interact with customers who are also passionate users of JMP, many of whom use JMP and SAS together.

While a majority of JMP personnel are in Cary, North Carolina, almost half the staff are contributing from other states and countries. JMP is sold everywhere we have SAS offices (in 59 countries). JMP has localized versions in seven languages, and we keep getting requests for more.

DS- You have been a SAS Institute veteran for 15 years now. What are some of the ups and downs you remember as milestones in the field of analytics?

AM- The most exciting milestone is that analytics has been getting more attention in the last few years, thanks to a combination of factors. Analytics is a very inclusive term (statistics, optimization, data mining, machine learning, data science, etc.), but statistics is the main discipline we draw on when we are trying to make informed decisions in the face of uncertainty. In the early days of data mining, there was a tension between statisticians and data miners/machine learners, but we now have a richer set of methods (with more solid theoretic underpinnings) with which to analyze data and make better decisions. We have better ways to automate parts of the model-building process as well, which is important with ever-wider data. In the early days of data mining, I remember many reacting with “Why spend so much time dredging through opportunistically collected data, when statistics has so much more to offer, like design of experiments?” There is still some merit to that, and maybe we will see the pendulum swing back to doing more with information-rich data.

DS- What are your top three forecasts for analytics technology in 2014?

AM- My perspective may be different than others on what’s trending in analytics technology, but as we try to do more with more data, here are my top three picks:

  • We will continue to innovate new ways to visualize data and statistical output to capitalize on our high visual bandwidth. (Examples of some of our recent innovations can be found on the JMP Blog.)

  • We will continue to see innovative ways to create more analytic bandwidth and democratize analytics—for example, more quickly build and deploy analytic applications and interactive visualizations for others to use.

  • We will see more integration with commonly used analytical tools and infrastructure to help analysts be more productive.

DS-  How do you maintain work-life balance?

AM- I enjoy what I do and the great people I work with; that is part of what motivates me each day and is added to the long list of things for which I’m grateful. Outside of work, I enjoy spending time with family, regular exercise, organic gardening and other creative pursuits.

DS-As a senior technology management person working for the past 15 years, do you think technology is a better employer for women employees than it was in the 1990s? What steps can be done to increase this?

AM- I certainly see more support for women in technology with various women-in-technology organizations and programs around the world. And I also see more encouragement for girls and young women to get more exposure to science, technology, engineering, math, and statistics and consider the career options knowledge of these areas could bring. But there is more to do. I would like to add statistics to the STEM list explicitly since many still consider statistics a branch of math and don’t appreciate that statistics is the science/language of science. (Florence Nightingale said that statistics is “the most important science in the whole world.”) This year, we will see the first Women in Statistics Conference “enticing, elevating, and empowering careers of women in statistics.” There are several organizations and programs out there advocating for women in science, engineering, statistics and math, which is great. The resources such organizations provide for networking, mentoring, career development and making role models more visible are important in raising awareness on what the impediments are and how to overcome them. We should all read Sheryl Sandberg’s re-release of Lean In for Graduates (due out in April). Thank you for asking this question!


Anne oversees product management and analytic strategy in JMP Product Marketing. She is a contributing faculty member for the International Institute of Analytics.

2013 Thank You Note

I would like to write a thank you note to  some of the people who helped make Decisionstats.com possible . We had a total of 150,644 views this year.For that, I have to thank you dear readers for putting up with me- it is now our seventh year.

Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec Total
13,940 12,153 12,948 13,371 12,778  12,085  12,894  11,934  9,914  14,764  12,907  10,956  150,644

I would like to thank Chris  (of Mashape) for helping me with some of the interviews I wrote here .I did 26 interviews this year for Programmable Web and a total of 30+ articles including the interviews in 2013.

Of course- we have now reached 116 excellent interviews on Decisionstats.com alone ( see http://goo.gl/V6UsCG )I would like to thank each one of the interviewees who took precious time to fill out the questions.

Sponsors- I would like to thank Dr Eric Siegel ( individually as an author and as founder chair of www.pawcon.com ) , Nadja and Ingo (for Rapid-Miner) , Dr Jonathan ( for Datamind) , Chris M (for Statace.com ) , Gergely ( Author) and many more during all these six years who have kept us afloat and the servers warm in these days of cold reflection, including Gregory (of KDNuggets.com) and erstwhile AsterData founders.

Training Partners- I would like to thank Lovleen Bhatia ( of Edureka  for giving me the opportunity to make http://www.edureka.in/r-for-analytics which now has 1721 learners as per http://www.edureka.in/)

I would also specially say Thank you to Jigsaw Academy for giving me the opportunity to create
the first affordable and quality R course in Asia http://analyticstraining.com/2013/jigsaw-completes-training-of-300-students-on-r/

These training courses including those by Datamind and Coursera remain a formidable and affordable alternative to many others catching up in the analytics education game in India ( an issue I wrote here)

Each and Everyone of my students (past and present) and Everyone in the #rstats  and SAS-L community, including people who may have been left out.

Thank you sir, for helping me and Decisionstats.com !

Wish each one of you a very happy and Joyous Happy New Year and a great and prosperous 2014!

Karl Rexer Interview on the state of Analytics

To cap off a wonderful year, we have decided to interview Karl Rexer , founder of http://www.rexeranalytics.com/ and of the data mining survey that is considered the Industry benchmark for the state of the industry in analytics.

Ajay: Describe the history behind doing the survey , how you came up with the idea and what all players do you think survey the data mining and statistical software market apart from you

 Karl: Since the early 2000s I’ve been involved on the organizing and review committees for several data mining conferences and workshops. Early in the 2000s, in the hallways at these conferences I heard many analytic practitioners discussing and comparing their algorithms, data sources, challenges, tools, etc. Since we were already conducting online surveys for several of our clients, and my network of data miners is pretty large, I realized that we could easily do a survey of data miners, and share the results with the data mining community. I saw that the gap was there (and the interest), and we could help fill it. It was a way to give back to the data mining community, and also to raise awareness in the marketplace for my company, Rexer Analytics. So in 2007 we launched the first Data Miner Survey. In the first year, 314 data miners participated, and it’s just grown from there. In each of the last two surveys, over 1200 people participated. The interest we’ve seen in our research summary reports has also been astounding – we get thousands of requests for the summary reports each year. Overall, this just confirms what we originally thought: both inside the industry and beyond, people are hungry for information about data mining.

Are there other surveys and reviews of analytic professionals and the analytic marketplace? Sure. And there’s room for a variety of methodologies and perspectives. Forester and Gartner produce several reports that cover the analytic marketplace – they largely focus on software evaluations and IT trends. There are also surveys of CIOs and IT professionals that sometimes cover analytic topics. James Taylor (Decision Management Solutions) conducted an interesting study this year of Predictive Analytics in the Cloud. And of course, there are also the KDnuggets single-question polls that provide a pulse on people’s views of topical issues.

Ajay: Over the years- what broad trends have you seen in the survey in terms of paradigms- name your top 5 insights over these years

Karl: Well, I can’t think of a fifth one, but I’ve got four key findings and trends we’ve seen over the years we’ve been doing the Data Miner Surveys:

  1. The dramatic rise of open-source data mining tools, especially R. Since 2010, R has been the most-used data mining tool. And in 2013, 70% of data miners report using R. R is frequently used along with other tools, but we also see an increasing number of data miners selecting R as their primary tool.
  2. Data miners consistently report that regression, decision trees, and cluster analysis are the key algorithms they turn to. Each of the surveys, from 2007 through 2013, has shown this same core triad of algorithms.
  3. The challenges data miners face are also consistent: Across multiple years, the #1 challenge data miners report has been “dirty data.”. The other top challenges are “explaining data mining to others” and “difficult access to data”. In response to the 2010 survey, data miners described their best practices in overcoming these three key challenges. A summary of their ideas is available on our website here: http://www.rexeranalytics.com/Overcoming_Challenges.html. And three linked “challenge” pages contain almost 200 verbatim best practice ideas collected from survey respondents.
  4. We also see that there is excitement among analytic professionals, high job satisfaction, and room for more and better analytics. People report that the number of analytic projects is increasing, and the size of analytic teams is increasing too. But still there’s room for much wider and more sophisticated use of analytics – only a minority of data miners consider their companies to be analytically sophisticated.

 Ajay: What percentage of people are now doing analytics on the cloud, on mobile, tablet , versus desktop

Karl: In the past few years we’ve seen a doubling in the percent of people who report doing some of their analytics using cloud environments. It’s still the minority of data miners, but it’s grown from 7% in 2010 to 10% in 2011, and 19% reporting using cloud environments in 2013.

Ajay:Your survey is free. How does it help your consulting practice?

Karl: Our main motivation for doing the Data Miner Survey is to contribute to the data mining community. We don’t want to charge a fee for the summary reports, because we want to get the information into as many people’s hands as possible. And we want people to feel free to send the report on to their friends and colleagues.

However, the Data Miner Survey does also help Rexer Analytics. It helps to raise the visibility of our company. It increases the traffic and links to our website, and therefore helps our Google rankings. And it is a great conversation starter.

Ajay: Name some statistics on how popular your survey has become over time- in terms of people filling the survey , and people reading the survey

Karl: In 2007 when we launched the first Data Miner Survey, 314 data miners participated, and it’s grown nicely from there. In each of the last two surveys, over 1200 people participated. The interest we’ve seen in our research summary reports has also been growing at a dramatic rate – recently we’ve been getting thousands of requests for the summary reports each year. Additionally, we have been unveiling the highlights of the surveys with a presentation at the Fall Predictive Analytics World conferences, and it is always a popular talk.

But the most gratifying aspects about the expanded interest in our Data Miner Survey are two things:

  1. The great conversations that the Data Miner Survey has initiated. I have wonderful conversations with people by phone, email and at conferences and at colleges about the findings, the trends, and about all the great ideas people have for new and exciting ways that they want to apply analytics in their domains – everything from human resource planning to cancer research, and customer retention to fraud detection. And many people have contributed ideas for new questions or topics that we have incorporated into the survey.
  2. Seeing that people in the data mining community find the survey results useful. Many students and young people entering the field have told us the summary reports provide a great overview of the field and emerging trends. And many software vendors have told us that the survey helps them better understand the needs and preferences of hands-on data mining practitioners. I’m often surprised to see new people and places that are reading and appreciating our survey. We get emails from all corners of the globe, asking questions about the survey, or asking to share it with others. Sometime last year after receiving a question from an academic researcher in Asia, I decided to check Google Scholar to see who is citing the Data Miner Survey in their books and published papers. The list was long. And the list of online news stories, blogs and other mentions of the Data Miner Survey was even longer. I started a list of citations, with links back to the places that are citing the Data Miner Survey – you can look at the list here: http://www.rexeranalytics.com/Data_Miner_Survey_Citations.html – there are over 100 places citing our research, and the list includes 15 languages. But even more surprising was finding that someone had created a Wikipedia entry about the Data Miner Surveys. I made a couple small edits, but then I stopped. The accepted rule in the Wikipedia community is to not edit things that one has a personal interest in. However, I want to encourage any Wikipedia authors out there to go and help update https://en.wikipedia.org/wiki/Rexer%27s_Annual_Data_Miner_Survey.

 Ajay -What do you think are the top 3 insightful charts from your 2013 Report

Karl-  OK, it’s tough for me to pick only 3.  I think that you should pick the three that you think are the most insightful, and then blog about them and the reasons you think they’re important.

 But if you want me to pick 3, then here are three good ones:
— R Usage graph on page 16 
Screenshot from 2013-12-26 06:37:34
— Algorithm graph on page 36  
Screenshot from 2013-12-26 06:39:10
— The pair of graphs on page 19 that show that there’s still a lot of room for improvement
Happy new year!
(Ajay- You can see the wonderful report at http://www.rexeranalytics.com/ especially  the collection of links in the top right corner of the  home page that cite this survey)

Interview Christian Mladenov CEO StatAce Excellent and Hot #rstats StartUp

Here is an interview with Christian Mladenov, CEO of Statace , a hot startup in cloud based data science and statistical computing.39c1c29

Ajay Ohri (AO)- What is the difference between using R by StatAce and using R by RStudio on a R Studio server hosted on Amazon EC2 

Christian Mladenov (CM)- There are a few ways in which I think StatAce is better:

  • You do not need the technical skills to set up a server. You can instead start straight away at the click of a button.

  • You can save the full results for later reference. With an RStudio server you need to manually save and organize the text output and the graphics.

  • We are aiming to develop a visual interface for all the standard stuff. Then you will not need to know R at all.

  • We are developing features for collaboration, so that you can access and track changes to data, scripts and results in a team. With an RStudio server, you manage commits yourself, and Git is not suitable for large data files.

AO- How do you aim to differentiate yourself from other providers of R based software including Revolution, RStudio, Rapporter and even Oracle R Enterprise

CM- We aim to build a scalable, collaborative and easy to use environment. Pretty much everything else in the R ecosystem is lacking one, if not two of these. Most of the GUIs lack a visual way of doing the standard analyses. The ones that have it (e.g. Deducer) have a rather poor usability. Collaboration tools are hardly built in. RStudio has Git integration, but you need to set it up yourself, and you cannot really track large source data in Git.

Revolution Analytics have great technology, but you need to know R and you need to know how to maintain servers for large scale work. It is not very collaborative and can become quite expensive.

Rapporter is great for generating reports, but it is not very interactive – editing templates is a bit cumbersome if you just need to run a few commands. I think it wants to be the place to go to after you have finalized the development of the R code, so that you can share it.  Right now, I also do not see the scalability.

With Oracle R Enterprise you again need to know R. It is a targeted at large enterprises and I imagine it is quite expensive, considering it only works with Oracle’s database. For that you need an IT team. Screenshot from 2013-11-18 21:31:08

AO- How do you see the space for using R on a cloud?

CM- I think this is an area that has not received enough quality attention – there are some great efforts (e.g. ElasticR), but they are targeted at experienced R users. I see a few factors that facilitate the migration to the cloud:

  • Statisticians collaborate more and more, which means they need to have a place to share data, scripts and results.

  • The number of devices people use is increasing, and now frequently includes a tablet. Having things accessible through the web gives more freedom.

  • More and more data lives on servers. This is both because it is generated there (e.g. click streams) and because it is too big to fit on a user’s PC (e.g. raw DNA data). Using it where it already is prevents slow download/upload.

  • Centralizing data, scripts and results improves compliance (everybody knows where it is), reproducibility and reliability (it is easily backed up).

For me, having R to the cloud is a great opportunity.

AO-  What are some of the key technical challenges you currently face and are seeking to solve for R based cloud solutions

CM- Our main challenge is CPU use, since cloud servers typically have multiple slow cores and R is mostly single-threaded. We have yet to fully address that and are actively following the projects that aim to improve R’s interpreter – pqR, Renjin, Riposte, etc. One option is to move to bare metal servers, but then we will lose a lot of flexibility.

Another challenge is multi-server processing. This is also an area of progress where we have do not yet have a stable solution.

AO- What are some of the advantages and disadvantages of being a Europe based tech startup vis a vis a San Fransisco based tech startup

CM-In Eastern Europe at least, you can live quite cheaply, therefore you can better focus on the product and the customers. In the US you need to spend a lot of time courting investors.

Eastern Europe also has a lot of technical talent – it is not that difficult or expensive to hire experienced engineers.

The disadvantages are many, and I think they out-weigh the advantages:

  • Capital is scarce, especially after the seed stage. This means startups either have to focus on profit which limits their ability to execute a grander vision or they need to move to the US which wastes a lot of time and resources.

  • There is limited access to customers, partners, mentors and advisors. Most of the startup innovation happens in the US and its users prefer to deal with local companies.

  • The environment in Europe is not as supportive in terms of events, media coverage, and even social acceptance. In many countries founders are viewed with a bit of suspicion, and failure frequently means the end to one’s credibility. Screenshot from 2013-11-18 21:30:46

AO- What advice would you give to aspiring data scientists

CM-Use open-source. R, Julia, Octave and the others are seeing a level of innovation that the commercial solutions just cannot match. They are very flexible and versatile, and if you need something specific, you should learn some Python and do it yourself.

Keep things reproducible, or at some point you will get lost. This includes a version control system.

Be active in the community. While books are great, sharing and seeking advice will improve your skills much faster.

Focus more on “why” you do something and “what” you want to achieve. Only then get technical about “how” you want to do it. Use a good IDE that facilitates your work and allows you to do the simple things fast. You know, like StatAce :)

AO- Describe your career journey from Student to CEO

CM-During my bachelor studies I worked as a software developer and customer intelligence analyst. This gave me a lot of perspective on software and data.

After graduating I got a job where I coordinated processes and led projects. This is where I discovered the importance of listening to customers, planning well in advance, and having good data to base decisions on.

In my master studies, it was my statistics-heavy thesis that made me think “why is there not a place where I can easily use the power of R on a machine with a lot of RAM?” This is when the idea for StatAce was born.


About StatAce-

Bulgarian StatAce is the winner of betapitch | global, which was held in Berlin on 6 July (read more about it here). The  team, driven by the lack of software for low, student budgets, came up with the idea of building “Google docs for professional statisticians” and eventually took home the first prize of the startup competition.

Interview Joseph Eapen CEO RMInsights #rstats #shiny

Here is an interview with Joseph Eapen, CEO of RMInsights   an exciting data science company that have been using R for their work in providing decision support for the Entertainment Industry (do not confuse them with rminsight- which is almost the same url but without the s)

I found some of the work in applied data science cool enough to request an interview and they were gracious to respond at short notice.

Joseph Eapen

Ajay Ohri (AO)- What are the innovative steps , products services and initiatives that you have been executing at rminsights.net

Joseph Eapen (JE)- Cinema: RMI has launched Cinema Audience Measurement with focus on Audience Appreciation. As you know, film making is an expensive and time-consuming exercise; to add to it, the audience sit through a movie, with great expectation, hoping it is worth their time, effort and money. Even though Box office collections is a yardstick of movie revenues, it is still far from any audience appreciation measures. CAM through http://screenratings.org provides a scientific tool that will aid you with movie appreciation scores and help you opine to aid others in choosing a movie to watch this week.

Television: Apart from CAM, RMI focuses on Data Analytics, especially forecasting techniques using R and Shiny and has developed a TV ratings forecaster as a Service.

Data Collection: Designs Mobile based Custom Surveys using ODK (OpenDataKit –Build, Collect, Aggregate)

AO- What are some of the applications and products that you have been developing using R language and R Shiny Applications

JE-  TV Ratings Forecaster as a Service is the product we have developed using R and Shiny. It uses Professor Rob J Hyndman’s Forecast package that provides methods and tools for displaying and analyzing univariate time series forecasts including exponential smoothing; using the Holt-Winters HW approach.

The user has to simply upload a csv file containing historical TV ratings of any TV channel(s), usually 48 time-segments for each day, decide on the forecast horizon and the system predicts and plots the forecast.

TV Ratings Forecaster as a Service

AO- What is your vision for research and analytics going forward ?

JE- We are all at an interesting juncture, almost everything we do is captured at a transactional level and the so called Bigdata will be accessible for analysis. So in future, I see that we will need research to bridge gaps in this transactional data, so a large role in decision making will be played by analytics more than research (as we know it now). Both will co-exists, but data mining and analytics will take center stage.

AO- Describe your career journey from students day to CEO

JE- After I finished my engineering (Electronics & Telecommunication) in Mumbai, I went to Dubai in search for a job. Within 10 days, I saw a job vacancy in the local newspaper for a Project Engineer at a leading Market Research company – PARC (A Gallup affiliate). I applied and after 2 rounds of interview, landed the job. The company had just signed a JV with AGB Italia (an Television Audience Measurement company) to conduct peoplemeter based TAM in the Gulf region, starting with UAE. The JV was called AGB Gulf. During the expansion phase of the project I was send to Milan to test and initiate the purchase of peoplemeters for the expansion, and trained at their headquarters in Switzerland. Soon I was promoted as Head of AGB Gulf, and ran it for 10 years.. During this time I led the service through an successful audit by the Industry.

Following this I was instrumental in the launch of TGI (Single Source from BMRB) in the Gulf and LEVANT markets, again through a JV between PARC and BMRB in UK. The JV was called TGI Arabia.

This was the first 13 years of my career. Post that I joined as Head of Research & Development at MediaEdge:CIA (A WPP Company) in Dubai. After 3 years, I returned to Mumbai and joined aMap (An overnight Television Ratings agency, competing fiercely with TAM Media Research – A Nielsen/Kantar Company) as Director and promoted to CEO within 11 months.

After 3 years with aMap, I was appointed as CEO of MRUC (An industry body that issues IRS – world’s largest readership study, in India)

Post MRUC, I joined RMI as CEO… and the journey continues…

AO- What advice would you give to aspiring data scientists

JE- As much as, there will be large amount of data to mine and lot of techniques available to make sense of it all, it may get overwhelming and we should not lose track of the purpose of mining, that is, application of the learning and understanding to enhance our life and the world around us. It should not be – Exciting to Mine and Good to Know… It should be way beyond that.. totally purpose driven and put to use, aggressively

AO- What do you think are the criterion that companies should take into factor while outsourcing research and analytics (apart from cost)

JE- Apart from cost, the key factor should be turn-around speed and accountability. Not only should they be fast & meticulous, but how much they can stand behind their work and see that it serves the purpose, it was commissioned for, in the first place. So you see it goes beyond ‘just doing a good job’

About RMInsights 

We have studied human nature and behavior for more than 20 years. Our reputation for delivering relevant, timely, and visionary solutions on what people around the world think and feel is the cornerstone of the organization. We employ many of the world’s leading scientists in management, economics, psychology, and sociology. We study market conditions in local, regional, or national areas to examine potential of a product or service. We help companies understand what products people want, who will buy them, and at what price.

Joseph Eapen



Get every new post delivered to your Inbox.

Join 735 other followers

%d bloggers like this: