Interview Christian Mladenov CEO StatAce Excellent and Hot #rstats StartUp

Here is an interview with Christian Mladenov, CEO of Statace , a hot startup in cloud based data science and statistical computing.39c1c29

Ajay Ohri (AO)- What is the difference between using R by StatAce and using R by RStudio on a R Studio server hosted on Amazon EC2 

Christian Mladenov (CM)- There are a few ways in which I think StatAce is better:

  • You do not need the technical skills to set up a server. You can instead start straight away at the click of a button.

  • You can save the full results for later reference. With an RStudio server you need to manually save and organize the text output and the graphics.

  • We are aiming to develop a visual interface for all the standard stuff. Then you will not need to know R at all.

  • We are developing features for collaboration, so that you can access and track changes to data, scripts and results in a team. With an RStudio server, you manage commits yourself, and Git is not suitable for large data files.

AO- How do you aim to differentiate yourself from other providers of R based software including Revolution, RStudio, Rapporter and even Oracle R Enterprise

CM- We aim to build a scalable, collaborative and easy to use environment. Pretty much everything else in the R ecosystem is lacking one, if not two of these. Most of the GUIs lack a visual way of doing the standard analyses. The ones that have it (e.g. Deducer) have a rather poor usability. Collaboration tools are hardly built in. RStudio has Git integration, but you need to set it up yourself, and you cannot really track large source data in Git.

Revolution Analytics have great technology, but you need to know R and you need to know how to maintain servers for large scale work. It is not very collaborative and can become quite expensive.

Rapporter is great for generating reports, but it is not very interactive – editing templates is a bit cumbersome if you just need to run a few commands. I think it wants to be the place to go to after you have finalized the development of the R code, so that you can share it.  Right now, I also do not see the scalability.

With Oracle R Enterprise you again need to know R. It is a targeted at large enterprises and I imagine it is quite expensive, considering it only works with Oracle’s database. For that you need an IT team. Screenshot from 2013-11-18 21:31:08

AO- How do you see the space for using R on a cloud?

CM- I think this is an area that has not received enough quality attention – there are some great efforts (e.g. ElasticR), but they are targeted at experienced R users. I see a few factors that facilitate the migration to the cloud:

  • Statisticians collaborate more and more, which means they need to have a place to share data, scripts and results.

  • The number of devices people use is increasing, and now frequently includes a tablet. Having things accessible through the web gives more freedom.

  • More and more data lives on servers. This is both because it is generated there (e.g. click streams) and because it is too big to fit on a user’s PC (e.g. raw DNA data). Using it where it already is prevents slow download/upload.

  • Centralizing data, scripts and results improves compliance (everybody knows where it is), reproducibility and reliability (it is easily backed up).

For me, having R to the cloud is a great opportunity.

AO-  What are some of the key technical challenges you currently face and are seeking to solve for R based cloud solutions

CM- Our main challenge is CPU use, since cloud servers typically have multiple slow cores and R is mostly single-threaded. We have yet to fully address that and are actively following the projects that aim to improve R’s interpreter – pqR, Renjin, Riposte, etc. One option is to move to bare metal servers, but then we will lose a lot of flexibility.

Another challenge is multi-server processing. This is also an area of progress where we have do not yet have a stable solution.

AO- What are some of the advantages and disadvantages of being a Europe based tech startup vis a vis a San Fransisco based tech startup

CM-In Eastern Europe at least, you can live quite cheaply, therefore you can better focus on the product and the customers. In the US you need to spend a lot of time courting investors.

Eastern Europe also has a lot of technical talent – it is not that difficult or expensive to hire experienced engineers.

The disadvantages are many, and I think they out-weigh the advantages:

  • Capital is scarce, especially after the seed stage. This means startups either have to focus on profit which limits their ability to execute a grander vision or they need to move to the US which wastes a lot of time and resources.

  • There is limited access to customers, partners, mentors and advisors. Most of the startup innovation happens in the US and its users prefer to deal with local companies.

  • The environment in Europe is not as supportive in terms of events, media coverage, and even social acceptance. In many countries founders are viewed with a bit of suspicion, and failure frequently means the end to one’s credibility. Screenshot from 2013-11-18 21:30:46

AO- What advice would you give to aspiring data scientists

CM-Use open-source. R, Julia, Octave and the others are seeing a level of innovation that the commercial solutions just cannot match. They are very flexible and versatile, and if you need something specific, you should learn some Python and do it yourself.

Keep things reproducible, or at some point you will get lost. This includes a version control system.

Be active in the community. While books are great, sharing and seeking advice will improve your skills much faster.

Focus more on “why” you do something and “what” you want to achieve. Only then get technical about “how” you want to do it. Use a good IDE that facilitates your work and allows you to do the simple things fast. You know, like StatAce 🙂

AO- Describe your career journey from Student to CEO

CM-During my bachelor studies I worked as a software developer and customer intelligence analyst. This gave me a lot of perspective on software and data.

After graduating I got a job where I coordinated processes and led projects. This is where I discovered the importance of listening to customers, planning well in advance, and having good data to base decisions on.

In my master studies, it was my statistics-heavy thesis that made me think “why is there not a place where I can easily use the power of R on a machine with a lot of RAM?” This is when the idea for StatAce was born.

statacebetapitch

About StatAce-

Bulgarian StatAce is the winner of betapitch | global, which was held in Berlin on 6 July (read more about it here). The  team, driven by the lack of software for low, student budgets, came up with the idea of building “Google docs for professional statisticians” and eventually took home the first prize of the startup competition.

Tails -an OS for Privacy

Just came across Tails.

https://tails.boum.org/

amnesianoun:
forgetfulness; loss of long-term memory.

incognitoadjective & adverb:
(of a person) having one’s true identity concealed.

Screenshot from 2013-11-17 21:44:21

Tails is a live system that aims to preserve your privacy and anonymity. It helps you to use the Internet anonymously and circumvent censorship almost anywhere you go and on any computer but leaving no trace unless you ask it to explicitly.

It is a complete operating system designed to be used from a DVD, USB stick, or SD card independently of the computer’s original operating system. It is Free Software and based on Debian GNU/Linux.

Tails comes with several built-in applications pre-configured with security in mind: web browser, instant messaging client, email client, office suite, image and sound editor, etc.

 

Movie Review- Thor 2

This is Hollywood at it’s intellectual best. Natalie Portman, geeky queen of hearts ( since V for Vendetta and Star Wars 1-3) reunites with beefy actor Hemsworth (he of the Star Trek cameo ) , fellow Oscar winner Anthony ” The Hannibal” Odin Hopkins, and a full array of CGI to bring the old Norse God some real new meaning. Natalie is beginning to look a bit frail though.

Very entertaining and very light on brain movie. Unfortunately for Thor, Loki steals the show every time. Loki is a much better actor too and his presence tells us this will not be the last we see of feuding brothers

Oh yes, Asgard is lovely in this time of the year.

Thor 2 Second Trailer

 

RapidMiner takes things to the next level

I have watched Rapid Miner for quite some years including the R -extension, interview with founders , one of the first  marketplace for algorithms (or extensions to its statistical software) and use in sports analytics  has been much in the news lately.

They got funded, revamped the website , changed the name from Rapid-I to Rapid Miner and are now announcing version 6 of their flagship software soon.

http://www.zdnet.com/rapid-i-gets-funded-re-brands-as-rapidminer-7000022757/

 A well-kept secret of the Analytics/Data Mining world may get some of the spotlight now, with a cool $5M in its pocket.

a successful $5M Series A funding round, with participation from European firms Earlybird Venture Capital and Open Ocean Capital (the latter firm having a strong pedigree from the team behind the MySQL relational database).

It has easily been the first open source statistical tool with Visual Programming ( something R is still yet to have despite efforts by RedR, Analytic Flow et al) and more importantly has a huge stack of enterprise clients.

Screenshot from 2013-11-15 18:38:20

http://rapidminer.com/products/rapidminer-studio/

RapidMiner 6 will have brand-new templates for churn reduction, sentiment analysis, predictive maintenance and direct marketing.  A data analysis toolbox has never been more user-friendly or more powerful.

But best of all- they get a much easier training academy in place, and I am personally going to finally master it ( even though I have played a bit with it before

I do hope they make a MOOC (since the software is open source and free to download – how about some very easy to do self learning online tutorials)!

http://rapidminer.com/learning/training/

Introduction to Data Mining and Predictive Analytics with RapidMiner Studio and Server, December 3rd and 4th.
This course is a two-day introduction to the foundations of data mining, business analytics, and RapidMiner software. Participants will gain a complete understanding of how RapidMiner Studio and RapidMiner Server work and are used.

This course is the perfect preparation for the Image Mining training course.

Foundations of image processing, analysis and mining with the “Multimedia Mining-Image” (MUMI-Image) extension, December 5th and 6th.
This course is a two-day training on the foundations of image processing, analysis and mining with the “Multimedia Mining – Image” (MUMI-Image) extension for RapidMiner. After this training course, participants will have a complete understanding of how image mining analysis can be performed within RapidMiner Studio and Server, combining image processing techniques with the available data mining methods and data processing capabilities. Practical exercises ensure that the participants will be able to perform their own image analysis at the end of the class.

Screenshot from 2013-11-15 18:23:54

Blogger Disclosure- Rapid Miner has been a sponsor of Decisionstats.com for several years . I also like the software a lot!)

SAS Institute continues to lose training revenue as WPS clones use “language of SAS” loophole

Apparently, in Asia companies have been able to offer and advertise SAS Training , by inserting the language of SAS cleverly, hiding the disclaimers in the website map maze, an reassuring students that they can get jobs even in bonafide SAS Institute licenses client locations by defacto learning “the SAS language” without paying for the expensive SAS licensing.

This marks a double blow for the Institute , as one one hand WPS licensing erodes its margin by competitive discounting. On the other hand the lucrative SAS Language Training (and publishing) market is decimated in emerging market economies by SAS language clones.

I write this in dismay as I was one of the original authors of this article which was even referred to in the WPS judgement

http://en.wikipedia.org/wiki/SAS_language ( the edit history is even more fascinating if you want to see it!!)

Students, if you wanna learn SAS or language of SAS – do refer to this.

http://support.sas.com/training/

apologizes

Anything else is a legal grey area compounded by the curious  shyness of SAS legal to defend a 40 year old venerated analytics brand, by ceding ground through out the world and across domains

A simple analysis of Web logs in SAS Online Doc would prove how this is exploited even by sellers of SAS language clone software.

As the difference in SAS and WPS pricing is compounded on server licenses, companies in Asia have found the best thing to do is get a WPS server license and in fact offer it for training (or analysis) like a time shared cloud solution to multiple customers at the same time

It is an interesting thing to watch- because SAS Institute remains one of the lost holdouts on the West Coast to the Stanford mafia.

unrelated picture – a famous Bollywood movie

Jewel Thief mp3 songs

 

Interview Joseph Eapen CEO RMInsights #rstats #shiny

Here is an interview with Joseph Eapen, CEO of RMInsights   an exciting data science company that have been using R for their work in providing decision support for the Entertainment Industry (do not confuse them with rminsight- which is almost the same url but without the s)

I found some of the work in applied data science cool enough to request an interview and they were gracious to respond at short notice.

Joseph Eapen

Ajay Ohri (AO)- What are the innovative steps , products services and initiatives that you have been executing at rminsights.net

Joseph Eapen (JE)– Cinema: RMI has launched Cinema Audience Measurement with focus on Audience Appreciation. As you know, film making is an expensive and time-consuming exercise; to add to it, the audience sit through a movie, with great expectation, hoping it is worth their time, effort and money. Even though Box office collections is a yardstick of movie revenues, it is still far from any audience appreciation measures. CAM through http://screenratings.org provides a scientific tool that will aid you with movie appreciation scores and help you opine to aid others in choosing a movie to watch this week.

Television: Apart from CAM, RMI focuses on Data Analytics, especially forecasting techniques using R and Shiny and has developed a TV ratings forecaster as a Service.

Data Collection: Designs Mobile based Custom Surveys using ODK (OpenDataKit –Build, Collect, Aggregate)

AO- What are some of the applications and products that you have been developing using R language and R Shiny Applications

JE-  TV Ratings Forecaster as a Service is the product we have developed using R and Shiny. It uses Professor Rob J Hyndman’s Forecast package that provides methods and tools for displaying and analyzing univariate time series forecasts including exponential smoothing; using the Holt-Winters HW approach.

The user has to simply upload a csv file containing historical TV ratings of any TV channel(s), usually 48 time-segments for each day, decide on the forecast horizon and the system predicts and plots the forecast.

TV Ratings Forecaster as a Service

AO- What is your vision for research and analytics going forward ?

JE- We are all at an interesting juncture, almost everything we do is captured at a transactional level and the so called Bigdata will be accessible for analysis. So in future, I see that we will need research to bridge gaps in this transactional data, so a large role in decision making will be played by analytics more than research (as we know it now). Both will co-exists, but data mining and analytics will take center stage.

AO- Describe your career journey from students day to CEO

JE- After I finished my engineering (Electronics & Telecommunication) in Mumbai, I went to Dubai in search for a job. Within 10 days, I saw a job vacancy in the local newspaper for a Project Engineer at a leading Market Research company – PARC (A Gallup affiliate). I applied and after 2 rounds of interview, landed the job. The company had just signed a JV with AGB Italia (an Television Audience Measurement company) to conduct peoplemeter based TAM in the Gulf region, starting with UAE. The JV was called AGB Gulf. During the expansion phase of the project I was send to Milan to test and initiate the purchase of peoplemeters for the expansion, and trained at their headquarters in Switzerland. Soon I was promoted as Head of AGB Gulf, and ran it for 10 years.. During this time I led the service through an successful audit by the Industry.

Following this I was instrumental in the launch of TGI (Single Source from BMRB) in the Gulf and LEVANT markets, again through a JV between PARC and BMRB in UK. The JV was called TGI Arabia.

This was the first 13 years of my career. Post that I joined as Head of Research & Development at MediaEdge:CIA (A WPP Company) in Dubai. After 3 years, I returned to Mumbai and joined aMap (An overnight Television Ratings agency, competing fiercely with TAM Media Research – A Nielsen/Kantar Company) as Director and promoted to CEO within 11 months.

After 3 years with aMap, I was appointed as CEO of MRUC (An industry body that issues IRS – world’s largest readership study, in India)

Post MRUC, I joined RMI as CEO… and the journey continues…

AO- What advice would you give to aspiring data scientists

JE- As much as, there will be large amount of data to mine and lot of techniques available to make sense of it all, it may get overwhelming and we should not lose track of the purpose of mining, that is, application of the learning and understanding to enhance our life and the world around us. It should not be – Exciting to Mine and Good to Know… It should be way beyond that.. totally purpose driven and put to use, aggressively

AO- What do you think are the criterion that companies should take into factor while outsourcing research and analytics (apart from cost)

JE- Apart from cost, the key factor should be turn-around speed and accountability. Not only should they be fast & meticulous, but how much they can stand behind their work and see that it serves the purpose, it was commissioned for, in the first place. So you see it goes beyond ‘just doing a good job’

About RMInsights 

We have studied human nature and behavior for more than 20 years. Our reputation for delivering relevant, timely, and visionary solutions on what people around the world think and feel is the cornerstone of the organization. We employ many of the world’s leading scientists in management, economics, psychology, and sociology. We study market conditions in local, regional, or national areas to examine potential of a product or service. We help companies understand what products people want, who will buy them, and at what price.

Joseph Eapen
CEO

http://www.rminsights.net/