Author: Ajay Ohri
Interview Heiko Miertzsch eoda #rstats #startups
This is an interview wit Heiko Miertzsch, founder EODA ( http://www.eoda.de/en/). EODA is a cutting edge startup . recently they launched a few innovative products that made me sit up and pay attention. In this interview, Heiko Miertzsch , the founder of eoda talks on the startup journey and the vision of analytics.

DecisionStats (DS)- Describe the journey of your startup eoda. What made you choose R as the platform for your software and training. Name a few turning points and milestones in your corporate journey
Heiko Miertzsch (HM)- eoda was founded in 2010 by Oliver and me. We both have a strong background in analytics and Information Technology industry. So we observed the market a while before starting the business. We saw two trends: First, a lot of new Technologies and Tools for data analysis appeared and second Open Source seemed to become more and more important for several reasons. Just to name one the easiness to share experience and code in a broad and professional community. Disruptive forces seem to change the market and we just don’t want back the wrong horse.
From the beginning on we tested R and we were enthusiastic. We started choosing R for our projects, software development, services and build up a training program for R. We already believed in 2010 that R has a successful future. It was more flexible than other statistic languages, more powerful in respect of the functionality, you could integrate it in an existing environment and much more.
DS- You make both Software products and services. What are the challenges in marketing both?
HM- We even do more: We provide consulting, training, individual software, customizing software and services. It is pure fun for us to go to our customers and say “hey, we can help you solving your analytical problems, no matter what kind of service you want to buy, what kind of infrastructure you use, if you want to learn about forest trees or buy a SaaS solution to predict your customers revenues”. In a certain way we don’t see barriers between these delivery models because we use analytics as our basis. First of all, we focus on the analytical problem of our customers and then we find the ideal solution together with the customer.
DS- Describe your software tableR. How does it work, what is the pricing and what is the benefit to user. Name a few real life examples if available for usage.
HM- Today the process of data collection, analysis and presenting the results is characterized by the use of a heterogeneous software environment with many tools, file formats and manual processing steps. tableR supports the entire process from design a questionnaire, share a structured format with the CAXI software, import the data and doing the analysis and plot the table report with only one single solution. The base report comes with just one click and if you want to go more into detail you can enhance your analysis with your own R code.
tableR is used in a closed beta at the moment and the open beta will in start next weeks.
(It is available at http://www.eoda.de/en/tableR.html)
DS- Describe your software translateR (http://www.eoda.de/en/translateR.html) . How does it work, what is the pricing and what is the benefit to user. Name a few real life examples if available for usage.
HM- Many companies realized the advantages of the open source programming language R. translateR allows a fast and inexpensive migration to R – currently from SPSS code.
The manual migration of complex SPSS® scripts has always been tedious and error-prone. translateR will help here and the task of translating by hand becomes a thing of the past. The beta test of translateR will also start in the next weeks.
DS- How do you think we can use R on the cloud for better analytics?
HM- Well, R seems to bring together the best “Data Scientists” of the world with all their different focuses on different methods, vertical knowledge, technical experience and more. The cloud is a great workplace: It holds the data – a lot of data and it offers a technical platform with computing power. If it succeeds to bring these two aspects together, we could provide a lot of knowledge to solve a lot of problems – with individual and global impact.
DS- What advantages and disadvantages does working on the cloud give to a R user?
HM- In terms of R I don’t see other aspects than in using the cloud in general.
DS- Startup life can be hectic – what do you do to relax.
HM- Oliver and I have both families, so eoda is our time to relax – just fun. I guess we do the same typical things like others, Oliver plays soccer and goes running. I like any kind of endurance sports and go climbing, the first to give the thoughtless space the second to train to focus on a concrete target.
About-
translateR is the new service from German based R specialist eoda, which helps users to translate SPSS® Code to R automatically. translateR is developed in cooperation with the University of Kassel and financially supported by the LOEWE-program of the state Hessen. translateR will be available as a cloud service and as a desktop application.
eoda offers consulting, software development and training for analytical and statistical questions. eoda is focused on R and specializes in integrating R into existing software environments.
Big Data Shoes
The internet is a ponderful and wonderful place for serendipity
Interview Tobias Verbeke Open Analytics #rstats #startups
Tobias- I discovered the free software foundation while still at university and spent wonderful evenings configuring my GNU/Linux system and reading RMS essays. For the statistics classes proprietary software was proposed and that was obviously not an option, so I started tackling all problems using R which was at the time (around 2000) still an underdog together with pspp (a command-line SPSS clone) and xlispstat. From that moment on, I decided that R was the hammer and all problems to be solved were nails 😉 In my early career I worked as a statistician / data miner for a general consulting company which gave me the opportunity to bring R into Fortune 500 companies and learn what was needed to support its use in an enterprise context. In 2008 I founded Open Analytics to turn these lessons into practice and we started building tools to support the data analysis process using R. The first big project was Architect, which started as an eclipse-based R IDE, but more and more evolves into an IDE for data science more generally. In parallel we started working on infrastructure to automate R-based analyses and to plug R (and therefore statistical logic) into larger business processes and soon we had a tool suite to cover the needs of industry.
Tobias– RSB stands for the R Service Bus and is communication middleware and a work manager for R jobs. It allows to trigger and receive results from R jobs using a plethora of protocols such as RESTful web services, e-mail protocols, sftp, folder polling etc. The idea is to enable people to push a button (or software to make a request) and have them receive automated R based analysis results or reports for their data.
Tobias– RSB started when automating toxicological analyses in pharmaceutical industry in collaboration with Philippe Lamote. Together with David Dossot, an exceptional software architect in Vancouver, we decided to cleanly separate concerns, namely to separate the integration layer (RSB) from the statistical layer (R) and, likewise, from the application layer. As a result any arbitrary R code can be run via RSB and any client application can interact with RSB as long as it can talk one of the many supported protocols. This fundamental design principle makes us different from alternative solutions where statistical logic and integration logic are always somehow interwoven, which results in maintenance and integration headaches. One of the challenges has been to keep focus on the core task of automating statistical analyses and not deviating into features that would turn RSB into a tool for interaction with an R session, which deserves an entirely different approach. 
Tobias– From a freedom perspective, cloud computing and the SaaS model is often a step backwards, but in our own practice we obviously follow our customers’ needs and offer RSB hosting from our data centers as well. Also, our other products e.g. the R IDE Architect are ready for the cloud and use on servers via Architect Server. As far as R itself concerns in relation to cloud computing, I foresee its use to increase. At Open Analytics we see an increasing demand for R-based statistical engines that power web applications living in the cloud.
Tobias– RSB 6.0 is all about large-scale production environments and strong security. It kicked off on a project where RSB was responsible for spitting 8500 predictions per second. Such large-scale production deployments of RSB motivated the development of a series of features. First of all RSB was made lightning fast: we achieved a full round trip from REST call to prediction in 7 ms on the mentioned use case. In order to allow for high throughput, RSB also gained a synchronous API (RSB 5.0 had an asynchronous API only). Another new feature is the availability of client-side connection pooling to the pool manager of R processes that are read to serve RSB. Besides speed, this type of production environments also need monitoring and resilience in case of issues. For the monitoring, we made sure that everything is in place for monitoring and remotely querying not only the RSB application itself, but also the pool of R processes managed by RServi.
(Note from Ajay- RJ is an open source library providing tools and interfaces to integrate R in Java applications. RJ project also provides a pool for R engines, easy to setup and manage by a web-interface or JMX. One or multiple client can borrow the R engines (called RServi) see http://www.walware.de/it/rj/ and https://github.com/walware/rj-servi)
Also, we now allow to define node validation strategies to be able to check that R nodes are still functioning properly. If not, the nodes are killed and new nodes are started and added to the pool. In terms of security, we are now able to cover a very wide spectrum of authentication and authorization. We have machines up and running using openid, basic http authentication, LDAP, SSL client certificates etc. to serve everyone from the individual user who is happy with openid authentication for his RSB app to large investment banks who have very strong security requirements. The next step is to provide tighter integration with Architect, such that people can release new RSB applications without leaving the IDE.
Tobias– I do not feel qualified to answer such a question, since I founded a single company in Antwerp, Belgium. That being said, Belgium is great! 🙂
Tobias– Free software. Free as in beer and as in free speech!
Tobias– Open source is probably a global ecosystem and crosses oceans very easily. Dries Buytaert started off Drupal in Belgium and now operates from the US interacting with a global community. From a business perspective, there are as many open source models as there are open source companies. I noticed that the major US R companies (Revolution Analytics and RStudio) cherished the open source philosophy initially, but drifted both into models combining open source and proprietary components. At Open Analytics, there are only open source products and enterprise customers have access to exactly the same functionality as a student may have in a developing country. That being said, I don’t believe this is a matter of geography, but has to do more with the origins and different strategies of the companies.
Tobias- In a previous life the athletics track helped keeping hands off the keyboard. Currently, my children find very effective ways to achieve similar goals
About-
OpenAnalytics is a consulting company specialized in statistical computing using open technologies. You can read more on it at http://www.openanalytics.eu
How cheap is cloud computing anyway?
So I wanted to really find out how cheap the cloud was- but I got confused by the 23 kinds of instances than Amazon has http://aws.amazon.com/ec2/pricing/ and 15 kinds of instances at https://developers.google.com/compute/pricing.
or whether there is any price collusion between them 😉
Now Amazon has spot pricing so I can bid for prices as well (http://aws.amazon.com/ec2/purchasing-options/spot-instances/ ) and upto 60% off for reserved instances (http://aws.amazon.com/ec2/purchasing-options/reserved-instances/) but charges $2 for dedicated instances (which are not dedicated but pay as you go)
- $2 per hour – An additional fee is charged once per hour in which at least one Dedicated Instance of any type is running in a Region.
Google has sustained discounts ( will not offer Windows on the cloud though!)
The table below describes the discount at each usage level. These discounts apply for all instance types.
| Usage Level (% of month) | % at which incremental is charged | Example incremental rate (USD/per hour) for an n1-standard-1 instance |
|---|---|---|
| 0%-25% | 100% of base rate | $0.07 |
| 25%-50% | 80% of base rate | $0.056 |
| 50%-75% | 60% of base rate | $0.042 |
| 75%-100% | 40% of base rate | $0.028 |
Anyways- I tried to create this simple table to help me with it- after all hard disks are cheap- it is memory I want on the cloud !
Or maybe I am wrong and the cloud is not so cheap- or its just too complicated for someone to build a pricing calculator that can take in prices from all providers (Amazon, Azure, Google Compute) and show us the money!
| vCPU | RAM(GiB) | $ per Hour | Type -Linux Usage | Provider | Notes | |
| t2.micro | 1 | 1 | $0.01 | General Purpose – Current Generation | Amazon (North Virginia) | Amazon also has spot instances |
| t2.small | 1 | 2 | $0.03 | General Purpose – Current Generation | Amazon (North Virginia) | that can lower prices |
| t2.medium | 2 | 4 | $0.05 | General Purpose – Current Generation | Amazon (North Virginia) | |
| m3.medium | 1 | 3.75 | $0.07 | General Purpose – Current Generation | Amazon (North Virginia) | |
| m3.large | 2 | 7.5 | $0.14 | General Purpose – Current Generation | Amazon (North Virginia) | |
| m3.xlarge | 4 | 15 | $0.28 | General Purpose – Current Generation | Amazon (North Virginia) | |
| m3.2xlarge | 8 | 30 | $0.56 | General Purpose – Current Generation | Amazon (North Virginia) | |
| c3.large | 2 | 3.75 | $0.11 | Compute Optimized – Current Generation | Amazon (North Virginia) | |
| c3.xlarge | 4 | 7.5 | $0.21 | Compute Optimized – Current Generation | Amazon (North Virginia) | |
| c3.2xlarge | 8 | 15 | $0.42 | Compute Optimized – Current Generation | Amazon (North Virginia) | |
| c3.4xlarge | 16 | 30 | $0.84 | Compute Optimized – Current Generation | Amazon (North Virginia) | |
| c3.8xlarge | 32 | 60 | $1.68 | Compute Optimized – Current Generation | Amazon (North Virginia) | |
| g2.2xlarge | 8 | 15 | $0.65 | GPU Instances – Current Generation | Amazon (North Virginia) | |
| r3.large | 2 | 15 | $0.18 | Memory Optimized – Current Generation | Amazon (North Virginia) | |
| r3.xlarge | 4 | 30.5 | $0.35 | Memory Optimized – Current Generation | Amazon (North Virginia) | |
| r3.2xlarge | 8 | 61 | $0.70 | Memory Optimized – Current Generation | Amazon (North Virginia) | |
| r3.4xlarge | 16 | 122 | $1.40 | Memory Optimized – Current Generation | Amazon (North Virginia) | |
| r3.8xlarge | 32 | 244 | $2.80 | Memory Optimized – Current Generation | Amazon (North Virginia) | |
| i2.xlarge | 4 | 30.5 | $0.85 | Storage Optimized – Current Generation | Amazon (North Virginia) | |
| i2.2xlarge | 8 | 61 | $1.71 | Storage Optimized – Current Generation | Amazon (North Virginia) | |
| i2.4xlarge | 16 | 122 | $3.41 | Storage Optimized – Current Generation | Amazon (North Virginia) | |
| i2.8xlarge | 32 | 244 | $6.82 | Storage Optimized – Current Generation | Amazon (North Virginia) | |
| hs1.8xlarge | 16 | 117 | $4.60 | Storage Optimized – Current Generation | Amazon (North Virginia) | |
| n1-standard-1 | 1 | 3.75 | $0.07 | Standard | Google -US | Google charges per minute |
| n1-standard-2 | 2 | 7.5 | $0.14 | Standard | Google -US | of usage (subject to minimum of 10 minutes) |
| n1-standard-4 | 4 | 15 | $0.28 | Standard | Google -US | |
| n1-standard-8 | 8 | 30 | $0.56 | Standard | Google -US | |
| n1-standard-16 | 16 | 60 | $1.12 | Standard | Google -US | |
| n1-highmem-2 | 2 | 13 | $0.16 | High Memory | Google -US | |
| n1-highmem-4 | 4 | 26 | $0.33 | High Memory | Google -US | |
| n1-highmem-8 | 8 | 52 | $0.66 | High Memory | Google -US | |
| n1-highmem-16 | 16 | 104 | $1.31 | High Memory | Google -US | |
| n1-highcpu-2 | 2 | 1.8 | $0.09 | High CPU | Google -US | |
| n1-highcpu-4 | 4 | 3.6 | $0.18 | High CPU | Google -US | |
| n1-highcpu-8 | 8 | 7.2 | $0.35 | High CPU | Google -US | |
| n1-highcpu-16 | 16 | 14.4 | $0.70 | High CPU | Google -US | |
| f1-micro | 1 | 0.6 | $0.01 | Shared Core | Google -US | |
| g1-small | 1 | 1.7 | $0.04 | Shared Core | Google -US |
Using Windows Azure Machine Learning as a service with R #rstats
A Brief Tutorial I wrote by playing with the software at manage.windowsazure.com
Interview Louis Bajuk-Yorgan TIBCO Enterprise Runtime for R (TERR) #rstats
Here is an interview with Louis Bajuk-Yorgan, from TIBCO. TIBCO which was the leading commercial vendor to S Plus, the precursor of the R language makes a commercial enterprise version of R called TIBCO Enterprise Runtime for R (TERR). Louis also presented recently at User2014 http://user2014.stat.ucla.edu/abstracts/talks/54_Bajuk-Yorgan.pdf
DecisionStats(DS)- How is TERR different from Revolution Analytics or Oracle R. How is it similar.
DS- How much of R is TERR compatible with?
DS- Describe Tibco Cloud Compute Grid, What are it’s applications for data science.
DS- What advantages does TIBCO’s rich history with the S project give it for the R project.
Lou- Our 20+ years of experience with S-PLUS gave us a unique knowledge of the commercial applications of the S/R language, deep experience with architecting, extending and maintaining a commercial S language engine, strong ties to the R community and a rich trove of algorithms we could apply on developing the TERR engine.
DS- Describe some benchmarks of TERR with open source of R.
DS- TERR is not open source. Why is that?
DS- How is TIBCO a company to work for potential data scientists.
DS- How is TIBCO giving back to the R Community globally. What are it’s plans on community.
DS- As a sixth time attendee of UseR, Describe the evolution of R ecosystem as you have observed it.


