The Ohri Framework tries to create an economic alternative to proprietary data mining softwares by giving more value to the customer and utilizing open source statistical package R , with the GUI Rattle , hosted on a cloud computing environment.
It is based on the following assumptions-
1) R is relatively inefficient in processing bigger file sizes on same desktop configuration as other softwares like SAS.
2) R has a steep learning curve , hence the need for the GUI Rattle .
3) The enhanced need for computing resources for R is best solved using a cloud computing on demand processing environment. This enables R to scale up to whatever processing power it needs. Mainstream data mining softwares charge by CPU count for servers and are much more expensive due to software costs alone.
4) Users of big data sizes have data hygiene issues in transportation of data. This is solved by using encryption and compression before transporting the raw CSV files and the processed CSV files. The use of CSV is to enhance usage by internal data sources. The use of PGP is recommended both as compression and encryption.Compression of data cuts down on bandwidth transportation costs.
5) Pricing of service is recommended to be on cost plus model to enhance usage by subscribers.
The Ohri framework thus tries to replace hardware costs (for R) , software costs (for other softwares like SAS, SPSS), data hygiene costs (by encryption) , bandwidth costs (by compression) to give data mining on demand for the masses.
Part of the reason softwares like SPSS and SAS continues to enjoy a profitable lead is
1) Standardized language elements (Data and Procs)
2)Ease of Learning SAS, SPSS
3) Output Delivery to multiple sources
4) Input from multiple data sources
But the most important reason is the sheer efficiency of the SAS PDV in reading large files . If Excel could load a 300 mb file that easily, it would make a significant dent.Large files are assumed to be used by larger license holders.
Cloud computing could be of help here to languages like R. R is very very good in advanced stats, is free, the packages are peer reviewed. It has little known but very good GUI’s too (like rattle). If you place rattle GUI in a cloud , it would use processing power on demand, and output results.SAS wont do it because they charge by the CPU count on this, and thats an idle asset (reaping rewards from programming done long back)
Thus you save on hardware costs and software costs.People pay only when they use the system. But an additional costs is fixed cost of the remote application built to support the framework, including transport bandwidth cost. A sugestion could be to use 1_compressed and encrypted data transfers to and fro from the remote cloud.PGP.com would be of help here
Pay for bandwidth, and cost + small markup for the cloud hosting costs. Economies of scale will ensue.
R’s graphical system is superior than than SAS or SPSS, but it can be tweaked to newer graphical softwares like Silverlight.
The little guy no longer needs to squeeze himself for the big computing power.I could be totally wrong here, but it may be worth a shot.
Summary-
Data encryption and compression before data trqansfers.
Open source analytical tool in cloud
Graphical Interface for results
Billing at cost plus pricing.
2 thoughts on “The Ohri Framework – Data Mining on Demand”