The Ohri Framework – Data Mining on Demand

The Ohri Framework tries to create an economic alternative to proprietary data mining softwares by giving more value to the customer and utilizing open source statistical package R , with the GUI Rattle , hosted on a cloud computing environment.

It is based on the following assumptions-

1) R is relatively inefficient in processing bigger file sizes on same desktop configuration as other softwares like SAS.

2) R has a steep learning curve , hence the need for the GUI Rattle .

3) The enhanced need for computing resources for R is best solved using a cloud computing on demand processing environment. This enables R to scale up to whatever processing power it needs. Mainstream data mining softwares charge by CPU count for servers and are much more expensive due to software costs alone.

4)  Users of big data sizes have data hygiene issues in transportation of data. This is solved by using encryption and compression before transporting the raw CSV files and the processed CSV files. The use of CSV is to enhance usage by internal data sources. The use of PGP is recommended both as compression and encryption.Compression of data cuts down on bandwidth transportation costs.

5) Pricing of service is recommended to be on cost plus model to enhance usage by subscribers.

The Ohri framework thus tries to replace hardware costs (for R) , software costs (for other softwares like SAS, SPSS), data hygiene costs (by encryption) , bandwidth costs (by compression) to give data mining on demand for the masses.

Part of the reason softwares like SPSS and SAS continues to enjoy a profitable lead is

1) Standardized language elements (Data and Procs)

2)Ease of Learning SAS, SPSS

3) Output Delivery to multiple sources

4) Input from multiple data sources

But the most important reason is the sheer efficiency of the SAS PDV in reading large files . If Excel could load a 300 mb file that easily, it would make a significant dent.Large files are assumed to be used by larger license holders.

Cloud computing could be of help here to languages like R. R is very very good in advanced stats, is free, the packages are peer reviewed. It has little known but very good GUI’s too (like rattle). If you place rattle GUI in a cloud , it would use processing power on demand, and output results.SAS wont do it because they charge by the CPU count on this, and thats an idle asset (reaping rewards from programming done long back)

Thus you save on hardware costs and software costs.People pay only when they use the system. But an additional costs is fixed cost of the remote application built to support the framework, including transport bandwidth cost. A sugestion could be to use 1_compressed and encrypted data transfers to and fro from the remote cloud.PGP.com would be of help here

Pay for bandwidth, and cost + small markup for the cloud hosting costs. Economies of scale will ensue.

R’s graphical system is superior than than SAS or SPSS, but it can be tweaked to newer graphical softwares like Silverlight.

The little guy no longer needs to squeeze himself for the big computing power.I could be totally wrong here, but it may be worth a shot.

Summary-

Data encryption and compression before data trqansfers.

Open source analytical tool in cloud

Graphical Interface for results

Billing at cost plus pricing.

Author: Ajay Ohri

http://about.me/ajayohri

2 thoughts on “The Ohri Framework – Data Mining on Demand”

  1. Pingback: R for Recession

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s