Rapid Miner- R Extension

Here is a new video which shows exactly how you can use Rapid Miner and R together. Advantages of using both together is using Rapid Miner’s GUI (including the flowchart style for data mning) and adding R statistical functionality to it.

From http://rapid-i.com/content/view/219/1/

The web site features a video showing how easy R models and scripts can be integrated into the RapidMiner analysis processes. RapidMiner offers a new R perspective consisting of the known R console together with the great plotting facilities of R. All variables as well as R scripts can be stored in the RapidMiner Repository and used from there which helps to organize the usually large number of scripts. Furthermore, widely used modeling methods are directly integrated as RapidMiner operators as usual.

“This is a huge step for open source data analysis. RapidMiner offers a great user interface, a clear process structure and lots of ETL and analysis capabilities necessary for real-world problems. R adds a lot of flexibility and many analysis and data manipulation methods. The result is the by far most powerful data transformation and analysis solution worldwide. And this analysis power is now combined with the ease-of-use already known from RapidMiner.” states Dr. Ingo Mierswa, CEO of Rapid-I.

Visit the RCOMM 2010 and learn more about how to integrate analysis and preprocessing methods offered by R as well as how to use the new R perspective offering a full R console and access to all R plotters.

Thus Rapid Miner is one more mainstream software (after SPSS, SAS etc) to add R functionality to it.

Growing Rapidly: Rapid Miner 4.5

The Europe based Rapid Miner came out with version 4.5 of their data mining tool ( also known as Yale) with a much promising “Script” tool.

Also, Rapid Miner came in 1st in open source data mining tools in a poll by Industry benchmark www.kdnuggets.com

They have a brilliant video here for people who just want to have a look at the new Rapid Miner

http://rapid-i.com/videos/rapidminer_tour_3_4_en.html

Citation-

http://rapid-i.com/content/view/147/1/

New Operators:

  • FormulaExtractor
  • Trend
  • LagSeries
  • VectorLinearRegression
  • ExampleSetMinus
  • ExampleSetIntersect
  • Partition
  • Script
  • ForwardSelection
  • NeuralNetImproved
  • KernelNaiveBayes
  • ExhaustiveSubgroupDiscovery
  • URLExampleSource
  • NonDominatedSorting
Image

More Features:

  • The new Script operator allows for arbitrary user defined operations based on Groovy script combined with a simplified RapidMiner syntax
  • Improved the join operator and added options for left and right outer joins
  • New notification mail mechanism at the end of processes
  • Most file based data input operators now provide an option to skip error lines
  • Most file based example source operators as well as the IOObjectReader and the new URLExampleSource now accept URLs instead of a filename for the input source location

Modified Ohri Framework

 

Some time back, I had created a framework for data mining through on demand cloud computing. This is the next version- it is free to use for all, with only authorship credit back to me…………..
 
It tries to do away with fixed server ,desktop costs AND fixed software costs in softwares which are used for data mining ,stats and analytics and have huge huge per CPU count annual license fees

 

The modified Ohri Framework tries to mash the following

 

0) HTTPS rather than HTTP

1) Encryption and Compression Software for data transfer (like PGP)

2) Open source stats package like R in cloud computer (like Amazon EC2 or Rightscale  with hadoop)

3) GUI to make it easy to use (like Rattle GUI and PMML Package)

4) A Data Mining Open Source Package (like Rapid Miner or Splunk)

5) RIA Graphics (like Silverlight )

6) Secure Output to cloud computing devices (like Google Docs)

7) Billing or Priced at simple cost plus X % (where simple cost can be like 0.85 cent /per instance hour or more depending on usage and X should not be more than 15 %)

8) Open source sharing of all code to ensure community sandboxing

 

Intention is to remove fixed computing costs of servers and desktops to normal PC’s (Ubuntu Linux ) with (Firefox or IE Explorer ) access to secure data mining on demand .

On tap demand mining to anyone in the world without going for the big license purchases/renewals (software expenses) or big hardware purchases (which become obsolete in 2-3 years).