RapidMiner launches extensions marketplace

For some time now, I had been hoping for a place where new package or algorithm developers get at least a fraction of the money that iPad or iPhone application developers get. Rapid Miner has taken the lead in establishing a marketplace for extensions. Is there going to be paid extensions as well- I hope so!!

This probably makes it the first “app” marketplace in open source and the second app marketplace in analytics after salesforce.com

It is hard work to think of new algols, and some of them can really be usefull.

Can we hope for #rstats marketplace where people downloading say ggplot3.0 atleast get a prompt to donate 99 cents per download to Hadley Wickham’s Amazon wishlist. http://www.amazon.com/gp/registry/1Y65N3VFA613B

Do you think it is okay to pay 99 cents per iTunes song, but not pay a cent for open source software.

I dont know- but I am just a capitalist born in a country that was socialist for the first 13 years of my life. Congratulations once again to Rapid Miner for innovating and leading the way.


RapidMinerMarketplaceExtensions 30 May 2011
Rapid-I Marketplace Launched by Simon Fischer

Over the years, many of you have been developing new RapidMiner Extensions dedicated to a broad set of topics. Whereas these extensions are easy to install in RapidMiner – just download and place them in the plugins folder – the hard part is to find them in the vastness that is the Internet. Extensions made by ourselves at Rapid-I, on the other hand,  are distributed by the update server making them searchable and installable directly inside RapidMiner.

We thought that this was a bit unfair, so we decieded to open up the update server to the public, and not only this, we even gave it a new look and name. The Rapid-I Marketplace is available in beta mode at http://rapidupdate.de:8180/ . You can use the Web interface to browse, comment, and rate the extensions, and you can use the update functionality in RapidMiner by going to the preferences and entering http://rapidupdate.de:8180/UpdateServer/ as the update server URL. (Once the beta test is complete, we will change the port back to 80 so we won’t have any firewall problems.)

As an Extension developer, just register with the Marketplace and drop me an email (fischer at rapid-i dot com) so I can give you permissions to upload your own extension. Upload is simple provided you use the standard RapidMiner Extension build process and will boost visibility of your extension.

Looking forward to see many new extensions there soon!

Disclaimer- Decisionstats is a partner of Rapid Miner. I have been liking the software for a long long time, and recently agreed to partner with them just like I did with KXEN some years back, and with Predictive AnalyticsConference, and Aster Data until last year.

I still think Rapid Miner is a very very good software,and a globally created software after SAP.

Here is the actual marketplace


Welcome to the Rapid-I Marketplace Public Beta Test

The Rapid-I Marketplace will soon replace the RapidMiner update server. Using this marketplace, you can share your RapidMiner extensions and make them available for download by the community of RapidMiner users. Currently, we are beta testing this server. If you want to use this server in RapidMiner, you must go to the preferences and enter http://rapidupdate.de:8180/UpdateServer for the update url. After the beta test, we will change the port back to 80, which is currently occupied by the old update server. You can test the marketplace as a user (downloading extensions) and as an Extension developer. If you want to publish your extension here, please let us know via the contact form.

Hot Downloads
«« « 1 2 3 » »»
[Icon]The Image Processing Extension provides operators for handling image data. You can extract attributes describing colour and texture in the image, you can make several transformation of a image data which allows you to perform segmentation and detection of suspicious areas in image data.The extension provides many of image transformation and extraction operators ranging from Wavelet Decomposition, Hough Circle to Block Difference of Inverse probabilities.

[Icon]RapidMiner is unquestionably the world-leading open-source system for data mining. It is available as a stand-alone application for data analysis and as a data mining engine for the integration into own products. Thousands of applications of RapidMiner in more than 40 countries give their users a competitive edge.

  • Data IntegrationAnalytical ETLData Analysis, and Reporting in one single suite
  • Powerful but intuitive graphical user interface for the design of analysis processes
  • Repositories for process, data and meta data handling
  • Only solution with meta data transformation: forget trial and error and inspect results already during design time
  • Only solution which supports on-the-fly error recognition and quick fixes
  • Complete and flexible: Hundreds of data loading, data transformation, data modeling, and data visualization methods
[Icon]All modeling methods and attribute evaluation methods from the Weka machine learning library are available within RapidMiner. After installing this extension you will get access to about 100 additional modelling schemes including additional decision trees, rule learners and regression estimators.This extension combines two of the most widely used open source data mining solutions. By installing it, you can extend RapidMiner to everything what is possible with Weka while keeping the full analysis, preprocessing, and visualization power of RapidMiner.

[Icon]Finally, the two most widely used data analysis solutions – RapidMiner and R – are connected. Arbitrary R models and scripts can now be directly integrated into the RapidMiner analysis processes. The new R perspective offers the known R console together with the great plotting facilities of R. All variables and R scripts can be organized in the RapidMiner Repository.A directly included online help and multi-line editing makes the creation of R scripts much more comfortable.

Brief Interview with James G Kobielus

Here is a brief one question interview with James Kobielus, Senior Analyst, Forrester.

Ajay-Describe the five most important events in Predictive Analytics you saw in 2010 and the top three trends in 2011 as per you.


Five most important developments in 2010:

  • Continued emergence of enterprise-grade Hadoop solutions as the core of the future cloud-based platforms for advanced analytics
  • Development of the market for analytic solution appliances that incorporate several key features for advanced analytics: massively parallel EDW appliance, in-database analytics and data management function processing, embedded statistical libraries, prebuilt logical domain models, and integrated modeling and mining tools
  • Integration of advanced analytics into core BI platforms with user-friendly, visual, wizard-driven, tools for quick, exploratory predictive modeling, forecasting, and what-if analysis by nontechnical business users
  • Convergence of predictive analytics, data mining, content analytics, and CEP in integrated tools geared  to real-time social media analytics
  • Emergence of CRM and other line-of-business applications that support continuously optimized “next-best action” business processes through embedding of predictive models, orchestration engines, business rules engines, and CEP agility

Three top trends I see in the coming year, above and beyond deepening and adoption of the above-bulleted developments:

  • All-in-memory, massively parallel analytic architectures will begin to gain a foothold in complex EDW environments in support of real-time elastic analytics
  • Further crystallization of a market for general-purpose “recommendation engines” that, operating inline to EDWs, CEP environments, and BPM platforms, enable “next-best action” approaches to emerge from today’s application siloes
  • Incorporation of social network analysis functionality into a wider range of front-office business processes to enable fine-tuned behavioral-based customer segmentation to drive CRM optimization

About –http://www.forrester.com/rb/analyst/james_kobielus

James G. Kobielus
Senior Analyst, Forrester Research


James serves Business Process & Applications professionals. He is a leading expert on data warehousing, predictive analytics, data mining, and complex event processing. In addition to his core coverage areas, James contributes to Forrester’s research in business intelligence, data integration, data quality, and master data management.


James has a long history in IT research and consulting and has worked for both vendors and research firms. Most recently, he was at Current Analysis, an IT research firm, where he was a principal analyst covering topics ranging from data warehousing to data integration and the Semantic Web. Prior to that position, James was a senior technical systems analyst at Exostar (a hosted supply chain management and eBusiness hub for the aerospace and defense industry). In this capacity, James was responsible for identifying and specifying product/service requirements for federated identity, PKI, and other products. He also worked as an analyst for the Burton Group and was previously employed by LCC International, DynCorp, ADEENA, International Center for Information Technologies, and the North American Telecommunications Association. He is both well versed and experienced in product and market assessments. James is a widely published business/technology author and has spoken at many industry events

Interview Tasso Argyros CTO Aster Data Systems

Here is an interview with Tasso Argyros,the CTO and co-founder of Aster Data Systems (www.asterdata.com ) .Aster Data Systems is one of the first DBMS to tightly integrate SQL with MapReduce.


Ajay- Maths and Science students the world over are facing a major decline. What would you recommend to young students to get careers in science.

[TA]My father is a professor of Mathematics and I spent a lot of my college time studying advanced math. What I would say to new students is that Math is not a way to get  a job, it’s a way to learn how to think. As such, a Math education can lead to success in any discipline that requires intellectual abilities. As long as they take the time to specialize at some point – via  postgraduate education or a job where they can learn a new discipline from smart people – they won’t regret the investment.

Ajay- Describe your career in Science particularly your time at Stanford. What made you think of starting up Asterdata. How important is it for a team rather than an individual to begin startups. Could you describe the startup moment when your team came together.

[TA] – While at Stanford I became very familiar with the world of startups through my advisor, David Cheriton (who was an angel investor in VMWare, Google and founder of two successful companies). My research was about processing large amounts of data on large, low-cost computer farms. A year into my research it became obvious that this approach had huge processingpower advantages and it was superior to anything else I could see in the marketplace. I then happened to meet my other two co-founders, Mayank Bawa & George Candea who were looking at a similar technical problem from the database and reliability perspective, respectively.

I distinctly remember George walking into my office one day (I barely knew him back then) and saying “I want talk to you about startups and the future” – the rest has become history.

Ajay- How would you describe your product Aster nCluster Cloud Edition to omebody who does not anything beyond the Traditional Server/ Datawarehouse technologies. Could you rate it against some known vendors and give a price point specific to what level of usage does the Total Cost of Ownership in Asterdata becomes cheaper than a say Oracle or a SAP or a Microsoft Datawarehosuing solution.

[TA]- Aster allows businesses  to reduce the data analytics TCO in two interesting ways. First, it has a much lower hardware cost than any traditional DW technology because of its use of commodity servers or cloud infrastructure like Amazon EC2. Secondly, Aster has implemented a lot of  innovations that simplify the (previously tedious and expensive) management of the system, which includes scaling the system elastically up/down as needed – so they are not paying for capacity they don’t need at a given point in time.

But cutting costs is one side of the equation; what makes me even more excited is the ability to make a business more profitable, competitive and efficient through analyzing more data at greaterdepth. We have customers that have cut their costs and increased their customers and revenue by using Aster to analyze their valuable (and usually underutilized) data. If you have data – and you think you’re not taking full advantage of it – Aster can help.

Ajay- I have always have this one favourite question.When can I analyze 100 giga bytes of data using just a browser and some statistical software like R or advanced forecasting softwares that are available.Describe some of Asterdata ‘s work in enhancing the analytical capabilities of big data.

Can I run R ( free -open source) on an on demand basis for an Asterdata solution. How much would it cost me to crunch 100 gb of data and make segmentations and models with say 50 hours of processing time per month

[TA]- One of the big innovations that Aster does it to allow analytical applications like R to be embedded in the database via our SQL/MapReduce framework. We actually have customers right now that are using R to do advanced analytics over terabytes of data.  100GB is actually on the lower end of what our software can enable and as such the cost would not be significant.

Ajay- What do people at Asterdata do when not making complex software.

[TA]- A lot of Asterites love to travel around the world – we are, after all, a very diverse company. We also love coffee, Indian food as well as international and US sports like soccer, cricket, cycling,and football!

Ajay- Name some competing products to Asterdata and where Asterdata products are more suitable for a TCO viewpoint. Name specific areas where you would not recommend your own products.

[TA]- We go against products like Orace database, Teradata and IBM DB2. If you need to do analytics over 100s of GBs or terabytes of data, our price/performance ratio would be orders of magnitude better.

Ajay- How do you convince named and experienced VC’s Sequia Capital to invest in a start-up ( eg I could do with some server costs coming financing)

[TA]- You need to convince Sequoia of three things. (a) that the market you’re going after is very large (in the billions of dollars, if you’re successful). (b) that your team is the best set of people that could ever come together to solve the particular problem you’re trying to solve. And (c) that the technology you’ve developed gives you an “unfair advantage” over incumbents or new market entrants.  Most importantly, you have to smile a lot! J


About Tasso:

Tasso (Tassos) Argyros is the CTO and co-founder of Aster Data Systems, where he is responsible for all product and engineering operations of the company. Tasso was recently recognized as one ofBusinessWeek’s Best Young Tech Entrepreneurs for 2009 and was an SAP fellow at the Stanford Computer Science department. Prior to Aster, Tasso was pursuing a Ph.D. in the Stanford Distributed Systems Group with a focus on designing cluster architectures for fast, parallel data processing using large farms of commodity servers. He holds an MsC in Computer Science from Stanford University and a Diploma in Computer and Electrical Engineering from Technical University of Athens.

About Aster:

Aster Data Systems is a proven leader in high-performance database systems for data warehousing and analytics – the first DBMS to tightly integrate SQL with MapReduce – providing deep insights on data analyzed on clusters of low-cost commodity hardware.

The Aster nCluster database cost-effectively powers frontline analytic applications for companies such as MySpace, aCerno (an Akamai company), and ShareThis. Running on low-cost off-the-shelf hardware, and providing ‘hands-free’ administration, Aster enables enterprises to meet their data warehousing needs within their budget.

Aster is headquartered in San Carlos, California and is backed by Sequoia Capital, JAFCO Ventures, IVP, Cambrian Ventures, and First-Round Capital, as well as industry visionaries including David Cheriton, Rajeev Motwani and Ron Conway.