What is Radoop? Quite possibly an exciting mix of analytics and big data computing

What is Radoop?

BY ZOLTÁN PREKOPCSÁK

Hadoop is an excellent tool for analyzing large data sets, but it lacks an easy-to-use graphical interface. RapidMiner is an excellent tool for data analytics, but its data size is limited by the memory available, and a single machine is often not enough to run the analyses on time. In this project, we combine the strengths of both projects and provide a RapidMiner extension for editing and running ETL, data analytics and machine learning processes over Hadoop.

We have closely integrated the highly optimized data analytics capabilities of Hive and Mahout, and the user-friendly interface of RapidMiner to form a powerful and easy-to-use data analytics solution for Hadoop.

and what’s new

http://blog.radoop.eu/?p=198

Radoop 0.3 released – fully graphical big data analytics

BY ZOLTÁN PREKOPCSÁK

Today, Radoop had a major step forward with its 0.3 release. The new version of the visual big data analytics package adds full support for all major Hadoop distributions used these days: Apache Hadoop 0.20.2, 0.20.203, 1.0 and Cloudera’s Distribution including Apache Hadoop 3 (CDH3). It also adds support for large clusters by allowing the namenode, the jobtracker and the Hive server to reside on different nodes.

As Radoop’s promise is to make big data analytics easier, the 0.3 release is also focused on improving the user interface. It has an enhanced breakpointing system which allows to investigate intermediate results, and it adds dozens of quick fixes, so common process design mistakes get much easier to solve.

There are many further improvements and fixes, so please consult the release notes for more details. Radoop is in private beta mode, but heading towards a public release in Q2 2012. If you would like to get early access, then please apply at the signup page or describe your use case in email (beta at radoop.eu).

Radoop 0.3 (15 February 2012)

Support for Apache Hadoop 0.20.2, 0.20.203, 1.0 and Cloudera’s Distribution Including Apache Hadoop 3 (CDH3) in a single release
Support for clusters with separate master nodes (namenode, jobtracker, Hive server)
Enhanced breakpointing to evaluate intermediate results
Dozens of quick fixes for the most common process design errors
Improved process design and error reporting
New welcome perspective to help in the first steps
Many bugfixes and performance improvements

Radoop 0.2.2 (6 December 2011)

More Aggregate functions and distinct option
Generate ID operator for convenience
Numerous bug fixes and improvements
Improved user interface

Radoop 0.2.1 (16 September 2011)

Set Role and Data Multiplier operators
Management panel for testing Hadoop connections
Stability improvements for Hive access
Further small bugfixes and improvements

Radoop 0.2 (26 July 2011)

Three new algoritms: Fuzzy K-Means, Canopy, and Dirichlet clustering
Three new data preprocessing operators: Normalize, Replace, and Replace Missing Values
Significant speed improvements in data transmission and interactive analytics
Increased stability and speedup for K-Means
More flexible settings for Join operations
More meaningful error messages
Other small bugfixes and improvements

Radoop 0.1 (14 June 2011)

Initial release with 26 operators for data transmission, data preprocessing, and one clustering algorithm.

Note that Rapid Miner also has a great R extension so you can use R, a graphical interface and big data analytics is now easier and more powerful than ever.

Author: Ajay Ohri

https://linkedin.com/in/ajayohri View all posts by Ajay Ohri

Radoop 0.3 launched- Open Source Graphical Analytics meets Big Data

What is Radoop?