MapReduce Analytics Apps- AsterData's Developer Express Plugin

AsterData continues to wow with it’s efforts on bridging MapReduce and Analytics, with it’s new Developer Express plug-in for Eclipse. As any Eclipse user knows, that greatly improves ability to write code or develop ( similar to creating Android apps if you have tried to). I did my winter internship at AsterData last December last year in San Carlos, and its an amazing place with giga-level bright people.

Here are some details ( Note I plan to play a bit more on the plugin on my currently downUbuntu on this and let you know)

http://marketplace.eclipse.org/content/aster-data-developer-express-plug-eclipse

Aster Data Developer Express provides an integrated set of tools for development of SQL and MapReduce analytics for Aster Data nCluster, a massively parallel database with an integrated analytics engine.

The Aster Data Developer Express plug-in for Eclipse enables developers to easily create new analytic application projects with the help of an intuitive set of wizards, immediately test their applications on their desktop, and push down their applications into the nCluster database with a single click.

Using Developer Express, analysts can significantly reduce the complexity and time needed to create advanced analytic applications so that they can more rapidly deliver deeper and richer analytic insights from their data.

and from the Press Release

Now, any developer or analyst that is familiar with the Java programming language can complete a rich analytic application in under an hour using the simple yet powerful Aster Data Developer Express environment in Eclipse. Aster Data Developer Express delivers both rapid development and local testing of advanced analytic applications for any project, regardless of size.

The free, downloadable Aster Data Developer Express IDE now brings the power of SQL-MapReduce to any organization that is looking to build richer analytic applications that can leverage massive data volumes. Much of the MapReduce coding, including programming concepts like parallelization and distributed data analysis, is addressed by the IDE without the developer or analyst needing to have expertise in these areas. This simplification makes it much easier for developers to be successful quickly and eliminates the need for them to have any deep knowledge of the MapReduce parallel processing framework. Google first published MapReduce in 2004 for parallel processing of big data sets. Aster Data has coupled SQL with MapReduce and brought SQL-MapReduce to market, making it significantly easier for any organization to leverage the power of MapReduce. The Aster Developer Express IDE simplifies application development even further with an intuitive point-and-click development environment that speeds development of rich analytic applications. Applications can be validated locally on the desktop or ultimately within Aster Data nCluster, a massive parallel processing (MPP) database with a fully integrated analytics engine that is powered by MapReduce—known as a data-analytics server.

Rich analytic applications that can be easily built with Aster Data’s downloadable IDE include:

Iterative Analytics: Uncovering critical business patterns in your data requires hypothesis-driven, iterative analysis.  This class of applications is defined by the exploratory navigation of massive volumes of data in a top-down, deductive manner.  Aster Data’s IDE makes this easy to develop and to validate the algorithms and functions required to deliver these advanced analytic applications.

Prediction and Optimization: For this class of applications, the process is inductive. Rather than starting with a hypothesis, developers and analysts can easily build analytic applications that discover the trends, patterns, and outliers in data sets.  Examples include propensity to churn in telecommunications, proactive product and service recommendations in retail, and pricing and retention strategies in financial services.

Ad Hoc Analysis: Examples of ad hoc analysis that can be performed includes social network analysis, advanced click stream analysis, graph analysis, cluster analysis, and a wide variety of mathematical, trigonometry, and statistical functions.

“Aster Data’s IDE and SQL-MapReduce significantly eases development of advanced analytic applications on big data. We have now built over 350 analytic functions in SQL-MapReduce on Aster Data nCluster that are available for customers to purchase,” said Partha Sen, CEO and Founder of Fuzzy Logix. “Aster Data’s implementation of MapReduce with SQL-MapReduce goes beyond the capabilities of general analytic development APIs and provides us with the excellent control and flexibility needed to implement even the most complex analytic algorithms.”

Richer analytics on big data volumes is the new competitive frontier. Organizations have always generated reports to guide their decision-making. Although reports are important, they are historical sets of information generally arranged around predefined metrics and generated on a periodic basis.

Advanced analytics begins where reporting leaves off. Reporting often answers historical questions such as “what happened?” However, analytics addresses “why it happened” and, increasingly, “what will happen next?” To that end, solutions like Aster Data Developer Express ease the development of powerful ad hoc, predictive analytics and enables analysts to quickly and deeply explore terabytes to petabytes of data.
“We are in the midst of a new age in analytics. Organizations today can harness the power of big data regardless of scale or complexity”, said Don Watters, Chief Data Architect for MySpace. “Solutions like the Aster Data Developer Express visual development environment make it even easier by enabling us to automate aspects of development that currently take days, allowing us to build rich analytic applications significantly faster. Making Developer Express openly available for download opens the power of MapReduce to a broader audience, making big data analytics much faster and easier than ever before.”

“Our delivery of SQL coupled with MapReduce has clearly made it easier for customers to build highly advanced analytic applications that leverage the power of MapReduce. The visual IDE, Aster Data Developer Express, introduced earlier this year, made application development even easier and the great response we have had to it has driven us to make this open and freely available to any organization looking to build rich analytic applications,” said Tasso Argyros, Founder and CTO, Aster Data. “We are excited about today’s announcement as it allows companies of all sizes who need richer analytics to easily build powerful analytic applications and experience the power of MapReduce without having to learn any new skills.”

You can have a look here at http://www.asterdata.com/download_developer_express/

Aster Analytics and MapReduce.org

From the Press Release,

Aster Data Announces New Analytics Center and Launches http://www.mapreduce.org to Ease and Accelerate Adoption of MapReduce-Based Analytics

All-Star Team of Analytics Experts and MapReduce.org to Help Companies Build Next Generation Analytic Applications Using SQL-MapReduce and MapReduce Breakthroughs

Las Vegas, NV – April 12, 2010 – Gartner Business Intelligence Summit – Aster Data, a proven leader dedicated to providing the best data management and processing platform for big data volumes and analytics-intensive applications, today unveiled the Aster Analytics Center to help customers accelerate development of advanced analytic applications. Simultaneously, Aster Data also launched the first multi-author destination site for enterprise and government organizations, systems integrators, ISVs, and developers who want to build competency on the MapReduce analytics processing framework and related MapReduce frameworks.http://www.mapreduce.org offers research, education, analysis, customer use cases, key learnings, and tips for anyone interested in understanding the analytical value of MapReduce and related frameworks such as SQL-MapReduce. The new Aster Analytics Center provides product offerings, services, a world-class team, and an elite ecosystem of partners to develop and deliver data-driven applications that use SQL and MapReduce.

www.mapreduce.org is designed to be a key destination for companies who want to understand and build skills around MapReduce, SQL-MapReduce, and related MapReduce technologies. It includes content from those developing data-intensive applications with MapReduce and related MapReduce frameworks such as SQL-MapReduce, as well as insights from industry analysts, customers, and vendors who are leveraging this technology popularized by Google to build next-generation analytic applications. Any industry, enterprise organization, government agency, or expert can contribute content to this site.

Wayne Eckerson, director for TDWI Research and author of the recent article titled Launching an Analytics Practice: 10 Steps to Success, said, “Companies today need experts who can help them accelerate delivery of next-generation, data-driven applications. To run deep analytics on big data requires understanding the analytical capabilities of new database technology, including knowledge of MapReduce and parallel processing requirements.”

Today’s news includes key additions to the Aster Data team. Jonathan Goldman, director of analytics for Aster Data, is responsible for the new Aster Analytics Center, which includes product offerings such as the recently announced Aster Analytics Foundation—a suite of ready-to-use analytics functions and best practices for building advanced analytic applications that involve large data volumes and many diverse data sources. Prior to joining Aster Data he was a principal scientist at LinkedIn, where he led a team of analytics researchers to build cutting-edge products with the rich data sets LinkedIn collected. He created the popular “People You May Know” product for LinkedIn, and developed and supported computationally-intensive and targeted content throughout the site including “Who Viewed My Profile,” the “Similar Jobs” function, and “Similar Members” function, among others. Goldman earned a PhD in physics from Stanford University and a bachelors of science in physics from MIT.

These are interesting developments given the increasing focus on handling complex, unstructured and larger datasets involved in predictive as well as descriptive analytics and data driven strategies.

Webinar on January 14th, Thursday

Disclaimer- I am doing my Winter Internship with Steve at AsterData and helping organize this as part of my training as well.

Open Source Webinar with AsterData

Learn how to make money from open source databases, some business intelligence and more business analytics in this webinare at here.

FCC Disclaimer ( even though it is one day before the rules for Bloggers come in effect)-

AsterData is an advertiser on this blog. See the ad on right.

MapReduce was released by Google in 2004 as how to do big data crunching faster.

Google is not an advertiser nor partner on this site. They are busy with mobile phones and advertising (like the TV series Mad Men.)

And yes, Sergey Brin needs to finish his  Phd too.

M2009 Interview Peter Pawlowski AsterData

Here is an interview with Peter Pawlowski, who is the MTS for Data Mining at Aster Data. I ran into Peter at his booth at AsterData during M2009, and followed up with an email interview. Also included is a presentation by him of which he was a co-author.

[tweetmeme source=”decisionstats”]

Ajay- Describe your career in Science leading up till today.

Peter- Went to Stanford, where I got a BS & MS in Computer Science. I did some work on automated bug-finding tools while at Stanford.
( Note- that sums up the career of almost 60 % of CS scientists)

Ajay- How is life working at Aster Data- what are the challenges and the great stuff

Peter- Working at Aster is great fun, due to the sheer breadth and variety of the technical challenges. We have problems to solve in the optimization, languages, networking, databases, operating systems, etc. It’s been great to think about problems end-to-end & consider the impact of a change on all aspects of the system. I worked on SQL/MR in particular, which had lots of interesting challenges: how do you define the API? how do you integrate with SQL? how do you make it run fast? how do you make it scale?

Ajay- Do you think Universities offer adequate preparation for in demand skills like Mapreduce, Hadoop and Business Intelligence

Peter-   Probably not BI–I learned everything I know about BI while at Aster. In terms of M/R, it’d be useful to have more hands-on experience with distributed system which at school. We read the MapReduce paper but didn’t get a chance to actually play with M/R. I think that sort of exposure would be useful. We recently made our software available to some students taking a data mining class at Stanford, and they came up with some fascinating use cases for our system, esp. around the Netflix challenge dataset.

Ajay- Describe some of the recent engineering products that you have worked with at Aster

Peter-  SQL/MR is the main aspects of nCluster that i’ve worked with–interesting challenged described in #2.

Ajay- All BI companies claim to crunch data the fastest at the lowest price at highest quality as per their marketing brochure- How would you validate your product’s performance scientifically and transparently.

Peter- I’ve found that the hardest part of judging performance is to come up with a realistic workload. There are public benchmarks out there, but they may or may not reflect the kinds of workloads that our customers want to run. Our goal is to make our customers’ experience as good as possible, so we focus on speeding up the sorts of workloads they ask about.
And here is a presentation at Slideshare.net on more of what Peter works on.