Interview Stephanie McReynolds Director Product Marketing, AsterData

Here is an interview with Stephanie McReynolds who works as as Director of Product Marketing with AsterData. I asked her a couple of questions about the new product releases from AsterData in analytics and MapReduce.

Ajay – How does the new Eclipse Plugin help people who are already working with huge datasets but are new to AsterData’s platform?

Stephanie- Aster Data Developer Express, our new SQL-MapReduce development plug-in for Eclipse, makes MapReduce applications easy to develop. With Aster Data Developer Express, developers can develop, test and deploy a complete SQL-MapReduce application in under an hour. This is a significant increase in productivity over the traditional analytic application development process for Big Data applications, which requires significant time coding applications in low-level code and testing applications on sample data.

Ajay – What are the various analytical functions that are introduced by you recently- list say the top 10.

Stephanie- At Aster Data, we have an intense focus on making the development process easier for SQL-MapReduce applications. Aster Developer Express is a part of this initiative, as is the release of pre-defined analytic functions. We recently launched both a suite of analytic modules and a partnership program dedicated to delivering pre-defined analytic functions for the Aster Data nCluster platform. Pre-defined analytic functions delivered by Aster Data’s engineering team are delivered as modules within the Aster Data Analytic Foundation offering and include analytics in the areas of pattern matching, clustering, statistics, and text analysis– just to name a few areas. Partners like Fuzzy Logix and Cobi Systems are extending this library by delivering industry-focused analytics like Monte Carlo Simulations for Financial Services and geospatial analytics for Public Sector– to give you a few examples.

Ajay – So okay I want to do a K Means Cluster on say a million rows (and say 200 columns) using the Aster method. How do I go about it using the new plug-in as well as your product.

Stephanie- The power of the Aster Data environment for analytic application development is in SQL-MapReduce. SQL is a powerful analytic query standard because it is a declarative language. MapReduce is a powerful programming framework because it can support high performance parallel processing of Big Data and extreme expressiveness, by supporting a wide variety of programming languages, including Java, C/C#/C++, .Net, Python, etc. Aster Data has taken the performance and expressiveness of MapReduce and combined it with the familiar declarativeness of SQL. This unique combination ensures that anyone who knows standard SQL can access advanced analytic functions programmed for Big Data analysis using MapReduce techniques.

kMeans is a good example of an analytic function that we pre-package for developers as part of the Aster Data Analytic Foundation. What does that mean? It means that the MapReduce portion of the development cycle has been completed for you. Each pre-packaged Aster Data function can be called using standard SQL, and executes the defined analytic in a fully parallelized manner in the Aster Data database using MapReduce techniques. The result? High performance analytics with the expressiveness of low-level languages accessed through declarative SQL.

Ajay – I see an an increasing focus on Analytics. Is this part of your product strategy and how do you see yourself competing with pure analytics vendors.

Stephanie – Aster Data is an infrastructure provider. Our core product is a massively parallel processing database called nCluster that performs at or beyond the capabilities of any other analytic database in the market today. We developed our analytics strategy as a response to demand from our customers who were looking beyond the price/performance wars being fought today and wanted support for richer analytics from their database provider. Aster Data analytics are delivered in nCluster to enable analytic applications that are not possible in more traditional database architectures.

Ajay – Name some recent case studies in Analytics of implementation of MR-SQL with Analytical functions

Stephanie – There are three new classes of applications that Aster Data Express and Aster Analytic Foundation support: iterative analytics, prediction and optimization, and ad hoc analysis.

Aster Data customers are uncovering critical business patterns in Big Data by performing hypothesis-driven, iterative analytics. They are exploring interactively massive volumes of data—terabytes to petabytes—in a top-down deductive manner. ComScore, an Aster Data customer that performs website experience analysis is a good example of an Aster Data customer performing this type of analysis.

Other Aster Data customers are building applications for prediction and optimization that discover trends, patterns, and outliers in data sets. Examples of these types of applications are propensity to churn in telecommunications, proactive product and service recommendations in retail, and pricing and retention strategies in financial services. Full Tilt Poker, who is using Aster Data for fraud prevention is a good example of a customer in this space.

The final class of application that I would like to highlight is ad hoc analysis. Examples of ad hoc analysis that can be performed includes social network analysis, advanced click stream analysis, graph analysis, cluster analysis and a wide variety of mathematical, trigonometry, and statistical functions. LinkedIn, whose analysts and data scientists have access to all of their customer data in Aster Data are a good example of a customer using the system in this manner.

While Aster Data customers are using nCluster in a number of other ways, these three new classes of applications are areas in which we are seeing particularly innovative application development.

Biography-

Stephanie McReynolds is Director of Product Marketing at Aster Data, where she is an evangelist for Aster Data’s massively parallel data-analytics server product. Stephanie has over a decade of experience in product management and marketing for business intelligence, data warehouse, and complex event processing products at companies such as Oracle, Peoplesoft, and Business Objects. She holds both a master’s and undergraduate degree from Stanford University.

MapReduce Analytics Apps- AsterData's Developer Express Plugin

AsterData continues to wow with it’s efforts on bridging MapReduce and Analytics, with it’s new Developer Express plug-in for Eclipse. As any Eclipse user knows, that greatly improves ability to write code or develop ( similar to creating Android apps if you have tried to). I did my winter internship at AsterData last December last year in San Carlos, and its an amazing place with giga-level bright people.

Here are some details ( Note I plan to play a bit more on the plugin on my currently downUbuntu on this and let you know)

http://marketplace.eclipse.org/content/aster-data-developer-express-plug-eclipse

Aster Data Developer Express provides an integrated set of tools for development of SQL and MapReduce analytics for Aster Data nCluster, a massively parallel database with an integrated analytics engine.

The Aster Data Developer Express plug-in for Eclipse enables developers to easily create new analytic application projects with the help of an intuitive set of wizards, immediately test their applications on their desktop, and push down their applications into the nCluster database with a single click.

Using Developer Express, analysts can significantly reduce the complexity and time needed to create advanced analytic applications so that they can more rapidly deliver deeper and richer analytic insights from their data.

and from the Press Release

Now, any developer or analyst that is familiar with the Java programming language can complete a rich analytic application in under an hour using the simple yet powerful Aster Data Developer Express environment in Eclipse. Aster Data Developer Express delivers both rapid development and local testing of advanced analytic applications for any project, regardless of size.

The free, downloadable Aster Data Developer Express IDE now brings the power of SQL-MapReduce to any organization that is looking to build richer analytic applications that can leverage massive data volumes. Much of the MapReduce coding, including programming concepts like parallelization and distributed data analysis, is addressed by the IDE without the developer or analyst needing to have expertise in these areas. This simplification makes it much easier for developers to be successful quickly and eliminates the need for them to have any deep knowledge of the MapReduce parallel processing framework. Google first published MapReduce in 2004 for parallel processing of big data sets. Aster Data has coupled SQL with MapReduce and brought SQL-MapReduce to market, making it significantly easier for any organization to leverage the power of MapReduce. The Aster Developer Express IDE simplifies application development even further with an intuitive point-and-click development environment that speeds development of rich analytic applications. Applications can be validated locally on the desktop or ultimately within Aster Data nCluster, a massive parallel processing (MPP) database with a fully integrated analytics engine that is powered by MapReduce—known as a data-analytics server.

Rich analytic applications that can be easily built with Aster Data’s downloadable IDE include:

Iterative Analytics: Uncovering critical business patterns in your data requires hypothesis-driven, iterative analysis.  This class of applications is defined by the exploratory navigation of massive volumes of data in a top-down, deductive manner.  Aster Data’s IDE makes this easy to develop and to validate the algorithms and functions required to deliver these advanced analytic applications.

Prediction and Optimization: For this class of applications, the process is inductive. Rather than starting with a hypothesis, developers and analysts can easily build analytic applications that discover the trends, patterns, and outliers in data sets.  Examples include propensity to churn in telecommunications, proactive product and service recommendations in retail, and pricing and retention strategies in financial services.

Ad Hoc Analysis: Examples of ad hoc analysis that can be performed includes social network analysis, advanced click stream analysis, graph analysis, cluster analysis, and a wide variety of mathematical, trigonometry, and statistical functions.

“Aster Data’s IDE and SQL-MapReduce significantly eases development of advanced analytic applications on big data. We have now built over 350 analytic functions in SQL-MapReduce on Aster Data nCluster that are available for customers to purchase,” said Partha Sen, CEO and Founder of Fuzzy Logix. “Aster Data’s implementation of MapReduce with SQL-MapReduce goes beyond the capabilities of general analytic development APIs and provides us with the excellent control and flexibility needed to implement even the most complex analytic algorithms.”

Richer analytics on big data volumes is the new competitive frontier. Organizations have always generated reports to guide their decision-making. Although reports are important, they are historical sets of information generally arranged around predefined metrics and generated on a periodic basis.

Advanced analytics begins where reporting leaves off. Reporting often answers historical questions such as “what happened?” However, analytics addresses “why it happened” and, increasingly, “what will happen next?” To that end, solutions like Aster Data Developer Express ease the development of powerful ad hoc, predictive analytics and enables analysts to quickly and deeply explore terabytes to petabytes of data.
“We are in the midst of a new age in analytics. Organizations today can harness the power of big data regardless of scale or complexity”, said Don Watters, Chief Data Architect for MySpace. “Solutions like the Aster Data Developer Express visual development environment make it even easier by enabling us to automate aspects of development that currently take days, allowing us to build rich analytic applications significantly faster. Making Developer Express openly available for download opens the power of MapReduce to a broader audience, making big data analytics much faster and easier than ever before.”

“Our delivery of SQL coupled with MapReduce has clearly made it easier for customers to build highly advanced analytic applications that leverage the power of MapReduce. The visual IDE, Aster Data Developer Express, introduced earlier this year, made application development even easier and the great response we have had to it has driven us to make this open and freely available to any organization looking to build rich analytic applications,” said Tasso Argyros, Founder and CTO, Aster Data. “We are excited about today’s announcement as it allows companies of all sizes who need richer analytics to easily build powerful analytic applications and experience the power of MapReduce without having to learn any new skills.”

You can have a look here at http://www.asterdata.com/download_developer_express/

R Oracle Data Mining

Here is a new package called R ODM and it is an interface to do Data Mining via Oracle Tables through R. You can read more here http://www.oracle.com/technetwork/database/options/odm/odm-r-integration-089013.html and here http://cran.fhcrc.org/web/packages/RODM/RODM.pdf . Also there is a contest for creative use of R and ODM.

R Interface to Oracle Data Mining

The R Interface to Oracle Data Mining ( R-ODM) allows R users to access the power of Oracle Data Mining’s in-database functions using the familiar R syntax. R-ODM provides a powerful environment for prototyping data analysis and data mining methodologies.

R-ODM is especially useful for:

  • Quick prototyping of vertical or domain-based applications where the Oracle Database supports the application
  • Scripting of “production” data mining methodologies
  • Customizing graphics of ODM data mining results (examples: classificationregressionanomaly detection)

The R-ODM interface allows R users to mine data using Oracle Data Mining from the R programming environment. It consists of a set of function wrappers written in source R language that pass data and parameters from the R environment to the Oracle RDBMS enterprise edition as standard user PL/SQL queries via an ODBC interface. The R-ODM interface code is a thin layer of logic and SQL that calls through an ODBC interface. R-ODM does not use or expose any Oracle product code as it is completely an external interface and not part of any Oracle product. R-ODM is similar to the example scripts (e.g., the PL/SQL demo code) that illustrates the use of Oracle Data Mining, for example, how to create Data Mining models, pass arguments, retrieve results etc.

R-ODM is packaged as a standard R source package and is distributed freely as part of the R environment’s Comprehensive R Archive Network ( CRAN). For information about the R environment, R packages and CRAN, see www.r-project.org.

and

Present and win an Apple iPod Touch!
The BI, Warehousing and Analytics (BIWA) SIG is giving an Apple iPOD Touch to the best new presenter. Be part of the TechCast series and get a chance to win!

Consider highlighting a creative use of R and ODM.

BIWA invites all Oracle professionals (experts, end users, managers, DBAs, developers, data analysts, ISVs, partners, etc.) to submit abstracts for 45 minute technical webcasts to our Oracle BIWA (IOUG SIG) Community in our Wednesday TechCast series. Note that the contest is limited to new presenters to encourage fresh participation by the BIWA community.

Also an interview with Oracle Data Mining head, Charlie Berger https://decisionstats.wordpress.com/2009/09/02/oracle/

Cloud Computing across LAN’s ?

The concept of cloud computing is interesting and actually quite old. It lacked major backing till Google came along and is now increasingly seen as the alternative to PC (given that other alternatives like Tablet PC came and went).

This diagram and definition is from Wikipedia of course ”

Cloud computing refers to computing resources being accessed which are typically owned and operated by a third-party provider on a consolidated basis in Data Center locations. Consumers of cloud computing services purchase computing capacity on-demand and are not generally concerned with the underlying technologies used to achieve the increase in server capability. There are however increasing options for developers that allow for platform services in the cloud where developers do care about the underlying technology.”

What prevents local area networks from enforcing clouds beats me. Put all the apps and ALL the storage on the server.Since most PC OEMS insist on their standard 80 gb hard disk configuration, the IT team of a company has to work harder to enforce it, but once done – They have lower tickets to attend to. Just put thin shell ubuntu PC’s with open office on each local machine. This also makes compliance and productivity tracking much easier to do- just check the server logs. Bottlenecks of course remain that IT Compliance in companies rarely seeks to maximize business value, thus ensuring they are the first to be transferred  to other teams or downsized in downturns as a cost unit not as a core unit.

You can also try Google Apps for enterprise for such initiatives. The software is now ready which wasnt the case a few years back.