Shawn kung – DECISION STATS

Here is an interview with Shawn Kung, Senior Director of Product Management at Aster Data. Shawn explains the difference between the various database technologies, Aster’s rising appeal to its unique technological approach and touches upon topics of various other interests as well to people in the BI and technology space.

Ajay -Describe your career journey from a high school student of science till today .Do you think science is a more lucrative career?

Shawn: My career journey has spanned over a decade in several Silicon Valley technology companies. In both high school and my college studies at Princeton, I had a fervent interest in math and quantitative economics. Silicon Valley drew me to companies like upstart procurement software maker Ariba and database giant Oracle. I continued my studies by returning to get a Master’s in Management Science at Stanford before going on to lead core storage systems for nearly 5 years at NetApp and subsequently Aster.

Science (whether it is math, physics, economics, or the hard engineering sciences) provides a solid foundation. It teaches you to think and test your assumptions – those are valuable skills that can lead to a both a financially lucrative and personally inspiring career.

Ajay- How would you describe the difference between Map Reduce and Hadoop and Oracle and SAS, DBMS and Teradata and Aster Data products to a class of undergraduate engineers ?

Shawn: Let’s start with the database guys – Oracle and Teradata. They focus on structured data – data that has a logical schema and is manipulated via a standards-based structured query language (SQL). Oracle tries to be everything to everyone – it does OLTP (low-latency transactions like credit card or stock trade execution apps) and some data warehousing (typically summary reporting). Oracle’s data warehouse is not known for large-scale data warehousing and is more often used for back-office reporting.

Teradata is focused on data warehousing and scales very well, but is extremely expensive – it runs on high-end custom hardware and takes a mainframe approach to data processing. This approach makes less sense as commodity hardware becomes more compute-rich and better software comes along to support large-scale MPP data warehousing.

SAS is very different – it’s not a relational database. It really offers an application platform for data analysis, specifically data mining. Unlike Oracle and Teradata which is used by SQL developers and managed by DBAs, SAS is typically run in business units by data analysts – for example a quantitative marketing analyst, a statistician/mathematician, or a savvy engineer with a data mining/math background. SAS is used to try to find patterns, understand behaviors, and offer predictive analytics that enable businesses to identify trends and make smarter decisions than their competitors.

Hadoop offers an open-source framework for large-scale data processing. MapReduce is a component of Hadoop, which also contains multiple other modules including a distributed filesystem (HDFS). MapReduce offers a programming paradigm for distributed computing (a parallel data flow processing framework).

Both Hadoop and MapReduce are catered toward the application developer or programmer. It’s not catered for enterprise data centers or IT. If you have a finite project in a line of business and want to get it done, Hadoop offers a low-cost way to do this. For example, if you want to do large-scale data munging like aggregations, transformations, manipulations of unstructured data – Hadoop offers a solution for this without compromising on the performance of your main data warehouse. Once the data munging is finished, the post-processed data set can be loaded into a database for interactive analysis or analytics. It is a great combination of big data technologies for certain use-cases.

Aster takes a very unique approach. Our Aster nCluster software offers the best of all worlds – we offer the potential for deep analytics of SAS, the low-cost scalability and parallel processing of Hadoop/MapReduce, and the structured data advantages (schema, SQL, ACID compliance and transactional integrity, indexes, etc) of a relational database like Teradata and Oracle. Often, we find complementary approaches and therefore view SAS and Hadoop/MapReduce as synergistic to a complete solution. Data warehouses like Teradata and Oracle tend to be more competitive.

Ajay- What exciting products have you launched so far and what makes them unique both from a technical developer perspective and a business owner perspective

Shawn: Aster was the first-to-market to offer In-Database MapReduce, which provides the standards and familiarity of SQL and databases with the analytic power of MapReduce. This is very unique as it offers technical developers and application programmers to write embedded procedural algorithms once, upload it, and allow business analysts or IT folks (SQL developers, DBAs, etc) to invoke these SQL-MapReduce functions forever.

It is highly polymorphic (re-usable), highly fault-tolerant, highly flexible (any language – Java, Python, Ruby, Perl, R statistical language, C# in the .NET world, etc) and natively massively parallel – all of which differentiate these SQL extensions from traditional dumb user-defined functions (UDFs).

Ajay- “I am happy with my databases and I don’t need too much diversity or experimentation in my systems”, says a CEO to you.

How do you convince him using quantitative numbers and not marketing adjectives?

Shawn: Aster has dozens of production customers including big-names like MySpace, LinkedIn, Akamai, Full Tilt Poker, comScore, and several yet-to-be-named retail and financial service accounts. We have quantified proof points that show orders of magnitude improvements in scalability, performance, and analytic insights compared to incumbent or competitor solutions. Our highly referenceable customers would be happy to discuss their positive experiences with the CEO.

But taking a step back, there’s a fundamental concept that this CEO needs to first understand. The world is changing – data growth is proliferating due to the digitization of so many applications and the emergence of unstructured data and new data types. Like the book “Competing on Analytics”, the world is shifting to a paradigm where companies that don’t take risks and push the limits on analytics will die like the dinosaurs.

IDC is projecting 10x+ growth in data over the next few years to zetabytes of aggregate data driven by digitization (Internet, digital television, RFID, etc). The data is there and in order to compete effectively and understand your customers more intimately, you need a large-scale analytics solution like the one Aster nCluster offers. If you hold off on experimentation and innovation, it will be too late by the time you realize you have a problem at hand.

Ajay- How important is work life balance for you?

Shawn: Very important. I hang out with my wife most weekends – we do a lot of outdoors activities like hiking and gardening. In Silicon Valley, it’s all too easy to get caught up in the rush of things. Taking breaks, especially during the weekend, is important to recharge and re-energize to be as productive as possible.

Ajay- Are you looking for college interns and new hires what makes aster exciting for you so you are pumped up every day to go to work?

Shawn: We’re always looking for smart, innovative, and entrepreneurial new college grads and interns, especially on the technical side. So if you are a computer science major or recent grad or graduate student, feel free to contact us for opportunities.

What makes Aster exciting is 2 things –

first, the people. Everyone is very smart and innovative so you learn a tremendous amount, which is personally gratifying and professionally useful long-term.

Second, Aster is changing the world!

Distributed systems computing focused on big data processing and analytics – these are massive game-changers that will fundamentally change the landscape in data warehousing and analytics. Traditional databases have been a oligopoly for over a generation – they haven’t been challenged and so the 1970’s based technology has stuck around. The emergence of big data and low-cost commodity hardware has created a unique opportunity to carve out a brand new market…

what gets me pumped every day is I have the ability to contribute to a pioneer that is quickly becoming Silicon Valley’s next great success story!

Biography-

Over the past decade, Shawn has led product management for some of Silicon Valley’s most successful and innovative technology companies. Most recently, he spent nearly 5 years at Network Appliance leading Core Systems storage product management, where he oversaw the development of high availability software and Storage Systems hardware products that grew in annual revenue from $200M to nearly $800M. Prior to NetApp, Shawn held senior product management and corporate strategy roles at Oracle Corporation and Ariba Inc.

Shawn holds an M.S. in Management Science and engineering from Stanford University, where he was awarded the Valentine Fellowship (endowed by Don Valentine of Sequoia Capital). He also received a B.A. with high honors from Princeton University.

About Aster

Aster Data Systems is a proven leader in high-performance database systems for data warehousing and analytics – the first DBMS to tightly integrate SQL with MapReduce – providing deep insights on data analyzed on clusters of low-cost commodity hardware. The AsternCluster database cost-effectively powers frontline analytic applications for companies such as MySpace, aCerno (an Akamai company), and ShareThis.

Running on low-cost off-the-shelf hardware, and providing ‘hands-free’ administration, Aster enables enterprises to meet their data warehousing needs within their budget. Aster is headquartered in San Carlos, California and is backed by Sequoia Capital, JAFCO Ventures, IVP, Cambrian Ventures, and First-Round Capital, as well as industry visionaries including David Cheriton and Ron Conway.