Home » Interviews » Interview Shawn Kung Sr Director Aster Data

Interview Shawn Kung Sr Director Aster Data

R in the Cloud

R for Quantitative Finance

Software

Train in R

Here is an interview with Shawn Kung, Senior Director of Product Management at Aster Data. Shawn explains the difference between the various database technologies, Aster’s rising appeal to its unique technological approach and touches upon topics of various other interests as well to people in the BI and technology space.

image001

Ajay -Describe your career journey from a high school student of science till today .Do you think science is a more lucrative career?

Shawn: My career journey has spanned over a decade in several Silicon Valley technology companies.  In both high school and my college studies at Princeton, I had a fervent interest in math and quantitative economics.  Silicon Valley drew me to companies like upstart procurement software maker Ariba and database giant Oracle.  I continued my studies by returning to get a Master’s in Management Science at Stanford before going on to lead core storage systems for nearly 5 years at NetApp and subsequently Aster.

Science (whether it is math, physics, economics, or the hard engineering sciences) provides a solid foundation.  It teaches you to think and test your assumptions – those are valuable skills that can lead to a both a financially lucrative and personally inspiring career.

Ajay- How would you describe the difference between Map Reduce and Hadoop and Oracle and SAS, DBMS and Teradata and Aster Data products to a class of undergraduate engineers ?

Shawn: Let’s start with the database guys – Oracle and Teradata.  They focus on structured data – data that has a logical schema and is manipulated via a standards-based structured query language (SQL).  Oracle tries to be everything to everyone – it does OLTP (low-latency transactions like credit card or stock trade execution apps) and some data warehousing (typically summary reporting).  Oracle’s data warehouse is not known for large-scale data warehousing and is more often used for back-office reporting.

Teradata is focused on data warehousing and scales very well, but is extremely expensive – it runs on high-end custom hardware and takes a mainframe approach to data processing.  This approach makes less sense as commodity hardware becomes more compute-rich and better software comes along to support large-scale MPP data warehousing.

SAS is very different – it’s not a relational database. It really offers an application platform for data analysis, specifically data mining.  Unlike Oracle and Teradata which is used by SQL developers and managed by DBAs, SAS is typically run in business units by data analysts – for example a quantitative marketing analyst, a statistician/mathematician, or a savvy engineer with a data mining/math background.  SAS is used to try to find patterns, understand behaviors, and offer predictive analytics that enable businesses to identify trends and make smarter decisions than their competitors.

Hadoop offers an open-source framework for large-scale data processing.  MapReduce is a component of Hadoop, which also contains multiple other modules including a distributed filesystem (HDFS).  MapReduce offers a programming paradigm for distributed computing (a parallel data flow processing framework).

Both Hadoop and MapReduce are catered toward the application developer or programmer.  It’s not catered for enterprise data centers or IT.  If you have a finite project in a line of business and want to get it done, Hadoop offers a low-cost way to do this.  For example, if you want to do large-scale data munging like aggregations, transformations, manipulations of unstructured data – Hadoop offers a solution for this without compromising on the performance of your main data warehouse.  Once the data munging is finished, the post-processed data set can be loaded into a database for interactive analysis or analytics. It is a great combination of big data technologies for certain use-cases.

Aster takes a very unique approach.  Our Aster nCluster software offers the best of all worlds – we offer the potential for deep analytics of SAS, the low-cost scalability and parallel processing of Hadoop/MapReduce, and the structured data advantages (schema, SQL, ACID compliance and transactional integrity, indexes, etc) of a relational database like Teradata and Oracle.  Often, we find complementary approaches and therefore view SAS and Hadoop/MapReduce as synergistic to a complete solution.  Data warehouses like Teradata and Oracle tend to be more competitive.

Ajay- What exciting products have you launched so far and what makes them unique both from a technical developer perspective and a business owner perspective

Shawn: Aster was the first-to-market to offer In-Database MapReduce, which provides the standards and familiarity of SQL and databases with the analytic power of MapReduce.  This is very unique as it offers technical developers and application programmers to write embedded procedural algorithms once, upload it, and allow business analysts or IT folks (SQL developers, DBAs, etc) to invoke these SQL-MapReduce functions forever.

It is highly polymorphic (re-usable), highly fault-tolerant, highly flexible (any language – Java, Python, Ruby, Perl, R statistical language, C# in the .NET world, etc) and natively massively parallel – all of which differentiate these SQL extensions from traditional dumb user-defined functions (UDFs).

Ajay- “I am happy with my databases and I don’t need too much diversity or experimentation in my systems”, says a CEO to you.

How do you convince him using quantitative numbers and not marketing adjectives?

Shawn: Aster has dozens of production customers including big-names like MySpace, LinkedIn, Akamai, Full Tilt Poker, comScore, and several yet-to-be-named retail and financial service accounts.  We have quantified proof points that show orders of magnitude improvements in scalability, performance, and analytic insights compared to incumbent or competitor solutions.  Our highly referenceable customers would be happy to discuss their positive experiences with the CEO.

But taking a step back, there’s a fundamental concept that this CEO needs to first understand.  The world is changing – data growth is proliferating due to the digitization of so many applications and the emergence of unstructured data and new data types.  Like the book “Competing on Analytics”, the world is shifting to a paradigm where companies that don’t take risks and push the limits on analytics will die like the dinosaurs.

IDC is projecting 10x+ growth in data over the next few years to zetabytes of aggregate data driven by digitization (Internet, digital television, RFID, etc).  The data is there and in order to compete effectively and understand your customers more intimately, you need a large-scale analytics solution like the one Aster nCluster offers.  If you hold off on experimentation and innovation, it will be too late by the time you realize you have a problem at hand.

Ajay- How important is work life balance for you?

Shawn: Very important.  I hang out with my wife most weekends – we do a lot of outdoors activities like hiking and gardening.  In Silicon Valley, it’s all too easy to get caught up in the rush of things.  Taking breaks, especially during the weekend, is important to recharge and re-energize to be as productive as possible.

Ajay- Are you looking for college interns and new hires what makes aster exciting for you so you are pumped up every day to go to work?

Shawn: We’re always looking for smart, innovative, and entrepreneurial new college grads and interns, especially on the technical side.  So if you are a computer science major or recent grad or graduate student, feel free to contact us for opportunities.

What makes Aster exciting is 2 things –

first, the people.  Everyone is very smart and innovative so you learn a tremendous amount, which is personally gratifying and professionally useful long-term.

Second, Aster is changing the world!

Distributed systems computing focused on big data processing and analytics – these are massive game-changers that will fundamentally change the landscape in data warehousing and analytics.  Traditional databases have been a oligopoly for over a generation – they haven’t been challenged and so the 1970’s based technology has stuck around.  The emergence of big data and low-cost commodity hardware has created a unique opportunity to carve out a brand new market…

what gets me pumped every day is I have the ability to contribute to a pioneer that is quickly becoming Silicon Valley’s next great success story!

Biography-

Over the past decade, Shawn has led product management for some of Silicon Valley’s most successful and innovative technology companies.  Most recently, he spent nearly 5 years at Network Appliance leading Core Systems storage product management, where he oversaw the development of high availability software and Storage Systems hardware products that grew in annual revenue from $200M to nearly $800M.  Prior to NetApp, Shawn held senior product management and corporate strategy roles at Oracle Corporation and Ariba Inc.

Shawn holds an M.S. in Management Science and engineering from Stanford University, where he was awarded the Valentine Fellowship (endowed by Don Valentine of Sequoia Capital).  He also received a B.A. with high honors from Princeton University.

About Aster

Aster Data Systems is a proven leader in high-performance database systems for data warehousing and analytics – the first DBMS to tightly integrate SQL with MapReduce – providing deep insights on data analyzed on clusters of low-cost commodity hardware. The AsternCluster database cost-effectively powers frontline analytic applications for companies such as MySpace, aCerno (an Akamai company), and ShareThis.

Running on low-cost off-the-shelf hardware, and providing ‘hands-free’ administration, Aster enables enterprises to meet their data warehousing needs within their budget. Aster is headquartered in San Carlos, California and is backed by Sequoia Capital, JAFCO Ventures, IVP, Cambrian Ventures, and First-Round Capital, as well as industry visionaries including David Cheriton and Ron Conway.


1 Comment

  1. James Judge says:

    what exactly is “…. a mainframe approach to data processing. …” Since the author has only been involved in this arena for the last decade mainframe computing has been out of vogue before that, where does his experience come from? Sounds like he is “parroting” someone else’s words.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Predictive Analytics- The Book

Conferences

Books

Follow

Get every new post delivered to your Inbox.

Join 735 other followers