From http://www-03.ibm.com/systems/z/solutions/cloud/smart.html, IBM the parent of SPSS announced a Smart Analytics Cloud.

From http://www-03.ibm.com/systems/z/solutions/cloud/smart.html, IBM the parent of SPSS announced a Smart Analytics Cloud.

My favorite ( as of now) company in Big Data is Aster Data* ( I am partial to companies founded by Stanford Alumni having interacted with a lot of them while working with Trilogy- another Stanford dropout alumni company. There are also not too many Silicon Valley startups by us famously non intellectual Punjus
Q- What is the culture in Punjab A- In Punjab the only culture is agriculture)
Aster Data has correctly hit the marketing hammer on the nail of bigger data and with the quantities of data expanding rapidly this is a lucrative market to get into ( as pointed by our favorite analytics journal NY Times)
Aster Data’s products of nCluster and nPath with MapReduce SQL, and the recent interactions with SAS Institute hold them in a nice promising place but with miles to go before they even rest ( or start thinking of that IPO)
Aster were present at Data Mining 2009 with terrific response to their booth.
As a techie wannabe stats frat boy, I like the Aster nPath product more (Time Series) but the analytics within database claim with nCluster needs to be investigated and even tested further. Especially if you need three days to get your monthly summary report.
( *and also an advertiser, sponsor to Big Data Summit as per FCC regulations)
The Data Services and Applications with flexibility for cloud computing is what makes this especially appealing from a product perspective while their relatively small size ( as compared to other bigger Vend- ORs) gives alliance partners more leverage in colloborating in Research and Design and maybe even co bundling applications.
Screenshots- Courtesy -The Lovely www.asterdata.com website ( Webmasters of other websites especially IBM and Oracle’s should take note how a website can have lots of content and yet be readable)
Also I will be posting the remaining Data Mining 2009 interviews shortly (including Part 2 with Anne) and share some/all of the presentations via SlideShare embedding in WordPress.com ( post permission).
As for the Aster Data Interviews- I owe Peter Pavloski and the readers one. Coming up soon.
Submit a presentation BI/DW & Analytics – Brain-Powered by the BIWA SIG The IOUG is excited to partner with the BIWA SIG to present a special “conference within a conference” – Get Analytical with BIWA Training Days — on Business Intelligence and Data Warehousing, held in conjunction with COLLABORATE 10 – IOUG Forum. Those interested in Analytics, BI, Data Warehousing, Enterprise Performance Management (EPM) and OBIEE are encouraged to participate in this special forum. For full track descriptions click here.
Don’t miss your chance to attend COLLABORATE 10, April 18-22 in Las Vegas, Nevada, for free!*Submit a presentation** through the IOUG Forum by this FRIDAY, October 23 for the chance to:
So what are you waiting for? Submit your Business Intelligence, Warehousing and Analytics presentation before the deadline — FRIDAY, October 23.
*Technical session and Deep Dive speakers receive complimentary registration to the full conference; and Quick Tip speakers receive 50% off the early bird registration rate.
**All Oracle employees interested in speaking at COLLABORATE 10 should contact Lisa Stuart at lisa.stuart@oracle.com. Please do not submit papers through the official COLLABORATE 10 call for speakers until approval is received.
***Register with code BIWA2010 for discounts and BI-nefits
An interview with Carole Jesse, an experienced Analytics professional in SAS, JMP , analytics and Risk Management.

Ajay- Describe your career in science from school to now.
Carole- Truthfully, my career in science started in 7th grade. Hey, I know this is further back in time than you intended the question to go! However, something significant happened that year that pretty much set me on the path that I am still on today. I discovered Algebra. Up to that point in time, I was an average student in ‘arithmetic’. Algebra introduced LETTERS into the mix with numbers, in the simplest of ways that we have all seen: ‘Solve for x in the equation x+2=5’. That was something I could get behind, AND I excelled at it immediately. Without mathematical excellence, efforts in learning science can fall apart. Mathematics is everywhere!
I spent the rest of my secondary education consuming all the math and science that I could get. By the time I entered college I had already been exposed to pre-calculus and physics and was actually surprised by those in my college Freshman courses who had not seen anti-derivatives, memorized the quotient rule, or worked an inclined plane friction problem before.
My goal as an undergraduate was to become a Veterinarian. The beauty of a pre-Vet curriculum is that it is pretty much like pre-Med, rigorous and broad in the sciences. In my first two years of undergraduate work, I was exposed to more Chemistry, more Mathematics, more Physics, along with things like Genetics, Biology, even the Plant and Animal Sciences. Although I did not stick with my pursuit of Veterinary Medicine, it laid a solid foundation that has served me very well in the strangest of places.
I consider myself a Mathematician/Statistician due to my academic degrees in those areas, first a BS in Mathematics/Physics at the University of Wisconsin followed by a MS in Statistics at Montana State University. In between the BS and MS I also dabbled briefly in Electrical Engineering at the University of Minnesota.
Since academia, it is my breadth in ALL sciences which has allowed me to be very fluid in straddling diverse industries: from High Volume Manufacturing of Consumer Products, to Nuclear Energy, to Semiconductor Manufacturing/Packaging, to Financial Services, to Health Care. I succeed at business problem solving in these industries by applying my Statistical Methods knowledge, coupled with business acumen and peripheral understanding of the technologies used. I have worked closely with scientists and engineers, and could enter THEIR world speaking THEIR language, which was an aid in getting to these solutions quickly.
I can not place enough emphasis on the importance of exposure to a broad range of sciences, and as early as possible, for anyone who wants to be involved in Advanced Analytics and Business Intelligence. As a manager, I look closely at candidates for these diverse sorts of backgrounds.
Ajay- I find the number of computer scientists and analysts to be overwhelmingly male despite this being a lucrative profession. Do you think that BI and Analytics are male dominated? How can the trend be re-shaped?
Carole- Welcome to my world! All kidding aside, yes that has been my observation as well. While I am not versed in the specifics of actual gender statistics in Computer Science and Advanced Analytics versus other fields, based on my years in and around these fields, there does appear to be a bias.
This is not due to a lack of capability or interest in these fields on the part of women. I believe it is more due to the long history of cultural norms and negative social messages that perhaps push woman away from these fields. The messages can be subtle, but if you pay close attention, you will see them. Being one of 10 females in an undergraduate engineering class of 150 students has a message right there. Even though these 10 women were able to make entry to the class, the pressure of being a minority, whether gender based or otherwise, can be a powerful influencer in remaining there.
In my own experience, I have encountered frequent judgments where I was made to feel “good at math” was an unacceptable trait for a woman to have. It is important to note that these judgments have been delivered equally by men AND women. So I think until both genders develop higher expectations of women in the hard science areas, the trends will continue. It has been decades since my 7th grade introduction to algebra, but it appears the negative social messages regarding girls in math and science are still present today. Otherwise there would be no need (i.e. no market) for books like Danica McKellar’s “Math Doesn’t Suck,” and the follow-up “Kiss My Math,” both aimed at battling these negative messages at the middle school level.
As to how I have battled these cultural expectations, I developed a thick skin. I have also learned to expect excellence from myself even when a teacher, or a peer, or a boss may have had lower expectations for me than for a male counterpart. Sort of a John Mayer “Who Says” type of attitude. Who says I can’t do Math and Science. Watch me.
Ajay- How would you explain Risk Management using software to a class of graduate students in mathematics and statistics?
Carole- There are many areas of Risk Management. My specific experience has been on the Credit Risk Management and Fraud Risk Management sides in a couple of industries. For credit risk in financial services, typically there is a specific department whose role is to quantify and predict credit risk. Not just for the current portfolio, but for new products as well. Various methodologies are utilized, ranging from summarization of portfolio characteristics that have a known relationship to default to using historical data to build out predictive models for production implementation.
Key skills needed here are good understanding of the business, solid statistical methods knowledge, and computing skills. As far as the computing /software skills needed, there are three main categories 1) query and preparation of data, 2) model building and validation, and 3) model implementation. The actual tools will likely differ across these categories.
For example, 1) might be tackled with SAS®, Business Objects, or straight SQL;
2) requires a true modeling package or coding language like SAS®, SPSS, R, etc; and lastly
3) is the trickiest, as implementation can have many system limitations, but SAS® or C++ are often seen at implementation.
Ajay- Describe some of your most challenging and most exciting projects over the years.
Carole- I have been very fortunate to have many challenges and good projects in every role I have been in, but as I look back today, some things that stand out the most were in ‘high tech’. By virtue of being high tech, there is no fear of technology, and it is fast-paced and ever evolving to the next generation of product.
I spent seven years in the Semiconductor industry during the 90’s at Micron Technology, Intel, and Motorola. At the beginning of that window, we left the “486 processor” world, and during that window we spanned the realm of “Pentium processors.” Moore’s Law dominated all of this. To stay competitive all of these companies embraced statistical methods to help speed up development time.
At one point, I supported a group of about 10 R&D engineers in the Design and Analysis of their process improvement and simplification experiments. This afforded me exposure to much of the leading edge research the team was working on.
I recall one project with the goal of optimizing capacitance via surface roughness of the capacitor structures. In addition to all the science involved at the manufacturing step, what made this so interesting was the difficulty in measuring capacitance at the point in the process where film roughness was introduced. All we had were surface images after this step. The semiconductor wafers had to pass through several more process steps to get to the point where capacitance could actually be measured. All of this provided challenges around the design of the experiment and the data handling and analysis.
By working closely with both the process engineer and the process technician I was able to gather the image files off the image tool that were taken from the experimental runs. I used SAS® (yes, another shameless plug for my favorite software) to process the images using Fast Fourier Transforms. Subsequently, the transformed data was correlated to the capacitance in the analysis of the experimental results. Finding the “sweet spot” for capacitance, as driven by surface roughness, provided a huge leap for this process technology team.
The challenges of today are much different than they were in the 90s. In the more recent years, I have been working with transactional data related to financial services or health care claims. The challenges manifest themselves in the sheer volume of the data. In the last decade in particular most industries have been able to put the infrastructures in place to gather and store massive amounts of data related to their businesses. The challenge of turning this data into meaningful actionable information has been equally exciting as using Fast Fourier Transforms on image processing to optimize capacitance!
Currently I am working with an Oracle database where one table in the schema has 250 million records and a couple hundred fields. I refer to this as a “Pushing Tera” situation, since this one table is close to a Terabyte in size. As far as storing the data, that is not a big deal, but working with data this large or larger is the challenge.
Different skill sets are needed here beyond those of just an analyst, data miner, or statistician. These VLDB situations have morphed me into a bit of an IT person.
Ajay- How important do you think work life balance is for people in this profession? What do you do to chill out?
Carole- I don’t think the work-life balance is any more or less important to the decision science professionals than it is to any other profession really. I have friends in many other professions like Law, Nursing, Financial Planning, etc. with the same work-life balance struggles.
We live in a busy culture that includes more and more demands placed on us professionally. Let’s face it, most of us are care-takers to someone besides ourselves. It might be a spouse, or a child, or a dog, or even an elderly parent. Therefore, a total focus on work is bound to upset the work-life balance for most of us.
My biggest struggle comes in the form of balancing the two sides of my brain. That may sound weird, but one thing you have to agree with is that all of this is pretty “Left Brained”: mathematics, statistics, business intelligence, computing, etc.
To balance this out, and tap into my Right Brain, I like to dabble in the arts to some extent. Don’t get me wrong, I am not an artist! But that doesn’t mean I can’t draw on creativity in the artistic sense. For example, this past summer I took a course on Adobe Photoshop and Illustrator at Minneapolis College of Art and Design. This provided the best of both worlds, combining software and art! In addition to learning how to remove Cindy Crawford’s mole (yes, we did this), there were some very useful projects. One of my course projects was creating my customized Twitter background. An endeavor like this provides me a ‘chilling out’ factor from the normal work world. I know of many other Left Brain leaners that do similar things, like playing a musical instrument, or painting, etc. This is another reason why I took up digital photography: more visual arts.
Volunteer work has a balancing effect too. I try to give back to the community when I can. Swinging a hammer at Habitat for Humanity, or doing record keeping for an Animal Rescue organization, are things I have participated in.
And if none of this works, I enjoy cooking for my family and friends, and plying them with wine!
Ajay- What are you views on:
Carole- Data Quality
I’d have to say I am for data Quality! Who isn’t? But the reality is that data is dirty. That “Pushing Tera” Oracle table I mentioned earlier, well it turns out it has some issues. And it is incumbent upon me to determine the quality of that data before attempting to do anything analytical with it. One place in industry where value enhancement are needed: database administrators with business knowledge. It seems that more times than not, even if there was a business savvy DBA they may have moved on, leaving the consumers of that data (that would be me) to fend for themselves. There is some debate over which philosopher said “Know thyself.” Today’s job challenge is to “Know thy data” or perhaps “Value those that know thy data.”
B) Predictive Analytics for Fraud Monitoring
There is a huge market for analytics in fraud detection and prevention. But it is not for the faint of heart. Insiders, at least in Mortgage and Health Care, are the typical perpetrators of lucrative fraud. These insiders know how the industry processes work and they exploit this. As soon as one loophole is discovered and patched, fraudsters are looking for another loophole to exploit. This makes the task of predictive analytics different for Fraud than other areas where underlying patterns are probably more stable. Any methodology used here must have “turn on a dime” features built in, if possible. With economic conditions as they are, fraud detection/monitoring will remain important and challenging field.
Biography
Carole Jesse has been applying statistical methods and advanced analytics in a variety of industries for the last 20 years. Her career spans High Volume Manufacturing of Consumer Products, Nuclear Energy, Semiconductor Manufacturing/Packaging, Financial Services, and Health Care. Applications have ranged from Design and Analysis of Experiments to Credit Risk Prediction to Fraud Pattern Recognition. Carole holds a B.S. in Mathematics from the University of Wisconsin and a M.S. in Statistics from Montana State University, as well as several professional certifications. All the opinions expressed here are her own, and not those of her employers: past, present, or future. (Although her dog Angie may have had some influence.) Ms. Jesse currently lives and works in Minneapolis, Minnesota.
1) Describe your career in science from school to now.
Truthfully, my career in science started in 7th grade. Hey, I know this is further back in time than you intended the question to go! However, something significant happened that year that pretty much set me on the path that I am still on today. I discovered Algebra. Up to that point in time, I was an average student in ‘arithmetic’. Algebra introduced LETTERS into the mix with numbers, in the simplest of ways that we have all seen: ‘Solve for x in the equation x+2=5’. That was something I could get behind, AND I excelled at it immediately. Without mathematical excellence, efforts in learning science can fall apart. Mathematics is everywhere!
I spent the rest of my secondary education consuming all the math and science that I could get. By the time I entered college I had already been exposed to pre-calculus and physics and was actually surprised by those in my college Freshman courses who had not seen anti-derivatives, memorized the quotient rule, or worked an inclined plane friction problem before.
My goal as an undergraduate was to become a Veterinarian. The beauty of a pre-Vet curriculum is that it is pretty much like pre-Med, rigorous and broad in the sciences. In my first two years of undergraduate work, I was exposed to more Chemistry, more Mathematics, more Physics, along with things like Genetics, Biology, even the Plant and Animal Sciences. Although I did not stick with my pursuit of Veterinary Medicine, it laid a solid foundation that has served me very well in the strangest of places.
I consider myself a Mathematician/Statistician due to my academic degrees in those areas, first a BS in Mathematics/Physics at the University of Wisconsin followed by a MS in Statistics at Montana State University. In between the BS and MS I also dabbled briefly in Electrical Engineering at the University of Minnesota.
Since academia, it is my breadth in ALL sciences which has allowed me to be very fluid in straddling diverse industries: from High Volume Manufacturing of Consumer Products, to Nuclear Energy, to Semiconductor Manufacturing/Packaging, to Financial Services, to Health Care. I succeed at business problem solving in these industries by applying my Statistical Methods knowledge, coupled with business acumen and peripheral understanding of the technologies used. I have worked closely with scientists and engineers, and could enter THEIR world speaking THEIR language, which was an aid in getting to these solutions quickly.
I can not place enough emphasis on the importance of exposure to a broad range of sciences, and as early as possible, for anyone who wants to be involved in Advanced Analytics and Business Intelligence. As a manager, I look closely at candidates for these diverse sorts of backgrounds.
2) I find the number of computer scientists and analysts to be overwhelmingly male despite this being a lucrative profession. Do you think that BI and Analytics are male dominated? How can the trend be re-shaped?
Welcome to my world! All kidding aside, yes that has been my observation as well. While I am not versed in the specifics of actual gender statistics in Computer Science and Advanced Analytics versus other fields, based on my years in and around these fields, there does appear to be a bias.
This is not due to a lack of capability or interest in these fields on the part of women. I believe it is more due to the long history of cultural norms and negative social messages that perhaps push woman away from these fields. The messages can be subtle, but if you pay close attention, you will see them. Being one of 10 females in an undergraduate engineering class of 150 students has a message right there. Even though these 10 women were able to make entry to the class, the pressure of being a minority, whether gender based or otherwise, can be a powerful influencer in remaining there.
In my own experience, I have encountered frequent judgments where I was made to feel “good at math” was an unacceptable trait for a woman to have. It is important to note that these judgments have been delivered equally by men AND women. So I think until both genders develop higher expectations of women in the hard science areas, the trends will continue. It has been decades since my 7th grade introduction to algebra, but it appears the negative social messages regarding girls in math and science are still present today. Otherwise there would be no need (i.e. no market) for books like Danica McKellar’s “Math Doesn’t Suck,” and the follow-up “Kiss My Math,” both aimed at battling these negative messages at the middle school level.
As to how I have battled these cultural expectations, I developed a thick skin. I have also learned to expect excellence from myself even when a teacher, or a peer, or a boss may have had lower expectations for me than for a male counterpart. Sort of a John Mayer “Who Says” type of attitude. Who says I can’t do Math and Science. Watch me.
3) How would you explain Risk Management using software to a class of graduate students in mathematics and statistics?
There are many areas of Risk Management. My specific experience has been on the Credit Risk Management and Fraud Risk Management sides in a couple of industries. For credit risk in financial services, typically there is a specific department whose role is to quantify and predict credit risk. Not just for the current portfolio, but for new products as well. Various methodologies are utilized, ranging from summarization of portfolio characteristics that have a known relationship to default to using historical data to build out predictive models for production implementation. Key skills needed here are good understanding of the business, solid statistical methods knowledge, and computing skills. As far as the computing /software skills needed, there are three main categories 1) query and preparation of data, 2) model building and validation, and 3) model implementation. The actual tools will likely differ across these categories. For example, 1) might be tackled with SAS®, Business Objects, or straight SQL; 2) requires a true modeling package or coding language like SAS®, SPSS, R, etc; and lastly 3) is the trickiest, as implementation can have many system limitations, but SAS® or C++ are often seen at implementation.
4) Describe some of your most challenging and most exciting projects over the years.
I have been very fortunate to have many challenges and good projects in every role I have been in, but as I look back today, some things that stand out the most were in ‘high tech’. By virtue of being high tech, there is no fear of technology, and it is fast-paced and ever evolving to the next generation of product.
I spent seven years in the Semiconductor industry during the 90’s at Micron Technology, Intel, and Motorola. At the beginning of that window, we left the “486 processor” world, and during that window we spanned the realm of “Pentium processors.” Moore’s Law dominated all of this. To stay competitive all of these companies embraced statistical methods to help speed up development time.
At one point, I supported a group of about 10 R&D engineers in the Design and Analysis of their process improvement and simplification experiments. This afforded me exposure to much of the leading edge research the team was working on.
I recall one project with the goal of optimizing capacitance via surface roughness of the capacitor structures. In addition to all the science involved at the manufacturing step, what made this so interesting was the difficulty in measuring capacitance at the point in the process where film roughness was introduced. All we had were surface images after this step. The semiconductor wafers had to pass through several more process steps to get to the point where capacitance could actually be measured. All of this provided challenges around the design of the experiment and the data handling and analysis.
By working closely with both the process engineer and the process technician I was able to gather the image files off the image tool that were taken from the experimental runs. I used SAS® (yes, another shameless plug for my favorite software) to process the images using Fast Fourier Transforms. Subsequently, the transformed data was correlated to the capacitance in the analysis of the experimental results. Finding the “sweet spot” for capacitance, as driven by surface roughness, provided a huge leap for this process technology team.
The challenges of today are much different than they were in the 90s. In the more recent years, I have been working with transactional data related to financial services or health care claims. The challenges manifest themselves in the sheer volume of the data. In the last decade in particular most industries have been able to put the infrastructures in place to gather and store massive amounts of data related to their businesses. The challenge of turning this data into meaningful actionable information has been equally exciting as using Fast Fourier Transforms on image processing to optimize capacitance!
Currently I am working with an Oracle database where one table in the schema has 250 million records and a couple hundred fields. I refer to this as a “Pushing Tera” situation, since this one table is close to a Terabyte in size. As far as storing the data, that is not a big deal, but working with data this large or larger is the challenge.
Different skill sets are needed here beyond those of just an analyst, data miner, or statistician. These VLDB situations have morphed me into a bit of an IT person.
5) How important do you think work life balance is for people in this profession? What do you do to chill out?
I don’t think the work-life balance is any more or less important to the decision science professionals than it is to any other profession really. I have friends in many other professions like Law, Nursing, Financial Planning, etc. with the same work-life balance struggles.
We live in a busy culture that includes more and more demands placed on us professionally. Let’s face it, most of us are care-takers to someone besides ourselves. It might be a spouse, or a child, or a dog, or even an elderly parent. Therefore, a total focus on work is bound to upset the work-life balance for most of us.
My biggest struggle comes in the form of balancing the two sides of my brain. That may sound weird, but one thing you have to agree with is that all of this is pretty “Left Brained”: mathematics, statistics, business intelligence, computing, etc.
To balance this out, and tap into my Right Brain, I like to dabble in the arts to some extent. Don’t get me wrong, I am not an artist! But that doesn’t mean I can’t draw on creativity in the artistic sense. For example, this past summer I took a course on Adobe Photoshop and Illustrator at Minneapolis College of Art and Design. This provided the best of both worlds, combining software and art! In addition to learning how to remove Cindy Crawford’s mole (yes, we did this), there were some very useful projects. One of my course projects was creating my customized Twitter background. An endeavor like this provides me a ‘chilling out’ factor from the normal work world. I know of many other Left Brain leaners that do similar things, like playing a musical instrument, or painting, etc. This is another reason why I took up digital photography: more visual arts.
Volunteer work has a balancing effect too. I try to give back to the community when I can. Swinging a hammer at Habitat for Humanity, or doing record keeping for an Animal Rescue organization, are things I have participated in.
And if none of this works, I enjoy cooking for my family and friends, and plying them with wine!
6) What are you views on:
A) Data Quality
I’d have to say I am for data Quality! Who isn’t? But the reality is that data is dirty. That “Pushing Tera” Oracle table I mentioned earlier, well it turns out it has some issues. And it is incumbent upon me to determine the quality of that data before attempting to do anything analytical with it. One place in industry where value enhancement are needed: database administrators with business knowledge. It seems that more times than not, even if there was a business savvy DBA they may have moved on, leaving the consumers of that data (that would be me) to fend for themselves. There is some debate over which philosopher said “Know thyself.” Today’s job challenge is to “Know thy data” or perhaps “Value those that know thy data.”
B) Predictive Analytics for Fraud Monitoring
There is a huge market for analytics in fraud detection and prevention. But it is not for the faint of heart. Insiders, at least in Mortgage and Health Care, are the typical perpetrators of lucrative fraud. These insiders know how the industry processes work and they exploit this. As soon as one loophole is discovered and patched, fraudsters are looking for another loophole to exploit. This makes the task of predictive analytics different for Fraud than other areas where underlying patterns are probably more stable. Any methodology used here must have “turn on a dime” features built in, if possible. With economic conditions as they are, fraud detection/monitoring will remain important and challenging field.
Post discussions on my performance at grad school and WHAT exactly DO I want to work in- I drew the following curves.
Feel free to draw better circles- and I will include your reference here
Caution- Based upon a very ordinary understanding of extra ordinary technical things.
THE WORLD OF DATA

AND WHAT I WANT TO DO IN IT

ps- What do you think? Add a comment
“Build a better mousetrap, and the world will beat a path to your door.”- Emerson
Here is an interview with Shawn Kung, Senior Director of Product Management at Aster Data. Shawn explains the difference between the various database technologies, Aster’s rising appeal to its unique technological approach and touches upon topics of various other interests as well to people in the BI and technology space.

Ajay -Describe your career journey from a high school student of science till today .Do you think science is a more lucrative career?
Shawn: My career journey has spanned over a decade in several Silicon Valley technology companies. In both high school and my college studies at Princeton, I had a fervent interest in math and quantitative economics. Silicon Valley drew me to companies like upstart procurement software maker Ariba and database giant Oracle. I continued my studies by returning to get a Master’s in Management Science at Stanford before going on to lead core storage systems for nearly 5 years at NetApp and subsequently Aster.
Science (whether it is math, physics, economics, or the hard engineering sciences) provides a solid foundation. It teaches you to think and test your assumptions – those are valuable skills that can lead to a both a financially lucrative and personally inspiring career.
Ajay- How would you describe the difference between Map Reduce and Hadoop and Oracle and SAS, DBMS and Teradata and Aster Data products to a class of undergraduate engineers ?
Shawn: Let’s start with the database guys – Oracle and Teradata. They focus on structured data – data that has a logical schema and is manipulated via a standards-based structured query language (SQL). Oracle tries to be everything to everyone – it does OLTP (low-latency transactions like credit card or stock trade execution apps) and some data warehousing (typically summary reporting). Oracle’s data warehouse is not known for large-scale data warehousing and is more often used for back-office reporting.
Teradata is focused on data warehousing and scales very well, but is extremely expensive – it runs on high-end custom hardware and takes a mainframe approach to data processing. This approach makes less sense as commodity hardware becomes more compute-rich and better software comes along to support large-scale MPP data warehousing.
SAS is very different – it’s not a relational database. It really offers an application platform for data analysis, specifically data mining. Unlike Oracle and Teradata which is used by SQL developers and managed by DBAs, SAS is typically run in business units by data analysts – for example a quantitative marketing analyst, a statistician/mathematician, or a savvy engineer with a data mining/math background. SAS is used to try to find patterns, understand behaviors, and offer predictive analytics that enable businesses to identify trends and make smarter decisions than their competitors.
Hadoop offers an open-source framework for large-scale data processing. MapReduce is a component of Hadoop, which also contains multiple other modules including a distributed filesystem (HDFS). MapReduce offers a programming paradigm for distributed computing (a parallel data flow processing framework).
Both Hadoop and MapReduce are catered toward the application developer or programmer. It’s not catered for enterprise data centers or IT. If you have a finite project in a line of business and want to get it done, Hadoop offers a low-cost way to do this. For example, if you want to do large-scale data munging like aggregations, transformations, manipulations of unstructured data – Hadoop offers a solution for this without compromising on the performance of your main data warehouse. Once the data munging is finished, the post-processed data set can be loaded into a database for interactive analysis or analytics. It is a great combination of big data technologies for certain use-cases.
Aster takes a very unique approach. Our Aster nCluster software offers the best of all worlds – we offer the potential for deep analytics of SAS, the low-cost scalability and parallel processing of Hadoop/MapReduce, and the structured data advantages (schema, SQL, ACID compliance and transactional integrity, indexes, etc) of a relational database like Teradata and Oracle. Often, we find complementary approaches and therefore view SAS and Hadoop/MapReduce as synergistic to a complete solution. Data warehouses like Teradata and Oracle tend to be more competitive.
Ajay- What exciting products have you launched so far and what makes them unique both from a technical developer perspective and a business owner perspective
Shawn: Aster was the first-to-market to offer In-Database MapReduce, which provides the standards and familiarity of SQL and databases with the analytic power of MapReduce. This is very unique as it offers technical developers and application programmers to write embedded procedural algorithms once, upload it, and allow business analysts or IT folks (SQL developers, DBAs, etc) to invoke these SQL-MapReduce functions forever.
It is highly polymorphic (re-usable), highly fault-tolerant, highly flexible (any language – Java, Python, Ruby, Perl, R statistical language, C# in the .NET world, etc) and natively massively parallel – all of which differentiate these SQL extensions from traditional dumb user-defined functions (UDFs).
Ajay- “I am happy with my databases and I don’t need too much diversity or experimentation in my systems”, says a CEO to you.
How do you convince him using quantitative numbers and not marketing adjectives?
Shawn: Aster has dozens of production customers including big-names like MySpace, LinkedIn, Akamai, Full Tilt Poker, comScore, and several yet-to-be-named retail and financial service accounts. We have quantified proof points that show orders of magnitude improvements in scalability, performance, and analytic insights compared to incumbent or competitor solutions. Our highly referenceable customers would be happy to discuss their positive experiences with the CEO.
But taking a step back, there’s a fundamental concept that this CEO needs to first understand. The world is changing – data growth is proliferating due to the digitization of so many applications and the emergence of unstructured data and new data types. Like the book “Competing on Analytics”, the world is shifting to a paradigm where companies that don’t take risks and push the limits on analytics will die like the dinosaurs.
IDC is projecting 10x+ growth in data over the next few years to zetabytes of aggregate data driven by digitization (Internet, digital television, RFID, etc). The data is there and in order to compete effectively and understand your customers more intimately, you need a large-scale analytics solution like the one Aster nCluster offers. If you hold off on experimentation and innovation, it will be too late by the time you realize you have a problem at hand.
Ajay- How important is work life balance for you?
Shawn: Very important. I hang out with my wife most weekends – we do a lot of outdoors activities like hiking and gardening. In Silicon Valley, it’s all too easy to get caught up in the rush of things. Taking breaks, especially during the weekend, is important to recharge and re-energize to be as productive as possible.
Ajay- Are you looking for college interns and new hires what makes aster exciting for you so you are pumped up every day to go to work?
Shawn: We’re always looking for smart, innovative, and entrepreneurial new college grads and interns, especially on the technical side. So if you are a computer science major or recent grad or graduate student, feel free to contact us for opportunities.
What makes Aster exciting is 2 things –
first, the people. Everyone is very smart and innovative so you learn a tremendous amount, which is personally gratifying and professionally useful long-term.
Second, Aster is changing the world!
Distributed systems computing focused on big data processing and analytics – these are massive game-changers that will fundamentally change the landscape in data warehousing and analytics. Traditional databases have been a oligopoly for over a generation – they haven’t been challenged and so the 1970’s based technology has stuck around. The emergence of big data and low-cost commodity hardware has created a unique opportunity to carve out a brand new market…
what gets me pumped every day is I have the ability to contribute to a pioneer that is quickly becoming Silicon Valley’s next great success story!
Biography-
Over the past decade, Shawn has led product management for some of Silicon Valley’s most successful and innovative technology companies. Most recently, he spent nearly 5 years at Network Appliance leading Core Systems storage product management, where he oversaw the development of high availability software and Storage Systems hardware products that grew in annual revenue from $200M to nearly $800M. Prior to NetApp, Shawn held senior product management and corporate strategy roles at Oracle Corporation and Ariba Inc.
Shawn holds an M.S. in Management Science and engineering from Stanford University, where he was awarded the Valentine Fellowship (endowed by Don Valentine of Sequoia Capital). He also received a B.A. with high honors from Princeton University.
About Aster
Aster Data Systems is a proven leader in high-performance database systems for data warehousing and analytics – the first DBMS to tightly integrate SQL with MapReduce – providing deep insights on data analyzed on clusters of low-cost commodity hardware. The AsternCluster database cost-effectively powers frontline analytic applications for companies such as MySpace, aCerno (an Akamai company), and ShareThis.
Running on low-cost off-the-shelf hardware, and providing ‘hands-free’ administration, Aster enables enterprises to meet their data warehousing needs within their budget. Aster is headquartered in San Carlos, California and is backed by Sequoia Capital, JAFCO Ventures, IVP, Cambrian Ventures, and First-Round Capital, as well as industry visionaries including David Cheriton and Ron Conway.
I was just looking at my web analytics numbers and we seem to have crossed some milestones.
Thank you everyone for your help in this. More importantly the quality of comments has been fabulous. Since I am out of ideas for the rest of the week- here is a best of posts collection.
Here are some of the most favorite articles as measured by number of page views. I have personal fovurites as well, but these are just the ranks as per page views and how they measure up.
Top 5 Interviews
1) Interviews with SAS Institute leaders- I have found generally great professionalism from SAS Institute people. This is surprising because comin from an open source background, SAS is often looked as a big brother. I find that more of a perception and less of a reality as the company continues to innovate.
a) with John Sall, founder SAS Institute- This is really the biggest interview I did in terms of the person involved. To my surprise ( I wasnt expecting John to say yes) the interview was really frank, and it came very fast. The answers seem to be written by John himself.
Quote- Quantitative fields can be fairly resistant to recession- John Sall.
http://www.decisionstats.com/2009/07/28/interview-john-sall-jmp/
b) Interview with Anne Milley, Director, Product Marketing , SAS Institute- This is a favourite because it came very soon after the NYTimes article on R etc. One of my personal opinions is that the difference between great and good leaders is often the fact that great leaders are humble enough to learn and then build on their strengths. It ran in two parts- and I was really appreciative of the in-depth answers that Anne wrote.
Quotes-
Analytics continues to be our middle name.
Customers vote with the cheque book.
Continue reading “The Top DecisionStats Articles -Part 1 Analytics”