M2009 Interview Peter Pawlowski AsterData

Here is an interview with Peter Pawlowski, who is the MTS for Data Mining at Aster Data. I ran into Peter at his booth at AsterData during M2009, and followed up with an email interview. Also included is a presentation by him of which he was a co-author.

[tweetmeme source=”decisionstats”]

Ajay- Describe your career in Science leading up till today.

Peter- Went to Stanford, where I got a BS & MS in Computer Science. I did some work on automated bug-finding tools while at Stanford.
( Note- that sums up the career of almost 60 % of CS scientists)

Ajay- How is life working at Aster Data- what are the challenges and the great stuff

Peter- Working at Aster is great fun, due to the sheer breadth and variety of the technical challenges. We have problems to solve in the optimization, languages, networking, databases, operating systems, etc. It’s been great to think about problems end-to-end & consider the impact of a change on all aspects of the system. I worked on SQL/MR in particular, which had lots of interesting challenges: how do you define the API? how do you integrate with SQL? how do you make it run fast? how do you make it scale?

Ajay- Do you think Universities offer adequate preparation for in demand skills like Mapreduce, Hadoop and Business Intelligence

Peter-   Probably not BI–I learned everything I know about BI while at Aster. In terms of M/R, it’d be useful to have more hands-on experience with distributed system which at school. We read the MapReduce paper but didn’t get a chance to actually play with M/R. I think that sort of exposure would be useful. We recently made our software available to some students taking a data mining class at Stanford, and they came up with some fascinating use cases for our system, esp. around the Netflix challenge dataset.

Ajay- Describe some of the recent engineering products that you have worked with at Aster

Peter-  SQL/MR is the main aspects of nCluster that i’ve worked with–interesting challenged described in #2.

Ajay- All BI companies claim to crunch data the fastest at the lowest price at highest quality as per their marketing brochure- How would you validate your product’s performance scientifically and transparently.

Peter- I’ve found that the hardest part of judging performance is to come up with a realistic workload. There are public benchmarks out there, but they may or may not reflect the kinds of workloads that our customers want to run. Our goal is to make our customers’ experience as good as possible, so we focus on speeding up the sorts of workloads they ask about.
And here is a presentation at Slideshare.net on more of what Peter works on.

SAS and JMP : Visual Data Discovery

While R packagers have a lot to be proud of in the graphics packages of R, the truth of the matter is that the lack of GUI even for Graphical Analysis hinders the ease of usage in adopting R’s powerful graphics for statistical analysis. As a contrast , SAS and JMP have been combined together in the SAS Visual Data Discovery Environment

[tweetmeme source=”decisionstats”]

I really liked the GUI of JMP ( which is very rich in stats testing) and with the powerful data handling capabilities on the desktop of SAS, this is clearly an outstanding effort to create terrific graphics ( see below)

Note the combination of the two- Great Graphics WITH a GUI. in R the GUI that comes closest to matching JMP is R Commander, but it’s graphical capabilities are kept basic as it is not meant for replacement of the beloved Kommand prompt

( maybe an expanded plugin for graphics or hexabin would help)

It would be interesting to see an on demand  Ec2 cloud hosted version of visual data discovery by SAS (with JMP as the front end) even for a limited pilot of six months and targeted at the SMB segment. Or a Salesforce.com application that integrates Salesforce.com data with the tests and standard procedures in SAS and JMP.

Note of Discontent- The JMP Website is terrible. It has a different font from the SAS Website ( they could atleast use the same CSS ) and overall is the worst part of the otherwise excellently elegant JMP. Hope they upgrade their website soon ( they havent done it this year atleast).

Scrennshot Citation-

http://www.sas.com/technologies/analytics/statistics/datadiscovery/index.html


S A S GOOD LIFE UNDER SIEGE – NYT

There was a time when when the word NYT invoked that’s where we read about news and politics. In 2009, the most happening news on statistical software came from NYT ( KD Nuggets and the Journal of Statistical Software are not too happy about that either).

The latest article calls SAS as a software giant under siege-and it’s Good Life under threat.

[tweetmeme source=”decisionstats”]

This inspired me to an old movie poster I saw once- It’s also called Under Siege.

Given the fact that the under siege SAS earned 2.4 Billion Dollars last year alone

and the market capitalisation o New York Times is 1.25 Billion Dollars.

Why doesn’t DR Goodnight buy the New York Times itself for 600 million dollars and have enough change left over for………. err a Happy Thanksgiving.

—————————————————————————————————————————————————————————————-

LIES

TRUE LIES

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

AND STATISTICS

 

Analytics and BI for small biz

I saw a story on Warren B and Goldman S creating a 500$ million pool for small business owners.

  • The program will contribute $200 million to community colleges, universities and other institutions to provide small- business owners with practical business education.

  • Goldman Sachs repaid the $10 billion it was given last year under the taxpayer-funded Troubled Asset Relief Program, plus dividends. The firm continues to benefit from federal guarantees on about $21 billion of long-term debt.

  • Buffett, known as the “Oracle of Omaha” for his investing prowess, is the second-richest American. Berkshire, which invests in companies ranging from retailers to insurers, paid $5 billion in September 2008 to acquire preferred stock in Goldman Sachs that pays a 10 percent dividend. Berkshire, based in Omaha, Nebraska, also gained five-year warrants to buy $5 billion of common stock at $115 per share.

  • ( NOTE Curent Price of GS shares is 172$ – thats a 50% profit on 5 Billion~ 2.5 Billion for Mr Buffett but he is probably waiting for long term capital gains ax rates to kick in before encashing his patriotic  “Buy American. I am” warrants (see NYT op ed by him  http://www.nytimes.com/2008/10/17/opinion/17buffett.html )
  • A better analysis of the above Bloomberg story was given on Bloomberg itself at http://www.bloomberg.com/apps/news?pid=20601039&sid=asjp51YPDwJU
  • A small thought- could smaller businesses gain from efficiencies of programs like SPSS, SAS and R. Or would they be better off with customized GUI’s linked to their POS data.

Anyways a need for analytics for small businesses in inventory management, and sales planning could help. Joe the Plumber could do with some ETS and Regression Models as well.

However apart for Salesforce.com applications this field seems to be totally vacant for analytics. What are IBM SPSS, SAS, or even other stats packages doing for small businesses. or even developing Salesforce.com applications for their own equivalent software

The market could be an interesting one to atleast do a test in. Unless you don’t believe in test and control.

See below the IBM Cognos by IBM itself and the third party app by Pervasive for SAP Integration-

Citation-

http://sites.force.com/appexchange/listingDetail?listingId=a0N300000016YGYEA2

and

http://sites.force.com/appexchange/listingDetail?listingId=a0N300000016am1EAA