M2009 Interview Peter Pawlowski AsterData

Here is an interview with Peter Pawlowski, who is the MTS for Data Mining at Aster Data. I ran into Peter at his booth at AsterData during M2009, and followed up with an email interview. Also included is a presentation by him of which he was a co-author.

[tweetmeme source=”decisionstats”]

Ajay- Describe your career in Science leading up till today.

Peter- Went to Stanford, where I got a BS & MS in Computer Science. I did some work on automated bug-finding tools while at Stanford.
( Note- that sums up the career of almost 60 % of CS scientists)

Ajay- How is life working at Aster Data- what are the challenges and the great stuff

Peter- Working at Aster is great fun, due to the sheer breadth and variety of the technical challenges. We have problems to solve in the optimization, languages, networking, databases, operating systems, etc. It’s been great to think about problems end-to-end & consider the impact of a change on all aspects of the system. I worked on SQL/MR in particular, which had lots of interesting challenges: how do you define the API? how do you integrate with SQL? how do you make it run fast? how do you make it scale?

Ajay- Do you think Universities offer adequate preparation for in demand skills like Mapreduce, Hadoop and Business Intelligence

Peter-   Probably not BI–I learned everything I know about BI while at Aster. In terms of M/R, it’d be useful to have more hands-on experience with distributed system which at school. We read the MapReduce paper but didn’t get a chance to actually play with M/R. I think that sort of exposure would be useful. We recently made our software available to some students taking a data mining class at Stanford, and they came up with some fascinating use cases for our system, esp. around the Netflix challenge dataset.

Ajay- Describe some of the recent engineering products that you have worked with at Aster

Peter-  SQL/MR is the main aspects of nCluster that i’ve worked with–interesting challenged described in #2.

Ajay- All BI companies claim to crunch data the fastest at the lowest price at highest quality as per their marketing brochure- How would you validate your product’s performance scientifically and transparently.

Peter- I’ve found that the hardest part of judging performance is to come up with a realistic workload. There are public benchmarks out there, but they may or may not reflect the kinds of workloads that our customers want to run. Our goal is to make our customers’ experience as good as possible, so we focus on speeding up the sorts of workloads they ask about.
And here is a presentation at Slideshare.net on more of what Peter works on.

Decisionstats Interview at Big Data Summit, AsterData

 

For a change, I got interviewed at Big Data Summit, sponsored by Aster Data. With a special thanks to Tasso , Steve Wooledge and his team, and the lovely Michelle from http://www.zagcommunications.com

Note this is just the raw unedited footage so it can be pretty repetitive, so if you want the final interview it should take some time. Anyways I managed to not make a complete mess of it. Have a look.

Aster Data : Big Data Bigger Analytics Campaign

My favorite ( as of now) company in Big Data is Aster Data* ( I am partial to companies founded by Stanford Alumni having interacted with a lot of them while working with Trilogy- another Stanford dropout alumni company. There are also not too many Silicon Valley startups by us famously non intellectual Punjus

Q- What is the culture in Punjab A- In Punjab the only culture is agriculture)

Aster Data has correctly hit the marketing hammer on the nail of bigger data and with the quantities of data expanding rapidly  this is a lucrative market to get into ( as pointed by our favorite analytics journal NY Times)

Aster Data’s products of nCluster and nPath with MapReduce SQL, and the recent interactions with SAS Institute hold them in a nice promising place but with miles to go before they even rest ( or start thinking of that IPO)

Aster were present at Data Mining 2009 with terrific response to their booth.

As a techie wannabe stats frat boy, I like the Aster nPath product more (Time Series)  but the analytics within database claim with nCluster needs to be investigated and even tested further. Especially if you need  three days to get your monthly summary report.asterdata

( *and also an advertiser, sponsor to Big Data Summit as per FCC regulations)

The Data Services and Applications with flexibility for cloud computing is what makes this especially appealing from a product perspective while their relatively small size ( as compared to other bigger Vend- ORs) gives alliance partners more leverage in colloborating in Research and Design and maybe even co bundling applications.

Screenshots- Courtesy -The Lovely www.asterdata.com website ( Webmasters of other websites especially IBM and Oracle’s should take note how a website can have lots of content and yet be readable)

Also I will be posting the remaining Data Mining 2009 interviews shortly (including Part 2 with Anne) and share some/all of the presentations  via SlideShare embedding in WordPress.com ( post permission).

As for the Aster Data Interviews- I owe Peter Pavloski and the readers one. Coming up soon.

asterdata2

SAS Data Mining 2009 Las Vegas

I am going to Las Vegas as a guest of SAS Institute for the Data Mining 2009 Conference. ( Note FCC regulations on bloggers come in effective December but my current policies are in ADVERTISE page unchanged since some months now)

With the big heavyweight of analytics, SAS Institute showcases events in both the SAS Global Forum and the Data Mining 2009

conference has a virtual who’s- who of partners there. This includes my friends at Aster Data and Shawn Rogers, Beye Network

in addition to Anne Milley, Senior Product Director. Anne is a frequent speaker for SAS Institute and has shrug off the beginning of the year NY Times spat with R /Open Source. True to their word they did go ahead and launch the SAS/IML with the interface to R – mindful of GPL as well as open source sentiments.

. While SPSS does have a data mining product there is considerable discussion on that help list today on what direction IBM will allow the data mining product to evolve.

Charlie Berger, from Oracle Data Mining , also announced at Oracle World that he is going to launch a GUI based data mining product for free ( or probably Software as a Service Model)- Thanks to Karl Rexer from Rexer Analytics for this tip.

While this is my first trip to Las Vegas ( a change from cold TN weather), I hope to read new stuff on data mining including sessions on blog and text mining and statistical usage of the same. Data Mining continues to be an enduring passion for me even though I need to get maybe a Divine Miracle for my Phd to get funded on that topic.

Also I may have some tweets at #M2009 for you and some video interviews/ photos. Ok- Watch this space.

ps _ We lost to Alabama #2 in the country by two points because 2 punts were blocked by hand which were as close as it gets.

Next week I hope to watch the South Carolina match in Orange Country.

Screenshot-32

Software HIStory: Bass Institute Part 1

or How SAS Institute needs to take competition from WPS, (sas language compiler) in an alliance with IBM, and from R (open source predictive analytics with tremendous academic support) and financial pressure from Microsoft and SAP more seriously.

On the weekend, I ran into Jeff Bass, owner of BASS Institute. BASS Institute provided a SAS -like compiler in the 1980’s , was very light compared to the then clunky SAS ( which used multiple floppies), and sold many copies. It ran out of money when the shift happened to PCs and SAS Institute managed to reach that first.

Today the shift is happening to cloud computing and though SAS has invested 70 Million in it, it still continues to SUPPORT Microsoft by NOT supporting or even offering financial incentives for customers to use  Ubuntu Linux server and Ubuntu Linux desktop. For academic students it charges 25$ per Windows license, and thus helping sell much more copies of Windows Vista. Why does it not give the Ubuntu Linux version free to students. Why does SAS Institute continue to give the online doc free to people who use it’s language, and undercut it. More importantly why does SAS charge LESS money for excellent software in the BI space. It is one of the best and cheapest BI software and the most expensive desktop software. Why Does the SAS Institute not support Hadoop , Map/Reduce database systems insted of focusing on Oracle, Teradata relationships and feelings ??

Anyways, back to Jeff Bass- This is part 1 of the interview.

Ajay- Jeff, tell us all about the BASS Institute?

Jeff-

the BASS system has been off the market for about 20 years and is an example of old, command line, DOS based software that has been far surpassed by modern products – including SAS for the PC platform.  It was fun providing a “SAS like” language for people on PCs – running MS DOS – but I scrapped the product when PC SAS became a reasonably useable product and PC’s got enough memory and hard disk space.
 
BASS was a SAS “work alike”…it would run many (but certainly not all) SAS programs with few modifications.  It required a DOS PC with 640K of RAM and a hard disk with 1MB of available space.  We used to demo it on a Toshiba laptop with NO hard disk and only a floppy drive.  It was a true compiler that parsed the data / proc step input code and generated 8086 assembly language that went through mild optimization, and then executed.
 
I no longer have the source code…it was saved to an ancient Irwin RS-232 tape drive onto tapes that no longer exist…it is fun how technology has moved on in 20 years!  The BASS system was written in Microsoft Pascal and the code for the compiler was similar to the code that would be generated by the Unix YACC “compiler compiler” when fed the syntax of the SAS data step language.  BASS included the “DATA Step” and the most basic PROCS, like MEANS, FREQ, REG, TTEST, PRINT, SORT and others.  Parts of the system were written in 8086 assembler (I have to smile when I remember that).  If I was to recreate it today, I would probably use YACC and have it produce R source code…but that is an idea I am never likely to spend any time on.
 
We sold quite a few copies of the software and BASS Institute, Incorporated was a going concern until PC SAS became debugged and reliable.  Then there was no point in continuing it.  But I think it would be fun for someone to write a modern open source version of a SAS compiler (the data step and basic procs were developed in the public domain at NC State University before Sall and Goodnight took the company private, so as long as no copyrighted code was used in any way, an open source compiler would probably be legal).
 
I still use SAS (my company has an enterprise license), but only very rarely.  I use R more often and am a big fan of free software (sometimes called open source software, but I like the free software foundation’s distinction at fsf.org).  I appreciated your recommendation of the book “R for SAS and SPSS Users” on your website.  I bought it for my Kindle immediately upon reading about it on your website.I no longer work in the software world; I’m a reimbursement and health policy director for the biotech firm Amgen, where I have worked since 1990 or so…  I also serve on the boards of a couple of non-profit organizations in the health care field.

the BASS system has been off the market for about 20 years and is an example of old, command line, DOS based software that has been far surpassed by modern products – including SAS for the PC platform.  It was fun providing a “SAS like” language for people on PCs – running MS DOS – but I scrapped the product when PC SAS became a reasonably useable product and PC’s got enough memory and hard disk space.

 

BASS was a SAS “work alike”…it would run many (but certainly not all) SAS programs with few modifications.  It required a DOS PC with 640K of RAM and a hard disk with 1MB of available space.  We used to demo it on a Toshiba laptop with NO hard disk and only a floppy drive.  It was a true compiler that parsed the data / proc step input code and generated 8086 assembly language that went through mild optimization, and then executed.

 

I no longer have the source code…it was saved to an ancient Irwin RS-232 tape drive onto tapes that no longer exist…it is fun how technology has moved on in 20 years!  The BASS system was written in Microsoft Pascal and the code for the compiler was similar to the code that would be generated by the Unix YACC “compiler compiler” when fed the syntax of the SAS data step language.  BASS included the “DATA Step” and the most basic PROCS, like MEANS, FREQ, REG, TTEST, PRINT, SORT and others.  Parts of the system were written in 8086 assembler (I have to smile when I remember that).  If I was to recreate it today, I would probably use YACC and have it produce R source code…but that is an idea I am never likely to spend any time on.

 

We sold quite a few copies of the software and BASS Institute, Incorporated was a going concern until PC SAS became debugged and reliable.  Then there was no point in continuing it.  But I think it would be fun for someone to write a modern open source version of a SAS compiler (the data step and basic procs were developed in the public domain at NC State University before Sall and Goodnight took the company private, so as long as no copyrighted code was used in any way, an open source compiler would probably be legal).

 

I still use SAS (my company has an enterprise license), but only very rarely.  I use R more often and am a big fan of free software (sometimes called open source software, but I like the free software foundation’s distinction at fsf.org).  I appreciated your recommendation of the book “R for SAS and SPSS Users” on your website.  I bought it for my Kindle immediately upon reading about it on your website.

 

I’m a reimbursement and health policy director for the biotech firm Amgen, where I have worked since 1990 or so…  I also serve on the boards of a couple of non-profit organizations in the health care field.

Ajay- Any comments on WPS?

Jeff- I’m glad WPS is out there.  I think alternatives help keep the SAS folks aware that they have to care about competition, at least a little 😉

( Note from Ajay-

You can see more on WPS at http://www.teamwpc.co.uk/home

wps

and on SAS at http://www.sas.com/


Interview John Sall Founder JMP/SAS Institute

Here is an interview with John Sall, inventor of SAS and JMP and co-founder and co-owner of SAS Institute, the largest independent business intelligence and analytics software firm. In a free wheeling and exclusive interview, John talks of the long journey within SAS and his experiences in helping make JMP the data visualization software of choice.
JMP is perfect for anyone who wants to do exploratory data analysis and modeling in a visual and interactive way – John Sall

untitled2

Ajay- Describe your early science career. How would you encourage today’s generation to take up science and math careers?

John- I was a history major in college, but I graduated into a weak job market. So I went to graduate school and discovered statistics and computer science to be very captivating. Of course, I grew up in the moon-race science generation and was always a science enthusiast.

Ajay- Archimedes leapt out the bath shouting “Eureka” when he discovered his principle. Could you describe a “Eureka” moment while creating the SAS language when you and Jim Goodnight were working on it?

John- I think that the moments of discovery were more like “Oh, we were idiots” as we kept having to rewrite much of the product to handle emerging environments, like CMS, minicomputers, bitmap workstations, personal computers, Windows, client-server, and now the cloud. Several of the rewrites were even changing the language we implemented it in. But making the commitment to evolve led to an amazing sequence of growth that is still going on after 35 years.

Ajay- Describe the origins of JMP. What specific market segments does the latest release of JMP target?

John- JMP emerged from a recognition of two things: size and GUI. SAS’ enterprise footprint was too big a commitment for some potential users, and we needed a product to really take advantage of graphical interactivity. It was a little later that JMP started being dedicated more to the needs of engineering and science users, who are most of our current customers.

Ajay- What other non-SAS Institute software do you admire or have you worked with? Which areas is JMP best suited for? For which areas would you recommend software other than JMP to customers?

John- My favorite software was the Metrowerks CodeWarrior development environment. Sadly, it was abandoned among various Macintosh transitions, and now we are stuck with the open-source GCC and Xcode. It’s free, but it’s not as good.

JMP is perfect for anyone who wants to do exploratory data analysis and modeling in a visual and interactive way. This is something organizations of all kinds want to do. For analytics beyond what JMP can do, I recommend SAS, which has unparalleled breadth, depth and power in its analytic methods.

Ajay- I have yet to hear of a big academic push for JMP distribution in Asia. Are there any plans to distribute JMP for free or at very discounted prices in academic institutions in countries like India, China or even the rest of the USA?

John- We are increasing our investment in supporting academic institutions, but it has not been an area of strength for us. Professors seem to want the package they learned long ago, the language that is free or the spreadsheet program their business students already have. JMP’s customers do tell us that they wish the universities would train their prospective future employees in JMP, but the universities haven’t been hearing them. Fortunately, JMP is easy enough to pick up after you enter the work world. JMP does substantially discount prices for academic users.

Ajay- What are your views on tech offshoring, given the recession in the United States?

John- As you know, our products are mostly made in the USA, but we do have growing R&D operations in Pune and Beijing that have been performing very well. Even when the software is authored in the US, considerable work happens in each country to localize, customize and support our local users, and this will only increase as we become more service-oriented. In this recession, JMP has still been growing steadily.

Ajay-  What advice would you give to young graduates in this recession? How does learning JMP enhance their prospect of getting a job?

John- Quantitative fields have been fairly resistant to the recession. North Carolina State University, near the SAS campus, even has a Master of Science in Analytics < http://analytics.ncsu.edu/ > to get people job-ready. JMP experience certainly helps get jobs at our major customers.

Ajay- What does John Sall do in his free time, when not creating world-class companies or groovy statistical discovery software?

John- I lead the JMP division, which has been a fairly small part of a large software company (SAS), but JMP is becoming bigger than the whole company was when JMP was started. In my spare time, I go to meetings and travel with the Nature Conservancy <http://www.nature.org/ >, North Carolina State University <http:// http://ncsu.edu/ >, WWF <http://wwf.org/ >, CARE <http://www.care.org/ > and several other nonprofit organizations that my wife or I work with.

Official Biography

John Sall is a co-founder and Executive Vice President of SAS, the world’s largest privately held software company. He also leads the JMP business division, which creates interactive and highly visual data analysis software for the desktop.

Sall joined Jim Goodnight and two others in 1976 to establish SAS. He designed, developed and documented many of the earliest analytical procedures for Base SAS® software and was the initial author of SAS/ETS® software and SAS/IML®. He also led the R&D effort that produced SAS/OR®, SAS/QC® and Version 6 of Base SAS.

Sall was elected a Fellow of the American Statistical Association in 1998 and has held several positions in the association’s Statistical Computing section. He serves on the board of The Nature Conservancy, reflecting his strong interest in international conservation and environmental issues. He also is a member of the North Carolina State University (NCSU) Board of Trustees. In 1997, Sall and his wife, Ginger, contributed to the founding of Cary Academy, an independent college preparatory day school for students in grades 6 through 12.

Sall received a bachelor’s degree in history from Beloit College in Beloit, WI, and a master’s degree in economics from Northern Illinois University in DeKalb, IL. He studied graduate-level statistics at NCSU, which awarded him an honorary doctorate in 2003.

About JMP-

Originally nicknamed as John’s Macintosh Program, JMP is a leading software program in data visualization for statistical software. Researchers and engineers – whose jobs didn’t revolve solely around statistical analysis – needed an easy-to-use and affordable stats program. A new software product, today known as JMP®, was launched in 1989 to dynamically link statistical analysis with the graphical capabilities of Macintosh computers. Now running on all platforms, JMP continues to play an important role in modeling processes across industries as a desktop data visualization tool. It also provides a visual interface to SAS in an expanding line of solutions that includes SAS Visual BI and SAS Visual Data Discovery. Sall remains the lead architect for JMP.

Citation- http://www.sas.com/presscenter/bios/jsall.html

Ajay- I am thankful to John and his marketing communication specialist Arati for this interview.With an increasing focus on data to drive more rational decision making, SAS remains an interesting company to watch for in the era of mega- vendors and any SAS Institute deal and alliance will be  making potential investment bankers as well as newer customers drool. For previous interviews and coverage of SAS please use www.decisionstats.com/tag/sas