SAS , R and NYT – The Sequel

Here is a follow up article to the SAS vs. R articles by Ashlee V of the NYT.

 

The SAS Institute has borrowed a page from Sesame Street. It is now sponsoring the letter ‘R.’

Last month, I wrote an article about the rising popularity of the R programming language. The open-source software has turned into a favorite piece of technology for statisticians and other people looking to pull insights out of data.

On several levels, R represents a threat to SAS, which is the largest seller of commercial statistics software. Students at universities now learn R alongside SAS. In addition, the open-source nature of R allows the software to be tweaked at a pace that is hard for a commercial software maker to match.

All told, surging interest in the free R language could affect sales of SAS software, which can sell for thousands of dollars. Rather than running from the threat, SAS appears ready to try to understand R by adopting a more active role in its development.

You can read more at http://bits.blogs.nytimes.com/2009/02/16/sas-warms-to-open-source-one-letter-at-a-time/ or even by clicking on the Bits RSS feed in the sidebar on www.decisionstats.com

Ajay –

Note SAS is only opening up the SAS/IML product to integrate R’s matrix language capabilities. The base SAS software seems to be still not integrated with R and so is the statistics module SAS/Stat (SAS Institute sells in add on modules based on functionality and prices accordingly).

Many third party sources like http://www.minequest.com have created interfaces from Base SAS to R – they are priced at around 50 $ a piece.

An additional threat to SAS’s dominance is from the WPS software from a UK based company , World Programming http://www.teamwpc.co.uk/home (which has an alliance with IBM) . WPS software can read , and write in SAS language and read and write SAS datasets as well, and is priced at 660 $ almost one tenth of SAS Institute’s licenses.

The recession is also forcing many large license holders of statistical software (like Banks and Financial Services) to seek discounts and alternatives. SAS Institute remains the industry leader in analytics software after almost 35 years of dominance.

However this is a nice first step and it would be interesting to see follow up steps from SAS Institute rivals .

We can all go on our respective open source and closed source jets now.

comments from Anne H. Milley, director for technology product marketing at SAS, who relegated R to a limited role.

In the article, Ms. Milley said, “I think it addresses a niche market for high-end data analysts that want free, readily available code. We have customers who build engines for aircraft. I am happy they are not using freeware when I get on a jet.”

SAS adds support to R

From the official website itself http://support.sas.com/rnd/app/studio/Rinterface2.html

R Interface Coming to SAS/IML® Studio

While readers of the New York Times may have learned about R in recent weeks, it’s not news to many at SAS.

“R is a leading language for developing new statistical methods,” said Bob Rodriguez, Senior Director of Statistical Development at SAS. “Our new PhD developers learned R in their graduate programs and are quite versed in it.”

R is a matrix-based programming language that allows you to program statistical methods reasonably quickly. It’s open source software, and many add-on packages for R have emerged, providing statisticians with convenient access to new research. Many new statistical methods are first programmed in R.

While SAS is committed to providing the new statistical methodologies that the marketplace demands and will deliver new work more quickly with a recent decoupling of the analytical product releases from Base SAS, a commercial software vendor can only put out new work so fast. And never as as fast as a professor and a grad student writing an academic implementation of brand-new methodology.

Both R and SAS are here to stay, and finding ways to make them work better with each other is in the best interests of our customers.

“We know a lot of our users have both R and SAS in their tool kit, and we decided to make it easier for them to access R by making it available in the SAS environment,” said Rodriguez. “Our first interface to R will be in an upcoming version of SAS/IML Studio (currently known as SAS Stat Studio), scheduled for this summer.”

The SAS/IML Studio interface allows you to integrate R functionality with IML or SAS programs. You can also exchange data between SAS and R as data sets or matrices.

“This is just the first step,” said Radhika Kulkarni, Vice President of Advanced Analytics. “We are busy working on an R interface that can be surfaced in the SAS server or via other SAS clients. For example, users will be able to interface with R through the IML procedure, possibly as soon as the first part of 2010.“

SAS/IML Studio is distributed with SAS/IML software. Stay tuned for details on availability.

 

Note-SAS/IML ,Base SAS and SAS/Stat are  copyrighted products of SAS Institute.

This is a welcome step from the industry leader SAS Institute and also puts an effective stop to rumors of it being too arrogant or too conservative to change.

Perhaps no other software maker has dominated the niche in which it operates for as long as SAS has ( even before I was born !) without getting into any kind of hassles. The decision to stay  private as a company also means an incredibly wise decision given the carnage on stock markets today ( but it requires a lot of will power from the founders to say no to the easy billions that investment bankers would have lined up for the IPO).

This decision would also help the R project greatly as SAS support definitely means the matrix part of the R language has come to stay.However R is not just a matrix based programming language , it has capabilities for data mining and other statistical analysis as well. Would SAS extend SAS /Stat capabilities to R / What does recent decoupling of the analytical product releases from Base SAS mean ( is this due to the WPS challenge) .

Either way the consumer is the winner.Kudos SAS Institute !!

As mentioned before, Zementis is at the forefront of using Cloud Computing ( Amazon EC2 ) for open source analytics. Recently I came in contact with Michael Zeller for a business problem , and Mike being the gentleman he is not only helped me out but also agreed on an extensive and exclusive interview.(!)

image

Ajay- What are the traditional rivals to scoring solutions offered by you. How does ADAPA compare to each of them. Case Study- Assume I have 50000 leads daily on a Car buying website. How would ADAPA help me in scoring the model ( created say by KXEN or , R or,SAS, or SPSS).What would my approximate cost advantages be if I intend to mail say the top 5 deciles everyday.

Michael- Some of the traditional scoring solutions used today are based on SAS, in-database scoring like Oracle, MS SQL Server, or very often even custom code.  ADAPA is able to import the models from all tools that support the PMML standard, so any of the above tools, open source or commercial, could serve as an excellent development environment.

The key differentiators for ADAPA are simple and focus on cost-effective deployment:

1) Open Standards – PMML & SOA:

Freedom to select best-of-breed development tools without being locked into a specific vendor;  integrate easily with other systems.

2) SaaS-based Cloud Computing:

Delivers a quantum leap in cost-effectiveness without compromising on scalability.

In your example, I assume that you’d be able to score your 50,000 leads in one hour using one ADAPA engine on Amazon.  Therefore, you could choose to either spend US$100,000 or more on hardware, software, maintenance, IT services, etc., write a project proposal, get it approved by management, and be ready to score your model in 6-12 months…

OR, you could use ADAPA at something around US$1-$2 per day for the scenario above and get started today!  To get my point across here, I am of course simplifying the scenario a little bit, but in essence these are your choices.

Sounds too good to be true?  We often get this response, so please feel free to contact us today [http://www.zementis.com/contact.htm] and we will be happy show you how easy it can be to deploy predictive models with ADAPA!

 

Ajay- The ADAPA solution seems to save money on both hardware and software costs. Comment please. Also any benchmarking tests that you have done on a traditional scoring configuration system versus ADAPA.

Michael-Absolutely, the ADAPA Predictive Analytics Edition [http://www.zementis.com/predictive_analytics_edition.htm] on Amazon’s cloud computing infrastructure (Amazon EC2) eliminates the upfront investment in hardware and software.  It is a true Software as a Service (SaaS) offering on Amazon EC2 [http://www.zementis.com/howtobuy.htm] whereby users only pay for the actual machine time starting at less than US$1 per machine hour.  The ADAPA SaaS model is extremely dynamic, e.g., a user is able to select an instance type most appropriate for the job at hand (small, large, x-large) or launch one or even 100 instances within minutes.

In addition to the above savings in hardware/software, ADAPA also cuts the time-to-market for new models (priceless!) which adds to business agility, something truly critical for the current economic climate.

Regarding a benchmark comparison, it really depends on what is most important to the business.  Business agility, time-to-market, open standards for integration, or pure scoring performance?  ADAPA addresses all of the above.  At its core, it is a highly scalable scoring engine which is able to process thousands of transactions per second.  To tackle even the largest problems, it is easy to scale ADAPA via more CPUs, clustering, or parallel execution on multiple independent instances. 

Need to score lots of data once a month which would take 100 hours on one computer?  Simply launch 10 instances and complete the job in 10 hours over night.  No extra software licenses, no extra hardware to buy — that’s capacity truly on-demand, whenever needed, and cost-effective.

Ajay- What has been your vision for Zementis. What exciting products are we going to see from it next.

Michael – Our vision at Zementis [http://www.zementis.com] has been to make it easier for users to leverage analytics.  The primary focus of our products is on the deployment side, i.e., how to integrate predictive models into the business process and leverage them in real-time.  The complexity of deployment and the cost associated with it has been the main hurdle for a more widespread adoption of predictive analytics. 

Adhering to open standards like the Predictive Model Markup Language (PMML) [http://www.dmg.org/] and SOA-based integration, our ADAPA engine [http://www.zementis.com/products.htm] paves the way for new use cases of predictive analytics — wherever a painless, fast production deployment of models is critical or where the cost of real-time scoring has been prohibitive to date.

We will continue to contribute to the R/PMML export package [http://www.zementis.com/pmml_exporters.htm] and extend our free PMML converter [http://www.zementis.com/pmml_converters.htm] to support the adoption of the standard.  We believe that the analytics industry will benefit from open standards and we are just beginning to grasp what data-driven decision technology can do for us.  Without giving away much of our roadmap, please stay tuned for more exciting products that will make it easier for businesses to leverage the power of predictive analytics!

Ajay- Any India or Asia specific plans for the Zementis.

Michael-Zementis already serves customers in the Asia/Pacific region from its office in Hong Kong.  We expect rapid growth for predictive analytics in the region and we think our cost-effective SaaS solution on Amazon EC2 will be of great service to this market.  I could see various analytics outsourcing and consulting firms benefit from using ADAPA as their primary delivery mechanism to provide clients with predictive  models that are ready to be executed on-demand.

Ajay-What do you believe be the biggest challenges for analytics in 2009. What are the biggest opportunities.

Michael-The biggest challenge for analytics will most likely be the reduction in technology spending in a deep, global recession.  At the same time, companies must take advantage of analytics to cut cost, optimize processes, and to become more competitive.  Therefore, the biggest opportunity for analytics will be in the SaaS field, enabling clients to employ analytics without upfront capital expenditures.

Ajay – What made you choose a career in science. Describe your journey so far.What would your advice be to young science graduates in this recessionary times.

Michael- As a physicist, my research focused on neural networks and intelligent systems.  Predictive analytics is a great
way for me to stay close to science while applying such complex algorithms to solve real business problems.  Even in a recession, there is always a need for good people with the desire to excel in their profession.  Starting your career, I’d say the best way is to remain broad in expertise rather than being too specialized on one particular industry or proficient in a single analytics tool.  A good foundation of math and computer science, combined with curiosity in how to apply analytics to specific business problems will provide opportunities, even in the current economic climate.

About Zementis

Zementis, Inc. is a software company focused on predictive analytics and advanced Enterprise Decision Management technology. We combine science and software to create superior business imageand industrial solutions for our clients. Our scientific expertise includes statistical algorithms, machine learning, neural networks, and intelligent systems and our scientists have a proven record in producing effective predictive models to extract hidden patterns from a variety of data types. It is complemented by our product offering ADAPA®, a decision engine framework for real-time execution of predictive models and rules. For more information please visit www.zementis.com

Ajay-If you have a lot of data ( GB’s and GB’s) , an existing model ( in SAS,SPSS,R) which you converted to PMML, and it is time for you to choose between spending more money to upgrade your hardware, renew your software licenses  then instead take a look at the ADAPA from www.zementis.com and score models as low as 1$ per hour. Check it out ( test and control !!)

Do you have any additional queries from Michael ? Use the comments page to ask….

Interview Alan Churchill Savian

An interview with Alan Churchill, SAS Consultant and Alumni of SAS Institute.

Ajay- What’s the latest trend you see in Computer Programming over the next year and next three to five years.

Alan- Silverlight and Flex will be huge and will really enable much more SaaS. The current web simply needs wholesale replacement to make it more usable for business applications. These new RIAs will allow us, as developers, to take it to a whole new level. Expect a massive influx of dollars into web redesign and redevelopment.

Ajay-  Tell us how you came in this field of work, and what factors made you succeed.

Alan- I got into computers in high school (this was very early computing). I loved the sense of challenge that computers offered: they were a big crossword puzzle. I succeeded because I never viewed a problem the way a typical computer person or scientist would view them. As a history guy, I took a more holistic approach to problems. Heck, if you don’t know about a particular theory, you won’t be constrained by it. If you do know it, sometimes ignore it to get the job done, even if it isn’t as pretty.

Ajay-  Most challenging and fun project you ever did (anonymous details)

Alan- I have had many, many rewarding projects. As a consultant, every job is different. However, the spare time project one I am currently working on (figuring out the layout of the sas dataset) is perhaps my favorite due to the complexity.

Ajay- Advice to people wanting to join computer programming as a career- Positive Things, Challenges, Skill Requirements.

Alan- First of all, programming is hard so be prepared to work to be good. Never ever stop evolving and looking for the next thing: you are only as good as your last 18 months of experience.

The career is very rewarding since you are continuously facing challenges that must be overcome. Computers have no patience for mistakes so they require a lot of patience for programmers.

Always, always, always think outside of the box. Approach problems differently. If you hit an obstacle, move around it rather than always trying to burrow through. At the end of the day, it is all about getting the job solved at the speed of business not finding a cool, nifty new algorithm: do that on your spare time.

Ajay- Would you like to visit India for work/travel.

Alan- I honestly don’t like to travel long distances. After a long corporate career flying over a million miles, travel is simply taxing to me and takes me away from what I love to do: programming. As a history major, I love various cultures and would enjoy the beauty and history that India provides but would dread the flight ;-]

Bio;

Alan Churchill has been coding in SAS for over 20 years and worked at SAS as a senior consultant for 5 1/2 years.At SAS, Alan worked on the Microsoft-SAS Alliance and helped SAS customers integrate with .NET. He is also responsible for coding the engine for SAS’s web analytics product. Currently, he is the owner of Savian which specializes in Microsoft-SAS solutions. He lives and works in Colorado Springs, Colorado.

SAS Analytics :Google Earth and Lex Jansen’s Site

Google earth stores values into KML files . These are almost like XMl file formats. The zipped versions of the KML file is the KMZ file .(It beats me why Google Wanted to create a zipped file format for KML ,since most KML files are extremely small).

To do any geo-coding analysis with Google Earth, here are two SAS papers from Lex Jansen’s terrific site.

1) 

Put Your Customers on the Map: Integrating SAS/GRAPH and Google Earth
(http://www2.sas.com/proceedings/forum2008/252-2008.pdf)

Daniel Kuiper, Koen Vyverman (SAS Global Forum, 2008-03)

 

and

2) Using SAS and Google Earth to Access and Display Air Pollution Data

(http://www2.sas.com/proceedings/forum2008/253-2008.pdf)
Joshua Drukenbrod, David Mintz (SAS Global Forum, 2008-03)

 

These two papers are great in the way they use Google Earth for geo coding analysis and visual representation. They however require SAS to be licensed with you.

Lex Jansen ‘s site is generally considered the de facto site to search for analytics especially related to SAS.

 

 

 

 

 

 

 

 

 

 

 

 

Upcoming Book

The great Bob Muenchen, is coming with the very good updated version of the R for SAS and SPSS users book— in September 2008 to help people learn R , if they have used only SAS or SPSS before.We first covered the earlier edition of the book here.The book adds sections on R Commander, Rattle and JGR as well as two chapters on graphics ; one on basic stats. The author runs the examples and walks though them explaining each step, especially where the results differ from SAS & SPSS. Check the new book here-

http://www.amazon.com/SAS-SPSS-Users-Statistics-Computing/dp/0387094172/ref=pd_bbs_sr_1?ie=UTF8&s=books&qid=1217456813&sr=8-1