The big big Analytics Conference

The Predictive Analytics Conference (http://www.predictiveanalyticsworld.com/ ) starts today in Hotel Nikko ,San Francisco . A whole whos who of analytics experts is gathering there including SAS,SPSS ,SAP, Click Forensics ,Acxiom ,Amazon, Google and a big R user conference as well. It is really really huge so stay tuned for some exciting announcements happening there.

image

SAS , R and NYT – The Sequel

Here is a follow up article to the SAS vs. R articles by Ashlee V of the NYT.

 

The SAS Institute has borrowed a page from Sesame Street. It is now sponsoring the letter R.

Last month, I wrote an article about the rising popularity of the R programming language. The open-source software has turned into a favorite piece of technology for statisticians and other people looking to pull insights out of data.

On several levels, R represents a threat to SAS, which is the largest seller of commercial statistics software. Students at universities now learn R alongside SAS. In addition, the open-source nature of R allows the software to be tweaked at a pace that is hard for a commercial software maker to match.

All told, surging interest in the free R language could affect sales of SAS software, which can sell for thousands of dollars. Rather than running from the threat, SAS appears ready to try to understand R by adopting a more active role in its development.

You can read more at http://bits.blogs.nytimes.com/2009/02/16/sas-warms-to-open-source-one-letter-at-a-time/ or even by clicking on the Bits RSS feed in the sidebar on www.decisionstats.com

Ajay –

Note SAS is only opening up the SAS/IML product to integrate Rs matrix language capabilities. The base SAS software seems to be still not integrated with R and so is the statistics module SAS/Stat (SAS Institute sells in add on modules based on functionality and prices accordingly).

Many third party sources like http://www.minequest.com have created interfaces from Base SAS to R – they are priced at around 50 $ a piece.

An additional threat to SAS’s dominance is from the WPS software from a UK based company , World Programming http://www.teamwpc.co.uk/home (which has an alliance with IBM) . WPS software can read , and write in SAS language and read and write SAS datasets as well, and is priced at 660 $ almost one tenth of SAS Institute’s licenses.

The recession is also forcing many large license holders of statistical software (like Banks and Financial Services) to seek discounts and alternatives. SAS Institute remains the industry leader in analytics software after almost 35 years of dominance.

However this is a nice first step and it would be interesting to see follow up steps from SAS Institute rivals .

We can all go on our respective open source and closed source jets now.

comments from Anne H. Milley, director for technology product marketing at SAS, who relegated R to a limited role.

In the article, Ms. Milley said, I think it addresses a niche market for high-end data analysts that want free, readily available code. We have customers who build engines for aircraft. I am happy they are not using freeware when I get on a jet.

Modeling : R Code,Books and Documents

Here is an equivalent of Proc Genmod in R .

If the SAS language code is as below-

PROC GENMOD DATA=X;
CLASS FLH;
MODEL BS/OCCUPANCY = distcrop distfor flh distcrop*flh /D=B LINK=LOGIT
TYPE3; RUN;

 

Then the R language equivalent would be :

glm(bs/occupancy ~ distcrop*flh+distcrop,
   family=binomial(logit), weights=occupancy)
where flh needs to be a factor

 

Credit to Peter Dalgaard from the R-Help List 

Peter is also author of the splendid standard R book

 

Speaking of books Here is one R book I am looking /waiting for

 

A similar named free document ( Introduction to statistical modelling in R by P.M.E.Altham, Statistical Laboratory, University of Cambridge)  is available here –

http://www.statslab.cam.ac.uk/~pat/redwsheets.pdf

It is a pretty nice reference document if Modelling is what you do, and R is what you need to explore.It was dated 5 February 2009, so its quite updated and new.You can also check Dr Althams home page for a lot of R resources.

SAS adds support to R

From the official website itself http://support.sas.com/rnd/app/studio/Rinterface2.html

R Interface Coming to SAS/IML Studio

While readers of the New York Times may have learned about R in recent weeks, it’s not news to many at SAS.

R is a leading language for developing new statistical methods, said Bob Rodriguez, Senior Director of Statistical Development at SAS. Our new PhD developers learned R in their graduate programs and are quite versed in it.

R is a matrix-based programming language that allows you to program statistical methods reasonably quickly. It’s open source software, and many add-on packages for R have emerged, providing statisticians with convenient access to new research. Many new statistical methods are first programmed in R.

While SAS is committed to providing the new statistical methodologies that the marketplace demands and will deliver new work more quickly with a recent decoupling of the analytical product releases from Base SAS, a commercial software vendor can only put out new work so fast. And never as as fast as a professor and a grad student writing an academic implementation of brand-new methodology.

Both R and SAS are here to stay, and finding ways to make them work better with each other is in the best interests of our customers.

We know a lot of our users have both R and SAS in their tool kit, and we decided to make it easier for them to access R by making it available in the SAS environment, said Rodriguez. Our first interface to R will be in an upcoming version of SAS/IML Studio (currently known as SAS Stat Studio), scheduled for this summer.

The SAS/IML Studio interface allows you to integrate R functionality with IML or SAS programs. You can also exchange data between SAS and R as data sets or matrices.

This is just the first step, said Radhika Kulkarni, Vice President of Advanced Analytics. We are busy working on an R interface that can be surfaced in the SAS server or via other SAS clients. For example, users will be able to interface with R through the IML procedure, possibly as soon as the first part of 2010.

SAS/IML Studio is distributed with SAS/IML software. Stay tuned for details on availability.

 

Note-SAS/IML ,Base SAS and SAS/Stat are  copyrighted products of SAS Institute.

This is a welcome step from the industry leader SAS Institute and also puts an effective stop to rumors of it being too arrogant or too conservative to change.

Perhaps no other software maker has dominated the niche in which it operates for as long as SAS has ( even before I was born !) without getting into any kind of hassles. The decision to stay  private as a company also means an incredibly wise decision given the carnage on stock markets today ( but it requires a lot of will power from the founders to say no to the easy billions that investment bankers would have lined up for the IPO).

This decision would also help the R project greatly as SAS support definitely means the matrix part of the R language has come to stay.However R is not just a matrix based programming language , it has capabilities for data mining and other statistical analysis as well. Would SAS extend SAS /Stat capabilities to R / What does recent decoupling of the analytical product releases from Base SAS mean ( is this due to the WPS challenge) .

Either way the consumer is the winner.Kudos SAS Institute !!

SPSS and R

I rarely use SPSS now, but in college ( www.iiml.ac.in) my marketing professors kind of ensured I was buried in it for weeks. Much later I did to some ARIMA forecasting in SPSS for macro economic indicators prediction ( details coming up)–

 

However the SPSS help list is a great one ( SPSSX-L@LISTSERV.UGA.EDU) , not just for staying in touch with SPSS but also with the latest statistical modeling techniques. Here is an extract from the list ( www.listserv.uga.edu/archives/spssx-l.html ) on using SPSS and R together

 

Assuming version 16 or later, you need to install the R plug-in from Developer Central.  Then your R syntax can be run in the syntax window between

BEGIN PROGRAM R.

and

END PROGRAM R.

The output automatically appears in the SPSS Viewer with two cautions.  1) In version 16, R graphics are written to files and don’t appear in the Viewer.  Version 17 integrates the graphics directly.  2) When using R interactively, expression output appears in your console windows, e.g.,

summary(dta)

displays the summary statistics for a data frame, dta.  In non-interactive mode, which is what you are in when running BEGIN PROGRAM, you need to enclose the expression in a print function for it to display, e.g.,

print(summary(dta))

The documentation for the apis to communicate between SPSS and R is installed along with the plug-in, and there are examples in the Data Management book linked on Developer Central (www.spss.com/devcentral).

You might also go through the PowerPoint article on Developer Central, "Programmability in SPSS Statistics 17", which you will find on the front page of the site.  It includes a detailed example of using the R Quantreg package in SPSS as an extension command.  There is also a download in the R section on creating an SPSS dialog box that generates an R program directly.  Look for Rboxplot – Creating an R Program from a Dialog.  This has a simple dialog box that generates code for an R boxplot along with an article that explains what is happening.

 

Ajay ‘s 2 cents– SPSS treats R as an opportunity rather than a threat, partly because SPSS is a much lower priced software , and has been working to displace SAS in vain for some time now.

SAS ( the company and not the language) as the market leader has the most to lose due to

  • its high market share ( which it has maintained by aggressively seeking both legal action as well as by pumping in or investing or generously giving — huge amounts of money in hosting conferences,papers and research and keeping alumni and current employees happy and loyal),

and

  • premium pricing ( which comes under greater pricing pressure amid a general economic downturn amongst its preferred customers -especially banks and companies like Amazon , GE Money etc)

and

  • multi pronged competition with tacit support from bigger players waiting on sidelines
  • ( like IBM has an alliance with WPS which is almost a de facto Base SAS clone as it can take in SAS datasets, SAS code, and output SAS code, SAS datasets besides having it’s own Eclipse based design for the Workbench
  • Microsoft expanding data mining capabilities in SQL Server and initiatives like Microsoft Azure ( OS for Cloud Computers ) and Microsoft Mesh .
  • open source players like R, KNIME, Rapid Miner getting commercial momentum due to better value for cost ( 0 ).

and

  • data and code portability between SAS,SPSS,R due to PMML standards means switching barriers are getting lowered. There are almost no switching barriers between Base SAS and WPS in my testing experience.

The coming market share battles between SAS, and WPS and R will be interesting to watch for the analyst/customers — that is if the current economic crisis doesn’t claim any of the companies or the clients first. Alliances as well community networking among users and developers could be critical.

Still innovation flows from creative destruction of old ideas, mindsets, attitudes and yes even software.

Vote for the SAS-L Rookie of the Year

If you are on the SAS-L list, you can vote for the following

 

SAS-L Rookie of the Year (SASLROY)


Scott Bucher
Joe Matise
Akshaya Nathilvar
Ajay Ohri                        (This is me…..by the way)
Karma Tarap

You can vote (one vote per person please) at:
http://ires.ku.edu/~ipsr/SGF2009/saslbof.htm
Voting will end February 12th.

And, as usual, the winners will be announced at the annual SAS-L BOF, at SAS Global Forum:
When: Monday, March 23
Where: TBA
Time: 7-8 pm

Ps-I wonder if the R –Help list has something  like this.

Revolution Computing Releases Commercial R –The Analytics Market just grew better

I just downloaded R Comp’s latest release of REvolution R. The individual Win 32 version is free, while Enterprise version with Win 64 versions. Tech support is included in services contract for the software which should help with any corporate willing to take R on a trial basis.

 

From the press release ,

REvolution Computing Makes High Performance ‘REvolution R’

Available For Download

New Haven, CT – January 28, 2009 – REvolution Computing, a leading provider of open source predictive analytics solutions, today announced that it has made a public version of its commercial grade REvolution R program available for download from its website. REvolution R is REvolution Computing’s distribution of the popular R statistical software, optimized for use in commercial environments.

With the latest release of REvolution R, REvolution Computing has added significant performance enhancements to the base system, which can prove to be of great value in both commercial and research settings. A key feature includes the use of powerful optimized libraries capable of boosting performance by a factor of 5 or 10 for commonly used operations. In addition, REvolution R has been put through a quality process designed to meet regulatory agency audit standards, making the subscription version reliable for use in mission critical research and production.

“In making our latest release of REvolution R available for download, REvolution Computing is providing all R users the ability to take advantage of optimized and validated software previously available only to commercial users,” said REvolution Computing CEO, Richard Schultz. “In a true commercial open source way, we have reached the point in our development that we are able to offer significant value to both sets of our community users – REvolution R for all users, and REvolution R Enterprise, with additional commercial-grade capabilities and support, available by annual subscription.”

REvolution’s commercial distribution, REvolution R Enterprise, features advanced functionality, including ParallelR, which speeds deployment across both multiprocessor workstations and clusters to enable the same codes to be used for prototyping and production. REvolution R Enterprise is functional with 64-bit platforms and Linux enterprise platforms and provides for telephone support and response guarantees.

Some background on the company itself ………..from the company itself-

 

About REvolution Computing

New Haven, Connecticut-based REvolution Computing is the leading commercial provider of software and support for the statistical computing language known as “R.” 

Our products, including REvolution R and REvolution R Enterprise, enable statisticians, scientists and others to create superior predictive models and derive meaning from large sets of mission-critical data in record time. REvolution Computing

 

works closely with the R community to incorporate the latest developments in open source R, and with our clients to support their efforts to produce groundbreaking innovations in life sciences, financial services, defense technology and other industries where high-level analytics are crucial to success. At REvolution Computing, “We do the math.”

The product names “RPro,” “ParallelR,” “REvolution R,” and “REvolution R Enterprise,” are trademarks of REvolution Computing.

 

This basically gives the company first mover

advantage in commercial R. The timing is also fortunate as companies across the world look to cut costs (unfortunately labor costs are being cut faster than software costs) as well as move beyond traditional analytics softwares that performed ah so well in the sub prime prediction market.

REvolution R is available for download on Windows and Intel MacOS X, both in 32-bit mode at http://www.revolution-computing.com/downloads/revolution-r.php