Here’s a data mining survey you may want to spend some minutes and take part into-
Here’s a data mining survey you may want to spend some minutes and take part into-
For people who always wanted to try out Linux but never had the time or the energy ( or courage to risk moving to a Linux only environment) here is a great application which allows you to keep Linux as well as Windows for double booting environment . You need 256 mb ram and 5 gb hard disk and you are good to go. It is a single click download and install to try out Ubuntu Linux and it preserves your Windows too.
If you wanted to try out R with Linux , then it is an easy way out for you ( and me).
Saves quite a lot of money per desktop per OS and per office productivity software.
Sounds good to be true – well the site is http://wubi-installer.org/
Here are some screenshots courtesy of the site itself-
Ubuntu Desktop Preview
And if you need to install your favorite applications (like R , Subversion) and don’t want to command line your way the solution is quite simple – It is called Synaptic and it is free and downloadable here
Two very good and very customer centric (and open source ) companies shook hands on a strategic partnership today.
Knime www.knime.org and Zementis www.zementis.com .
Decision Stats has been covering these companies and both the products are amazing good, synch in very well thanks to the support of the PMML standard and lower costs considerably for the consumer. (http://www.decisionstats.com/2009/02/knime/ ) and http://www.decisionstats.com/2009/02/interview-michael-zeller-ceozementis/ )
While Knime has both a free personal as well as a commercial license , it supports R thanks to the PMML (www.dmg.org initiative ). Knime also supports R very well .
See http://www.knime.org/blog/export-and-convert-r-models-pmml-within-knime
The following example R script learns a decision tree based on the Iris-Data and exports this as PMML and as an R model which is understood by the R Predictor node:
# load the library for learning a tree model
library(rpart);
# load the pmml export library
library(pmml);
# use class column as predicted column to build decision tree
dt <- rpart(class~., R)
# export to PMML
r_pmml <- pmml(dt)
# write the PMML model to an export file
write(toString(r_pmml), file="C:/R.pmml")
# provide the native R model at the out-port
R<-dt
Zementis takes the total cost of ownership and total pain of creating scored models to something close to 1$ /hour thanks to using their proprietary ADAPA engine.
The Predictive Analytics Conference (http://www.predictiveanalyticsworld.com/ ) starts today in Hotel Nikko ,San Francisco . A whole who’s who of analytics experts is gathering there including SAS,SPSS ,SAP, Click Forensics ,Acxiom ,Amazon, Google and a big R user conference as well. It is really really huge so stay tuned for some exciting announcements happening there.
Here is a follow up article to the SAS vs. R articles by Ashlee V of the NYT.
The SAS Institute has borrowed a page from Sesame Street. It is now sponsoring the letter ‘R.’
Last month, I wrote an article about the rising popularity of the R programming language. The open-source software has turned into a favorite piece of technology for statisticians and other people looking to pull insights out of data.
On several levels, R represents a threat to SAS, which is the largest seller of commercial statistics software. Students at universities now learn R alongside SAS. In addition, the open-source nature of R allows the software to be tweaked at a pace that is hard for a commercial software maker to match.
All told, surging interest in the free R language could affect sales of SAS software, which can sell for thousands of dollars. Rather than running from the threat, SAS appears ready to try to understand R by adopting a more active role in its development.
You can read more at http://bits.blogs.nytimes.com/2009/02/16/sas-warms-to-open-source-one-letter-at-a-time/ or even by clicking on the Bits RSS feed in the sidebar on www.decisionstats.com
Ajay –
Note SAS is only opening up the SAS/IML product to integrate R’s matrix language capabilities. The base SAS software seems to be still not integrated with R and so is the statistics module SAS/Stat (SAS Institute sells in add on modules based on functionality and prices accordingly).
Many third party sources like http://www.minequest.com have created interfaces from Base SAS to R – they are priced at around 50 $ a piece.
An additional threat to SAS’s dominance is from the WPS software from a UK based company , World Programming http://www.teamwpc.co.uk/home (which has an alliance with IBM) . WPS software can read , and write in SAS language and read and write SAS datasets as well, and is priced at 660 $ almost one tenth of SAS Institute’s licenses.
The recession is also forcing many large license holders of statistical software (like Banks and Financial Services) to seek discounts and alternatives. SAS Institute remains the industry leader in analytics software after almost 35 years of dominance.
However this is a nice first step and it would be interesting to see follow up steps from SAS Institute rivals .
We can all go on our respective open source and closed source jets now.
comments from Anne H. Milley, director for technology product marketing at SAS, who relegated R to a limited role.
In the article, Ms. Milley said, “I think it addresses a niche market for high-end data analysts that want free, readily available code. We have customers who build engines for aircraft. I am happy they are not using freeware when I get on a jet.”
Here is an equivalent of Proc Genmod in R .
If the SAS language code is as below-
PROC GENMOD DATA=X;
CLASS FLH;
MODEL BS/OCCUPANCY = distcrop distfor flh distcrop*flh /D=B LINK=LOGIT
TYPE3; RUN;
Then the R language equivalent would be :
glm(bs/occupancy ~ distcrop*flh+distcrop,
family=binomial(logit), weights=occupancy)
where flh needs to be a factor
Credit to Peter Dalgaard from the R-Help List
Peter is also author of the splendid standard R book–
Speaking of books – Here is one R book I am looking /waiting for –
A similar named free document ( Introduction to statistical modelling in R by P.M.E.Altham, Statistical Laboratory, University of Cambridge) is available here –
http://www.statslab.cam.ac.uk/~pat/redwsheets.pdf
It is a pretty nice reference document if Modelling is what you do, and R is what you need to explore.It was dated 5 February 2009, so its quite updated and new.You can also check Dr Altham’s home page for a lot of R resources.
From the official website itself http://support.sas.com/rnd/app/studio/Rinterface2.html
R Interface Coming to SAS/IML® Studio
While readers of the New York Times may have learned about R in recent weeks, it’s not news to many at SAS.
“R is a leading language for developing new statistical methods,” said Bob Rodriguez, Senior Director of Statistical Development at SAS. “Our new PhD developers learned R in their graduate programs and are quite versed in it.”
R is a matrix-based programming language that allows you to program statistical methods reasonably quickly. It’s open source software, and many add-on packages for R have emerged, providing statisticians with convenient access to new research. Many new statistical methods are first programmed in R.
While SAS is committed to providing the new statistical methodologies that the marketplace demands and will deliver new work more quickly with a recent decoupling of the analytical product releases from Base SAS, a commercial software vendor can only put out new work so fast. And never as as fast as a professor and a grad student writing an academic implementation of brand-new methodology.
Both R and SAS are here to stay, and finding ways to make them work better with each other is in the best interests of our customers.
“We know a lot of our users have both R and SAS in their tool kit, and we decided to make it easier for them to access R by making it available in the SAS environment,” said Rodriguez. “Our first interface to R will be in an upcoming version of SAS/IML Studio (currently known as SAS Stat Studio), scheduled for this summer.”
The SAS/IML Studio interface allows you to integrate R functionality with IML or SAS programs. You can also exchange data between SAS and R as data sets or matrices.
“This is just the first step,” said Radhika Kulkarni, Vice President of Advanced Analytics. “We are busy working on an R interface that can be surfaced in the SAS server or via other SAS clients. For example, users will be able to interface with R through the IML procedure, possibly as soon as the first part of 2010.“
SAS/IML Studio is distributed with SAS/IML software. Stay tuned for details on availability.
Note-SAS/IML ,Base SAS and SAS/Stat are copyrighted products of SAS Institute.
This is a welcome step from the industry leader SAS Institute and also puts an effective stop to rumors of it being too arrogant or too conservative to change.
Perhaps no other software maker has dominated the niche in which it operates for as long as SAS has ( even before I was born !) without getting into any kind of hassles. The decision to stay private as a company also means an incredibly wise decision given the carnage on stock markets today ( but it requires a lot of will power from the founders to say no to the easy billions that investment bankers would have lined up for the IPO).
This decision would also help the R project greatly as SAS support definitely means the matrix part of the R language has come to stay.However R is not just a matrix based programming language , it has capabilities for data mining and other statistical analysis as well. Would SAS extend SAS /Stat capabilities to R / What does recent decoupling of the analytical product releases from Base SAS mean ( is this due to the WPS challenge) .
Either way the consumer is the winner.Kudos SAS Institute !!