Interview Kelci Miclaus, SAS Institute Using #rstats with JMP

Here is an interview with Kelci Miclaus, a researcher working with the JMP division of the SAS Institute, in which she demonstrates examples of how the R programming language is a great hit with JMP customers who like to be flexible.

 

Ajay- How has JMP been using integration with R? What has been the feedback from customers so far? Is there a single case study you can point out where the combination of JMP and R was better than any one of them alone?

Kelci- Feedback from customers has been very positive. Some customers are using JMP to foster collaboration between SAS and R modelers within their organizations. Many are using JMP’s interactive visualization to complement their use of R. Many SAS and JMP users are using JMP’s integration with R to experiment with more bleeding-edge methods not yet available in commercial software. It can be used simply to smooth the transition with regard to sending data between the two tools, or used to build complete custom applications that take advantage of both JMP and R.

One customer has been using JMP and R together for Bayesian analysis. He uses R to create MCMC chains and has found that JMP is a great tool for preparing the data for analysis, as well as displaying the results of the MCMC simulation. For example, the Control Chart platform and the Bubble Plot platform in JMP can be used to quickly verify convergence of the algorithm. The use of both tools together can increase productivity since the results of an analysis can be achieved faster than through scripting and static graphics alone.

I, along with a few other JMP developers, have written applications that use JMP scripting to call out to R packages and perform analyses like multidimensional scaling, bootstrapping, support vector machines, and modern variable selection methods. These really show the benefit of interactive visual analysis of coupled with modern statistical algorithms. We’ve packaged these scripts as JMP add-ins and made them freely available on our JMP User Community file exchange. Customers can download them and now employ these methods as they would a regular JMP platform. We hope that our customers familiar with scripting will also begin to contribute their own add-ins so a wider audience can take advantage of these new tools.

(see http://www.decisionstats.com/jmp-and-r-rstats/)

Ajay- Are there plans to extend JMP integration with other languages like Python?

Kelci- We do have plans to integrate with other languages and are considering integrating with more based on customer requests. Python has certainly come up and we are looking into possibilities there.

 Ajay- How is R a complimentary fit to JMP’s technical capabilities?

Kelci- R has an incredible breadth of capabilities. JMP has extensive interactive, dynamic visualization intrinsic to its largely visual analysis paradigm, in addition to a strong core of statistical platforms. Since our brains are designed to visually process pictures and animated graphs more efficiently than numbers and text, this environment is all about supporting faster discovery. Of course, JMP also has a scripting language (JSL) allowing you to incorporate SAS code, R code, build analytical applications for others to leverage SAS, R and other applications for users who don’t code or who don’t want to code.

JSL is a powerful scripting language on its own. It can be used for dialog creation, automation of JMP statistical platforms, and custom graphic scripting. In other ways, JSL is very similar to the R language. It can also be used for data and matrix manipulation and to create new analysis functions. With the scripting capabilities of JMP, you can create custom applications that provide both a user interface and an interactive visual back-end to R functionality. Alternatively, you could create a dashboard using statistical and/or graphical platforms in JMP to explore the data and with the click of a button, send a portion of the data to R for further analysis.

Another JMP feature that complements R is the add-in architecture, which is similar to how R packages work. If you’ve written a cool script or analysis workflow, you can package it into a JMP add-in file and send it to your colleagues so they can easily use it.

Ajay- What is the official view on R from your organization? Do you think it is a threat, or a complimentary product or another statistical platform that coexists with your offerings?

Kelci- Most definitely, we view R as complimentary. R contributors are providing a tremendous service to practitioners, allowing them to try a wide variety of methods in the pursuit of more insight and better results. The R community as a whole is providing a valued role to the greater analytical community by focusing attention on newer methods that hold the most promise in so many application areas. Data analysts should be encouraged to use the tools available to them in order to drive discovery and JMP can help with that by providing an analytic hub that supports both SAS and R integration.

Ajay-  While you do use R, are there any plans to give back something to the R community in terms of your involvement and participation (say at useR events) or sponsoring contests.

 Kelci- We are certainly open to participating in useR groups. At Predictive Analytics World in NY last October, they didn’t have a local useR group, but they did have a Predictive Analytics Meet-up group comprised of many R users. We were happy to sponsor this. Some of us within the JMP division have joined local R user groups, myself included.  Given that some local R user groups have entertained topics like Excel and R, Python and R, databases and R, we would be happy to participate more fully here. I also hope to attend the useR! annual meeting later this year to gain more insight on how we can continue to provide tools to help both the JMP and R communities with their work.

We are also exploring options to sponsor contests and would invite participants to use their favorite tools, languages, etc. in pursuit of the best model. Statistics is about learning from data and this is how we make the world a better place.

About- Kelci Miclaus

Kelci is a research statistician developer for JMP Life Sciences at SAS Institute. She has a PhD in Statistics from North Carolina State University and has been using SAS products and R for several years. In addition to research interests in statistical genetics, clinical trials analysis, and multivariate analysis/visualization methods, Kelci works extensively with JMP, SAS, and R integration.

.

 

Summer School on Uncertainty Quantification

Scheme for sensitivity analysis
Image via Wikipedia

SAMSI/Sandia Summer School on Uncertainty Quantification – June 20-24, 2011

http://www.samsi.info/workshop/samsisandia-summer-school-uncertainty-quantification

The utilization of computer models for complex real-world processes requires addressing Uncertainty Quantification (UQ). Corresponding issues range from inaccuracies in the models to uncertainty in the parameters or intrinsic stochastic features.

This Summer school will expose students in the mathematical and statistical sciences to common challenges in developing, evaluating and using complex computer models of processes. It is essential that the next generation of researchers be trained on these fundamental issues too often absent of traditional curricula.

Participants will receive not only an overview of the fast developing field of UQ but also specific skills related to data assimilation, sensitivity analysis and the statistical analysis of rare events.

Theoretical concepts and methods will be illustrated on concrete examples and applications from both nuclear engineering and climate modeling.

The main lecturers are:
Dan Cacuci (N.C. State University): data assimilation and applications to nuclear engineering

Dan Cooley (Colorado State University): statistical analysis of rare events
This short course will introduce the current statistical practice for analyzing extreme events. Statistical practice relies on fitting distributions suggested by asymptotic theory to a subset of data considered to be extreme. Both block maximum and threshold exceedance approaches will be presented for both the univariate and multivariate cases.

Doug Nychka (NCAR): data assimilation and applications in climate modeling
Climate prediction and modeling do not incorporate geophysical data in the sequential manner as weather forecasting and comparison to data is typically based on accumulated statistics, such as averages. This arises because a climate model matches the state of the Earth’s atmosphere and ocean “on the average” and so one would not expect the detailed weather fluctuations to be similar between a model and the real system. An emerging area for climate model validation and improvement is the use of data assimilation to scrutinize the physical processes in a model using observations on shorter time scales. The idea is to find a match between the state of the climate model and observed data that is particular to the observed weather. In this way one can check whether short time physical processes such as cloud formation or dynamics of the atmosphere are consistent with what is observed.

Dongbin Xiu (Purdue University): sensitivity analysis and polynomial chaos for differential equations
This lecture will focus on numerical algorithms for stochastic simulations, with an emphasis on the methods based on generalized polynomial chaos methodology. Both the mathematical framework and the technical details will be examined, along with performance comparisons and implementation issues for practical complex systems.

The main lectures will be supplemented by discussion sessions and by presentations from UQ practitioners from both the Sandia and Los Alamos National Laboratories.

http://www.samsi.info/workshop/samsisandia-summer-school-uncertainty-quantification

Using R for Time Series in SAS

 

Time series: random data plus trend, with best...
Image via Wikipedia

 

Here is a great paper on using Time Series in R, and it specifically allows you to use just R output in Base SAS.

SAS Code

/* three methods: */

/* 1. Call R directly – Some errors are not reported to log */

x “’C:\Program Files\R\R-2.12.0\bin\r.exe’–no-save –no-restore <“”&rsourcepath\tsdiag.r””>””&rsourcepath\tsdiag.out”””;

/* include the R log in the SAS log */7data _null_;

infile “&rsourcepath\tsdiag.out”;

file log;

input;

put ’R LOG: ’ _infile_;

run;

/* include the image in the sas output.Specify a file if you are not using autogenerated html output */

ods html;

data _null_;

file print;

put “<IMG SRC=’” “&rsourcepath\plot.png” “’ border=’0’>”;

put “<IMG SRC=’” “&rsourcepath\acf.png” “’ border=’0’>”;

put “<IMG SRC=’” “&rsourcepath\pacf.png” “’ border=’0’>”;

put “<IMG SRC=’” “&rsourcepath\spect.png” “’ border=’0’>”;

put “<IMG SRC=’” “&rsourcepath\fcst.png” “’ border=’0’>”;

run;

ods html close;

The R code to create a time series plot is quite elegant though-


library(tseries)

air <- AirPassengers #Datasetname

ts.plot(air)

acf(air)

pacf(air)

plot(decompose(air))

air.fit <- arima(air,order=c(0,1,1), seasonal=list(order=c(0,1,1), period=12) #The ARIMA Model Based on PACF and ACF Graphs

tsdiag(air.fit)

library(forecast)

air.forecast <- forecast(air.fit)

plot.forecast(air.forecast)

You can download the fascinating paper from the Analytics NCSU Website http://analytics.ncsu.edu/sesug/2008/ST-146.pdf

About the Author-

Sam Croker has a MS in Statistics from the University of South Carolina and has over ten years of experience in analytics.   His research interests are in time series analysis and forecasting with focus on stream-flow analysis.  He is currently using SAS, R and other analytical tools for fraud and abuse detection in Medicare and Medicaid data. He also has experience in analyzing, modeling and forecasting in the finance, marketing, hospitality, retail and pharmaceutical industries.