Interview Kelci Miclaus, SAS Institute Using #rstats with JMP

Here is an interview with Kelci Miclaus, a researcher working with the JMP division of the SAS Institute, in which she demonstrates examples of how the R programming language is a great hit with JMP customers who like to be flexible.

 

Ajay- How has JMP been using integration with R? What has been the feedback from customers so far? Is there a single case study you can point out where the combination of JMP and R was better than any one of them alone?

Kelci- Feedback from customers has been very positive. Some customers are using JMP to foster collaboration between SAS and R modelers within their organizations. Many are using JMP’s interactive visualization to complement their use of R. Many SAS and JMP users are using JMP’s integration with R to experiment with more bleeding-edge methods not yet available in commercial software. It can be used simply to smooth the transition with regard to sending data between the two tools, or used to build complete custom applications that take advantage of both JMP and R.

One customer has been using JMP and R together for Bayesian analysis. He uses R to create MCMC chains and has found that JMP is a great tool for preparing the data for analysis, as well as displaying the results of the MCMC simulation. For example, the Control Chart platform and the Bubble Plot platform in JMP can be used to quickly verify convergence of the algorithm. The use of both tools together can increase productivity since the results of an analysis can be achieved faster than through scripting and static graphics alone.

I, along with a few other JMP developers, have written applications that use JMP scripting to call out to R packages and perform analyses like multidimensional scaling, bootstrapping, support vector machines, and modern variable selection methods. These really show the benefit of interactive visual analysis of coupled with modern statistical algorithms. We’ve packaged these scripts as JMP add-ins and made them freely available on our JMP User Community file exchange. Customers can download them and now employ these methods as they would a regular JMP platform. We hope that our customers familiar with scripting will also begin to contribute their own add-ins so a wider audience can take advantage of these new tools.

(see http://www.decisionstats.com/jmp-and-r-rstats/)

Ajay- Are there plans to extend JMP integration with other languages like Python?

Kelci- We do have plans to integrate with other languages and are considering integrating with more based on customer requests. Python has certainly come up and we are looking into possibilities there.

 Ajay- How is R a complimentary fit to JMP’s technical capabilities?

Kelci- R has an incredible breadth of capabilities. JMP has extensive interactive, dynamic visualization intrinsic to its largely visual analysis paradigm, in addition to a strong core of statistical platforms. Since our brains are designed to visually process pictures and animated graphs more efficiently than numbers and text, this environment is all about supporting faster discovery. Of course, JMP also has a scripting language (JSL) allowing you to incorporate SAS code, R code, build analytical applications for others to leverage SAS, R and other applications for users who don’t code or who don’t want to code.

JSL is a powerful scripting language on its own. It can be used for dialog creation, automation of JMP statistical platforms, and custom graphic scripting. In other ways, JSL is very similar to the R language. It can also be used for data and matrix manipulation and to create new analysis functions. With the scripting capabilities of JMP, you can create custom applications that provide both a user interface and an interactive visual back-end to R functionality. Alternatively, you could create a dashboard using statistical and/or graphical platforms in JMP to explore the data and with the click of a button, send a portion of the data to R for further analysis.

Another JMP feature that complements R is the add-in architecture, which is similar to how R packages work. If you’ve written a cool script or analysis workflow, you can package it into a JMP add-in file and send it to your colleagues so they can easily use it.

Ajay- What is the official view on R from your organization? Do you think it is a threat, or a complimentary product or another statistical platform that coexists with your offerings?

Kelci- Most definitely, we view R as complimentary. R contributors are providing a tremendous service to practitioners, allowing them to try a wide variety of methods in the pursuit of more insight and better results. The R community as a whole is providing a valued role to the greater analytical community by focusing attention on newer methods that hold the most promise in so many application areas. Data analysts should be encouraged to use the tools available to them in order to drive discovery and JMP can help with that by providing an analytic hub that supports both SAS and R integration.

Ajay-  While you do use R, are there any plans to give back something to the R community in terms of your involvement and participation (say at useR events) or sponsoring contests.

 Kelci- We are certainly open to participating in useR groups. At Predictive Analytics World in NY last October, they didn’t have a local useR group, but they did have a Predictive Analytics Meet-up group comprised of many R users. We were happy to sponsor this. Some of us within the JMP division have joined local R user groups, myself included.  Given that some local R user groups have entertained topics like Excel and R, Python and R, databases and R, we would be happy to participate more fully here. I also hope to attend the useR! annual meeting later this year to gain more insight on how we can continue to provide tools to help both the JMP and R communities with their work.

We are also exploring options to sponsor contests and would invite participants to use their favorite tools, languages, etc. in pursuit of the best model. Statistics is about learning from data and this is how we make the world a better place.

About- Kelci Miclaus

Kelci is a research statistician developer for JMP Life Sciences at SAS Institute. She has a PhD in Statistics from North Carolina State University and has been using SAS products and R for several years. In addition to research interests in statistical genetics, clinical trials analysis, and multivariate analysis/visualization methods, Kelci works extensively with JMP, SAS, and R integration.

.

 

Revolution Webinar Series #Rstats

Revolution Analytics Webinar-

 

Featured Webinar
David Champagne REGISTER NOW
Presenter David Champagne
CTO, Revolution Analytics
Date Tuesday, December 20th
Time 11:00AM – 11:30AM Pacific 
Click here for the webinar time in your local time zone

Big Data Starts with R

Traditional IT infrastructure is simply unable to meet

the demands of the new “Big Data Analytics” landscape.   Many enterprises are turning to the “R” statistical programming language and Hadoop (both open source projects) as a potential solution. This webinar will introduce the statistical capabilities of R within the Hadoop ecosystem.  We’ll cover:

  • An introduction to new packages developed by Revolution Analytics to facilitate interaction with the data stores HDFS and HBase so that they can be leveraged from the R environment
  • An overview of how to write Map Reduce jobs in R using Hadoop
  • Special considerations that need to be made when working with R and Hadoop.

We’ll also provide additional resources that are available to people interested in integrating R and Hadoop.

 

Upcoming Webinars
Wed, Dec 14th
11:00AM – 11:30AM PT
Revolution R Enterprise – 100% R and MoreR users already know why the R language is the lingua franca of statisticians today: because it’s the most powerful statistical language in the world. Revolution Analytics builds on the power of open source R, and adds performance, productivity and integration features to create Revolution R Enterprise. In this webinar, author and blogger David Smith will introduce the additional capabilities of Revolution R Enterprise.
 Archived Webinars-
Revolution Webinar: New Features in Revolution R Enterprise 5.0 (including RevoScaleR) to Support Scalable Data AnalysisRevolution R Enterprise 5.0 is Revolution Analytics’ scalable analytics platform.  At its core is Revolution Analytics’ enhanced Distribution of R, the world’s most widely-used project for statistical computing.  In this webinar, Dr. Ranney will discuss new features and show examples of the new functionality, which extend the platform’s usability, integration and scalability

 

Creating Pages on Google Plus for some languages

So I decided to create Pages on Google Plus for my favorite programming languages.

a programming language that lets you work more quickly and integrate your systems more effectively

Add to circles

  –  Comment  –  Share

Ajay Ohri

Ajay Ohri's profile photo

Ajay Ohri  –
Ajay Ohri shared a Google+ page with you.
Structured Query Language
Leading statistical language since 1960’s especially in sociology and market research
The leading statistical language in the world
The leading statistical language since 1970’s

Python

https://plus.google.com/107930407101060924456/posts

 

These are in accordance with Google’s Policies http://www.google.com/intl/en/+/policy/pagesterm.html  Continue reading “Creating Pages on Google Plus for some languages”

The Amazing Microsoft Robotics

Amazing stuff from the makers of Kinetic-

Operating systems of Robots may be the future cash cow of Microsoft , while the pirates of Silicon Valley fight fascinating cloudy wars! 🙂

http://www.microsoft.com/robotics/#Product

 

Product Information

Microsoft Robotics Developer Studio 4 beta (RDS4 beta) provides a wide range of support to help make it easy to develop robot applications. RDS4 beta includes a programming model that helps make it easy to develop asynchronous, state-driven applications. RDS4 beta provides a common programming framework that can be applied to support a wide variety of robots, enabling code and skill transfer.

RDS4 beta includes a lightweight asynchronous services-oriented runtime, a set of visual authoring and simulation tools, as well as templates, tutorials, and sample code to help you get started.

Microsoft Robotics Developer Studio 4 beta Datasheet – English (PDF Format)

Product VideoView the product video on Channel 9!

This release has extensive support for the Kinect sensor hardware throug the Kinect for Windows SDK allowing developers to create Kinect-enabled robots in the Visual Simulation Environment and in real life. Along with this release comes a standardized reference spec for building a Kinect-based robot.

See how Microsoft Robotics Developer Studio 4 beta is being used to bring ideas to life in the Microsoft Robotics@Home competition.

Lightweight Asynchronous ServicesOriented Runtime

Lightweight Asynchronous ServicesOriented Runtime

Concurrency and Coordination Runtime (CCR) helps make it easier to handle asynchronous input and output by eliminating the conventional complexities of manual threading, locks, and semaphores. Lightweight state-oriented Decentralized Software Services (DSS) framework enables you to create program modules that can interoperate on a robot and connected PCs by using a relatively simple, open protocol.

Visual Programming Language (VPL)

Visual Programming Language

Visual Programming Language (VPL) provides a relatively simple drag-and-drop visual programming language tool that helps make it easy to create robotics applications. VPL also provides the ability to take a collection of connected blocks and reuse them as a single block elsewhere in your program. VPL is also capable of generating human-readable C#.

DSS Manifest Editor

DSS Manifest Editor

DSS Manifest Editor (DSSME) provides a relatively simple creation of application configuration and distribution scenarios.

DSS Log Analyzer

DSS Log Analyzer

The DSS Log Analyzer tool allows you to view message flows across multiple DSS services. DSS Log Analyzer also allows you to inspect message details.

Visual Simulation Environment (VSE)

Visual Simulation Environment

Visual Simulation Environment (VSE) provides the ability to simulate and test robotic applications using a 3D physics-based simulation tool. This allows developers to create robotics applications without the hardware. Sample simulation models and environments enable you to test your application in a variety of 3D virtual environments.

Google Dart a new programming language for web applications

From Google a new language for structured web applications-

http://www.dartlang.org/docs/technical-overview/index.html ( a rather unstructured website, if I may add)

Dart is a new class-based programming language for creating structured web applications. Developed with the goals of simplicity, efficiency, and scalability, the Dart language combines powerful new language features with familiar language constructs into a clear, readable syntax.

  • structured yet flexible programming language for the web.
  • Make Dart feel familiar and natural to programmers and thus easy to learn.
  • Ensure that all Dart language constructs allow high performance and fast application startup.
  • Make Dart appropriate for the full range of devices on the web—including phones, tablets, laptops, and servers.
  • Provide tools that make Dart run fast across all major modern browsers.

These design goals address the following problems currently facing web developers:

  • Small scripts often evolve into large web applications with no apparent structure—they’re hard to debug and difficult to maintain. In addition, these monolithic apps can’t be split up so that different teams can work on them independently. It’s difficult to be productive when a web application gets large.
  • Scripting languages are popular because their lightweight nature makes it easy to write code quickly. Generally, the contracts with other parts of an application are conveyed in comments rather than in the language structure itself. As a result, it’s difficult for someone other than the author to read and maintain a particular piece of code.
  • With existing languages, the developer is forced to make a choice between static and dynamic languages. Traditional static languages require heavyweight toolchains and a coding style that can feel inflexible and overly constrained.
  • Developers have not been able to create homogeneous systems that encompass both client and server, except for a few cases such as Node.js and Google Web Toolkit (GWT).
  • Different languages and formats entail context switches that are cumbersome and add complexity to the coding process.

Cloud Computing using Python

I liked the new features in PiCloud , which is a cloud computing way to use Python. Python is increasingly popular as a computational language, and the cloud is the way where HW is headed to atleast as of 2011-12

http://www.picloud.com/

The new features allows you to publish your own functions as urls.

 By publishing your Python functions to URLs. Why would you want to publish a function?

  • To call your Python functions from a programming language other than Python.
  • To use PiCloud from Google AppEngine, which does not support our native client library.
  • To easily setup a scalable RPC system.

Here’s a peek at the interface:

You publish a Python function

cloud.rest.publish(your_func, ‘myfunction’)

We give you a URL Back

https://api.picloud.com/r/2/myfunction/

You make an HTTP request using your method of choice to the URL

curl -k -u ‘key:secret_key’ https://api.picloud.com/r/2/myfunction/

It certainly is an interesting development and I am wondering how other languages can adopt this paradigm as well.
For R, as of now http://www.cloudnumbers.com/ seems to be the only player in the cloud.
It would be exciting to see more players in the cloud statistical analytical space.

 

Interview- Top Data Mining Blogger on Earth , Sandro Saitta

Surajustement Modèle 2
Image via Wikipedia

If you do a Google search for Data Mining Blog- for the past several years one Blog will come on top. data mining blog – Google Search http://bit.ly/kEdPlE

To honor 5 years of Sandro Saitta’s blog (yes thats 5 years!) , we cover an exclusive interview with him where he reveals his unique sauce for cool techie blogging.

Ajay- Describe your journey as a scientist and data miner, from early experiences, to schooling to your work/research/blogging.

Sandro- My first experience with data mining was my master project. I used decision tree to predict pollen concentration for the following week using input data such as wind, temperature and rain. The fact that an algorithm can make a computer learn from experience was really amazing to me. I found it so interesting that I started a PhD in data mining. This time, the field of application was civil engineering. Civil engineers put a lot of sensors on their structure in order to understand how they behave. With all these sensors they generate a lot of data. To interpret these data, I used data mining techniques such as feature selection and clustering. I started my blog, Data Mining Research, during my PhD, to share with other researchers.

I then started applying data mining in the stock market as my first job in industry. I realized the difference between image recognition, where 99% correct classification rate is state of the art, and stock market, where you’re happy with 55%. However, the company ambiance was not as good as I thought, so I moved to consulting. There, I applied data mining in behavioral targeting to increase click-through rates. When you compare the number of customers who click with the ones who don’t, then you really understand what class imbalance mean. A few months ago, I accepted a very good opportunity at SICPA. I’m looking forward to resolving new challenges there.

Ajay- Your blog is the top ranked blog for “data mining blog”. Could you share some tips on better blogging for analytics and technical people

Sandro- It’s always difficult to start a blog, since at the beginning you have no reader. Writing for nobody may seem stupid, but it is not. By writing my first posts during my PhD I was reorganizing my ideas. I was expressing concepts which were not always clear to me. I thus learned a lot and also improved my English level. Of course, it’s still not perfect, but I hope most people can understand me.

Next come the readers. A few dozen each week first. To increase this number, I then started to learn SEO (Search Engine Optimization) by reading books and blogs. I tested many techniques that increased Data Mining Research visibility in the blogosphere. I think SEO is interesting when you already have some content published (which means not at the very beginning of your blog). After a while, once your blog is nicely ranked, the main task is to work on the content of the blog. To be of interest, your content must be particular: original, informative or provocative for example. I also had the chance to have a good visibility thanks to well-known people in the field like Kevin Hillstrom, Gregory Piatetsky-Shapiro, Will Dwinnell / Dean Abbott, Vincent Granville, Matthew Hurst and many others.

Ajay- Whats your favorite statistical software and what are the various softwares that you have worked with.
Could you compare and contrast these software as well.

Sandro- My favorite software at this point is SAS. I worked with it for two years. Once you know the language, you can perform ETL and data mining so easily. It’s also very fast compared to others. There are a lot of tools for data mining, but I cannot think of a tool that is as powerful as SAS and, in the same time, has a high-level programming language behind it.

I also worked with R and Matlab. R is very nice since you have all the up-to-date data mining algorithms implemented. However, working in the memory is not always a good choice, especially for ETL. Matlab is an excellent tool for prototyping. It’s not so fast and certainly not done for ETL, but the price is low regarding all the possibilities for data mining. According to me, SAS is the best choice for ETL and a good choice for data mining. Of course, there is the price.

Ajay- What are your favorite techniques and training resources for learning basics of data mining to say statisticians or business management graduates.

Sandro- I’m the kind of guy who likes to read books. I read data mining books one after the other. The fact that the same concepts are explained differently (and by different people) helps a lot in learning a topic like data mining. Of course, nothing replaces experience in the field. You can read hundreds of books, you will still not be a good practitioner until you really apply data mining in specific fields. My second choice after books is blogs. By reading data mining blogs, you will really see the issues and challenges in the field. It’s still not experience, but we are closer. Finally, web resources and networks such as KDnuggets of course, but also AnalyticBridge and LinkedIn.

Ajay- Describe your hobbies and how they help you ,if at all in your professional life.

Sandro- One of my hobbies is reading. I read a lot of books about data mining, SEO, Google as well as Sci-Fi and Fantasy. I’m a big fan of Asimov by the way. My other hobby is playing tennis. I think I simply use my hobbies as a way to find equilibrium in my life. I always try to find the best balance between work, family, friends and sport.

Ajay- What are your plans for your website for 2011-2012.

Sandro- I will continue to publish guest posts and interviews. I think it is important to let other people express themselves about data mining topics. I will not write about my current applications due to the policies of my current employer. But don’t worry, I still have a lot to write, whether it is technical or not. I will also emphasis more on my experience with data mining, advices for data miners, tips and tricks, and of course book reviews!

Standard Disclosure of Blogging- Sandro awarded me the Peoples Choice award for his blog for 2010 and carried out my interview. There is a lot of love between our respective wordpress blogs, but to reassure our puritan American readers- it is platonic and intellectual.

About Sandro S-



Sandro Saitta is a Data Mining Research Engineer at SICPA Security Solutions. He is also a blogger at Data Mining Research (www.dataminingblog.com). His interests include data mining, machine learning, search engine optimization and website marketing.

You can contact Mr Saitta at his Twitter address- 

https://twitter.com/#!/dataminingblog

%d bloggers like this: