The Excellent V 2.0 of R Commander #rstats

Just a few  clicks with the new version of R Commander and Beautiful Markdown Code. (using KMggplot plugin for R Commander). kmggplot1

ggplot AND markdown made easy!!

kmggplot2

Now how cool is that! Update your R Commander Package today . Now on CRAN!

This should especially be useful to people blogging! (note I cleaned up some warnings!)

21

23

 

Iris for Big Data #rstats #bigdata

Quote of the Day-

it is impossible to be a data scientist without knowing iris 

#Anonymous #Quotes

 

Revolution Analytics has been nice enough to provide both datasets and code for analyzing Big Data in R.

http://www.revolutionanalytics.com/subscriptions/datasets/

http://packages.revolutionanalytics.com/datasets/

Site was updated so here are the new links

 

while the Datasets collection is still elementary, as a R Instructor I find this list extremely useful. However I wish they look at some other repositories and make .xdf and “tidy” csv versions. A little bit of RODBC usage should help, and so will some descriptions. Maybe they should partner with Quandl, DataMarket, or Infochimps on this initiative than do it alone.

 

Overall there can be a R package (like a Big Data version of the famous datasets package in R)

But a nice and very useful effort

Revolution R Datasets

More code-

http://blog.revolutionanalytics.com/2013/08/big-data-sets-for-r.html

Also a recent project made by a student of mine on Revolution Datasets and using their blog posts.

Note how much more better the above project is than use the mini and super clean datasets within R (like Boston)

 

Hat TIP- R’s very own Mr Smith
Unrelated-
For more on IRIS

 

The Wonderful ggmap package for spatial analysis in R #rstats

I really like two functions in ggmap package. One is geocode that converts any text charachter into basically a google maps query and returns the longitude and latitude for it.

library(ggmap)
geocode("Calgary")
geocode("Saddledome Calgary")
> geocode("Calgary")
Information from URL : http://maps.googleapis.com/maps/api/geocode/json?address=Calgary&sensor=false
Google Maps API Terms of Service : http://developers.google.com/maps/terms
        lon      lat
1 -114.0581 51.04532
> geocode("Saddledome Calgary")
Information from URL : http://maps.googleapis.com/maps/api/geocode/json?address=Saddledome+Calgary&sensor=false
Google Maps API Terms of Service : http://developers.google.com/maps/terms
        lon      lat
1 -114.0513 51.03811
 

Rplot01

Rplot2

the other is qmap that makes a map out of the text query. We can change the level of detail using the zoom option.

One of the options that I like is of course watercolor using parameter maptype.

qmap("Saddledome Calgary")
qmap("Saddledome Calgary",zoom=15)
qmap("Saddledome Calgary",zoom=15,maptype="watercolor")

Rplot3

Other useful options for spatial analysis are get_map and ggmap which get and plot the map for a query. In between of course you can add the layers for your data.

This is a relatively recent package and you can test it out how it makes spatial analysis even more easy for beginners.

See the package site ,these slides , or this article on the R Journal.

Google’s dream has been lost : Rise of American Cyber Imperialism

Google was built of burying the concept of information asymmetry and spreading knowledge. Yes the ads were there, but they were just a way to make money without being evil

It was just a continuation of the process that the Gutenberg press and the Internet brought about. Products like Adsense and acquisitions like Youtube, Android and Blogger proved that Google not only helped you find content it helped you create content too.

But if the old desktop Monopoly was thwarted, a new more sinister monopoly has been born. With an increasingly corrupted political system manipulated by political lobbying, search engine queries are now logged , and neatly transferred to the American Government ‘s non elected branches. Using existing loopholes in existing law, we now face a frightening future in which the novel 1984 is more likely to be a Google sponsored movie coming soon to a computer screen near you.

When you click every ad that Google shows, remember you are helping fund the NSA as well. The rise of a neo imperialism led primarily by US/ Anglo Saxon /Western   alliances shows why the Chinese and Russian governments are actually right in being skeptical about the glasnost and perestroika that American establishment offered.

Power tends to corrupt and absolute power tends to corrupt absolutely. Today, Data is Power and the biggest collector of Data has chosen to hide behind a decade long slogan, Trust Me, I am not Evil.

Dude, Seriously!

The Galactic Empire is being built on Data ……

key_art_dude_wheres_my_car

Some tips on creating a useful blog for beginners

1) Blog post title should be self explanatory

2) Use categories and tags for better navigation

3) Use a theme which attracts not distracts

4) Simple language in blog writing works best

5) Useful blogs get more traffic than autobiographical blogs. Unless you are a celebrity.

6) People who enjoy writing blogs create better blogs

7) Writing a blog  is like jogging. Do it every day , even when its boring and painful. or Do it as much as your schedule permits.

12

Interview -Dr Eric Siegel Author Predictive Analytics

Here is an interview with Dr Eric Siegel, founding chair of Predictive Analytics Conference and author of the recent bestseller in analytics, Predictive Analytics.

Ajay- What has been the response to your book

Eric- Since its launch in February, Predictive Analytics has held the #1 bestseller slot in two Amazon categories (planning & forecasting and econometric) and I have been gratified to see it receive positive reviews (http://www.predictiveanalyticsworld.com/book/press.php#reviewsbybookcritics). Amazon readers have mostly rated it 5 stars; the inevitable tail of negative reviews have almost all been from more technically inclined readers looking for a “how to” or more mathematical book. (They bought the wrong book and blame the book!) I’ve found most such readers are more than capable of understanding – after a few minute conversation – that there’s a place in the world for a book about their field written for a broader readership (I explain this here: 5 reasons the book matters to experts – http://bit.ly/103qPVa), and in fact the industry overview, new case studies, and treatment of uplift modeling is often of great interest to even senior hands-on experts.
Ajay- You lead an extremely busy life with conferences travel and consulting. Do you plan to write another book and on what topic?
 
Eric- It’s likely to be a long while, since Predictive Analytics achieved my goal to introduce the field, provide a broad industry overview, and cover the advanced topics that interest me most (in a conceptual manner, but with copious citations for the more technical readers to drill down further). My attention now turns back to improving and broadening the coverage of Predictive Analytics World conference agendas (www.pawcon.com).
In the meantime, I’d suggest readers check out Kaiser Fung’s new book Numbersense, as well as forthcoming books from Dean Abbott.
Ajay- How do you think PAW has positively impacted the Analytics fraternity through the world.
 
Eric- The conference has been a central place to engender and catalyze positive industry movement. Predictive Analytics World covers all the bases for both expert practitioners as well as newcomers. As the universal, cross-vendor meeting place that brings together the who’s who of predictive analytics, PAW presents not only unique opportunities to gain knowledge, but the industry’s premier networking event.
ABOUT
Eric Siegel, Ph.D., founder of Predictive Analytics World and Text Analytics World, and Executive Editor of the Predictive Analytics Times, makes the how and why of predictive analytics understandable and captivating. In addition to being the author of Predictive Analytics: The Power to Predict Who Will Click, Buy, Lie, or Die, Eric is a former Columbia University professor who used to sing to his students, and a renowned speaker, educator, and leader in the field.
CONFLICT OF INTEREST DISCLOSURE –
Both Predictive Analytics Conference and Dr Eric have been supporters of this website for past three years

Different Forks of R- Will SAS create a new version of R too #rstats

 

A quick and dirty list….

1) Revolution R – http://www.revolutionanalytics.com/products/revolution-r.php Revolution R Community is Revolution Analytics’ free distribution of the open source R programming language — enhanced for users looking for faster performance and greater stability. It’s perfect for learning R and basic analysis

2) Oracle Enterprise R http://www.oracle.com/us/corporate/features/features-oracle-r-enterprise-498732.html

Integrates the Open-Source Statistical Environment R with Oracle Database 11g
Oracle R Enterprise allows analysts and statisticians to run existing R applications and use the R client directly against data stored in Oracle Database 11g—vastly increasing scalability, performance and security. The combination of Oracle Database 11g and R delivers an enterprise-ready, deeply integrated environment for advanced analytics. Users can also use analytical sandboxes, where they can analyze data and develop R scripts for deployment while results stay managed inside Oracle Database.

3) Tibco Enterprise Runtime for R

TERR, a key component of Spotfire Predictive Analytics, is an enterprise-grade analytic engine that TIBCO has built from the ground up to be fully compatible with the R language, leveraging our long-time expertise in the closely related S+ analytic engine. This allows customers to continue to develop in open source R, but to then integrate and deploy their R code on a commercially-supported and robust platform—without the need to rewrite their code.

Prototypes are often developed in R, but then typically re-implemented in another language for production purposes because R was not built for enterprise usage. TERR brings enterprise-class scalability and stability to the agile R-language, and enables statisticians to broadly share their analyses through TIBCO Spotfire Statistics Services or by directly embedding the TERR engine.

4) pqR -http://radfordneal.github.io/pqR/   You gotta love Radford Neal’s throwing down the gauntlets to the old sleepy heads! At JSM , Montreal the R Core member announced they have agreed to incorporate his changes, signalling a major departure in the way changes have been signaled at R.

pqR is a new version of the R interpreter. It is based on R-2.15.0, distributed by the R Core Team (at r-project.org), but improves on it in many ways, mostly ways that speed it up, but also by implementing some new features and fixing some bugs.

One notable improvement is that pqR is able to do some numeric computations in parallel with each other, and with other operations of the interpreter, on systems with multiple processors or processor cores.

 

5) Renjin http://www.renjin.org/ Renjin is a JVM-based interpreter for the R language for statistical computing Renjin is a new implementation of the R language and environment for the Java Virtual Machine (JVM), whose goal is to enable transparent analysis of big data sets and seamless integration with other enterprise systems such as databases and application servers.

Renjin is still under development, with a target of a version “1.0” in late 2013, but in the meantime it is being used in production for a number of our client projects, and supports most CRAN packages, including some with C/Fortran dependencies.

6) Riposte (?) https://github.com/jtalbot/riposte

Riposte, a fast interpreter and JIT for R.

Justin Talbot justintalbot@gmail.com Zach Devito

We only do development on OSX and Linux. It’s unlikely that our JIT will work on Windows.

Planned work for July-December 2013. The first three bullet points are currently in progress on the library branch. Partial work will be integrated to main by the end of July.

  • [x] Load the standard base R library without errors
    • This will require support for about 15 primitive and external functions
  • [ ] Support all R primitive operators (~200, 50 supported as of July 2013)
    • [x] The most common 40 or so will be appear as bytecodes in the Riposte VM, primarily control flow operators and a small set of common arithmetic
    • [ ] The rest will be implemented in the Riposte core library
    • [x] Implement new .Map, .Scan, or .Fold FFI functions to allow vector fusion through primitives implemented as external calls in the core library
  • [ ] Support for the 200 most common internal functions (out of ~580, 30 supported as of July 2013)

 

SAP , IBM Netezza already have specialized packages for R.

The question is SAS which supports interaction with R through SAS/IML, even Base R, and JMP- can it be willing to go the extra mile for customers and create SAS/R . The fact that they made their products compatible with R shows they acknowledge and respect R’s appeal ( contrary to old sleepyheads who think all SAS is good and all base R is divine)

SAS/ R can be the third major product for the SAS Institute after SAS and JMP platforms. Any takers, ladies and gentlemen?

jim g