New RCommander with ggplot #rstats

 

My favorite GUI (or one of them) R Commander has a relatively new plugin called KMGGplot2. Until now Deducer was the only GUI with ggplot features , but the much lighter and more popular R Commander has been a long champion in people wanting to pick up R quickly.

 

http://cran.r-project.org/web/packages/RcmdrPlugin.KMggplot2/

RcmdrPlugin.KMggplot2: Rcmdr Plug-In for Kaplan-Meier Plot and Other Plots by Using the ggplot2 Package

 

As you can see by the screenshot- it makes ggplot even easier for people (like R  newbies and experienced folks alike)

 

This package is an R Commander plug-in for Kaplan-Meier plot and other plots by using the ggplot2 package.

Version: 0.1-0
Depends: R (≥ 2.15.0), stats, methods, grid, Rcmdr (≥ 1.8-4), ggplot2 (≥ 0.9.1)
Imports: tcltk2 (≥ 1.2-3), RColorBrewer (≥ 1.0-5), scales (≥ 0.2.1), survival (≥ 2.36-14)
Published: 2012-05-18
Author: Triad sou. and Kengo NAGASHIMA
Maintainer: Triad sou. <triadsou at gmail.com>
License: GPL-2
CRAN checks: RcmdrPlugin.KMggplot2 results

 

----------------------------------------------------------------
NEWS file for the RcmdrPlugin.KMggplot2 package
----------------------------------------------------------------

----------------------------------------------------------------

Changes in version 0.1-0 (2012-05-18)

 o Restructuring implementation approach for efficient
   maintenance.
 o Added options() for storing package specific options (e.g.,
   font size, font family, ...).
 o Added a theme: theme_simple().
 o Added a theme element: theme_rect2().
 o Added a list box for facet_xx() functions in some menus
   (Thanks to Professor Murtaza Haider).
 o Kaplan-Meier plot: added confidence intervals.
 o Box plot: added violin plots.
 o Bar chart for discrete variables: deleted dynamite plots.
 o Bar chart for discrete variables: added stacked bar charts.
 o Scatter plot matrix: added univariate plots at diagonal
   positions (ggplot2::plotmatrix).
 o Deleted the dummy data for histograms, which is large in
   size.

----------------------------------------------------------------

Changes in version 0.0-4 (2011-07-28)

 o Fixed "scale_y_continuous(formatter = "percent")" to
   "scale_y_continuous(labels = percent)" for ggplot2
   (>= 0.9.0).
 o Fixed "legend = FALSE" to "show_guide = FALSE" for
   ggplot2 (>= 0.9.0).
 o Fixed the DESCRIPTION file for ggplot2 (>= 0.9.0) dependency.

----------------------------------------------------------------

Changes in version 0.0-3 (2011-07-28; FIRST RELEASE VERSION)

 o Kaplan-Meier plot: Show no. at risk table on outside.
 o Histogram: Color coding.
 o Histogram: Density estimation.
 o Q-Q plot: Create plots based on a maximum likelihood estimate
   for the parameters of the selected theoretical distribution.
 o Q-Q plot: Create plots based on a user-specified theoretical
   distribution.
 o Box plot / Errorbar plot: Box plot.
 o Box plot / Errorbar plot: Mean plus/minus S.D.
 o Box plot / Errorbar plot: Mean plus/minus S.D. (Bar plot).
 o Box plot / Errorbar plot: 95 percent Confidence interval
   (t distribution).
 o Box plot / Errorbar plot: 95 percent Confidence interval
   (bootstrap).
 o Scatter plot: Fitting a linear regression.
 o Scatter plot: Smoothing with LOESS for small datasets or GAM
   with a cubic regression basis for large data.
 o Scatter plot matrix: Fitting a linear regression.
 o Scatter plot matrix: Smoothing with LOESS for small datasets
   or GAM with a cubic regression basis for large data.
 o Line chart: Normal line chart.
 o Line chart: Line char with a step function.
 o Line chart: Area plot.
 o Pie chart: Pie chart.
 o Bar chart for discrete variables: Bar chart for discrete
   variables.
 o Contour plot: Color coding.
 o Contour plot: Heat map.
 o Distribution plot: Normal distribution.
 o Distribution plot: t distribution.
 o Distribution plot: Chi-square distribution.
 o Distribution plot: F distribution.
 o Distribution plot: Exponential distribution.
 o Distribution plot: Uniform distribution.
 o Distribution plot: Beta distribution.
 o Distribution plot: Cauchy distribution.
 o Distribution plot: Logistic distribution.
 o Distribution plot: Log-normal distribution.
 o Distribution plot: Gamma distribution.
 o Distribution plot: Weibull distribution.
 o Distribution plot: Binomial distribution.
 o Distribution plot: Poisson distribution.
 o Distribution plot: Geometric distribution.
 o Distribution plot: Hypergeometric distribution.
 o Distribution plot: Negative binomial distribution.

Happy $100 Billion to Mark Zuckerberg Productions !

Heres to an expected $100 billion market valuation to the latest Silicon Valley Legend, Facebook- A Mark Zuckerberg Production.

Some milestones that made FB what it is-

1) Beating up MySpace, Ibibo, Google Orkut combined

2) Smart timely acquisitions from Friend feed , to Instagram

3) Superb infrastructure for 900 million accounts, fast interface rollouts, and a policy of never deleting data. Some of this involved creating new technology like Cassandra. There have been no anti-trust complaints against FB’s behavior particularly as it simply stuck to being the cleanest interface offering a social network

4) Much envied and copied features like Newsfeed, App development on the FB platform, Social Gaming as revenue streams

5) Replacing Google as the hot techie employer, just like Google did to Microsoft.

6) An uncanny focus, including walking away from a billion dollars from Yahoo,resisting Google, Apple’s Ping, imposing design changes unilaterally, implementing data sharing only with flexible partners  and strategic investors (like Bing)

FB has made more money for more people than any other company in the past ten years. Here’s wishing it an even more interesting next ten years! With 900 million users if they could integrate a PayPal like system, or create an alternative to Adsense for content creators, they could create an all new internet economy – one which is more open than the Google dominated internet ; 0

 

BigML meets R #rstats

I am just checking the nice new R package created by BigML.com co-founder Justin Donaldson. The name of the new package is bigml, which can confuse a bit since there do exist many big suffix named packages in R (including biglm)

The bigml package is available at CRAN http://cran.r-project.org/web/packages/bigml/index.html

I just tweaked the code given at http://blog.bigml.com/2012/05/10/r-you-ready-for-bigml/ to include the ssl authentication code at http://www.brocktibert.com/blog/2012/01/19/358/

so it goes

> library(bigml)
Loading required package: RJSONIO
Loading required package: RCurl
Loading required package: bitops
Loading required package: plyr
> setCredentials(“bigml_username”,”API_key”)

# download the file needed for authentication
download.file(url="http://curl.haxx.se/ca/cacert.pem", destfile="cacert.pem")

# set the curl options
curl <- getCurlHandle()
options(RCurlOptions = list(capath = system.file("CurlSSL", "cacert.pem",
package = "RCurl"),
ssl.verifypeer = FALSE))
curlSetOpt(.opts = list(proxy = 'proxyserver:port'), curl = curl)

> iris.model = quickModel(iris, objective_field = ‘Species’)

Of course there are lots of goodies added here , so read the post yourself at http://blog.bigml.com/2012/05/10/r-you-ready-for-bigml/

Incidentally , the author of this R package (bigml) Justin Donalsdon who goes by name sudojudo at http://twitter.com/#!/sudojudo has also recently authored two other R packages including tsne at  http://cran.r-project.org/web/packages/tsne/index.html (tsne: T-distributed Stochastic Neighbor Embedding for R (t-SNE) -A “pure R” implementation of the t-SNE algorithm) and a GUI toolbar http://cran.r-project.org/web/packages/sculpt3d/index.html (sculpt3d is a GTK+ toolbar that allows for more interactive control of a dataset inside the RGL plot window. Controls for simple brushing, highlighting, labeling, and mouseMode changes are provided by point-and-click rather than through the R terminal interface)

This along with the fact the their recently released python bindings for bigml.com was one of the top news at Hacker News- shows bigML.com is going for some traction in bringing cloud computing, better software interfaces and data mining together!

Interview BigML.com

Here is an interview with Charlie Parker, head of large scale online algorithms at http://bigml.com

Ajay-  Describe your own personal background in scientific computing, and how you came to be involved with machine learning, cloud computing and BigML.com

Charlie- I am a machine learning Ph.D. from Oregon State University. Francisco Martin (our founder and CEO), Adam Ashenfelter (the lead developer on the tree algorithm), and myself were all studying machine learning at OSU around the same time. We all went our separate ways after that.

Francisco started Strands and turned it into a 100+ million dollar company building recommender systems. Adam worked for CleverSet, a probabilistic modeling company that was eventually sold to Cisco, I believe. I worked for several years in the research labs at Eastman Kodak on data mining, text analysis, and computer vision.

When Francisco left Strands to start BigML, he brought in Justin Donaldson who is a brilliant visualization guy from Indiana, and an ex-Googler named Jose Ortega who is responsible for most of our data infrastructure. They pulled in Adam and I a few months later. We also have Poul Petersen, a former Strands employee, who manages our herd of servers. He is a wizard and makes everyone else’s life much easier.

Ajay- You use clojure for the back end of BigML.com .Are there any other languages and packages you are considering? What makes clojure such a good fit for cloud computing ?

Charlie- Clojure is a great language because it offers you all of the benefits of Java (extensive libraries, cross-platform compatibility, easy integration with things like Hadoop, etc.) but has the syntactical elegance of a functional language. This makes our code base small and easy to read as well as powerful.

We’ve had occasional issues with speed, but that just means writing the occasional function or library in Java. As we build towards processing data at the Terabyte level, we’re hoping to create a framework that is language-agnostic to some extent. So if we have some great machine learning code in C, for example, we’ll use Clojure to tie everything together, but the code that does the heavy lifting will still be in C. For the API and Web layers, we use Python and Django, and Justin is a huge fan of HaXe for our visualizations.

 Ajay- Current support is for Decision Trees. When can we see SVM, K Means Clustering and Logit Regression?

Charlie- Right now we’re focused on perfecting our infrastructure and giving you new ways to put data in the system, but expect to see more algorithms appearing in the next few months. We want to make sure they are as beautiful and easy to use as the trees are. Without giving too much away, the first new thing we will probably introduce is an ensemble method of some sort (such as Boosting or Bagging). Clustering is a little further away but we’ll get there soon!

Ajay- How can we use the BigML.com API using R and Python.

Charlie- We have a public github repo for the language bindings. https://github.com/bigmlcom/io Right now, there there are only bash scripts but that should change very soon. The python bindings should be there in a matter of days, and the R bindings in probably a week or two. Clojure and Java bindings should follow shortly after that. We’ll have a blog post about it each time we release a new language binding. http://blog.bigml.com/

Ajay-  How can we predict large numbers of observations using a Model  that has been built and pruned (model scoring)?

Charlie- We are in the process of refactoring our backend right now for better support for batch prediction and model evaluation. This is something that is probably only a few weeks away. Keep your eye on our blog for updates!

Ajay-  How can we export models built in BigML.com for scoring data locally.

Charlie- This is as simple as a call to our API. https://bigml.com/developers/models The call gives you a JSON object representing the tree that is roughly equivalent to a PMML-style representation.

About-

You can read about Charlie Parker at http://www.linkedin.com/pub/charles-parker/11/85b/4b5 and the rest of the BigML team at

https://bigml.com/team

 

Protected: Converting SAS language code to Java

This content is password-protected. To view it, please enter the password below.

Avengers Review

Avengers is the big ticket block buster which heralds summer just like the groundhog denotes spring. An ensemble cast (of superheroes and okay actors) , it stars Hulk (angry green man aka Dr Bruce Banner /Mark Ruffalo) ,Iron Man (genius billionaire philanthropist playboy aka Tony Stark / Robert Downey Jr), Thor (an Australian looking Chris H), Loki (God of Mischief played  by German looking Tom Hiddleston ), Captain America  and Scarlet Johnassen and Jeremy “Hurt Locker” Renner and Samuel L Jackson. You know somethings is gotta give if the A List stars(?) in the cast is going to be longer than a plot summary.

Well Loki the bad guys strikes a deal with some other bad Guys of funnily named world called Assguard (parallel universe!) and tries to find a cube (which is all energy powerful like the Transformers 1 Cube)  and in return gets an Army from the dark side (who look just  like Cybertrons and Lords of the Rings orcs combined).

The Avengers after much dilly dallying, trying to emote, create bromances, tension buildup, in the end decide to give you what you came looking for- a visual feast of credible looking CGI to counter the bad guys. The scene stealer is the Hulk. He is kind of cute for a big green guy, if you dont know what I mean, see the movie!

This is American cinema at its most profoundly intellectual since the Die Hard series. and its quite entertaining, especially if you are a geeky comic book fan-boy (like me).

Summer is here and so are the super-heroes!! Unleash the popcorn.