Interviews and Reviews: More R #rstats

I got interviewed on moving on from Excel to R in Human Resources (HR) here at http://www.hrtecheurope.com/blog/?p=5345

“There is a lot of data out there and it’s stored in different formats. Spreadsheets have their uses but they’re limited in what they can do. The spreadsheet is bad when getting over 5000 or 10000 rows – it slows down. It’s just not designed for that. It was designed for much higher levels of interaction.

In the business world we really don’t need to know every row of data, we need to summarise it, we need to visualise it and put it into a powerpoint to show to colleagues or clients.”

And a more recent interview with my fellow IIML mate, and editor at Analytics India Magazine

http://analyticsindiamag.com/interview-ajay-ohri-author-r-for-business-analytics/

AIM: Which R packages do you use the most and which ones are your favorites?

AO: I use R Commander and Rattle a lot, and I use the dependent packages. I use car for regression, and forecast for time series, and many packages for specific graphs. I have not mastered ggplot though but I do use it sometimes. Overall I am waiting for Hadley Wickham to come up with an updated book to his ecosystem of packages as they are very formidable, completely comprehensive and easy to use in my opinion, so much I can get by the occasional copy and paste code.

 

A surprising review at R- Bloggers.com /Intelligent Trading

http://intelligenttradingtech.blogspot.in/2012/10/book-review-r-for-business-analytics.html

The good news is that many of the large companies do not view R as a threat, but as a beneficial tool to assist their own software capabilities.

After assisting and helping R users navigate through the dense forest of various GUI interface choices (in order to get R up and running), Mr. Ohri continues to handhold users through step by step approaches (with detailed screen captures) to run R from various simple to more advanced platforms (e.g. CLOUD, EC2) in order to gather, explore, and process data, with detailed illustrations on how to use R’s powerful graphing capabilities on the back-end.

Do you want to write a review too? You can visit the site here

http://www.springer.com/statistics/book/978-1-4614-4342-1

 

Obama’s chief data scientist to keynote PAW SF

From Predictive Analytics Conference,

http://www.predictiveanalyticsworld.com/sanfrancisco/2013/

Detailed Agenda Coming Soon
Predictive Analytics World Header Image


Predictive Analytics World April 14-19 in San Francisco is packed with the top predictive analytics experts, practitioners, authors and business thought leaders, including keynote speakers:

Rayid Ghani
Rayid Ghani
Chief Data Scientist
Obama for American 2012 Campaign
Anthony Goldbloom
Anthony Goldbloom
CEO
Kaggle
The $3m Heritage Health Prize: Results and Conclusions
Edward Nazarko
Edward Nazarko
Client Technical Advisor
IBM
Putting IBM Watson to Work

R Studio and Training

I really like the design, course structure and Hadley Wickham (in no particular order) as part of R Studio’ training suite which may be new, but is much better and open. Again I think Oracle’s training is awesome for online features , but some body needs to step up and create a credible R certification here. More power to R 😉

Check it out-

http://www.rstudio.com/training/

 

 

 

 

 

Revolution Analytics and Pricing Analytics

Cost of 1 day of Revolution Analytics Training at http://www.revolutionanalytics.com/services/training/

 

1. Intro to R

Price:  Commercial: SGD$500.00
Academic:SGD$350.00

1 Singapore dollar = 0.8197 US dollars

10% Early Bird Discount Deadline: November 13, 2012 @ 12:00PM Pacific Time
Discount code: earlybird

2. (aptly titled Minimalistic Sufficient R…you think the ricing would be minimalistic.. but)

http://www.revolutionanalytics.com/services/training/public/minimalist-sufficient-r.php

Price: 

$750

$100 Early Bird Discount Deadline: November 16, 2012 @ 12:00PM Pacific Time
Discount code: earlybird

3.

Advanced R (Italian)

Price:  Commercial: €680.00
Academic: €480.00

1 euro = 1.2975 US dollars

4.

Big Data AnalyticS with RevoScaleR

Price:  $500 with 2 month Revolution R Enterprise workstation evaluation.

$700 with 1 year subscription of Revolution R enterprise workstation ($1500 value)

10% Early Bird Discount Deadline: October 30, 2012 @ 12:00PM Pacific Time
Discount code: early

5.

Revolution R Time Series Training

Price:  Commercial: S$1,200.00
Academic:S$750.00

10% Early Bird Discount Deadline: October 30, 2012 @ 12:00PM Pacific Time
Discount code: earlybird

so training costs differently different strokes for different folks I guess,

BUT me hearties.

Cost of 1 year of Revolution Enterprise= $1000

Thats a flat rate, so the Linux and Windows costs the same and so does the 32-bit and 64-bit

(see http://buy.revolutionanalytics.com/ )

( My comment- either Revo should give away the license for free to enterprises, rationalize training costs, seriously how can 2 days of training cost like a 1 year of license and the software is definitely quite good., or create a paid Amazon Ec 2 AMI for enterprises to rent the Revolution Analytics software (like SAP Hana ), or even on Windows Azure if they insist on hugging Microsoft, though I am clearly seeing various flavors of Linux beating Windows Server to a pulp in the Big Data market, though I am probably more optimistic on the Windows 8 on Surface but because of hardware not software/ Azure alternative to Amazon given Google’s delayed offering- I dont even know many many instance of Windows related HPC or HPA,  (/end_of_rant)

Annual Subscription
Includes software license and technical support
Price Quantity Total
Revolution R Enterprise Single-User Workstation (64-bit Windows) $1,000.00 $0.00
Revolution R Enterprise Single-User Workstation (32-bit Windows) $1,000.00 $0.00
Revolution R Enterprise Single-User Workstation (64-bit Red Hat 6 Enterprise Linux) $1,000.00 $0.00
Revolution R Enterprise Single-User Workstation (64-bit Red Hat 5 Enterprise Linux) $1,000.00 $0.00

 

Running R through environments in PiCloud

PiCloud had an interesting announcement, they support non-Python things in custom environments, but R is pre-built in a new Base Environment.

http://blog.picloud.com/2012/10/24/new-base-environment-ubuntu-precise/

Enter Ubuntu Precise 12.04

Our latest environment is pre-configured with many of the latest libraries, making it easier than ever to move your computation to the cloud. Here are some of the notable packages:

  • NumPy 1.6.2
  • SciPy 0.11
  • Pandas 0.9.0
  • Scikits Learn 0.8.1
  • OpenCV 2.4.2
  • Java 7
  • R 2.14.1
  • Ruby 1.9.1
  • PHP 5.3.10

. To use Precise, specify the environment of a job as ‘base/precise’. In Python:

1 cloud.call(f, _env='base/precise')

BigML creates a marketplace for Predictive Models

BigML has created a marketplace for selling Datasets and Models. This is a first (?) as the closest market for Predictive Analytics till now was Rapid Miner’s marketplace for extensions (at http://rapidupdate.de:8180/UpdateServer/faces/index.xhtml)

From http://blog.bigml.com/2012/10/25/worlds-first-predictive-marketplace/

SELL YOUR DATA

You can make your Dataset public. Mind you: the Datasets we are talking about are BigML’s fancy histograms. This means that other BigML users can look at your Dataset details and create new models based on this Dataset. But they can not see individual records or columns or use it beyond the statistical summaries of the Dataset. Your Source will remain private, so there is no possibility of anyone accessing the raw data.

SELL YOUR MODEL

Now, once you have created a great model, you can share it with the rest of the world. For free or at any price you set.Predictions are paid for in BigML Prediction Credits. The minimum price is ‘Free’ and the maximum price indicated is 100 credits.

White Box Models

Clicking on the white open lock will open up your model to the rest of the world. Anyone can now buy your model, explore it, use it to make predictions

Black Box Models

If you choose the black box setting (the black open lock icon), other BigML users will NOT be able to view or clone your model, but they will be able to use it to make predictions.

——

DOWNLOAD YOUR MODEL

BigML.com have added downloads to our models. Simply choose the format you want and you can copy/paste the code or text. There is a range of formats that they offer currently: JSON PML, PMML, Python, Ruby, Objective-C, Java, the rules of the decision tree in plain text and a Summary overview of your model. Around the corner are MS Excel downloads and R (of course!).

PUBLICIZE YOUR MODEL

There’s also an ’embed’ function, so now you can embed the little poster of your model in your blog post or website, so it is easy to share it in your own environment.

————————————————————————————————————————–

It is nice to see Models and Data getting the APPY treatment and hopefully, it will encourage other vendors Iike Google Prediction API etc to further spend thought and effort to reward data mining individuals directly without going through corporate intermediaries while ensuring intellectual property safeguards .

An R package market for enterprises? for Python libraries? JMP addins? A market for SAS Macros- who knows what the future shall hold. But overall, this is a very positive step by the BigML.com team. The App marketplace has helped revolutionize mobile and desktop computing and hopefully it will do the same for Business Analytics.

 

httR by Hadley #rstats

The awesome Hadley Wickham has just released the next version of httr package. Prof Hadley is currently on leave from Rice Univ and working with the tremendous geeks at R Studio . New things in the httr package-

 

http://blog.rstudio.org/2012/10/14/httr-0-2/

httr, a package designed to make it easy to work with web APIs. Httr is a wrapper around RCurl, and provides:

  • functions for the most important http verbs: GET, HEAD, PATCH, PUT, DELETE and POST.
  • support for OAuth 1.0 and 2.0. Use oauth1.0_token and oauth2.0_token to get user tokens, and sign_oauth1.0 and sign_oauth2.0to sign requests. The demos directory has six demos of using OAuth: three for 1.0 (linkedin, twitter and vimeo) and three for 2.0 (facebook, github, google).

I especially like the OAuth functionality as I occasionaly got flummoxed with existing R OAuth packages , and this should hopefully lead to awesome new social media analytics posts by the larger R blogger community. Also given the fact that unauthenticated API requests to Twitter are greatly expanded by OAuth authenticated requests- (see https://dev.twitter.com/docs/rate-limiting )

  • Unauthenticated calls are permitted 150 requests per hour. Unauthenticated calls are measured against the public facing IP of the server or device making the request.
  • OAuth calls are permitted 350 requests per hour and are measured against the oauth_token used in the request.

 

some creative use cases should see an incredible amount of cross social media analysis (not just one social media channel ) at a time.

R for Social Media Analytics ? Watch this space.. 😉