R Modeling with huge data

Here is a training course by BI Vendor, Netezza which uses R analytical capabilties. Its using R in the customized appliances of Netezza.

Source-

http://www.netezza.com/userconference/pce.html#rmftfic

R Modeling for TwinFin i-Class

Objective
Learn how to use TwinFin i-Class for scaling up the R language.

Description
In this class, you’ll learn how to use R to create models using huge data and how to create R algorithms that exploit our asymmetric massively parallel (AMPP®) architecture. Netezza has seamlessly integrated with R to offload the heavy lifting of the computational processing on TwinFin i-Class. This results in higher performance and increased scalability for R. Sign up for this class to learn how to take advantage of TwinFin i-Class for your R modeling. Topics include:

  1. R CRAN package installation on TwinFin i-Class
  2. Creating models using R on TwinFin i-Class
  3. Creating R algorithms for TwinFin i-Class

Format
Hands-on classroom lecture, lab exercises, tour

Audience
Knowledgeable R users – modelers, analytic developers, data miners

Course Length
0.5 day: 12pm-4pm Wednesday, June 23 OR 8am-12pm Thursday, June 24 OR 1pm-5pm Thursday, June 24, 2010

Delivery
Enzee Universe 2010, Boston, MA

Student Prerequisites

  • Working knowledge of R and parallel computing
  • Have analytic, compute-intensive challenges
  • Understanding of data mining and analytics

How to read blogs in Indonesian and Chinese!

I just discovered the magic of Google Chrome’s Translate tool- it is a one  click operation. So if you want to read blogs in any other language, install Google chrome and tweak the settings accordingly- see below the top of the screenshot ( from the excellent Indonesian R Blog http://enciety.com/community/R/ also available on Twitter at @rcommunity )

Or else if you prefer you old browser you can go to http://translate.google.com/ and copy and paste. Good thing about the Chrome is – even if you dont have admin rights on the machine, it STill installs just fine- and it works faster!

Also see http://mp3.baidu.com/

Graphs

Some graphs from the Official Graphs Gallery at sas.com

http://support.sas.com/sassamples/graphgallery/PROC_G3D_Graph_Types_Plots_Scatter.html

From R’s Graph Gallery Here is the same-

http://addictedtor.free.fr/graphiques/RGraphGallery.php?graph=10

Which one do you like? Sometimes graphics is about imagination and not just software.

Software- Appls and Bugs

Some time ago I had written on a Twitter application bubble (actually it was a year ago here at https://decisionstats.wordpress.com/2009/04/05/tweets-viruses-and-bubbles/)

The automatic Twitter follow /unfollow (or atleast the automated unfollow ) was used by Twitter App Refollow.com (which is quite old- so it was a surprise when Twitter blamed the recent 0 followers 0 floowing on a bug which allows automated following) and the RSS automated reader is used by Twitterfeed.com (among others). I accidently created/revealed a bug in 2009  with the hash command #rstats which is used as a search index in twitter’s search engine) when I basically married a lot of RSS feeds pertaining to R and added the #rstats with them to the alternative twitter handle (Rarchive) . I did the same with the #sas with Sascommunity (which I later donated on request back to that community sascommunity.org). Basically this had the temporary effect of skewing search results for these search terms for a day (till Twitter fixed it).

As Twitter evolves from a well funded startup to a business- and tries to become more structured from chaotic flux, such bugs will continue to evolve. Bugs and especially software bugs are meant to be fixed (or squashed). This by no means should be a relection on the health of the software service (here- Twitter). Indeed the biggest worry is a mainstream software that has no flexibility for creative third party applications and thinks that it is bug-free. Perfect software exists in a perfect world- and delusional perfection can be dangerous thinking especially for software with clients (even more for statistical software).

Which stats softwares are you using and how confident you are that the bugs are being resolved openly?

The R Online WikiBook

I came across the R Programming Wikibook at http://en.wikibooks.org/wiki/R_Programming

It is quite surprisingly good- easy to read for a beginner- handy and concise reference for intermediate users. Some chapters like clustering could do with some more support from the community -see http://en.wikibooks.org/wiki/R_Programming/Clustering

[edit]References

But I really liked the pages on Graphics, Modeling and Maths (including Matrix)

See

http://en.wikibooks.org/wiki/R_Programming/Graphics

and http://en.wikibooks.org/wiki/R_Programming/Linear_Models

I really believe that a consolidated one book online documentation can be achieved for R, only if we follow a moderated-wiki like structure. This can be of a great use- since online help documents for R are currently not concise or present a seemingly professional look (due to multiple formats and styles to the documentation) and they rarely do multiple package comparison. All this has made R books the top selling books on statistics on Amazon but a project like R deserves atleast one comprehensive online and concise book which can be used readily without going through all the scattered multiple documentation- a bit like a R Online Doc.This could help in stage next of the project in getting more users to be comfortable with it.

Any volunteers 🙂 ?

The Top Statistical Softwares (GUI)

The list of top Statistical Softwares (GUI) is continued below. You can see the earlier post here

6. R Commander– While initially aimed at being a basic statistics GUI, the tremendous popularity of R Commander and the extensions in the form of plugins has helped make this one of the most widely used GUI. In short if you dont know ANY R, and still want to do basic descriptive stats and modeling this will come in handy- with an added script window for custom code for advanced users and extensions like that for DoE (design of experiments) and QCC (Quality Control) packages the e-plugins are a great way to extend this. I suspect the only thing holding it back is Dr Fox and the rest of R Core’s reluctance to fully embrace GUI as a software medium. You can read his earlier interview here-https://decisionstats.wordpress.com/2009/09/14/interview-professor-john-fox-creator-r-commander/

Technically it is possible to convert just about any package to a GUI menu in R Commander using the e-plugins.

7. SAS GUIs

Enterprise (Guide)

SAS Enterprise Guide was the higher end (and higher priced solution) to enhanced editor’s lack of menu driven commands. It works but many people I know prefer the text editor just as well.


The Enterprise Miner is a separate software and works more like Red R or SPSS Modeler does. Again EM is one of the major DM softwares out there, but the similarity in names is a bit confusing.

Even the Base SAS Enhanced Editor does have some menus for importing data, or querying etc, but it is rarely confused for being a GUI.

8. Oracle Data Miner and Knime

I like both the ODM and Knime but I find the lack of advertising or promotional support puzzling. Both these softwares can do well to combine technical excellence with some marketing. And since they are both free you can check them out yourself here

Oracle Data Mining

You can download it here-(note- the Oracle Web Site itself is a bit aging 🙂 )

http://www.oracle.com/technology/products/bi/odm/odminer.html

Knime is the open source GUI which can be found here-

http://www.knime.org/introduction/features

9. RAwkard

Another R GUI- it stands out on the comprehensive ways you can customize your code in menus rather than writing all or learning by rote the syntax.

From http://sourceforge.net/apps/mediawiki/rkward/index.php?title=Main_Page

you can see it below. I recommend this GUI over other GUIs especially if you are new to R and do more data visualization which needs custom graphics.

10. Red R and R JGR/ Deducer

Red R and RJGR/Deducer are both up and coming GUIs for R. While REd R is R version for Enterprise Miner, Deducer is coming up with a new GUI for ggplot the powerful graphics package in R.

Some GUIs excluded from this list are – Statistica, MatLab, EViews(?) because I dont really work with them, and thought it best to turn them over to someone who knows them better.

Hope this list of GUIs helps you- note most of the softwares can be learnt within a quick hour and two if you know basic software skills/data manipulation so going through the GUI list is a faster way of adding value to your resume/knowledge base as well.


Learning SAS for free

A big longstanding demand for the SAS Institute to enable better access to it’s on demand program for academics was fulfilled when SAS announced  it ‘s access for free- GLOBALLY.

This is really nice as it helps SAS get a huge pool of potential developers and programmers and it helps students learn a valuable skill. In today’s world, having SAS as a language on your resume is probably the fastest surest way to get a job.

Also R would have to work harder to retain academics and students/future users. The “our software is free” arguement wont cut it any more.

SAS OnDemand for Academics is an online service for teaching and learning data management and analytics. Users register and access SAS software via the Web and perform processing by connecting to a hosted server at SAS. Through SAS OnDemand for Academics, users have access to multiple SAS applications such as SAS® Enterprise Guide® (which includes access to Base SAS) and SAS® Enterprise Miner™ (which includes SAS Text Miner). Additional SAS software applications will be added over time.

and

SAS is removing a potential barrier to students seeking experience using advanced data analysis to solve classroom and real-world problems. SAS OnDemand for Academics, already used at no cost by professors at some 200 colleges and universities, will be available at no cost to all students worldwide in fall 2010. SAS OnDemand for Academics quickly and easily delivers the power of SAS software to higher education.

Source-http://www.sas.com/news/preleases/ondemandforacademics-nocostSGF10.html