stats – Page 4 – DECISION STATS

For R Writers- Inside R

Image via Wikipedia

Hurray I am on Inside -R

http://www.inside-r.org/blogs/2010/11/04/r-apache-next-frontier-r-computing

Thats blog post number 1 there.

Basically Inside R is a go-to site for tips, tricks, packages, as well as blog posts. It thus enhances R Bloggers – but also adds in other multiple features as well.

It is an excellent place for R beginners and learning R. Also it is moderated ( so you wont get the flashy jhing bhang stuff- just your R.

What I really liked is the Pretty R functionality for turning R code -its nifty for color coding R code for use of posting in your blog, journal or article

and when you are there drop them a line for their excellent R support for events (like Pizza, sponsorship) and nifty R packages (doSNOW, foreach, RevoScaler, RevoDeployR) and how much open core makes them look silly?

Come on Revolution- share the open code for RevoScaler package- did you notice any sales dip when you open sourced the other packages? (cue to David Smith to roll his eyes again)

Anyway- all that is part of the R family fun 🙂

Do check http://www.inside-r.org/pretty-r

Revolution links R stats package to apps (go.theregister.com)
Introducing RevoDeployR: Web Services for R (revolutionanalytics.com)
Taking R to the Limit: Parallelism and Big Data (revolutionanalytics.com)
Dress your R code for the Web with Pretty R (revolutionanalytics.com)
Why Apache keeps Google at a safe distance (zdnet.com)
Why I Hate Blogs with Too Many Ads (bloggingot.com)
The Fear Of Selling Out, And The Blogger Code Of Monetization (davidrisley.com)
Revolution Analytics Introduces Enterprise-Class Application Integration, Deployment & Administration for R (eon.businesswire.com)

Using PostgreSQL and MySQL databases in R 2.12 for Windows

Air University Library's Index to Military Per... — Image via Wikipedia

If you use Windows for your stats computing and your data is in a database (probably true for almost all corporate business analysts) R 2.12 has provided a unique procedural hitch for you NO BINARIES for packages used till now to read from these databases.

The Readme notes of the release say-

Packages related to many database system must be linked to the exact
version of the database system the user has installed, hence it does
not make sense to provide binaries for packages
	RMySQL, ROracle, ROracleUI, RPostgreSQL
although it is possible to install such packages from sources by
	install.packages('packagename', type='source')
after reading the manual 'R Installation and Administration'.

So how to connect to Databases if the Windows Binary is not available-

So how to connect to PostgreSQL and MySQL databases.

For Postgres databases-

You can update your PostgreSQL databases here-

http://www.postgresql.org/download/windows

Fortunately the RpgSQL package is still available for PostgreSQL

Using the RpgSQL package


library(RpgSQL)

#creating a connection
con <- dbConnect(pgSQL(), user = "postgres", password = "XXXX",dbname="postgres")

#writing a table from a R Dataset
dbWriteTable(con, "BOD", BOD)

# table names are lower cased unless double quoted. Here we write a Select SQL query
dbGetQuery(con, 'select * from "BOD"')

#disconnecting the connection
dbDisconnect(con)

You can also use RODBC package for connecting to your PostgreSQL database but you need to configure your ODBC connections in

Windows Start Panel-

Settings-Control Panel-

Administrative Tools-Data Sources (ODBC)

You should probably see something like this screenshot.

Coming back to R and noting the name of my PostgreSQL DSN from above screenshot-( If not there just click on add-scroll to appropriate database -here PostgreSQL and click on Finish- add in the default values for your database or your own created database values-see screenshot for help with other configuring- and remember to click Test below to check if username and password are working, port is correct etc.

so once the DSN is probably setup in the ODBC (frightening terminology is part of databases)- you can go to R to connect using RODBC package


#loading RODBC

library(RODBC)

#creating a Database connection
# for username,password,database name and DSN name

chan=odbcConnect("PostgreSQL35W","postgres;Password=X;Database=postgres")

#to list all table names

sqlTables(chan)

TABLE_QUALIFIER TABLE_OWNER TABLE_NAME TABLE_TYPE REMARKS
1       postgres      public        bod      TABLE      
 2        postgres      public  database1      TABLE      
 3        postgres      public         tt      TABLE

Now for MySQL databases it is exactly the same code except we download and install the ODBC driver from http://www.mysql.com/downloads/connector/odbc/

and then we run the same configuring DSN as we did for postgreSQL.

After that we use RODBC in pretty much the same way except changing for the default username and password for MySQL and changing the DSN name for the previous step.

channel <- odbcConnect("mysql","jasperdb;Password=XXX;Database=Test")
test2=sqlQuery(channel,"select * from jiuser")
test2
 id  username tenantId   fullname emailAddress  password externallyDefined enabled previousPasswordChangeTime1  1   jasperadmin        1 Jasper Administrator           NA 349AFAADD5C5A2BD477309618DC              NA    01                       
2  2       joe1ser        1             Joe User           NA                 4DD8128D07A               NA    01

odbcClose(channel)

While using RODBC for all databases is a welcome step, perhaps the change release notes for Window Users of R may need to be more substantiative than one given for R 2.12.2

Q&A with PG West Presenter Josh Berkus about PostgreSQL and “Neat Widgets” (blogs.enterprisedb.com)
Oracle MySQL Rival PostgreSQL Updated (pcworld.com)
Postgres folks, consider the 2011 MySQL conference (xaprb.com)
O’Reilly MySQL Conference CfP ends today (xaprb.com)
EnterpriseDB Announces Support for PostgreSQL 9.0; The Best Leverage Against Exploding Enterprise Relational Database Costs (eon.businesswire.com)
PostgreSQL Conference West: 2010 lands in San Francisco November 2nd through 4th (prweb.com)
New Community version: GreenSQL FW: 1.3.0 released (greensql.com)
RPostgreSQL 0.1-7 (dirk.eddelbuettel.com)

JMP Genomics 5 released

Animation of the structure of a section of DNA... — Image via Wikipedia

Close to the launch of JMP9 with it’s R integration comes the announcement of JMP Genomics 5 released. The product brief is available here http://jmp.com/software/genomics/pdf/103112_jmpg5_prodbrief.pdf and it has an interesting mix of features. If you want to try out the features you can see http://jmp.com/software/license.shtml

As per me, I snagged some “new”stuff in this release-

Perform enrichment analysis using functional information from Ingenuity Pathways Analysis.+
New bar chart track allows summarization of reads or intensities.
New color map track displays heat plots of information for individual subjects.
Use a variety of continuous measures for summarization.
Using a common identifier, compare list membership for up tofive groups and display overlaps with Venn diagrams.
Filter or shade segments by mean intensity, with an optionto display segment mean intensity and set a reference valuefor shading.
Adjust intensities or counts for experimental samples using paired or grouped control samples.
Screen paired DNA and RNA intensities for allele-specific expression.
Standardize using a shifting factor and perform log2transformation after standardization.
Use kernel density information in loess and quantile normalization.
Depict partition tree information graphically for standard models with new Tree Viewer
Predictive modeling for survival analysis with Harrell’s assessment method and integration with Cross-Validation Model Comparison.

That’s right- that is incorporating the work of our favorite professor from R Project himself- http://biostat.mc.vanderbilt.edu/wiki/Main/FrankHarrell

Apparently Prof Frank E was quite a SAS coder himself (see http://biostat.mc.vanderbilt.edu/wiki/Main/SasMacros)

Back to JMP Genomics 5-

The JMP software platform provides:

• New integration capabilities let R users leverage JMP’s interactivegraphics to display analytic results.

• Tools for R programmers to build and package user interfaces that let them share customized R analytics with a broader audience.•

A new add-in infrastructure that simplifies the integration of external analytics into JMP.

+ For people in life sciences who like new stats software you can also download a trial version of IPA here at http://www.ingenuity.com/products/IPA/Free-Trial-Software.html

JMP 9 releasing on Oct 12 (r-bloggers.com)
New JMP Software Version Extends Analytic Options (eon.businesswire.com)
Dan Ariely Headlines JMP Analytics Conference (eon.businesswire.com)
Whole Genome Sequencing of Japanese Individual Reveals Wealth of Undiscovered Genetic Variation (prweb.com)
Blog – Ozzy Osbourne’s Genome (technologyreview.com)
SAS Continues to Expand Analytics Options with Additional R Integration (eon.businesswire.com)
Human Genome Sciences Invites Investors to Listen to Webcast of Presentation at JMP Securities Healthcare Conference (eon.businesswire.com)
SAS, JMP Mix Simulation and Analytics to Foster Innovation (eon.businesswire.com)
Using JMP 9 and R together (r-bloggers.com)
Japanese flower has the biggest genome in the world [Mad Genomics] (io9.com)
JMP Customer Herzenberg Lab Wins Computerworld Honor (eon.businesswire.com)

Revolution R for Linux

New software just released from the guys in California (@RevolutionR) so if you are a Linux user and have academic credentials you can download it for free (@Cmastication doesnt), you can test it to see what the big fuss is all about (also see http://www.revolutionanalytics.com/why-revolution-r/benchmarks.php) –

Revolution Analytics has just released Revolution R Enterprise 4.0.1 for Red Hat Enterprise Linux, a significant step forward in enterprise data analytics. Revolution R Enterprise 4.0.1 is built on R 2.11.1, the latest release of the open-source environment for data analysis and graphics. Also available is the initial release of our deployment server solution, RevoDeployR 1.0, designed to help you deliver R analytics via the Web. And coming soon to Linux: RevoScaleR, a new package for fast and efficient multi-core processing of large data sets.

As a registered user of the Academic version of Revolution R Enterprise for Linux, you can take advantage of these improvements by downloading and installing Revolution R Enterprise 4.0.1 today. You can install Revolution R Enterprise 4.0.1 side-by-side with your existing Revolution R Enterprise installations; there is no need to uninstall previous versions.

Download Information

The following information is all you will need to download and install the Academic Edition.

Supported Platforms:

Revolution R Enterprise Academic edition and RevoDeployR are supported on Red Hat® Enterprise Linux® 5.4 or greater (64-bit processors).

Approximately 300MB free disk space is required for a full install of Revolution R Enterprise. We recommend at least 1GB of RAM to use Revolution R Enterprise.

For the full list of system requirements for RevoDeployR, refer to the RevoDeployR™ Installation Guide for Red Hat® Enterprise Linux®.

Download Links:

You will first need to download the Revolution R Enterprise installer.

Installation Instructions for Revolution R Enterprise Academic Edition

After downloading the installer, do the following to install the software:

Log in as root if you have not already.

Change directory to the directory containing the downloaded installer.

Unpack the installer using the following command:
tar -xzf Revo-Ent-4.0.1-RHEL5-desktop.tar.gz

Change directory to the RevolutionR_4.0.1 directory created.

Run the installer by typing ./install.py and following the on-screen prompts.

Getting Started with the Revolution R Enterprise

After you have installed the software, launch Revolution R Enterprise by typing Revo64 at the shell prompt.

Documentation is available in the form of PDF documents installed as part of the Revolution R Enterprise distribution. Type Revo.home(“doc”) at the R prompt to locate the directory containing the manuals Getting Started with Revolution R (RevoMan.pdf) and the ParallelR User’s Guide(parRman.pdf).

Installation Instructions for RevoDeployR (and RServe)

After downloading the RevoDeployR distribution, use the following steps to install the software:

Note: These instructions are for an automatic install. For more details or for manual install instructions, refer to RevoDeployR_Installation_Instructions_for_RedHat.pdf.

Log into the operating system as root.
su –

Change directory to the directory containing the downloaded distribution for RevoDeployR and RServe.

Unzip the contents of the RevoDeployR tar file. At prompt, type:
tar -xzf deployrRedHat.tar.gz

Change directories. At the prompt, type:
cd installFiles

Launch the automated installation script and follow the on-screen prompts. At the prompt, type:
./installRedHat.sh
Note: Red Hat installs MySQL without a password.

Getting Started with RevoDeployR

After installing RevoDeployR, you will be directed to the RevoDeployR landing page. The landing page has links to documentation, the RevoDeployR management console, the API Explorer development tool, and sample code.

Support

For help installing this Academic Edition, please email support@revolutionanalytics.com

Also interestingly some benchmarks on Revolution R vs R.

http://www.revolutionanalytics.com/why-revolution-r/benchmarks.php

R-25 Benchmarks

The simple R-benchmark-25.R test script is a quick-running survey of general R performance. The Community-developed test consists of three sets of small benchmarks, referred to in the script as Matrix Calculation, Matrix Functions, and Program Control.

R-25 Benchmarks	Base R 2.9.2	Revolution R (1-core)	Revolution R (4-core)	Speedup (4 core)
Matrix Calculation	34 sec	6.6 sec	4.4 sec	7.7x
Matrix Functions	20 sec	4.4 sec	2.1 sec	9.5x
Program Control	4.7 sec	4 sec	4.2 sec	Not Appreciable

Speedup = Slower time / Faster Time – 1 Test descriptions available at http://r.research.att.com/benchmarks

Additional Benchmarks

Revolution Analytics has created its own tests to simulate common real-world computations. Their descriptions are explained below.

Linear Algebra Computation	Base R 2.9.2	Revolution R (1-core)	Revolution R (4-core)	Speedup (4 core)
Matrix Multiply	243 sec	22 sec	5.9 sec	41x
Cholesky Factorization	23 sec	3.8 sec	1.1 sec	21x
Singular Value Decomposition	62 sec	13 sec	4.9 sec	12.6x
Principal Components Analysis	237 sec	41 sec	15.6 sec	15.2x
Linear Discriminant Analysis	142 sec	49 sec	32.0 sec	4.4x

Speedup = Slower time / Faster Time – 1

Matrix Multiply

This routine creates a random uniform 10,000 x 5,000 matrix A, and then times the computation of the matrix product transpose(A) * A.

set.seed (1)
m <- 10000
n <- 5000
A <- matrix (runif (m*n),m,n)
system.time (B <- crossprod(A))

The system will respond with a message in this format:

User system elapsed
37.22 0.40 9.68

The “elapsed” times indicate total wall-clock time to run the timed code.

The table above reflects the elapsed time for this and the other benchmark tests. The test system was an INTEL® Xeon® 8-core CPU (model X55600) at 2.5 GHz with 18 GB system RAM running Windows Server 2008 operating system. For the Revolution R benchmarks, the computations were limited to 1 core and 4 cores by calling setMKLthreads(1) and setMKLthreads(4) respectively. Note that Revolution R performs very well even in single-threaded tests: this is a result of the optimized algorithms in the Intel MKL library linked to Revolution R. The slight greater than linear speedup may be due to the greater total cache available to all CPU cores, or simply better OS CPU scheduling–no attempt was made to pin execution threads to physical cores. Consult Revolution R’s documentation to learn how to run benchmarks that use less cores than your hardware offers.

Cholesky Factorization

The Cholesky matrix factorization may be used to compute the solution of linear systems of equations with a symmetric positive definite coefficient matrix, to compute correlated sets of pseudo-random numbers, and other tasks. We re-use the matrix B computed in the example above:

system.time (C <- chol(B))

Singular Value Decomposition with Applications

The Singular Value Decomposition (SVD) is a numerically-stable and very useful matrix decompisition. The SVD is often used to compute Principal Components and Linear Discriminant Analysis.

# Singular Value Deomposition
m <- 10000
n <- 2000
A <- matrix (runif (m*n),m,n)
system.time (S <- svd (A,nu=0,nv=0))

# Principal Components Analysis
m <- 10000
n <- 2000
A <- matrix (runif (m*n),m,n)
system.time (P <- prcomp(A))

# Linear Discriminant Analysis
require (‘MASS’)
g <- 5
k <- round (m/2)
A <- data.frame (A, fac=sample (LETTERS[1:g],m,replace=TRUE))
train <- sample(1:m, k)
system.time (L <- lda(fac ~., data=A, prior=rep(1,g)/g, subset=train))

Revolution Analytics Introduces Enterprise-Class Application Integration, Deployment & Administration for R (eon.businesswire.com)
R on Windows HPC Server (r-bloggers.com)
Revolution links R stats package to apps (go.theregister.com)
Introducing RevoDeployR: Web Services for R (r-bloggers.com)

Unbreakable Oracle Linux- and Unshakable-Libre Office-

Tux, the Linux penguin — Image via Wikipedia

Oracle announced Unbreakable Oracle Linux (which is the first time I have seen Unbreakable word used in a formal software name)- Hats off to good ol’ Larry chutzpah. It is also quite a fast form of Linux for Enterprises-as the stats say at http://www.oracle.com/us/technologies/linux/ubreakable-enterprise-kernel-linux-173350.html

LibreOffice is a new fork from OpenOffice– Basically people who want to ensure OpenOffice remains free. It basically consists of efforts from everybody except Apple, Microsoft and Oracle (http://www.documentfoundation.org/supporters/) and it’s a new kind of workable office productivity suite-determined to remain free. I have used it- a bit shaky- but I really liked the new design and willingly will test it (and auto submit bugs) . It would be interesting to see the reaction of enterprise vendors like SAS, IBM,Dell, HP (and Lenovo) and etc -as their support would be critical to both Unbreakable Oracle Linux and Unshakable LibreOffice.

See more here-http://www.documentfoundation.org/download/–

OpenOffice leaves Oracle. Forks LibreOffice (instantfundas.com)
Oracle updates Linux for speed and reliability (v3.co.uk)

Better Data Visualization in WordPress.com Stats

WordPress.com Stats is the analytical software which helps bloggers on WP.com hosted blogs. It recently underwent a revamp in design-

Note a simple change from Line to Histogram charts, and added Tabs can add so much value to data.

However WP.com really needs to addin Geo-Coded Stats (Visitors from where) and Some level of Campaign Tracking (similar to Goals in Google Analytics)

Earlier WP Stats

Now WP Stats

Running R on Amazon EC2

On my second day of bludering about high technology, I came across http://rgrossman.com/2009/05/17/running-r-on-amazons-ec2/ which describes how to run R on Amazon EC2.

I tried it out and have subsequently added some screenshots to this tutorial so as to help you run R. My intention of course was to run a R GUI preferable Revolution Enterprise on the Amazon EC 2- and crunch uhm a lot of data.

Now go through the steps as follows-

0) Logging onto Amazon Console

http://aws.amazon.com/ec2/

Note you need your Amazon Id (even the same id which you use for buying books).

Note we are into Amazon EC2 as shown by the upper tab. Click upper tab to get into the Amazon EC2

2) Choosing the right AMI-

On the left margin, you can click AMI -Images.

Now you can search for the image-

I chose Ubuntu images (linux images are cheaper) and opendata in the search as belows- I get two images.

You can choose whether you want 32 bit or 64 bit image. Thumb rule- 64 bit images are preferable for data intensive tasks.

Click on launch instance in the upper tab ( near the search feature)

2) A pop up comes up, which shows the 5 step process to launch your computing.

Choose the right compute instance- As the screenshot shows- there are various compute instances and they all are at different multiples of prices or compute units.

After choosing the compute instance of your choice (extra large is highlighted)- click on continue-

3) Instance Details-

I did not choose cloudburst monitoring as it has a extra charge- and I am just trying out things.So I simply clicked continue.

4) Add Tag Details- If you are running a lot of instances you need to create your own tags to help you manage them. Advisable if you are running many instances.

Since I am going to run just one- I clicked continue with adding just two things OS and Stats Package.

5) Create a key pair- A key pair is an added layer of encryption. Click on create new pair and name it (note the name will be handy in coming steps)

After clicking and downloading the key pair- you come into security groups. Security groups is just a set of instructions to help keep your data transfer secure. So I created a new security group.

And I added some ways in security group to connect (like SSH using Port 22)

7) Last step- Review Details and Click Launch

8) On the Left margin click on instances ( you were in Images.>AMI earlier)

It will take some 3-5 minutes to launch an instance. You can see status as pending till then.

9) Pending instance as shown by yellow light-

10) Once the instance is running -it is shown by a green light.

Click on the check box, and on upper tab go to instance actions. Click on connect-

you see a popup with instructions like these-

Open the SSH client of your choice (e.g., PuTTY, terminal).

Locate your private key file, decisionstats2.pem

Use chmod to make sure your key file isn’t publicly viewable, ssh won’t work otherwise:
chmod 400 decisionstats.pem

Connect to your instance using instance’s public DNS [ec2-75-101-182-203.compute-1.amazonaws.com].

Example

Enter the following command line:

ssh -i decisionstats2.pem root@ec2-75-101-182-203.compute-1.amazonaws.com

IMPORTANT-

If you are choosing Ubuntu Terminal to connect- you need to change the word root from above to Ubuntu above.

12) To launch R, just type R at the terminal

If all goes well you should be able to see this-

choose to install any custom packages (like

install.packages(‘doSNOW’)

work on R using command line

13) IMPORTANT- After doing your R work, please CLOSE your instance (

Go to LEFT Margin-Instances-Check the check box of instance you are running- on upper tab- Instance Actions- Click Terminate.

Submitted By:	Eduardo L Leoni
US East AMI ID:	ami-1b9b7c72
AMI Manifest:	PolMethImages/imageR64.manifest.xml
License:	Public
Operating System:	Linux/Uni

Note there are other Amazon Machine Images as well which have R- I found this as well-

Amazon EC2 Ubuntu 8.10 intrepid AMI built by Eric Hammond; Eduardo Leoni added R, many R packages, JAGS, mysql-client and subversion.

Submitted By: Eduardo L Leoni

US East AMI ID: ami-1b9b7c72

AMI Manifest: PolMethImages/imageR64.manifest.xml

License: Public

Operating System: Linux/Uni

2) You can install Revolution R on 32 bit Ubuntu using sudo apt install revolution-r

Various versions of Revolution R are supported on different versions of Linux

see http://www.revolutionanalytics.com/why-revolution-r/which-r-is-right-for-me.php

I may be willing to try Red Hat Enterprise Linux on the ec2 with 64 bit AND Revolution Enterprise to see the maximum juice I can get. or you can try it using an image below- It would be interesting if there could be an Amazon Machine Image for that (paid-public and private-academic)

AMI ID: ami-8ba347e2
Name:–
Description:–
Source:redhat-cloud/RHEL-5-Server/5.2/x86_64/Beta-2.6.18-92.1.1/RHEL5.2-Server-x86_64-Beta-2.6.18-92.1.1.manifest.xml
Owner:432018295444	Visibility:Public	Product Code:54DBF944
State:available	Kernel ID:aki-89a347e0	RAM Disk ID:ari-88a347e1
Image Type:machine	Architecture:x86_64	Platform:Red Hat
Root Device Type:instance-store	Root Device:–	Image Size:0 bytes
Block Devices:N/A – Instance Store
Virtualization:paravirtual

3) My ultimate goal is to run a parallel session using all cores on an EC2 instance and a R GUI (like R Commander or Rattle)

4) For sake of running a test- I re did the parallel test I did on my 2 core laptop

but using 68.4 gb Memory and 8 cores (26 Compute Units brrr)

> check <-function(n) {

+ for(i in 1:1000)

+ {

+ sme <- matrix(rnorm(100), 10,10)

+ solve(sme)

+ }

> times <- 100

> system.time(for(j in 1:times ) x <- check(j))

user system elapsed

20.51 0.00 20.66

Note I still got a faster response than using parallel processors (on a 2 core 3 gb memory) BUT the gain was not as much had I tried to use foreach for running each of the 8 cores parallely. Or running multicore package as in http://www.cerebralmastication.com/2010/02/using-the-r-multicore-package-in-linux-with-wild-and-passionate-abandon/

5) For some reason on Ubuntu 64 bit Amazon image -I cant get Revolution R (even after using sudo apt) , and I am still learning how I can try and get Enterprise Edition fired up on a 64 bit Red Hat Enterprise Linux Amazon AMI (and maybe create an all new Machine Image 😉

6) I was also unable to use the -X command desite having x 11 as in http://rgrossman.com/2009/05/17/running-r-on-amazons-ec2/ so was not able to see graphics,

Also it prevents me from loogin onto as root, and asks me to login as ubuntu@amazon…. I also wanted to try logging into using my Windows session but kept shuttling between my VM Player Ubuntu session and Windows

7) Hope this was useful. I am thankful to tips from Revolution Blog, R Grossman’s Blog, Creators of the Open Data AMI, Tal’s R Statistics Blog and Cerebral Mastication blog on this

Related Articles

Please share:

Related Articles

Please share:

Read rest of the new software here http://jmp.com/software/genomics/pdf/103112_jmpg5_prodbrief.pdf

Related Articles

Please share:

Download Information

Supported Platforms:

Download Links:

Installation Instructions for Revolution R Enterprise Academic Edition

Getting Started with the Revolution R Enterprise

Installation Instructions for RevoDeployR (and RServe)

Getting Started with RevoDeployR

Support

R-25 Benchmarks

Additional Benchmarks

Matrix Multiply

Cholesky Factorization

Singular Value Decomposition with Applications

Related Articles

Please share:

Related Articles

Please share:

Please share:

Please share: