Using a Linux only package in Windows #rstats

Here is some R code for using a R package that has only a tar.gz file available (used to load R packages in Linux) and no Zip file available (used to load R packages in Windows).

Step 1- Download the tar.gz file.

Step 2 Unzip it (twice) using 7zip

Step 3 Change the path variable below to your unzipped, downloaded location for the R sub folder within the package folder .

Step 4 Copy and Paste this in R

Step 5 Start using the R package in Windows (where 75% of the money and clients and businesses still are)

Caveat Emptor- No X Dependencies (ok!)

path="C:\\Users\\KUs\\Desktop\\segue\\R"
b=dir(path)
c=length(b)
for (i in 1:c){source(gsub(" ","",paste(path,"\\",b[i])))}
ls()

 

R2D2

Revolution Analytics and Pricing Analytics

Cost of 1 day of Revolution Analytics Training at http://www.revolutionanalytics.com/services/training/

 

1. Intro to R

Price:  Commercial: SGD$500.00
Academic:SGD$350.00

1 Singapore dollar = 0.8197 US dollars

10% Early Bird Discount Deadline: November 13, 2012 @ 12:00PM Pacific Time
Discount code: earlybird

2. (aptly titled Minimalistic Sufficient R…you think the ricing would be minimalistic.. but)

http://www.revolutionanalytics.com/services/training/public/minimalist-sufficient-r.php

Price: 

$750

$100 Early Bird Discount Deadline: November 16, 2012 @ 12:00PM Pacific Time
Discount code: earlybird

3.

Advanced R (Italian)

Price:  Commercial: €680.00
Academic: €480.00

1 euro = 1.2975 US dollars

4.

Big Data AnalyticS with RevoScaleR

Price:  $500 with 2 month Revolution R Enterprise workstation evaluation.

$700 with 1 year subscription of Revolution R enterprise workstation ($1500 value)

10% Early Bird Discount Deadline: October 30, 2012 @ 12:00PM Pacific Time
Discount code: early

5.

Revolution R Time Series Training

Price:  Commercial: S$1,200.00
Academic:S$750.00

10% Early Bird Discount Deadline: October 30, 2012 @ 12:00PM Pacific Time
Discount code: earlybird

so training costs differently different strokes for different folks I guess,

BUT me hearties.

Cost of 1 year of Revolution Enterprise= $1000

Thats a flat rate, so the Linux and Windows costs the same and so does the 32-bit and 64-bit

(see http://buy.revolutionanalytics.com/ )

( My comment- either Revo should give away the license for free to enterprises, rationalize training costs, seriously how can 2 days of training cost like a 1 year of license and the software is definitely quite good., or create a paid Amazon Ec 2 AMI for enterprises to rent the Revolution Analytics software (like SAP Hana ), or even on Windows Azure if they insist on hugging Microsoft, though I am clearly seeing various flavors of Linux beating Windows Server to a pulp in the Big Data market, though I am probably more optimistic on the Windows 8 on Surface but because of hardware not software/ Azure alternative to Amazon given Google’s delayed offering- I dont even know many many instance of Windows related HPC or HPA,  (/end_of_rant)

Annual Subscription
Includes software license and technical support
Price Quantity Total
Revolution R Enterprise Single-User Workstation (64-bit Windows) $1,000.00 $0.00
Revolution R Enterprise Single-User Workstation (32-bit Windows) $1,000.00 $0.00
Revolution R Enterprise Single-User Workstation (64-bit Red Hat 6 Enterprise Linux) $1,000.00 $0.00
Revolution R Enterprise Single-User Workstation (64-bit Red Hat 5 Enterprise Linux) $1,000.00 $0.00

 

Running R on Windows Azure #rstats #cloud

Here is a brief tutorial for people to run R on Windows Azure Cloud (OS=Windows in this case , but there are 4 kinds of Linux also available)

There is a free 90 day trial so you can run R for free on the cloud for free (since Google Cloud Compute is still in closed hush hush beta)

Go to https://www.windowsazure.com/en-us/pricing/free-trial/

Different kinds of Clouds

Some slides I liked on cloud computing infrastructure as offered by Amazon, IBM, Google , Windows and Oracle

 

 

Using Rapid Miner and R for Sports Analytics #rstats

Rapid Miner has been one of the oldest open source analytics software, long long before open source or even analytics was considered a fashion buzzword. The Rapid Miner software has been a pioneer in many areas (like establishing a marketplace for Rapid Miner Extensions) and the Rapid Miner -R extension was one of the most promising enablers of using R in an enterprise setting.
The following interview was taken with a manager of analytics for a sports organization. The sports organization considers analytics as a strategic differentiator , hence the name is confidential. No part of the interview has been edited or manipulated.

Ajay- Why did you choose Rapid Miner and R? What were the other software alternatives you considered and discarded?

Analyst- We considered most of the other major players in statistics/data mining or enterprise BI.  However, we found that the value proposition for an open source solution was too compelling to justify the premium pricing that the commercial solutions would have required.  The widespread adoption of R and the variety of packages and algorithms available for it, made it an easy choice.  We liked RapidMiner as a way to design structured, repeatable processes, and the ability to optimize learner parameters in a systematic way.  It also handled large data sets better than R on 32-bit Windows did.  The GUI, particularly when 5.0 was released, made it more usable than R for analysts who weren’t experienced programmers.

Ajay- What analytics do you do think Rapid Miner and R are best suited for?

 Analyst- We use RM+R mainly for sports analysis so far, rather than for more traditional business applications.  It has been quite suitable for that, and I can easily see how it would be used for other types of applications.

 Ajay- Any experiences as an enterprise customer? How was the installation process? How good is the enterprise level support?

Analyst- Rapid-I has been one of the most responsive tech companies I’ve dealt with, either in my current role or with previous employers.  They are small enough to be able to respond quickly to requests, and in more than one case, have fixed a problem, or added a small feature we needed within a matter of days.  In other cases, we have contracted with them to add larger pieces of specific functionality we needed at reasonable consulting rates.  Those features are added to the mainline product, and become fully supported through regular channels.  The longer consulting projects have typically had a turnaround of just a few weeks.

 Ajay- What challenges if any did you face in executing a pure open source analytics bundle ?

Analyst- As Rapid-I is a smaller company based in Europe, the availability of training and consulting in the USA isn’t as extensive as for the major enterprise software players, and the time zone differences sometimes slow down the communications cycle.  There were times where we were the first customer to attempt a specific integration point in our technical environment, and with no prior experiences to fall back on, we had to work with Rapid-I to figure out how to do it.  Compared to the what traditional software vendors provide, both R and RM tend to have sparse, terse, occasionally incomplete documentation.  The situation is getting better, but still lags behind what the traditional enterprise software vendors provide.

 Ajay- What are the things you can do in R ,and what are the things you prefer to do in Rapid Miner (comparison for technical synergies)

Analyst- Our experience has been that RM is superior to R at writing and maintaining structured processes, better at handling larger amounts of data, and more flexible at fine-tuning model parameters automatically.  The biggest limitation we’ve had with RM compared to R is that R has a larger library of user-contributed packages for additional data mining algorithms.  Sometimes we opted to use R because RM hadn’t yet implemented a specific algorithm.  The introduction the R extension has allowed us to combine the strengths of both tools in a very logical and productive way.

In particular, extending RapidMiner with R helped address RM’s weakness in the breadth of algorithms, because it brings the entire R ecosystem into RM (similar to how Rapid-I implemented much of the Weka library early on in RM’s development).  Further, because the R user community releases packages that implement new techniques faster than the enterprise vendors can, this helps turn a potential weakness into a potential strength.  However, R packages tend to be of varying quality, and are more prone to go stale due to lack of support/bug fixes.  This depends heavily on the package’s maintainer and its prevalence of use in the R community.  So when RapidMiner has a learner with a native implementation, it’s usually better to use it than the R equivalent.

New Amazon Instance: High I/O for NoSQL

Latest from the Amazon Cloud-

hi1.4xlarge instances come with eight virtual cores that can deliver 35 EC2 Compute Units (ECUs) of CPU performance, 60.5 GiB of RAM, and 2 TiB of storage capacity across two SSD-based storage volumes. Customers using hi1.4xlarge instances for their applications can expect over 120,000 4 KB random write IOPS, and as many as 85,000 random write IOPS (depending on active LBA span). These instances are available on a 10 Gbps network, with the ability to launch instances into cluster placement groups for low-latency, full-bisection bandwidth networking.

High I/O instances are currently available in three Availability Zones in US East (N. Virginia) and two Availability Zones in EU West (Ireland) regions. Other regions will be supported in the coming months. You can launch hi1.4xlarge instances as On Demand instances starting at $3.10/hour, and purchase them as Reserved Instances

http://aws.amazon.com/ec2/instance-types/

High I/O Instances

Instances of this family provide very high instance storage I/O performance and are ideally suited for many high performance database workloads. Example applications include NoSQL databases like Cassandra and MongoDB. High I/O instances are backed by Solid State Drives (SSD), and also provide high levels of CPU, memory and network performance.

High I/O Quadruple Extra Large Instance

60.5 GB of memory
35 EC2 Compute Units (8 virtual cores with 4.4 EC2 Compute Units each)
2 SSD-based volumes each with 1024 GB of instance storage
64-bit platform
I/O Performance: Very High (10 Gigabit Ethernet)
Storage I/O Performance: Very High*
API name: hi1.4xlarge

*Using Linux paravirtual (PV) AMIs, High I/O Quadruple Extra Large instances can deliver more than 120,000 4 KB random read IOPS and between 10,000 and 85,000 4 KB random write IOPS (depending on active logical block addressing span) to applications. For hardware virtual machines (HVM) and Windows AMIs, performance is approximately 90,000 4 KB random read IOPS and between 9,000 and 75,000 4 KB random write IOPS. The maximum sequential throughput on all AMI types (Linux PV, Linux HVM, and Windows) per second is approximately 2 GB read and 1.1 GB write.