Amazon Ec2 goes Red Hat

message from Amazing Amazon’s cloud team- this will also help for #rstats users given that revolution Analytics full versions on RHEL.

—————————————————-

on-demand instances of Amazon EC2 running Red Hat Enterprise Linux (RHEL) for as little as $0.145 per instance hour. The offering combines the cost-effectiveness, scalability and flexibility of running in Amazon EC2 with the proven reliability of Red Hat Enterprise Linux.

Highlights of the offering include:

Support is included through subscription to AWS Premium Support with back-line support by Red Hat
Ongoing maintenance, including security patches and bug fixes, via update repositories available in all Amazon EC2 regions
Amazon EC2 running RHEL currently supports RHEL 5.5, RHEL 5.6, RHEL 6.0 and RHEL 6.1 in both 32 bit and 64 bit formats, and is available in all Regions.
Customers who already own Red Hat licenses will continue to be able to use those licenses at no additional charge.
Like all services offered by AWS, Amazon EC2 running Red Hat Enterprise Linux offers a low-cost, pay-as-you-go model with no long-term commitments and no minimum fees.

For more information, please visit the Amazon EC2 Red Hat Enterprise Linux page.

which is

Amazon EC2 Running Red Hat Enterprise Linux

Amazon EC2 running Red Hat Enterprise Linux provides a dependable platform to deploy a broad range of applications. By running RHEL on EC2, you can leverage the cost effectiveness, scalability and flexibility of Amazon EC2, the proven reliability of Red Hat Enterprise Linux, and AWS premium support with back-line support from Red Hat.. Red Hat Enterprise Linux on EC2 is available in versions 5.5, 5.6, 6.0, and 6.1, both in 32-bit and 64-bit architectures.

Amazon EC2 running Red Hat Enterprise Linux provides seamless integration with existing Amazon EC2 features including Amazon Elastic Block Store (EBS), Amazon CloudWatch, Elastic-Load Balancing, and Elastic IPs. Red Hat Enterprise Linux instances are available in multiple Availability Zones in all Regions.

Pricing

Pay only for what you use with no long-term commitments and no minimum fee.

On-Demand Instances

On-Demand Instances let you pay for compute capacity by the hour with no long-term commitments.

Region:US – N. VirginiaUS – N. CaliforniaEU – IrelandAPAC – SingaporeAPAC – Tokyo

Standard Instances	Red Hat Enterprise Linux
Small (Default)	$0.145 per hour
Large	$0.40 per hour
Extra Large	$0.74 per hour
Micro Instances	Red Hat Enterprise Linux
Micro	$0.08 per hour
High-Memory Instances	Red Hat Enterprise Linux
Extra Large	$0.56 per hour
Double Extra Large	$1.06 per hour
Quadruple Extra Large	$2.10 per hour
High-CPU Instances	Red Hat Enterprise Linux
Medium	$0.23 per hour
Extra Large	$0.78 per hour
Cluster Compute Instances	Red Hat Enterprise Linux
Quadruple Extra Large	$1.70 per hour
Cluster GPU Instances	Red Hat Enterprise Linux
Quadruple Extra Large	$2.20 per hour

Pricing is per instance-hour consumed for each instance type. Partial instance-hours consumed are billed as full hours.

↑ Top

and

Available Instance Types

Standard Instances

Instances of this family are well suited for most applications.

Small Instance – default*

1.7 GB memory
1 EC2 Compute Unit (1 virtual core with 1 EC2 Compute Unit)
160 GB instance storage
32-bit platform
I/O Performance: Moderate
API name: m1.small

Large Instance

7.5 GB memory
4 EC2 Compute Units (2 virtual cores with 2 EC2 Compute Units each)
850 GB instance storage
64-bit platform
I/O Performance: High
API name: m1.large

Extra Large Instance

15 GB memory
8 EC2 Compute Units (4 virtual cores with 2 EC2 Compute Units each)
1,690 GB instance storage
64-bit platform
I/O Performance: High
API name: m1.xlarge

Micro Instances

Instances of this family provide a small amount of consistent CPU resources and allow you to burst CPU capacity when additional cycles are available. They are well suited for lower throughput applications and web sites that consume significant compute cycles periodically.

Micro Instance

613 MB memory
Up to 2 EC2 Compute Units (for short periodic bursts)
EBS storage only
32-bit or 64-bit platform
I/O Performance: Low
API name: t1.micro

High-Memory Instances

Instances of this family offer large memory sizes for high throughput applications, including database and memory caching applications.

High-Memory Extra Large Instance

17.1 GB of memory
6.5 EC2 Compute Units (2 virtual cores with 3.25 EC2 Compute Units each)
420 GB of instance storage
64-bit platform
I/O Performance: Moderate
API name: m2.xlarge

High-Memory Double Extra Large Instance

34.2 GB of memory
13 EC2 Compute Units (4 virtual cores with 3.25 EC2 Compute Units each)
850 GB of instance storage
64-bit platform
I/O Performance: High
API name: m2.2xlarge

High-Memory Quadruple Extra Large Instance

68.4 GB of memory
26 EC2 Compute Units (8 virtual cores with 3.25 EC2 Compute Units each)
1690 GB of instance storage
64-bit platform
I/O Performance: High
API name: m2.4xlarge

High-CPU Instances

Instances of this family have proportionally more CPU resources than memory (RAM) and are well suited for compute-intensive applications.

High-CPU Medium Instance

1.7 GB of memory
5 EC2 Compute Units (2 virtual cores with 2.5 EC2 Compute Units each)
350 GB of instance storage
32-bit platform
I/O Performance: Moderate
API name: c1.medium

High-CPU Extra Large Instance

7 GB of memory
20 EC2 Compute Units (8 virtual cores with 2.5 EC2 Compute Units each)
1690 GB of instance storage
64-bit platform
I/O Performance: High
API name: c1.xlarge

Cluster Compute Instances

Instances of this family provide proportionally high CPU resources with increased network performance and are well suited for High Performance Compute (HPC) applications and other demanding network-bound applications. Learn more about use of this instance type for HPC applications.

Cluster Compute Quadruple Extra Large Instance

23 GB of memory
33.5 EC2 Compute Units (2 x Intel Xeon X5570, quad-core “Nehalem” architecture)
1690 GB of instance storage
64-bit platform
I/O Performance: Very High (10 Gigabit Ethernet)
API name: cc1.4xlarge

Cluster GPU Instances

Instances of this family provide general-purpose graphics processing units (GPUs) with proportionally high CPU and increased network performance for applications benefitting from highly parallelized processing, including HPC, rendering and media processing applications. While Cluster Compute Instances provide the ability to create clusters of instances connected by a low latency, high throughput network, Cluster GPU Instances provide an additional option for applications that can benefit from the efficiency gains of the parallel computing power of GPUs over what can be achieved with traditional processors. Learn more about use of this instance type for HPC applications.

Cluster GPU Quadruple Extra Large Instance

22 GB of memory
33.5 EC2 Compute Units (2 x Intel Xeon X5570, quad-core “Nehalem” architecture)
2 x NVIDIA Tesla “Fermi” M2050 GPUs
1690 GB of instance storage
64-bit platform
I/O Performance: Very High (10 Gigabit Ethernet)
API name: cg1.4xlarge

Getting Started

To get started using Red Hat Enterprise Linux on Amazon EC2, perform the following steps:

Open and log into the AWS Management Console
Click on Launch Instance from the EC2 Dashboard
Select the Red Hat Enterprise Linux AMI from the QuickStart tab
Specify additional details of your instance and click Launch
Additional details can be found on each AMI’s Catalog Entry page

The AWS Management Console is an easy tool to start and manage your instances. If you are looking for more details on launching an instance, a quick video tutorial on how to use Amazon EC2 with the AWS Management Console can be found here .
A full list of Red Hat Enterprise Linux AMIs can be found in the AWS AMI Catalog.

↑ Top

Support

All customers running Red Hat Enterprise Linux on EC2 will receive access to repository updates from Red Hat. Moreover, AWS Premium support customers can contact AWS to get access to a support structure from both Amazon and Red Hat.

↑ Top

Resources

↑ Top

About Red Hat

Red Hat, the world’s leading open source solutions provider, is headquartered in Raleigh, NC with over 50 satellite offices spanning the globe. Red Hat provides high-quality, low-cost technology with its operating system platform, Red Hat Enterprise Linux, together with applications, management and Services Oriented Architecture (SOA) solutions, including the JBoss Enterprise Middleware Suite. Red Hat also offers support, training and consulting services to its customers worldwide.

also from Revolution Analytics- in case you want to #rstats in the cloud and thus kill all that talk of RAM dependency, slow R than other softwares (just increase the RAM above in the instances to keep it simple)

,or Revolution not being open enough

http://www.revolutionanalytics.com/downloads/gpl-sources.php

GPL SOURCES

Revolution Analytics uses an Open-Core Licensing model. We provide open- source R bundled with proprietary modules from Revolution Analytics that provide additional functionality for our users. Open-source R is distributed under the GNU Public License (version 2), and we make our software available under a commercial license.

Revolution Analytics respects the importance of open source licenses and has contributed code to the open source R project and will continue to do so. We have carefully reviewed our compliance with GPLv2 and have worked with Mark Radcliffe of DLA Piper, the outside General Legal Counsel of the Open Source Initiative, to ensure that we fully comply with the obligations of the GPLv2.

For our Revolution R distribution, we may make some minor modifications to the R sources (the ChangeLog file lists all changes made). You can download these modified sources of open-source R under the terms of the GPLv2, using either the links below or those in the email sent to you when you download a specific version of Revolution R.

Download GPL Sources

Product	Version	Platform	Modified R Sources
Revolution R Community	3.2	Windows	R 2.10.1
Revolution R Community	3.2	MacOS	R 2.10.1
Revolution R Enterprise	3.1.1	RHEL	R 2.9.2
Revolution R Enterprise	4.0	Windows	R 2.11.1
Revolution R Enterprise	4.0.1	RHEL	R 2.11.1
Revolution R Enterprise	4.1.0	Windows	R 2.11.1
Revolution R Enterprise	4.2	Windows	R 2.11.1
Revolution R Enterprise	4.2	RHEL	R 2.11.1
Revolution R Enterprise	4.3	Windows & RHEL	R 2.12.2

Citrix Webinar – Time Management for better Time Sharing

I always liked Citrix products when I was a member of the Technical Advisory Board at the University of Tennessee. I especially liked enabling SAS software , R software, Matlab software , ONLY from a browser.

Data Mining through cloud computing, yes University of Tennessee’s analytics server http://analytics.utk.edu was way ahead in 2009- all these softwares at one portal no software needed on your own PC, simply upload data and work on any analytics software.

Here is a nice citrix webinar on managing Time (so you can read more webinars! nah. I think Youtube live streaming events with interactive question and answers is the way of the future while webinars are for Baby Boomers- you can do a test and control experiment yourself if you are in the webinar business. its a web2.oinar)

http://learn.gotomeeting.com/forms/26May11-APAC-ANZ-G2MC-WBR-L1?url=decisionstats

Standard Disclosure- I have not received any monetary or indirect compensation for promoting this webinar.

————————————————————————————————————————————————————-

Interruptions are productivity killers – between email, phone calls and back-to-back meetings, how do you find time to work on your top priorities?

Join top time-management guru Kent Curtis and learn how to stop “living in your inbox” and start prioritising tasks, messages and appointments according to what is most important.

This webinar takes the best principles from FranklinCovey’s world-class productivity training and teaches you how to apply them while using Microsoft Outlook as your scheduling tool.

Attend this interactive, one-hour webinar to:

Stay focused every day with a reliable planning system utilising Microsoft Outlook.
Control competing demands such as email, voice mail, meetings and interruptions.
Apply a planning process that gets better business results.
Reduce stress by eliminating low priority activities and distractions.
Register for the Webinar

Please forward this to colleagues who might be interested in learning more.

Kind regards,

H.R. Shiever | Managing Director – Asia Pacific

Citrix Online
A division of Citrix Systems, Inc.
http://www.citrixonline.com

Online Meetings Made Easy

GoToMeeting Corporate
Live Webinar

Title:

The New Time Management: Stay Focused Every Day with Reliable Planning

Date:

Thursday, 26 May
Time:

12 Noon Australian EST
10 AM Singapore SGT
7.30 AM India ST
Speakers:
Kent Curtis, Senior Consultant, FranklinCovey

http://learn.gotomeeting.com/forms/26May11-APAC-ANZ-G2MC-WBR-L1?url=decisionstats

Keep the Spice In… Life After the Webinar (customerthink.com)
Citrix Webinar – Rocommended Architecture (edugeek.net)
Google Apps Sync for Microsoft Outlook (edugeek.net)

Interview- Top Data Mining Blogger on Earth , Sandro Saitta

If you do a Google search for Data Mining Blog- for the past several years one Blog will come on top. data mining blog – Google Search http://bit.ly/kEdPlE

To honor 5 years of Sandro Saitta’s blog (yes thats 5 years!) , we cover an exclusive interview with him where he reveals his unique sauce for cool techie blogging.

Ajay- Describe your journey as a scientist and data miner, from early experiences, to schooling to your work/research/blogging.

Sandro- My first experience with data mining was my master project. I used decision tree to predict pollen concentration for the following week using input data such as wind, temperature and rain. The fact that an algorithm can make a computer learn from experience was really amazing to me. I found it so interesting that I started a PhD in data mining. This time, the field of application was civil engineering. Civil engineers put a lot of sensors on their structure in order to understand how they behave. With all these sensors they generate a lot of data. To interpret these data, I used data mining techniques such as feature selection and clustering. I started my blog, Data Mining Research, during my PhD, to share with other researchers.

I then started applying data mining in the stock market as my first job in industry. I realized the difference between image recognition, where 99% correct classification rate is state of the art, and stock market, where you’re happy with 55%. However, the company ambiance was not as good as I thought, so I moved to consulting. There, I applied data mining in behavioral targeting to increase click-through rates. When you compare the number of customers who click with the ones who don’t, then you really understand what class imbalance mean. A few months ago, I accepted a very good opportunity at SICPA. I’m looking forward to resolving new challenges there.

Ajay- Your blog is the top ranked blog for “data mining blog”. Could you share some tips on better blogging for analytics and technical people

Sandro- It’s always difficult to start a blog, since at the beginning you have no reader. Writing for nobody may seem stupid, but it is not. By writing my first posts during my PhD I was reorganizing my ideas. I was expressing concepts which were not always clear to me. I thus learned a lot and also improved my English level. Of course, it’s still not perfect, but I hope most people can understand me.

Next come the readers. A few dozen each week first. To increase this number, I then started to learn SEO (Search Engine Optimization) by reading books and blogs. I tested many techniques that increased Data Mining Research visibility in the blogosphere. I think SEO is interesting when you already have some content published (which means not at the very beginning of your blog). After a while, once your blog is nicely ranked, the main task is to work on the content of the blog. To be of interest, your content must be particular: original, informative or provocative for example. I also had the chance to have a good visibility thanks to well-known people in the field like Kevin Hillstrom, Gregory Piatetsky-Shapiro, Will Dwinnell / Dean Abbott, Vincent Granville, Matthew Hurst and many others.

Ajay- Whats your favorite statistical software and what are the various softwares that you have worked with.
Could you compare and contrast these software as well.

Sandro- My favorite software at this point is SAS. I worked with it for two years. Once you know the language, you can perform ETL and data mining so easily. It’s also very fast compared to others. There are a lot of tools for data mining, but I cannot think of a tool that is as powerful as SAS and, in the same time, has a high-level programming language behind it.

I also worked with R and Matlab. R is very nice since you have all the up-to-date data mining algorithms implemented. However, working in the memory is not always a good choice, especially for ETL. Matlab is an excellent tool for prototyping. It’s not so fast and certainly not done for ETL, but the price is low regarding all the possibilities for data mining. According to me, SAS is the best choice for ETL and a good choice for data mining. Of course, there is the price.

Ajay- What are your favorite techniques and training resources for learning basics of data mining to say statisticians or business management graduates.

Sandro- I’m the kind of guy who likes to read books. I read data mining books one after the other. The fact that the same concepts are explained differently (and by different people) helps a lot in learning a topic like data mining. Of course, nothing replaces experience in the field. You can read hundreds of books, you will still not be a good practitioner until you really apply data mining in specific fields. My second choice after books is blogs. By reading data mining blogs, you will really see the issues and challenges in the field. It’s still not experience, but we are closer. Finally, web resources and networks such as KDnuggets of course, but also AnalyticBridge and LinkedIn.

Ajay- Describe your hobbies and how they help you ,if at all in your professional life.

Sandro- One of my hobbies is reading. I read a lot of books about data mining, SEO, Google as well as Sci-Fi and Fantasy. I’m a big fan of Asimov by the way. My other hobby is playing tennis. I think I simply use my hobbies as a way to find equilibrium in my life. I always try to find the best balance between work, family, friends and sport.

Ajay- What are your plans for your website for 2011-2012.

Sandro- I will continue to publish guest posts and interviews. I think it is important to let other people express themselves about data mining topics. I will not write about my current applications due to the policies of my current employer. But don’t worry, I still have a lot to write, whether it is technical or not. I will also emphasis more on my experience with data mining, advices for data miners, tips and tricks, and of course book reviews!

Standard Disclosure of Blogging- Sandro awarded me the Peoples Choice award for his blog for 2010 and carried out my interview. There is a lot of love between our respective wordpress blogs, but to reassure our puritan American readers- it is platonic and intellectual.

About Sandro S-

Sandro Saitta is a Data Mining Research Engineer at SICPA Security Solutions. He is also a blogger at Data Mining Research (www.dataminingblog.com). His interests include data mining, machine learning, search engine optimization and website marketing.

You can contact Mr Saitta at his Twitter address-

https://twitter.com/#!/dataminingblog

HIGHLIGHTS from REXER Survey :R gives best satisfaction (decisionstats.com)
Participate in the 2011 Rexer Data Mining Survey (r-bloggers.com)
KDNuggets Survey on R (decisionstats.com)
New book on BigData Analytics and Data mining using #Rstats with a GUI (decisionstats.com)
Solmentum: Solar Meets Data Mining (gigaom.com)
KDnuggets: R used in 1 in 4 analytics projects (revolutionanalytics.com)

2010 in review and WP-Stats

The following is an auto generated post thanks to WordPress.com stats team- clearly they have got some stuff wrong

1) Defining the speedometer quantitatively

2) The busiest day numbers are plain wrong ( 2 views ??)

3) There is still no geographic data in WordPress -com stats (unlike Google Analytics) and I cant enable Google Analytics on a wordpress.com hosted site.

The stats helper monkeys at WordPress.com mulled over how this blog did in 2010, and here’s a high level summary of its overall blog health:

The Blog-Health-o-Meter™ reads Wow.

Crunchy numbers

The Louvre Museum has 8.5 million visitors per year. This blog was viewed about 97,000 times in 2010. If it were an exhibit at The Louvre Museum, it would take 4 days for that many people to see it.

In 2010, there were 367 new posts, growing the total archive of this blog to 1191 posts. There were 411 pictures uploaded, taking up a total of 121mb. That’s about 1 pictures per day.

The busiest day of the year was September 22nd with 2 views. The most popular post that day was Top 10 Graphical User Interfaces in Statistical Software.

Where did they come from?

The top referring sites in 2010 were r-bloggers.com, reddit.com, rattle.togaware.com, twitter.com, and Google Reader.

Some visitors came searching, mostly for libre office, facebook analytics, test drive a chrome notebook, test drive a chrome notebook., and wps sas lawsuit.

Attractions in 2010

These are the posts and pages that got the most views in 2010.

Top 10 Graphical User Interfaces in Statistical Software April 2010
8 comments and 1 Like on WordPress.com,

Wealth = function (numeracy, memory recall) December 2009
1 Like on WordPress.com,

Matlab-Mathematica-R and GPU Computing September 2010
1 Like on WordPress.com,

About DecisionStats July 2008

The Top Statistical Softwares (GUI) May 2010
1 comment and 1 Like on WordPress.com,

Google Chrome Extension To Check WordPress.com Stats (techie-buzz.com)

The Year 2010

My annual traffic to this blog was almost 99,000 . Add in additional views on networking sites plus the 400 plus RSS readers- so I can say traffic was 1,20,000 for 2010. Nice. Thanks for reading and hope it was worth your time. (this is a long post and will take almost 440 secs to read but the summary is just given)

My intent is either to inform you, give something useful or atleast something interesting.

see below-

	Jan	Feb	Mar	Apr	May	Jun

2010	6,311	4,701	4,922	5,463	6,493	4,271

Jul	Aug	Sep	Oct	Nov	Dec	Total

5,041

5,403

17,913

16,430

11,723

10,096

98,767

Sandro Saita from http://www.dataminingblog.com/ just named me for an award on his blog (but my surname is ohRi , Sandro left me without an R- What would I be without R :)) ).

Aw! I am touched. Google for “Data Mining Blog” and Sandro is the best that it is in data mining writing.

”

DMR People Award 2010
There are a lot of active people in the field of data mining. You can discuss with them on forums. You can read their blogs. You can also meet them in events such as PAW or KDD. Among the people I follow on a regular basis, I have elected:

Ajay Ori

He has been very active in 2010, especially on his blog . Good work Ajay and continue sharing your experience with us!”

What did I write in 2010- stuff.

What did you read on this blog- well thats the top posts list.

2009-12-31 to Today

Title		Views
Home page		21,150
Top 10 Graphical User Interfaces in Statistical Software		6,237
Wealth = function (numeracy, memory recall)		2,014
Matlab-Mathematica-R and GPU Computing		1,946
The Top Statistical Softwares (GUI)		1,405
About DecisionStats		1,352
Using Facebook Analytics (Updated)		1,313
Test drive a Chrome notebook.		1,170
Top ten RRReasons R is bad for you ?		1,157
Libre Office		1,151
Interview Hadley Wickham R Project Data Visualization Guru		1,007
Using Red R- R with a Visual Interface		854
SAS Institute files first lawsuit against WPS- Episode 1		790
Interview Professor John Fox Creator R Commander		764
R Package Creating		754
Windows Azure vs Amazon EC2 (and Google Storage)		726
Norman Nie: R GUI and More		716
Startups for Geeks		682
Google Maps – Jet Ski across Pacific Ocean		670
Not so AWkward after all: R GUI RKWard		579
Red R 1.8- Pretty GUI		570
Parallel Programming using R in Windows		569
R is an epic fail or is it just overhyped		559
Enterprise Linux rises rapidly:New Report		537
Rapid Miner- R Extension		518
Creating a Blog Aggregator for free		504
So which software is the best analytical software? Sigh- It depends		473
Revolution R for Linux		465
John Sall sets JMP 9 free to tango with R		460

So how do people come here –

well I guess I owe Tal G for almost 9000 views ( incidentally I withdrew posting my blog from R- Bloggers and Analyticbridge blogs – due to SEO keyword reasons and some spam I was getting see (below))

http://r-bloggers.com is still the CAT’s whiskers and I read it a lot.

I still dont know who linked my blog to a free sex movie site with 400 views but I have a few suspects.

2009-12-31 to Today

Referrer	Views
r-bloggers.com	9,131
Reddit	3,829
rattle.togaware.com	1,500
Twitter	1,254
Google Reader	1,215
linkedin.com	717
freesexmovie.irwanaf.com	422
analyticbridge.com	341
Google	327
coolavenues.com	322
Facebook	317
kdnuggets.com	298
dataminingblog.com	278
en.wordpress.com	185
google.co.in	151
xianblog.wordpress.com	130
inside-r.org	124
decisionstats.com	119
ifreestores.com	117
bits.blogs.nytimes.com	108

–

Still reading this post- gosh let me sell you some advertising. It is only $100 a month (yes its a recession)

Advertisers are treated on First in -Last out (FILO)

I have been told I am obsessed with SEO , but I dont care much for search engines apart from Google, and yes SEO is an interesting science (they should really re name it GEO or Google Engine Optimization)

Apparently Hadley Wickham and Donald Farmer are big keywords for me so I should be more respectful I guess.

Search Terms for 365 days ending 2010-12-31 (Summarized)

2009-12-31 to Today

Search	Views
libre office	925
facebook analytics	798
test drive a chrome notebook	467
test drive a chrome notebook.	215
r gui	203
data mining	163
wps sas lawsuit	158
wordle.net	133
wps sas	123
google maps jet ski	123
test drive chrome notebook	96
sas wps	89
sas wps lawsuit	85
chrome notebook test drive	83
decision stats	83
best statistics software	74
hadley wickham	72
google maps jetski	72
libreoffice	70
doug savage	65
hive tutorial	58
funny india	56
spss certification	52
donald farmer microsoft	51
best statistical software	49

What about outgoing links? Apparently I need to find a way to ask Google to pay me for the free advertising I gave their chrome notebook launch. But since their search engine and browser is free to me, guess we are even steven.

Clicks for 365 days ending 2010-12-31 (Summarized)

2009-12-31 to Today

URL	Clicks
rattle.togaware.com	378
facebook.com/Decisionstats	355
rapid-i.com/content/view/182/196	319
services.google.com/fb/forms/cr48basic	313
red-r.org	228
decisionstats.wordpress.com/2010/05/07/the-top-statistical-softwares-gui	199
teamwpc.co.uk/products/wps	162
r4stats.com/popularity	148
r-statistics.com/2010/04/r-and-the-google-summer-of-code-2010-accepted-students-and-projects	138
socserv.mcmaster.ca/jfox/Misc/Rcmdr	138
spss.com/certification	116
learnr.wordpress.com	114
dudeofdata.com/decisionstats	108
r-project.org	107
documentfoundation.org/faq	104
goo.gl/maps/UISY	100
inside-r.org/download	96
en.wikibooks.org/wiki/R_Programming	92
nytimes.com/external/readwriteweb/2010/12/07/07readwriteweb-report-google-offering-chrome-notebook-test-11919.html	92
sourceforge.net/apps/mediawiki/rkward/index.php?title=Main_Page	92
analyticdroid.togaware.com	88
yeroon.net/ggplot2	87

so in 2010,

SAS remained top daddy in business analytics,

R made revolutionary strides in terms of new packages,

JMP launched a new version,

SPSS got integrated with Cognos,

Oracle sued Google and did build a great Data Mining GUI,

Libre Office gave you a non Oracle Open office ( or open even more office)

2011 looks like a fun year. Have safe partying .

IBM SPSS 19 Now Available to the Global Academic Community via e-academy’s OnTheHub eStore (prweb.com)
ACM Data Mining Camp 3 (revolutionanalytics.com)
Accessing R from Python using RPy2 (r-bloggers.com)
Mining of Massive Data Sets (kinlane.com)
5 FeedBurner Alternatives You Should Know About (techie-buzz.com)
Uncertainty, Risk, Statistics and Data Mining (zyxo.wordpress.com)
‘Data Mining’ Gains Traction in Education (edreformer.com)
If you cut your RSS short I will ignore your post (chrisabraham.com)
Solar trends for 2011 (cleanbreak.ca)

Which software do we buy? -It depends

Often I am asked by clients, friends and industry colleagues on the suitability or unsuitability of particular software for analytical needs. My answer is mostly-

It depends on-

1) Cost of Type 1 error in purchase decision versus Type 2 error in Purchase Decision. (forgive me if I mix up Type 1 with Type 2 error- I do have some weird childhood learning disabilities which crop up now and then)

Here I define Type 1 error as paying more for a software when there were equivalent functionalities available at lower price, or buying components you do need , like SPSS Trends (when only SPSS Base is required) or SAS ETS, when only SAS/Stat would do.

The first kind is of course due to the presence of free tools with GUI like R, R Commander and Deducer (Rattle does have a 500$ commercial version).

The emergence of software vendors like WPS (for SAS language aficionados) which offer similar functionality as Base SAS, as well as the increasing convergence of business analytics (read predictive analytics), business intelligence (read reporting) has led to somewhat brand clutter in which all softwares promise to do everything at all different prices- though they all have specific strengths and weakness. To add to this, there are comparatively fewer business analytics independent analysts than say independent business intelligence analysts.

2) Type 2 Error- In this case the opportunity cost of delayed projects, business models , or lower accuracy – consequences of buying a lower priced software which had lesser functionality than you required.

To compound the magnitude of error 2, you are probably in some kind of vendor lock-in, your software budget is over because of buying too much or inappropriate software and hardware, and still you could do with some added help in business analytics. The fear of making a business critical error is a substantial reason why open source software have to work harder at proving them competent. This is because writing great software is not enough, we need great marketing to sell it, and great customer support to sustain it.

As Business Decisions are decisions made in the constraints of time, information and money- I will try to create a software purchase matrix based on my knowledge of known softwares (and unknown strengths and weakness), pricing (versus budgets), and ranges of data handling. I will add in basically an optimum approach based on known constraints, and add in flexibility for unknown operational constraints.

I will restrain this matrix to analytics software, though you could certainly extend it to other classes of enterprise software including big data databases, infrastructure and computing.

Noted Assumptions- 1) I am vendor neutral and do not suffer from subjective bias or affection for particular software (based on conferences, books, relationships,consulting etc)

2) All software have bugs so all need customer support.

3) All software have particular advantages , strengths and weakness in terms of functionality.

4) Cost includes total cost of ownership and opportunity cost of business analytics enabled decision.

5) All software marketing people will praise their own software- sometimes over-selling and mis-selling product bundles.

Software compared are SPSS, KXEN, R,SAS, WPS, Revolution R, SQL Server, and various flavors and sub components within this. Optimized approach will include parallel programming, cloud computing, hardware costs, and dependent software costs.

To be continued-

New Deal in Statistical Training (r-bloggers.com)
StatFilter: the time vs. money test (ask.metafilter.com)
Netezza Buy Further Defines IBM’s Analytics Bent (pcworld.com)
$1.4Bn Multi-Media Corporation Boosts Revenues with KXEN Analytics (eon.businesswire.com)
Enhanced SAS IT Intelligence Software Includes Cloud, Virtual Servers (eon.businesswire.com)
Interview Dean Abbott Abbott Analytics (r-bloggers.com)
SAS brings predictive analytics to business users (infoworld.com)
Netezza buy further defines IBM’s analytics bent (infoworld.com)
Business analytics market to see 7% CAGR over 2009-14 (newstatesman.com)
SAS Rolls Out Predictive Analytics for Business Users (nytimes.com)
Doughnuts and Pizza Slices: Analyzing Consolidation and Competition Among Software Vendors (customerthink.com)
NSF Wants To Know How Much Software Really Costs (developers.slashdot.org)
What License Management Can Do for Your IT Shop (itexpertvoice.com)
PASW v. 19 (SPSS) Trial Download (psipsychologytutor.org)
SPSS Co-Founder “Tex” Hull Joins REvolution Computing (eon.businesswire.com)
Global Banks Turn to IBM SPSS Predictive Analytics to Improve Customer Relationships (eon.businesswire.com)
Selling the intangibles beyond the demand is the real challenge (leadsexplorer.com)

Top ten RRReasons R is bad for you ?

This is the original symbol of the Perl progra... — Image via Wikipedia

R stands for programming language based out of www.r-project.org

R is bad for you because –

1) It is slower with bigger datasets than SPSS language and SAS language .If you use bigger datasets, then you should either consider more hardware , or try and wait for some of the ODBC connect packages.

2) It needs more time to learn than SAS language .Much more time to learn how to do much more.

3) R programmers are lesser paid than SAS programmers.They prefer it that way.It equates the satisfaction of creating a package in development with a world wide community with the satisfaction of using a package and earning much more money per hour.

4) It forces you to learn the exact details of what you are doing due to its object oriented structure. Thus you either get no answer or get an exact answer. Your customer pays you by the hour not by the correct answers.

5) You can not push a couple of buttons or refer to a list of top ten most commonly used commands to finish the project.

6) It is free. And open for all. It is socialism expressed in code. Some of the packages are built by university professors. It is free.Free is bad. Who pays for the mortgage of the software programmers if all softwares were free ? Who pays for the Friday picnics. Who pays for the Good Night cruises?

7) It is free. Your organization will not commend you for saving them money- they will question why you did not recommend this before. And why did you approve all those packages that expire in 2011.R is fReeeeee. Customers feel good while spending money.The more software budgets you approve the more your salary is. R thReatens all that.

8) It is impossible to install a package you do not need or want. There is no one calling you on the phone to consider one more package or solution. R can make you lonely.

9) R uses mostly Command line. Command line is from the Seventies. Or the Eighties. The GUI’s RCmdr and Rattle are there but still…..

10) R forces you to learn new stuff by the month. You prefer to only earn by the month. Till the day your job got offshored…

Written by a R user in English language

( which fortunately was not copyrighted otherwise we would be paying Britain for each word)

Install and load R package “Rcmdr” to quickly install lots of other packages (r-bloggers.com)
A Beginner’s Guide to Integrated Development Environments (mashable.com)
IPSUR – A Free R Textbook (r-bloggers.com)
Trrrouble in land of R…and Open Source Suggestions (r-bloggers.com)
R is Hot: Part 1 (r-bloggers.com)
The Big Data Explosion and the Demand for the Statistical Tools to Analyze It (readwriteweb.com)
Teach Yourself How to Use the Ubuntu Command Line (helpdeskgeek.com)

Ajay- The above post was reprinted by personal request. It was written on Jan 2009- and may not be truly valid now. It is meant to be taken in good humor-not so seriously.

Amazon EC2 Running Red Hat Enterprise Linux

Pricing

On-Demand Instances

Available Instance Types

Standard Instances

Micro Instances

High-Memory Instances

High-CPU Instances

Cluster Compute Instances

Cluster GPU Instances

Getting Started

Support

Resources

About Red Hat

GPL SOURCES

Download GPL Sources

Please share:

Related articles

Please share:

Related articles

Please share:

Crunchy numbers

Where did they come from?

Attractions in 2010

Related Articles

Please share:

2009-12-31 to Today

2009-12-31 to Today

Search Terms for 365 days ending 2010-12-31 (Summarized)

2009-12-31 to Today

Clicks for 365 days ending 2010-12-31 (Summarized)

2009-12-31 to Today

Related Articles

Please share:

Related Articles

Please share:

Related Articles

Please share: