The long tail of the internet

On a whim, I took the all time stats of my blog posts (more than 1000 posts) , and tried to plot their distribution.

Basically I copied and pasted all the data in a Google docs spreadsheet. and I created dummy codes (like URL1, URL2…. URL 500)

Next I  downloaded the….

I wasnt in the mood for downloading and uploading stuff so I decided to use GGPLOT using Jeroen’s Application at

I used the mirror server that Dataspora provides as I have had latency issues with Jeroen’s website.

I got this error while trying to connect the Dataspora App to my Google spreadsheet

The page you have requested cannot be displayed. Another site was requesting access to your Google Account, but sent a malformed request. Please contact the site that you were trying to use when you received this message to inform them of the error. A detailed error message follows:

The site “” has not been registered.

Oh dear! Back to Jeroen’s /UCLA’s page.

I get this warning but it still manages to log in

This website has not registered with Google to establish a secure connection for authorization requests. We recommend that you continue the process only if you trust the following destination:

wow it works! thats cloud computing now so I wonder why Google and Amazon continue to ignore the rApache, and Jeroen’s cloud app . Surely their Google Fusion Tables can be always improved or tweaked. Not to mention the next gen version of R which will have its own server

Pretty cool screenshot (but click to see more)

I get the following pretty graph. Hadley Wickham would be ashamed of me by now.

What went wrong- well one page has 36000 views . Scale is the key to graphical coherence . So I redo- delete home page in Google spreadsheet ,reimport replot. ( I didnt know how to modify data in the cloud app, maybe we need a cloud PlyR) I redo it again as I have a big outlier-The top 10 Statistical GUI article which ironically has only 5 GUIs in that article but hush dont tell to high quality search engine)

So again Belatedly I discover something called layer in ggplot.

Base Graphics engine has really spoilt me to write short functions for plots.

I give up. I rather prefer hist() I go to my favorite GUI Rattle, but it has some dating issues with the dll of GTK+

So I go to John Fox’s simple GUI. R Commander- is the best GUI if you use Occam’s Razor, and I am using Occam’s Chainsaw now.

I get the analysis I want in 12 secs

Summary- GGPLot is more complicated than base graphics engine.

Deducer GUI is not as simple too

R Commander is the best GUI because it retains simplicity

Ignore long tail of internet only at your peril

Almost 2/3 rds of my daily traffic of 400+ comes from old archived content That is why Search Engine Optimization and Alerts for Keywords are CRITICAL for any poor soul trying to write on a blog (which has no journal like prestige nor rewards)

If you make life easier for the search engine, it being a fair chap, rewards you well

Existing web traffic estimates like Comscore and Google Trends ignore this long tail

Comments are welcome (Data is pasted below of 500 rows X 2 columns if you can come up with a better analysis)

Since SAS has ignored web analytics and Google Analytics is hmm hmm,  this could be an area of opportunity for R developers as well to create a web analytics package.

Home page 36,185
Top 10 Graphical User Interfaces in Statistical Software More stats 8,264
Matlab-Mathematica-R and GPU Computing More stats 2,166
Wealth = function (numeracy, memory recall) More stats 2,162
The Top Statistical Softwares (GUI) More stats 2,118
About DecisionStats More stats 1,902
Libre Office More stats 1,770
Using Facebook Analytics (Updated) More stats 1,446
Windows Azure vs Amazon EC2 (and Google Storage) More stats 1,386
Interview Hadley Wickham R Project Data Visualization Guru More stats 1,204
Test drive a Chrome notebook. More stats 1,201
Interview Professor John Fox Creator R Commander More stats 1,190
Top ten RRReasons R is bad for you ? More stats 1,178
SAS Institute files first lawsuit against WPS- Episode 1 More stats 1,131
R Package Creating More stats 1,104
Interfaces to R More stats 1,039
Using Red R- R with a Visual Interface More stats 950
Google Maps – Jet Ski across Pacific Ocean More stats 922
Norman Nie: R GUI and More More stats 851
Not so AWkward after all: R GUI RKWard More stats 805
Running R on Amazon EC2 More stats 786
Startups for Geeks More stats 785
Creating a Blog Aggregator for free More stats 749
Cloud Computing with R More stats 676
Rapid Miner- R Extension More stats 671
Parallel Programming using R in Windows More stats 664
Revolution R for Linux More stats 645
Red R 1.8- Pretty GUI More stats 638
John Sall sets JMP 9 free to tango with R More stats 601 More stats 597
Funny Images from India More stats 571
R is an epic fail or is it just overhyped More stats 568
Great article on Notepad++ and R in R Journal More stats 564
Certifications in Analytics and Business Intelligence More stats 548
R Excel :Updated More stats 542
Enterprise Linux rises rapidly:New Report More stats 537
So which software is the best analytical software? Sigh- It depends More stats 520
Funny Photo :It happens only In India More stats 518
Creating 3D Graphs with Data in R More stats 507
SPSS /PASW Certification – Free until Sept 15 More stats 497
Interview :Dr Graham Williams More stats 476
GNU PSPP- The Open Source SPSS More stats 474
Professors and Patches: For a Betterrrr R More stats 467
Running R on Amazon EC2 :Windows More stats 462
WPS response to SAS Lawsuit More stats 458
R language on the GPU More stats 450
KXEN and a Data Mining Survey More stats 449
News on R Commercial Development -Rattle- R Data Mining Tool More stats 449
WPS ( Alternative SAS Language Software) Pricing More stats 447
Kill R? Wait a sec More stats 445
SAS Institute lawsuit against WPS Episode 2 The Clone Wars More stats 442
How to be a BAD blogger? More stats 435
ROC Curve More stats 431
Bulls ,Bears ,Tigers and Asses More stats 424
Trrrouble in land of R…and Open Source Suggestions More stats 422
Interview- BI Dashboards dMINE Sanjay Patel More stats 417
Top Seven Reasons :Why Outsourcing is Bad for India More stats 408
Interviews @Decisionstats More stats 407
Running a R GUI,and parallel programming on Amazon EC2 More stats 394
Unbreakable Oracle Linux- and Unshakable-Libre Office- More stats 393
IBM SPSS 19: Marketing Analytics and RFM More stats 387
Analyzing SAS Institute-WPS Lawsuit More stats 377
Hive Tutorial: Cloud Computing More stats 377
R and Hadoop More stats 374
Graphics Presentations More stats 373
Sector/ Sphere – Faster than Hadoop/Mapreduce at Terasort More stats 370
Benchmarking GNU R: DirkE’s view and a Ninja wishlist More stats 363
Webfocus RStat: Pervasive BI using R More stats 363
Open Source Business Intelligence: Pentaho and Jaspersoft More stats 362
How to do Logistic Regression More stats 362
CommeRcial R- Integration in software More stats 359
So what’s new in R 2.12.0 More stats 357
Interview Michael J. A. Berry Data Miners, Inc More stats 356
Data Mining through the Android More stats 352
Newer version of Alternative SAS / WPS 2.4 launched More stats 350
How to Analyze Wikileaks Data – R SPARQL More stats 348
JMP 9 releasing on Oct 12 More stats 343
The R Online WikiBook More stats 340
Hadley’s tutorials on R Visualization More stats 340
Interview Tasso Argyros CTO Aster Data Systems More stats 339
Parsing XML files easily More stats 337
A Software Called Rattle More stats 335
Which software do we buy? -It depends More stats 329
Jim Goodnight on Open Source- and why he is right -sigh More stats 328
SAS/Blades/Servers/ GPU Benchmarks More stats 326
R Commander Plugins-20 and growing! More stats 324
10 iPhone Apps you can actually use ( and dont have to pay for) More stats 316
R Modeling with huge data More stats 315
The Popularity of Data Analysis Software More stats 315
Interview Donald Farmer Microsoft More stats 307
Learning SAS for free More stats 305
Comparing Base SAS and SPSS More stats 304
Towards better Statistical Interfaces More stats 302
Making NeW R More stats 301
Using Code Snippets in Revolution R More stats 300
R Apache – The next frontier of R Computing More stats 298
Using JMP 9 and R together More stats 297
Doing Time Series using a R GUI More stats 295
Amazon announces Micro Instances for cloud computing More stats 295
Top 5 Free Music Websites More stats 295
Web R- Elastic R and RevoDeploy R More stats 291
R for Stats : Updated More stats 290
Heritage Health Prize- Data Mining Contest for 3mill USD More stats 289
Google AppInventor -Android and Business Intelligence More stats 281
Top R Interviews More stats 278
An Introduction to Data Mining-online book More stats 272
Interview Jim Davis SAS Institute More stats 272
Economic: Indian Caste System -Simplification More stats 271
Rattle Re-Introduced More stats 271
KXEN – Automated Regression Modeling More stats 267
Movie Review- Inglorious Basterds More stats 267
Interview :Doug Savage ,Creator More stats 261
IPSUR – A Free R Textbook More stats 258
SAS with the GUI Enterprise Guide (Updated) More stats 256
Trying out Google Prediction API from R More stats 256
Segmenting Models : When and Why More stats 253
Using R and Excel Together More stats 253
R Oracle Data Mining More stats 253
KNIME More stats 253
Using PostgreSQL and MySQL databases in R 2.12 for Windows More stats 250
Fighting Back -The Net, Social Media, Spam, Identity Theft, Terrorism More stats 249
Libre Office (Beta) 3 Launched More stats 248
India to make own DoS -citing cyber security More stats 247
Interview Dominic Pouzin Data Applied More stats 242
R releases new version R 2.9.2 More stats 240
SAS to launch SAS/IML with R ( updated) More stats 239
Playing with Playwith- R Package for Interactive Data Visualizations More stats 234
Predictive Analytics World Conference More stats 231
Analytics and BI for small biz More stats 231
Interview Jeanne Harris Co-Author -Analytics at Work and Competing with Analytics More stats 230
Using R for Time Series in SAS More stats 228
General Electric ‘s breach of the spirit and letter of integrity More stats 227
Interview Luis Torgo Author Data Mining with R More stats 222
Browser Based Model Creation More stats 222
Interview James Dixon Pentaho More stats 221
Thoughts on WPS, SAS , R More stats 220
Choosing R for business – What to consider? More stats 220
Buying SAS Institute More stats 219
Google: Prediction API and other cool stuff More stats 218
Interview : R For Stata Users More stats 216
Viva Libre Office More stats 216
Top 10 Games on Linux -sudo update More stats 214
When China overtook India- using DEDUCER More stats 214
KDD 2009 : Demos More stats 211
Interview Dean Abbott Abbott Analytics More stats 210
Statistically Speaking More stats 203
Data Visualization using Tableau More stats 203
SAS and JMP : Visual Data Discovery More stats 203
High Performance Computing and R More stats 200
Troubleshooting Rattle Installation- Data Mining R GUI More stats 194
Google Realtime Live Updates on Egypt Yemen Tunisia Jordan.. More stats 192
New Deal in Statistical Training More stats 191
Interview Ken O Connor Business Intelligence Consultant More stats 190
Karmic Koala versus Windows 7 More stats 189
Interview Shawn Kung Sr Director Aster Data More stats 189
Pun on Putin More stats 189
Towards better analytical software More stats 188
Dryad- Microsoft’s answer to MR More stats 188
Analyzing Indian – Chinese Relationships More stats 188
LibreOffice News and Google Musings More stats 186
Special Issue of JSS on R GUIs More stats 184
Using Google Docs for Web Scraping More stats 181
Using Reshape2 for transposing datasets in R More stats 180
IBM Buys Netezza More stats 180
Libreoffice 3.3 released More stats 180
Google moving on from MapReduce: rest of world still catching up More stats 179
Linux= Who did what and how much? More stats 176
Interview Carole Jesse Experienced Analytics Professional More stats 176
HIRE ME More stats 175
Test Drive a Google Chrome Notebook: Last Two Days left More stats 174
Q&A with David Smith, Revolution Analytics. More stats 174
R , Ubuntu, RCmdr Updates More stats 173
Interview KNIME Fabian Dill More stats 173
Big Data and R: New Product Release by Revolution Analytics More stats 173
Automated Content Aggregation More stats 173
R or SAS —– R and SAS ? More stats 170
Graphs More stats 169
How to use Oracle for Data Mining More stats 169
Carolina and SAS More stats 166
Interview John Sall Founder JMP/SAS Institute More stats 165
Aster Data hires Quentin Gallivan as CEO More stats 165
Oracle for possible takeover of REvolution Computing More stats 164
The Best and Worst Graphs Ever More stats 163
Statistical Analysis with R- by John M Quick More stats 163
Growing Rapidly: Rapid Miner 4.5 More stats 161
SAP and BI on Demand More stats 161
Google Snappy More stats 161
Google Refine More stats 161
Scoring SAS and SPSS Models in the cloud More stats 159
Hey Professor, I am not a Monkey More stats 157
REVolution Computing fails to create a Revolution More stats 156
SAS Lawsuit against WPS- Application Dismissed More stats 156
KDNuggets Poll on SAS: Churn in Analytics Users More stats 154
SAS Early Days More stats 154
Interview James Taylor Decision Management Expert (Updated) More stats 151
Google Books Ngram Viewer More stats 148
Review – R for SAS and SPSS Users More stats 148
New R Journal Edition More stats 146
Here comes PySpread- 85,899,345 rows and 14,316,555 columns More stats 145
Interview Karl Rexer -Rexer Analytics More stats 144
Poem: The Extroverted Engineer More stats 144
Hearst DataMining Challenge More stats 144
This Is It More stats 142
Interview Timo Elliott SAP More stats 141
The Blind Side – Movie Review More stats 141
Data Mining Survey Results :Tools and Offshoring More stats 140
Going Deap : Algols in Python More stats 140
ADVERTISE More stats 139
Interview Jeff Bass, Bass Institute (Part 2) More stats 139
Interview Jim Harris Data Quality Expert OCDQ Blog More stats 139
Do Monkeys Pay for Sex? More stats 138
Privacy Browsing Extensions in Google Chrome More stats 137
China biggest threat to Indian Software in 5 years: Indian Tech CEO More stats 136
Software HIStory: Bass Institute Part 1 More stats 135
Grenier’s Theory for Competitiveness More stats 134
Interview Charlie Berger Oracle Data Mining More stats 134
Karmic Koala Ubuntu/Linux 9.2 Preview More stats 133
Analytics and Journals More stats 133
Using Code Editors in R More stats 132
Interview Stephanie McReynolds Director Product Marketing, AsterData More stats 132
Amcharts- Cool Charts Web Editor More stats 130
Mapreduce Book More stats 128
Interesting R competition at Reddit More stats 127
Color of Statistics More stats 127
Amazon goes free for users next month More stats 127
#3443 (deleted) More stats 127
Interview Sarah Blow – Girly Geekdom Founder More stats 126
Social Network Analysis: Using R More stats 126
Interview Thomas C. Redman Author Data Driven More stats 126
Audio Interview Anne Milley , Part 1 More stats 124
Advanced Analytics on Multi-Terabyte Datasets- Conferences More stats 123
Geek Humour More stats 123
John M. Chambers Statistical Software Award – 2011 More stats 122
My friend -The Computer More stats 120
M2009 Interview Peter Pawlowski AsterData More stats 118
R Journal Dec 2010 and R for Business Analytics More stats 118
Top ten RRReasons R is bad for you ? More stats 116
Interview Michael Zeller,CEO Zementis on PMML More stats 115
Fast R Graphics More stats 114
New Google Ad Planner More stats 114
Making Sense: Hadoop and MapReduce More stats 114
Using SAS/IML with R More stats 114
Facebook App by SAP Crystal Reports More stats 113
Whats behind that pretty SAS Blog? More stats 113
Interview Alison Bolen More stats 113
Ajay @ arts More stats 112
My latest creation More stats 112
Indian Crabs – A story More stats 112
Open Source’s worst enemy is itself not Microsoft/SAS/SAP/Oracle More stats 112
Google Cloud Print -print documents from the internet More stats 111
WPS and SAS- A rah-rah comparison More stats 110
Facebook Gmail Killer Threatens to commit Hara Kari live on AOL Techcrunch if unsucessful More stats 110
Open Source and Software Strategy More stats 109
Windows Azure and Amazon Free offer More stats 108
R for Analytics is now live More stats 108
Open Source Compiler for SAS language/ GNU -DAP More stats 107
Using Chromium /Chrome on Ubuntu Linux More stats 107
Interview John Moore CTO, Swimfish More stats 106
Nice BI Tutorials More stats 106
Creating Customized Packages in SAS Software More stats 106
Business Analytics Analyst Relations /Ethics/White Papers More stats 105
Web Crawling Automation More stats 105
The SAS-WPS Lawsuit- Preliminary Hearing More stats 105
Handling time and date in R More stats 105
KXEN Update More stats 104
MapReduce Analytics Apps- AsterData’s Developer Express Plugin More stats 104
+ 1 your website -updated More stats 103
Movie Review- Peepli Live More stats 103
Better Data Visualization in Stats More stats 102
Customizing your R software startup More stats 102
LibreOffice Beta 2 (Office Fork off Oracle) launches! More stats 102
KXEN Case Studies : Financial Sector More stats 102
Deleting Twitter, Facebook,LinkedIn- Accepting Life More stats 102
Google Street View shows Gladiators fighting More stats 101
Carole-Ann’s 2011 Predictions for Decision Management More stats 101
Amazon goes HPC and GPU: Dirk E to revise his R HPC book More stats 101
Happy Thanksgiving Id More stats 101
Interview Phil Rack WPS Consultant and Developer More stats 100
SPSS launches two more PASWs More stats 99
Interview David Smith REvolution Computing More stats 99
Data Mining with R More stats 97
Dataset too big for R ? More stats 97
How Jesus saved my Butt More stats 97
Interview Evan Levy Baseline Consulting More stats 97
The Latest GUI for R- BioR More stats 96
WPS Version 2.5.1 Released – can still run SAS language/data and R More stats 96
SAS legal falls flat against WPS again: Technical Grounds More stats 95
World Programming System:300 pounds for The power of SAS language More stats 94
KNIME and Zementis shake hands More stats 93
Interview Eric Siegel, Phd President Prediction Impact More stats 93
Interview Sarah Burnett BI Analyst,Ovum group More stats 92
Quantifying Analytics ROI More stats 92
PSPP – SPSS ‘s Open Source Counterpart More stats 91
PySpread Magic More stats 91
Interview SPSS Olivier Jouve More stats 91
Interesting Data Visualization:Friendwheels More stats 91
R on Windows HPC Server More stats 90
The declining market for Telecommunication Churn Models More stats 90
Getting Inside R More stats 90
The Big Data Summit Agenda More stats 90
Review: Clash of the Titans More stats 89
Red Hat worth 7.8 Billion now More stats 89
Movie Review : Rajneeti (Politics) More stats 89
3 Idiots: Insight to Indian Engineer Campus Life More stats 89
The Comic Water Games (aka Common Wealth Games) More stats 88
Computer Education grants from Google More stats 88
Challenges of Analyzing a dataset (with R) More stats 87
Input Data in R using the top 3 R GUI More stats 86
Complex Event Processing- SASE Language More stats 85
Interview with Anne Milley, SAS II More stats 85
Data Mining Presentation at M2009 by Dr Vincent Granville More stats 85
Brief Interview Timo Elliott More stats 85
Mapping Health Statistics at More stats 85
Amazon’s Turks More stats 84
Business Intelligence and Stat Computing: The White Man’s Last Stand More stats 84
Movie Review- Dabangg More stats 84
Movie Review: Sherlock Holmes More stats 84
SAS Data Mining 2009 Las Vegas More stats 83
Chinese Fortune Cookies More stats 83
SPSS and R More stats 83
Manjunath- A Batchmate on my mine More stats 82
Data Mining 2010:SAS Conference in Vegas More stats 81
DirkE and JD swoon about Shane’s MOM in Room 106 while writing R code More stats 81
SAS to R Challenge: Unique benchmarking More stats 81
Pentaho and R: working together More stats 81
Interview John F Moore CEO The Lab More stats 80
Ways to use both Windows and Linux together More stats 80
Brief Interview with James G Kobielus More stats 80
For R Writers- Inside R More stats 79
Using Ipod and Iphone with your Ubuntu Laptop More stats 79
Webcasts: Oracle Data Mining More stats 79
The Cloud OS is finally here or is it?: Karmic Koala More stats 79
Movie Review: Lafangey Parinday (Rouge Birds) More stats 79
SAS announcement in education initiatives More stats 78
Using R from within Python More stats 78
Event: Predictive analytics with R, PMML and ADAPA More stats 78
Interesting R and BI Web Event More stats 78
Bruno Aziza, Microsoft Global BI Lead joins PAW Keynote More stats 77
Common Analytical Tasks More stats 77
RWui :Creating R Web Interfaces on the go More stats 77
R Successor Language ‘Tea’ announced More stats 76
Learning SPSS for SAS users More stats 76
Protovis a graphical toolkit for visualization More stats 76
Interview Paul van Eikeren Inference for R More stats 75
Data Visualization: Central Banks More stats 75
Oracle Data Mining 11 G R2 More stats 75
Interview Peter J Thomas -Award Winning BI Expert More stats 75
Weak Security in Internet Databases for Statisticians More stats 74
Open Source Cartoon More stats 74
Top Ten Graphs for Business Analytics -Pie Charts (1/10) More stats 74
SAS Sentiment Analysis wins Award More stats 74
JMP Genomics 5 released More stats 74
Short Interview Jill Dyche More stats 73
Interview David Katz ,Dataspora /David Katz Consulting More stats 73
PMML 4.0 More stats 73
Ponder This: IBM Research More stats 72
PAW Videos More stats 71
PASW 13 :The preview More stats 71
Cisco SocialMiner More stats 70
Review-The Dark knight More stats 70
MapReduce Patent Granted More stats 70
Cloud Computing and GPU ( and some stats softwares) More stats 70
IBM Business Analytics Forum More stats 70
And now- The Business Analytics Summit More stats 70
Creating an Anonymous Bot More stats 69
R and SAS in Twitter Land More stats 69
Interview:Richard Schultz , CEO REvolution Computing More stats 69
China -United States -The Third Opium War More stats 68
Quick-R and More stats 68
R Node- and other Web Interfaces to R More stats 68
Life Mojo – A Health Startup More stats 68
Using Views in R and comparing functions across multiple packages More stats 68
Another R Tutorial More stats 67
Interview Karen Lopez Data Modeling Expert More stats 67
QGIS and R More stats 66
Christmas Carol: The Best Software (BI-Stats-Analytics) More stats 66
Software Lawsuits :Ergo More stats 66
STEM is cool More stats 65
Date Night More stats 65
More Advanced SAS Modeling Procs More stats 65
The Big Data Event- Why am I here? More stats 65
Interview Gary Cokins SAS Institute More stats 65
Browser based Music Creation More stats 64
Interview Steve Sarsfield Author The Data Governance Imperative More stats 63
GrapheR More stats 63
Google Web Intelligence (Beta) More stats 61
Data Mining 2009 Interviews- Terry Whitlock, BlueCross BlueShield of TN More stats 60
Audio Interviews -Dr. Colleen McCue National Security Expert More stats 60
Red R- A new beginning More stats 59
YouTube Features: Audio Swap, Mobile posts and Themes More stats 59
R for Predictive Modeling:Workshop More stats 59
KDD2009: Papers Research and Industrial More stats 58
Chapman/Hall announces new series on R More stats 58
Data Visualization and Politics More stats 58
T Shirts Design More stats 58
Jump to JMP: Using Data Analysis in a visual manner More stats 58
Aster Analytics and More stats 57
OK Cupid Data Visualization- Flow Chart to your Heart More stats 57
R for SAS and SPSS Users More stats 57
Carbon Footprints in the snow More stats 57
Summer School on Uncertainty Quantification More stats 57
High Performance Computing within R: Tutorial More stats 57
Running Stats Softwares on Clouds More stats 57
Amazing Data Visualization- UN Counter Terrorism More stats 56
Cloud MapReduce More stats 56
Statistical Features in WPS More stats 56
An R Package only for SAS Users More stats 56
R is Ready for Business™ More stats 55
A Google App for Sales- ERPLY More stats 55
Rexer Analytics Annual Data Miner Survey More stats 55
Cartoons on R More stats 55
American Decline- Why outsourcing doesnt make sense More stats 55
Friday Cartoon Series- New More stats 55
What softwares do you plan to use/learn in the next one year? More stats 54
Great App for Online Sketching More stats 54
September Roundup by Revolution More stats 54
Using Firesheep on Campus, Caltrain and beyond More stats 54
Decisionstats Interview at Big Data Summit, AsterData More stats 53
Learning Hadoop More stats 53
The White Man’s Burden-Poem More stats 53
Curt Monash on Analytics with MapReduce More stats 53
To R or Not to R : Data Mining and CRM for Free More stats 52
Algorithms and Ads: No Free Lunches and Hill Climbing More stats 52
Interview: Roger Haddad, Founder of KXEN Automated Modeling Software More stats 52
Google and Me on Privacy and Openness More stats 52 More stats 52
Why do bloggers blog ? More stats 52
Live Streaming for Free : UStream More stats 51
Light Cycle of Tron review More stats 51
Lyx Releases 2 More stats 51
Interview – Anne Milley, SAS Part 1 More stats 51
SAS News More stats 51
KXEN EMEA User Conference 2010-Success in Business Analytics More stats 51
2011 Forecast-ying More stats 51
Kill Analytics More stats 50
Social Media Analysis Toolkit More stats 50
Multi State Models More stats 50
R and Cloud Computing More stats 50
Dataists shake up R community with a rocking contest More stats 50
Interview Anne Milley JMP More stats 49
Movie Review: Between the Folds More stats 49
Jokes in Economics More stats 49
Interview Ajay Ohri with DMR More stats 49
One more Y Tube Video More stats 49
Happy Diwali /Google Music More stats 48
SPSS Directions : Rexer Survey Results More stats 48
Redlining in Internet Access and notes on Regression Models More stats 48
Poem : A Poets Life More stats 48
Predictive Analytics World More stats 48
Interview- Phil Rack More stats 48
Building KXEN Models on Ubuntu More stats 48
New Year Resolution Presentation More stats 48
Adobe gulps Omniture More stats 47
SAS Modeling Procs More stats 47
Oracle Open World/ RODM package More stats 47
KDNuggets Survey on R More stats 47
IBM and Revolution team to create new in-database R More stats 47
SAS Institute invests in R project More stats 46
Not just a Cloud More stats 46
New Version of R released: R 2.10.1 More stats 46
Review- Iron Man2 More stats 46
Online Analytics: Monte Carlo Simulation More stats 45
Predictive Forecasting in Commercial Applications More stats 45
The Race -by D.H Groberg More stats 45
SAS Scoring Accelerators More stats 45
IBM launches Smart Analytics Cloud More stats 45
Reactions to IBM -SPSS takeover. More stats 45
Zementis partners with R Analytics Vendor- Revo More stats 44
A Missing Mandelbrot Who Dun It More stats 44
Downloading your Facebook Photos More stats 44
Android Tutorial More stats 44
The Mommy Track More stats 44
My First You Tube Video: Courtesy the competiton on VOLNIGHT by Univ of Tennessee More stats 44
Born in the USA? More stats 43
Interview Eric A. King President The Modeling Agency More stats 43
Interview Augusto Albeghi (Straycat) —Founder Straysoft More stats 43
Why Cloud? More stats 43
Innovative ways of Calculus: Gifting a comic set for Christmas More stats 43
To find the best chaat or paan shop More stats 43
Google unleashes Fusion Tables More stats 42
Using SAS and C/C++ together More stats 42
Whats new in the latest version of R More stats 42
Bollywood 101 More stats 42
Who will forecast for the forecasters? More stats 42
Learning R Easily :Two GUI’s More stats 41
Harvard DropOut Writes Open Letter- His Startup has 350m users More stats 41
BI Software More stats 41
How to read blogs in Indonesian and Chinese! More stats 41
Window to a Blue Cloud: Azure Pricing More stats 41
China bans Chinese Food for Googleplex More stats 41
SAS Program for Students More stats 41
The Year 2010 More stats 40
What do you want to know in data analytics? More stats 40
America’s Data Book: Census Abstract 2011 More stats 40
Big Data Management and Advanced Analytics More stats 40
AsterData partners with Tableau More stats 40
Using R from other Software More stats 40
SAS on Fraud More stats 40

Interviews with R Community

This chart represents several constituent comp...
Image via Wikipedia


Interview Luis Torgo Author Data Mining with R

John Fox, R Commander

Interview Dr Graham Williams RATTLE GUI

Hadley Wickham

R for SAS and SPSS Users

R for Stata Users

R Consulting

Interview David Katz ,Dataspora /David Katz Consulting

Case Study


Room: Salon 5 & 6
4:45pm – 5:05pm

Track 2: Social Data and Telecom 
Case Study: Major North American Telecom
Social Networking Data for Churn Analysis

A North American Telecom found that it had a window into social contacts – who has been calling whom on its network. This data proved to be predictive of churn. Using SQL, and GAM in R, we explored how to use this data to improve the identification of likely churners. We will present many dimensions of the lessons learned on this engagement.

Speaker: David Katz, Senior Analyst, Dataspora, and President, David Katz Consulting

Q&A with David Smith, Revolution Analytics

Inference for R

David Smith Revolution Computing

Richard Schultz Revolution Computing

Karime Chine, Elastic R

Chapman/Hall announces new series on R

Rice University, Houston, Texas, USA - Cohen H...
Image via Wikipedia
R Authors get more choice and variety now-
We are pleased to announce the launch of a new series of books on R. 

Chapman & Hall/CRC: The R Series

Aims and Scope
This book series reflects the recent rapid growth in the development and 
application of R, the programming language and software environment for 
statistical computing and graphics. R is now widely used in academic research, 
education, and industry. It is constantly growing, with new versions of the 
core software released regularly and more than 2,600 packages available. It is 
difficult for the documentation to keep pace with the expansion of the 
software, and this vital book series provides a forum for the publication of 
books covering many aspects of the development and application of R.

The scope of the series is wide, covering three main threads:
• Applications of R to specific disciplines such as biology, epidemiology, 
genetics, engineering, finance, and the social sciences.
• Using R for the study of topics of statistical methodology, such as linear 
and mixed modeling, time series, Bayesian methods, and missing data.
• The development of R, including programming, building packages, and graphics.

The books will appeal to programmers and developers of R software, as well as 
applied statisticians and data analysts in many fields. The books will feature 
detailed worked examples and R code fully integrated into the text, ensuring 
their usefulness to researchers, practitioners and students.

Series Editors
John M. Chambers (Department of Statistics, Stanford University, USA;
Torsten Hothorn (Institut für Statistik, Ludwig-Maximilians-Universität, 
München, Germany;
Duncan Temple Lang (Department of Statistics, University of California, Davis, 
Hadley Wickham (Department of Statistics, Rice University, Houston, Texas, USA;

Call for Proposals
We are interested in books covering all aspects of the development and 
application of R software. If you have an idea for a book, please contact one 
of the series editors above or one of the Chapman & Hall/CRC statistics 
acquisitions editors below. Please provide brief details of topic, audience, 
aims and scope, and include an outline if possible.

We look forward to hearing from you.

Best regards,Rob Calver (
David Grubbs (
John Kimmel (


R Journal Dec 2010 and R for Business Analytics

A Bold GNU Head
Image via Wikipedia

I almost missed out on the R Journal for this month- great reading,

and I liked Dr Hadley’s article on stringr package the best. Really really useful package and nice writing too

(incidentally I just downloaded a local copy of his ggplot website at

I aim to really read that one up

Okay, announcement time

I just signed a contract with Springer for a book on R, some what in first half of 2011

” R for Business Analytics

its going to be a more business analytics than a stats perspective ( I am a MBA /Mech Engineer)

and use cases would be business analytics cases. Do write to me if you need help doing some analytics in R (business use cases)- or want something featured. Big focus would be on GUI and easier analytics, using the Einsteinian principle to make things as simple as possible but no simpler)

The Year 2010

Nokia N800 internet tablet, with open source s...
Image via Wikipedia

My annual traffic to this blog was almost 99,000 . Add in additional views on networking sites plus the 400 plus RSS readers- so I can say traffic was 1,20,000 for 2010. Nice. Thanks for reading and hope it was worth your time. (this is a long post and will take almost 440 secs to read but the summary is just given)

My intent is either to inform you, give something useful or atleast something interesting.

see below-

Jan Feb Mar Apr May Jun
2010 6,311 4,701 4,922 5,463 6,493 4,271
Jul Aug Sep Oct Nov Dec Total
5,041 5,403 17,913 16,430 11,723 10,096 98,767



Sandro Saita from just named me for an award on his blog (but my surname is ohRi , Sandro left me without an R- What would I be without R :)) ).

Aw! I am touched. Google for “Data Mining Blog” and Sandro is the best that it is in data mining writing.

DMR People Award 2010
There are a lot of active people in the field of data mining. You can discuss with them on forums. You can read their blogs. You can also meet them in events such as PAW or KDD. Among the people I follow on a regular basis, I have elected:

Ajay Ori

He has been very active in 2010, especially on his blog . Good work Ajay and continue sharing your experience with us!”

What did I write in 2010- stuff.

What did you read on this blog- well thats the top posts list.

2009-12-31 to Today

Title Views
Home page More stats 21,150
Top 10 Graphical User Interfaces in Statistical Software More stats 6,237
Wealth = function (numeracy, memory recall) More stats 2,014
Matlab-Mathematica-R and GPU Computing More stats 1,946
The Top Statistical Softwares (GUI) More stats 1,405
About DecisionStats More stats 1,352
Using Facebook Analytics (Updated) More stats 1,313
Test drive a Chrome notebook. More stats 1,170
Top ten RRReasons R is bad for you ? More stats 1,157
Libre Office More stats 1,151
Interview Hadley Wickham R Project Data Visualization Guru More stats 1,007
Using Red R- R with a Visual Interface More stats 854
SAS Institute files first lawsuit against WPS- Episode 1 More stats 790
Interview Professor John Fox Creator R Commander More stats 764
R Package Creating More stats 754
Windows Azure vs Amazon EC2 (and Google Storage) More stats 726
Norman Nie: R GUI and More More stats 716
Startups for Geeks More stats 682
Google Maps – Jet Ski across Pacific Ocean More stats 670
Not so AWkward after all: R GUI RKWard More stats 579
Red R 1.8- Pretty GUI More stats 570
Parallel Programming using R in Windows More stats 569
R is an epic fail or is it just overhyped More stats 559
Enterprise Linux rises rapidly:New Report More stats 537
Rapid Miner- R Extension More stats 518
Creating a Blog Aggregator for free More stats 504
So which software is the best analytical software? Sigh- It depends More stats 473
Revolution R for Linux More stats 465
John Sall sets JMP 9 free to tango with R More stats 460

So how do people come here –

well I guess I owe Tal G for almost 9000 views ( incidentally I withdrew posting my blog from R- Bloggers and Analyticbridge blogs – due to SEO keyword reasons and some spam I was getting see (below)) is still the CAT’s whiskers and I read it  a lot.

I still dont know who linked my blog to a free sex movie site with 400 views but I have a few suspects.

2009-12-31 to Today

Referrer Views 9,131
Reddit 3,829 1,500
Twitter 1,254
Google Reader 1,215 717 422 341
Google 327 322
Facebook 317 298 278 185 151 130 124 119 117 108

Still reading this post- gosh let me sell you some advertising. It is only $100 a month (yes its a recession)

Advertisers are treated on First in -Last out (FILO)

I have been told I am obsessed with SEO , but I dont care much for search engines apart from Google, and yes SEO is an interesting science (they should really re name it GEO or Google Engine Optimization)

Apparently Hadley Wickham and Donald Farmer are big keywords for me so I should be more respectful I guess.

Search Terms for 365 days ending 2010-12-31 (Summarized)

2009-12-31 to Today

Search Views
libre office 925
facebook analytics 798
test drive a chrome notebook 467
test drive a chrome notebook. 215
r gui 203
data mining 163
wps sas lawsuit 158 133
wps sas 123
google maps jet ski 123
test drive chrome notebook 96
sas wps 89
sas wps lawsuit 85
chrome notebook test drive 83
decision stats 83
best statistics software 74
hadley wickham 72
google maps jetski 72
libreoffice 70
doug savage 65
hive tutorial 58
funny india 56
spss certification 52
donald farmer microsoft 51
best statistical software 49

What about outgoing links? Apparently I need to find a way to ask Google to pay me for the free advertising I gave their chrome notebook launch. But since their search engine and browser is free to me, guess we are even steven.

Clicks for 365 days ending 2010-12-31 (Summarized)

2009-12-31 to Today

URL Clicks 378 355 319 313 228 199 162 148 138 138 116 114 108 107 104 100 96 92 92 92 88 87

so in 2010,

SAS remained top daddy in business analytics,

R made revolutionary strides in terms of new packages,

JMP  launched a new version,

SPSS got integrated with Cognos,

Oracle sued Google and did build a great Data Mining GUI,

Libre Office gave you a non Oracle Open office ( or open even more office)

2011 looks like  a fun year. Have safe partying .

Who searches for this Blog?

Statue of Michael Jackson in Eindhoven, the Ne...
Image via Wikipedia

Using WP- Stats I set about answering this question-

What search keywords lead here-

Clearly Michael Jackson is down this year

And R GUI, Data Mining is up.

How does that affect my writing- given I get almost 250 visitors by search engines alone daily- assume I write nothing on this blog from now on.

It doesnt- I still write what ever code or poem that comes to my mind. So it is hurtful people misunderstimate the effort in writing and jump to conclusions (esp if I write about a company- I am not on payroll of that company- just like if  I write about a poem- I am not a full time poet)

Over to xkcd

All Time (for

Search Views
libre office 818
facebook analytics 806
michael jackson history 240
wps sas lawsuit 180
r gui 168
wps sas 154 118
sas wps 116
decision stats 110
sas wps lawsuit 100
google maps jet ski 94
data mining 88
doug savage 72
hive tutorial 63
spss certification 63
hadley wickham 63
google maps jetski 62
sas sues wps 60
decisionstats 58
donald farmer microsoft 45
libreoffice 44
wps statistics 44
best statistics software 42
r gui ubuntu 41
rstat 37
tamilnadu advanced technical training institute tatti 37


2009-11-24 to Today

Search Views
libre office 818
facebook analytics 781
wps sas lawsuit 170
r gui 164
wps sas 125 118
sas wps 101
sas wps lawsuit 95
google maps jet ski 94
data mining 86
decision stats 82
doug savage 63
hadley wickham 63
google maps jetski 62
hive tutorial 56
donald farmer microsoft 45

Top R Interviews


Portrait of baron A.I.Vassiliev (later - count)
Image via Wikipedia


Here is a list of the Top R Related Interviews I have done (in random order)-

1) John Fox , Creator of R Commander

2) Dr Graham Williams, Creator of Rattle

3) David Smith, back when he was community Director of then Revolution Computing.

and his second interview

4) Robert Schultz, the first CEO of Revolution Computing (now Analytics)

5) Bob  Muenchen, author of R for SAS and SPSS users AND R for Stata users

6) Karim Chine, creator Biocep, Cloud Computing for R

7) Paul van Eikeran, Inference for R,the first enterprise package to use R from within MS Office.

8) Hadley Wickham, creator GGPlot and R Author

Thats a lot of R interviews- I need to balance them out a bit I guess.