The long tail of the internet

On a whim, I took the all time stats of my blog posts (more than 1000 posts) , and tried to plot their distribution.

Basically I copied and pasted all the data in a Google docs spreadsheet. and I created dummy codes (like URL1, URL2…. URL 500)

Next I  downloaded the….

I wasnt in the mood for downloading and uploading stuff so I decided to use GGPLOT using Jeroen’s Application at http://www.stat.ucla.edu/~jeroen/

I used the mirror server that Dataspora provides as I have had latency issues with Jeroen’s website.

I got this error while trying to connect the Dataspora App to my Google spreadsheet

The page you have requested cannot be displayed. Another site was requesting access to your Google Account, but sent a malformed request. Please contact the site that you were trying to use when you received this message to inform them of the error. A detailed error message follows:

The site “http://dataspora.com” has not been registered.

Oh dear! Back to Jeroen’s /UCLA’s page.

http://rweb.stat.ucla.edu/ggplot2/

I get this warning but it still manages to log in

This website has not registered with Google to establish a secure connection for authorization requests. We recommend that you continue the process only if you trust the following destination:

http://rweb.stat.ucla.edu/R/googleLogin?domain=rweb.stat.ucla.edu

wow it works! thats cloud computing now so I wonder why Google and Amazon continue to ignore the rApache, and Jeroen’s cloud app . Surely their Google Fusion Tables can be always improved or tweaked. Not to mention the next gen version of R which will have its own server

Pretty cool screenshot (but click to see more)

I get the following pretty graph. Hadley Wickham would be ashamed of me by now.

What went wrong- well one page has 36000 views . Scale is the key to graphical coherence . So I redo- delete home page in Google spreadsheet ,reimport replot. ( I didnt know how to modify data in the cloud app, maybe we need a cloud PlyR) I redo it again as I have a big outlier-The top 10 Statistical GUI article which ironically has only 5 GUIs in that article but hush dont tell to high quality search engine)

So again Belatedly I discover something called layer in ggplot.

Base Graphics engine has really spoilt me to write short functions for plots.

I give up. I rather prefer hist() I go to my favorite GUI Rattle, but it has some dating issues with the dll of GTK+

So I go to John Fox’s simple GUI. R Commander- is the best GUI if you use Occam’s Razor, and I am using Occam’s Chainsaw now.

I get the analysis I want in 12 secs


Summary- GGPLot is more complicated than base graphics engine.

Deducer GUI is not as simple too

R Commander is the best GUI because it retains simplicity

Ignore long tail of internet only at your peril

Almost 2/3 rds of my daily traffic of 400+ comes from old archived content That is why Search Engine Optimization and Alerts for Keywords are CRITICAL for any poor soul trying to write on a blog (which has no journal like prestige nor rewards)

If you make life easier for the search engine, it being a fair chap, rewards you well

Existing web traffic estimates like Comscore and Google Trends ignore this long tail

Comments are welcome (Data is pasted below of 500 rows X 2 columns if you can come up with a better analysis)

Since SAS has ignored web analytics and Google Analytics is hmm hmm,  this could be an area of opportunity for R developers as well to create a web analytics package.

Title
Views
Home page 36,185
Top 10 Graphical User Interfaces in Statistical Software More stats 8,264
Matlab-Mathematica-R and GPU Computing More stats 2,166
Wealth = function (numeracy, memory recall) More stats 2,162
The Top Statistical Softwares (GUI) More stats 2,118
About DecisionStats More stats 1,902
Libre Office More stats 1,770
Using Facebook Analytics (Updated) More stats 1,446
Windows Azure vs Amazon EC2 (and Google Storage) More stats 1,386
Interview Hadley Wickham R Project Data Visualization Guru More stats 1,204
Test drive a Chrome notebook. More stats 1,201
Interview Professor John Fox Creator R Commander More stats 1,190
Top ten RRReasons R is bad for you ? More stats 1,178
SAS Institute files first lawsuit against WPS- Episode 1 More stats 1,131
R Package Creating More stats 1,104
Interfaces to R More stats 1,039
Using Red R- R with a Visual Interface More stats 950
Google Maps – Jet Ski across Pacific Ocean More stats 922
Norman Nie: R GUI and More More stats 851
Not so AWkward after all: R GUI RKWard More stats 805
Running R on Amazon EC2 More stats 786
Startups for Geeks More stats 785
Creating a Blog Aggregator for free More stats 749
Cloud Computing with R More stats 676
Rapid Miner- R Extension More stats 671
Parallel Programming using R in Windows More stats 664
Revolution R for Linux More stats 645
Red R 1.8- Pretty GUI More stats 638
John Sall sets JMP 9 free to tango with R More stats 601
Wordle.net More stats 597
Funny Images from India More stats 571
R is an epic fail or is it just overhyped More stats 568
Great article on Notepad++ and R in R Journal More stats 564
Certifications in Analytics and Business Intelligence More stats 548
R Excel :Updated More stats 542
Enterprise Linux rises rapidly:New Report More stats 537
So which software is the best analytical software? Sigh- It depends More stats 520
Funny Photo :It happens only In India More stats 518
Creating 3D Graphs with Data in R More stats 507
SPSS /PASW Certification – Free until Sept 15 More stats 497
Interview :Dr Graham Williams More stats 476
GNU PSPP- The Open Source SPSS More stats 474
Professors and Patches: For a Betterrrr R More stats 467
Running R on Amazon EC2 :Windows More stats 462
WPS response to SAS Lawsuit More stats 458
R language on the GPU More stats 450
KXEN and a Data Mining Survey More stats 449
News on R Commercial Development -Rattle- R Data Mining Tool More stats 449
WPS ( Alternative SAS Language Software) Pricing More stats 447
Kill R? Wait a sec More stats 445
SAS Institute lawsuit against WPS Episode 2 The Clone Wars More stats 442
How to be a BAD blogger? More stats 435
ROC Curve More stats 431
Bulls ,Bears ,Tigers and Asses More stats 424
Trrrouble in land of R…and Open Source Suggestions More stats 422
Interview- BI Dashboards dMINE Sanjay Patel More stats 417
Top Seven Reasons :Why Outsourcing is Bad for India More stats 408
Interviews @Decisionstats More stats 407
Running a R GUI,and parallel programming on Amazon EC2 More stats 394
Unbreakable Oracle Linux- and Unshakable-Libre Office- More stats 393
IBM SPSS 19: Marketing Analytics and RFM More stats 387
Analyzing SAS Institute-WPS Lawsuit More stats 377
Hive Tutorial: Cloud Computing More stats 377
R and Hadoop More stats 374
Graphics Presentations More stats 373
Sector/ Sphere – Faster than Hadoop/Mapreduce at Terasort More stats 370
Benchmarking GNU R: DirkE’s view and a Ninja wishlist More stats 363
Webfocus RStat: Pervasive BI using R More stats 363
Open Source Business Intelligence: Pentaho and Jaspersoft More stats 362
How to do Logistic Regression More stats 362
CommeRcial R- Integration in software More stats 359
So what’s new in R 2.12.0 More stats 357
Interview Michael J. A. Berry Data Miners, Inc More stats 356
Data Mining through the Android More stats 352
Newer version of Alternative SAS / WPS 2.4 launched More stats 350
How to Analyze Wikileaks Data – R SPARQL More stats 348
JMP 9 releasing on Oct 12 More stats 343
The R Online WikiBook More stats 340
Hadley’s tutorials on R Visualization More stats 340
Interview Tasso Argyros CTO Aster Data Systems More stats 339
Parsing XML files easily More stats 337
A Software Called Rattle More stats 335
Which software do we buy? -It depends More stats 329
Jim Goodnight on Open Source- and why he is right -sigh More stats 328
SAS/Blades/Servers/ GPU Benchmarks More stats 326
R Commander Plugins-20 and growing! More stats 324
10 iPhone Apps you can actually use ( and dont have to pay for) More stats 316
R Modeling with huge data More stats 315
The Popularity of Data Analysis Software More stats 315
Interview Donald Farmer Microsoft More stats 307
Learning SAS for free More stats 305
Comparing Base SAS and SPSS More stats 304
Towards better Statistical Interfaces More stats 302
Making NeW R More stats 301
Using Code Snippets in Revolution R More stats 300
R Apache – The next frontier of R Computing More stats 298
Using JMP 9 and R together More stats 297
Doing Time Series using a R GUI More stats 295
Amazon announces Micro Instances for cloud computing More stats 295
Top 5 Free Music Websites More stats 295
Web R- Elastic R and RevoDeploy R More stats 291
R for Stats : Updated More stats 290
Heritage Health Prize- Data Mining Contest for 3mill USD More stats 289
Google AppInventor -Android and Business Intelligence More stats 281
Top R Interviews More stats 278
An Introduction to Data Mining-online book More stats 272
Interview Jim Davis SAS Institute More stats 272
Economic: Indian Caste System -Simplification More stats 271
Rattle Re-Introduced More stats 271
KXEN – Automated Regression Modeling More stats 267
Movie Review- Inglorious Basterds More stats 267
Interview :Doug Savage ,Creator SavageChickens.com More stats 261
IPSUR – A Free R Textbook More stats 258
SAS with the GUI Enterprise Guide (Updated) More stats 256
Trying out Google Prediction API from R More stats 256
Segmenting Models : When and Why More stats 253
Using R and Excel Together More stats 253
R Oracle Data Mining More stats 253
KNIME More stats 253
Using PostgreSQL and MySQL databases in R 2.12 for Windows More stats 250
Fighting Back -The Net, Social Media, Spam, Identity Theft, Terrorism More stats 249
Libre Office (Beta) 3 Launched More stats 248
India to make own DoS -citing cyber security More stats 247
Interview Dominic Pouzin Data Applied More stats 242
R releases new version R 2.9.2 More stats 240
SAS to launch SAS/IML with R ( updated) More stats 239
Playing with Playwith- R Package for Interactive Data Visualizations More stats 234
Predictive Analytics World Conference More stats 231
Analytics and BI for small biz More stats 231
Interview Jeanne Harris Co-Author -Analytics at Work and Competing with Analytics More stats 230
Using R for Time Series in SAS More stats 228
General Electric ‘s breach of the spirit and letter of integrity More stats 227
Interview Luis Torgo Author Data Mining with R More stats 222
Browser Based Model Creation More stats 222
Interview James Dixon Pentaho More stats 221
Thoughts on WPS, SAS , R More stats 220
Choosing R for business – What to consider? More stats 220
Buying SAS Institute More stats 219
Google: Prediction API and other cool stuff More stats 218
Interview : R For Stata Users More stats 216
Viva Libre Office More stats 216
Top 10 Games on Linux -sudo update More stats 214
When China overtook India- using DEDUCER More stats 214
KDD 2009 : Demos More stats 211
Interview Dean Abbott Abbott Analytics More stats 210
Statistically Speaking More stats 203
Data Visualization using Tableau More stats 203
SAS and JMP : Visual Data Discovery More stats 203
High Performance Computing and R More stats 200
Troubleshooting Rattle Installation- Data Mining R GUI More stats 194
Google Realtime Live Updates on Egypt Yemen Tunisia Jordan.. More stats 192
New Deal in Statistical Training More stats 191
Interview Ken O Connor Business Intelligence Consultant More stats 190
Karmic Koala versus Windows 7 More stats 189
Interview Shawn Kung Sr Director Aster Data More stats 189
Pun on Putin More stats 189
Towards better analytical software More stats 188
Dryad- Microsoft’s answer to MR More stats 188
Analyzing Indian – Chinese Relationships More stats 188
LibreOffice News and Google Musings More stats 186
Special Issue of JSS on R GUIs More stats 184
Using Google Docs for Web Scraping More stats 181
Using Reshape2 for transposing datasets in R More stats 180
IBM Buys Netezza More stats 180
Libreoffice 3.3 released More stats 180
Google moving on from MapReduce: rest of world still catching up More stats 179
Linux= Who did what and how much? More stats 176
Interview Carole Jesse Experienced Analytics Professional More stats 176
HIRE ME More stats 175
Test Drive a Google Chrome Notebook: Last Two Days left More stats 174
Q&A with David Smith, Revolution Analytics. More stats 174
R , Ubuntu, RCmdr Updates More stats 173
Interview KNIME Fabian Dill More stats 173
Big Data and R: New Product Release by Revolution Analytics More stats 173
Automated Content Aggregation More stats 173
R or SAS —– R and SAS ? More stats 170
Graphs More stats 169
How to use Oracle for Data Mining More stats 169
Carolina and SAS More stats 166
Interview John Sall Founder JMP/SAS Institute More stats 165
Aster Data hires Quentin Gallivan as CEO More stats 165
Oracle for possible takeover of REvolution Computing More stats 164
The Best and Worst Graphs Ever More stats 163
Statistical Analysis with R- by John M Quick More stats 163
Growing Rapidly: Rapid Miner 4.5 More stats 161
SAP and BI on Demand More stats 161
Google Snappy More stats 161
Google Refine More stats 161
Scoring SAS and SPSS Models in the cloud More stats 159
Hey Professor, I am not a Monkey More stats 157
REVolution Computing fails to create a Revolution More stats 156
SAS Lawsuit against WPS- Application Dismissed More stats 156
KDNuggets Poll on SAS: Churn in Analytics Users More stats 154
SAS Early Days More stats 154
Interview James Taylor Decision Management Expert (Updated) More stats 151
Google Books Ngram Viewer More stats 148
Review – R for SAS and SPSS Users More stats 148
New R Journal Edition More stats 146
Here comes PySpread- 85,899,345 rows and 14,316,555 columns More stats 145
Interview Karl Rexer -Rexer Analytics More stats 144
Poem: The Extroverted Engineer More stats 144
Hearst DataMining Challenge More stats 144
This Is It More stats 142
Interview Timo Elliott SAP More stats 141
The Blind Side – Movie Review More stats 141
Data Mining Survey Results :Tools and Offshoring More stats 140
Going Deap : Algols in Python More stats 140
ADVERTISE More stats 139
Interview Jeff Bass, Bass Institute (Part 2) More stats 139
Interview Jim Harris Data Quality Expert OCDQ Blog More stats 139
Do Monkeys Pay for Sex? More stats 138
Privacy Browsing Extensions in Google Chrome More stats 137
China biggest threat to Indian Software in 5 years: Indian Tech CEO More stats 136
Software HIStory: Bass Institute Part 1 More stats 135
Grenier’s Theory for Competitiveness More stats 134
Interview Charlie Berger Oracle Data Mining More stats 134
Karmic Koala Ubuntu/Linux 9.2 Preview More stats 133
Analytics and Journals More stats 133
Using Code Editors in R More stats 132
Interview Stephanie McReynolds Director Product Marketing, AsterData More stats 132
Amcharts- Cool Charts Web Editor More stats 130
Mapreduce Book More stats 128
Interesting R competition at Reddit More stats 127
Color of Statistics More stats 127
Amazon goes free for users next month More stats 127
#3443 (deleted) More stats 127
Interview Sarah Blow – Girly Geekdom Founder More stats 126
Social Network Analysis: Using R More stats 126
Interview Thomas C. Redman Author Data Driven More stats 126
Audio Interview Anne Milley , Part 1 More stats 124
Advanced Analytics on Multi-Terabyte Datasets- Conferences More stats 123
Geek Humour More stats 123
John M. Chambers Statistical Software Award – 2011 More stats 122
My friend -The Computer More stats 120
M2009 Interview Peter Pawlowski AsterData More stats 118
R Journal Dec 2010 and R for Business Analytics More stats 118
Top ten RRReasons R is bad for you ? More stats 116
Interview Michael Zeller,CEO Zementis on PMML More stats 115
Fast R Graphics More stats 114
New Google Ad Planner More stats 114
Making Sense: Hadoop and MapReduce More stats 114
Using SAS/IML with R More stats 114
Facebook App by SAP Crystal Reports More stats 113
Whats behind that pretty SAS Blog? More stats 113
Interview Alison Bolen SAS.com More stats 113
Ajay @ arts More stats 112
My latest creation More stats 112
Indian Crabs – A story More stats 112
Open Source’s worst enemy is itself not Microsoft/SAS/SAP/Oracle More stats 112
Google Cloud Print -print documents from the internet More stats 111
WPS and SAS- A rah-rah comparison More stats 110
Facebook Gmail Killer Threatens to commit Hara Kari live on AOL Techcrunch if unsucessful More stats 110
Open Source and Software Strategy More stats 109
Windows Azure and Amazon Free offer More stats 108
R for Analytics is now live More stats 108
Open Source Compiler for SAS language/ GNU -DAP More stats 107
Using Chromium /Chrome on Ubuntu Linux More stats 107
Interview John Moore CTO, Swimfish More stats 106
Nice BI Tutorials More stats 106
Creating Customized Packages in SAS Software More stats 106
Business Analytics Analyst Relations /Ethics/White Papers More stats 105
Web Crawling Automation More stats 105
The SAS-WPS Lawsuit- Preliminary Hearing More stats 105
Handling time and date in R More stats 105
KXEN Update More stats 104
MapReduce Analytics Apps- AsterData’s Developer Express Plugin More stats 104
+ 1 your website -updated More stats 103
Movie Review- Peepli Live More stats 103
Better Data Visualization in WordPress.com Stats More stats 102
Customizing your R software startup More stats 102
LibreOffice Beta 2 (Office Fork off Oracle) launches! More stats 102
KXEN Case Studies : Financial Sector More stats 102
Deleting Twitter, Facebook,LinkedIn- Accepting Life More stats 102
Google Street View shows Gladiators fighting More stats 101
Carole-Ann’s 2011 Predictions for Decision Management More stats 101
Amazon goes HPC and GPU: Dirk E to revise his R HPC book More stats 101
Happy Thanksgiving Id More stats 101
Interview Phil Rack WPS Consultant and Developer More stats 100
SPSS launches two more PASWs More stats 99
Interview David Smith REvolution Computing More stats 99
Data Mining with R More stats 97
Dataset too big for R ? More stats 97
How Jesus saved my Butt More stats 97
Interview Evan Levy Baseline Consulting More stats 97
The Latest GUI for R- BioR More stats 96
WPS Version 2.5.1 Released – can still run SAS language/data and R More stats 96
SAS legal falls flat against WPS again: Technical Grounds More stats 95
World Programming System:300 pounds for The power of SAS language More stats 94
KNIME and Zementis shake hands More stats 93
Interview Eric Siegel, Phd President Prediction Impact More stats 93
Interview Sarah Burnett BI Analyst,Ovum group More stats 92
Quantifying Analytics ROI More stats 92
PSPP – SPSS ‘s Open Source Counterpart More stats 91
PySpread Magic More stats 91
Interview SPSS Olivier Jouve More stats 91
Interesting Data Visualization:Friendwheels More stats 91
R on Windows HPC Server More stats 90
The declining market for Telecommunication Churn Models More stats 90
Getting Inside R More stats 90
The Big Data Summit Agenda More stats 90
Review: Clash of the Titans More stats 89
Red Hat worth 7.8 Billion now More stats 89
Movie Review : Rajneeti (Politics) More stats 89
3 Idiots: Insight to Indian Engineer Campus Life More stats 89
The Comic Water Games (aka Common Wealth Games) More stats 88
Computer Education grants from Google More stats 88
Challenges of Analyzing a dataset (with R) More stats 87
Input Data in R using the top 3 R GUI More stats 86
Complex Event Processing- SASE Language More stats 85
Interview with Anne Milley, SAS II More stats 85
Data Mining Presentation at M2009 by Dr Vincent Granville More stats 85
Brief Interview Timo Elliott More stats 85
Mapping Health Statistics at CDC.gov More stats 85
Amazon’s Turks Mturk.com More stats 84
Business Intelligence and Stat Computing: The White Man’s Last Stand More stats 84
Movie Review- Dabangg More stats 84
Movie Review: Sherlock Holmes More stats 84
SAS Data Mining 2009 Las Vegas More stats 83
Chinese Fortune Cookies More stats 83
SPSS and R More stats 83
Manjunath- A Batchmate on my mine More stats 82
Data Mining 2010:SAS Conference in Vegas More stats 81
DirkE and JD swoon about Shane’s MOM in Room 106 while writing R code More stats 81
SAS to R Challenge: Unique benchmarking More stats 81
S A S GOOD LIFE UNDER SIEGE – NYT More stats 81
Pentaho and R: working together More stats 81
Interview John F Moore CEO The Lab More stats 80
Ways to use both Windows and Linux together More stats 80
Brief Interview with James G Kobielus More stats 80
For R Writers- Inside R More stats 79
Using Ipod and Iphone with your Ubuntu Laptop More stats 79
Webcasts: Oracle Data Mining More stats 79
The Cloud OS is finally here or is it?: Karmic Koala More stats 79
Movie Review: Lafangey Parinday (Rouge Birds) More stats 79
SAS announcement in education initiatives More stats 78
Using R from within Python More stats 78
Event: Predictive analytics with R, PMML and ADAPA More stats 78
Interesting R and BI Web Event More stats 78
Bruno Aziza, Microsoft Global BI Lead joins PAW Keynote More stats 77
Common Analytical Tasks More stats 77
RWui :Creating R Web Interfaces on the go More stats 77
R Successor Language ‘Tea’ announced More stats 76
Learning SPSS for SAS users More stats 76
Protovis a graphical toolkit for visualization More stats 76
Interview Paul van Eikeren Inference for R More stats 75
Data Visualization: Central Banks More stats 75
Oracle Data Mining 11 G R2 More stats 75
Interview Peter J Thomas -Award Winning BI Expert More stats 75
Weak Security in Internet Databases for Statisticians More stats 74
Open Source Cartoon More stats 74
Top Ten Graphs for Business Analytics -Pie Charts (1/10) More stats 74
SAS Sentiment Analysis wins Award More stats 74
JMP Genomics 5 released More stats 74
Short Interview Jill Dyche More stats 73
Interview David Katz ,Dataspora /David Katz Consulting More stats 73
PMML 4.0 More stats 73
Ponder This: IBM Research More stats 72
PAW Videos More stats 71
PASW 13 :The preview More stats 71
Cisco SocialMiner More stats 70
Review-The Dark knight More stats 70
MapReduce Patent Granted More stats 70
Cloud Computing and GPU ( and some stats softwares) More stats 70
IBM Business Analytics Forum More stats 70
And now- The Business Analytics Summit More stats 70
Creating an Anonymous Bot More stats 69
R and SAS in Twitter Land More stats 69
Interview:Richard Schultz , CEO REvolution Computing More stats 69
China -United States -The Third Opium War More stats 68
Quick-R and Statmethods.net More stats 68
R Node- and other Web Interfaces to R More stats 68
Life Mojo – A Health Startup More stats 68
Using Views in R and comparing functions across multiple packages More stats 68
Another R Tutorial More stats 67
Interview Karen Lopez Data Modeling Expert More stats 67
QGIS and R More stats 66
Christmas Carol: The Best Software (BI-Stats-Analytics) More stats 66
Software Lawsuits :Ergo More stats 66
STEM is cool More stats 65
Date Night More stats 65
More Advanced SAS Modeling Procs More stats 65
The Big Data Event- Why am I here? More stats 65
Interview Gary Cokins SAS Institute More stats 65
Browser based Music Creation More stats 64
Interview Steve Sarsfield Author The Data Governance Imperative More stats 63
GrapheR More stats 63
Google Web Intelligence (Beta) More stats 61
Data Mining 2009 Interviews- Terry Whitlock, BlueCross BlueShield of TN More stats 60
Audio Interviews -Dr. Colleen McCue National Security Expert More stats 60
Red R- A new beginning More stats 59
YouTube Features: Audio Swap, Mobile posts and Themes More stats 59
R for Predictive Modeling:Workshop More stats 59
KDD2009: Papers Research and Industrial More stats 58
Chapman/Hall announces new series on R More stats 58
Data Visualization and Politics More stats 58
T Shirts Design More stats 58
Jump to JMP: Using Data Analysis in a visual manner More stats 58
Aster Analytics and MapReduce.org More stats 57
OK Cupid Data Visualization- Flow Chart to your Heart More stats 57
R for SAS and SPSS Users More stats 57
Carbon Footprints in the snow More stats 57
Summer School on Uncertainty Quantification More stats 57
High Performance Computing within R: Tutorial More stats 57
Running Stats Softwares on Clouds More stats 57
Amazing Data Visualization- UN Counter Terrorism More stats 56
Cloud MapReduce More stats 56
Statistical Features in WPS More stats 56
An R Package only for SAS Users More stats 56
R is Ready for Business™ More stats 55
A Google App for Sales- ERPLY More stats 55
Rexer Analytics Annual Data Miner Survey More stats 55
Cartoons on R More stats 55
American Decline- Why outsourcing doesnt make sense More stats 55
Friday Cartoon Series- New More stats 55
What softwares do you plan to use/learn in the next one year? More stats 54
Great App for Online Sketching More stats 54
September Roundup by Revolution More stats 54
Using Firesheep on Campus, Caltrain and beyond More stats 54
Decisionstats Interview at Big Data Summit, AsterData More stats 53
Learning Hadoop More stats 53
The White Man’s Burden-Poem More stats 53
Curt Monash on Analytics with MapReduce More stats 53
To R or Not to R : Data Mining and CRM for Free More stats 52
Algorithms and Ads: No Free Lunches and Hill Climbing More stats 52
Interview: Roger Haddad, Founder of KXEN Automated Modeling Software More stats 52
Google and Me on Privacy and Openness More stats 52
MapReduce.org More stats 52
Why do bloggers blog ? More stats 52
Live Streaming for Free : UStream More stats 51
Light Cycle of Tron review More stats 51
Lyx Releases 2 More stats 51
Interview – Anne Milley, SAS Part 1 More stats 51
SAS News More stats 51
KXEN EMEA User Conference 2010-Success in Business Analytics More stats 51
2011 Forecast-ying More stats 51
Kill Analytics More stats 50
Social Media Analysis Toolkit More stats 50
Multi State Models More stats 50
R and Cloud Computing More stats 50
Dataists shake up R community with a rocking contest More stats 50
Interview Anne Milley JMP More stats 49
Movie Review: Between the Folds More stats 49
Jokes in Economics More stats 49
Interview Ajay Ohri Decisionstats.com with DMR More stats 49
One more Y Tube Video More stats 49
Happy Diwali /Google Music More stats 48
SPSS Directions : Rexer Survey Results More stats 48
Redlining in Internet Access and notes on Regression Models More stats 48
Poem : A Poets Life More stats 48
Predictive Analytics World More stats 48
Interview- Phil Rack More stats 48
Building KXEN Models on Ubuntu More stats 48
New Year Resolution Presentation More stats 48
Adobe gulps Omniture More stats 47
SAS Modeling Procs More stats 47
Oracle Open World/ RODM package More stats 47
KDNuggets Survey on R More stats 47
IBM and Revolution team to create new in-database R More stats 47
SAS Institute invests in R project More stats 46
Not just a Cloud More stats 46
New Version of R released: R 2.10.1 More stats 46
Review- Iron Man2 More stats 46
Online Analytics: Monte Carlo Simulation More stats 45
Predictive Forecasting in Commercial Applications More stats 45
The Race -by D.H Groberg More stats 45
SAS Scoring Accelerators More stats 45
IBM launches Smart Analytics Cloud More stats 45
Reactions to IBM -SPSS takeover. More stats 45
Zementis partners with R Analytics Vendor- Revo More stats 44
A Missing Mandelbrot Who Dun It More stats 44
Downloading your Facebook Photos More stats 44
Android Tutorial More stats 44
The Mommy Track More stats 44
My First You Tube Video: Courtesy the competiton on VOLNIGHT by Univ of Tennessee More stats 44
Born in the USA? More stats 43
Interview Eric A. King President The Modeling Agency More stats 43
Interview Augusto Albeghi (Straycat) —Founder Straysoft More stats 43
Why Cloud? More stats 43
Innovative ways of Calculus: Gifting a comic set for Christmas More stats 43
To find the best chaat or paan shop More stats 43
Google unleashes Fusion Tables More stats 42
Using SAS and C/C++ together More stats 42
Whats new in the latest version of R More stats 42
Bollywood 101 More stats 42
Who will forecast for the forecasters? More stats 42
Learning R Easily :Two GUI’s More stats 41
Harvard DropOut Writes Open Letter- His Startup has 350m users More stats 41
BI Software More stats 41
How to read blogs in Indonesian and Chinese! More stats 41
Window to a Blue Cloud: Azure Pricing More stats 41
China bans Chinese Food for Googleplex More stats 41
SAS Program for Students More stats 41
The Year 2010 More stats 40
What do you want to know in data analytics? More stats 40
America’s Data Book: Census Abstract 2011 More stats 40
Big Data Management and Advanced Analytics More stats 40
AsterData partners with Tableau More stats 40
Using R from other Software More stats 40
SAS on Fraud More stats 40

Google unleashes Fusion Tables

I just discovered Fusion Tables. There is life beyond the amazing Jeff’s Amazon Ec2/s3 after all!

Check out http://www.google.com/fusiontables/public/tour/index.html

Gather, visualize and share data online

Don’t have a Google Account?
Create one now

  • Visualize and publish your data as maps, timelines and charts
  • Host your data tables online
  • Combine data from multiple people

data table turns into map

Google Fusion Tables is a modern data management and publishing web application that makes it easy
to host, manage, collaborate on, visualize, and publish data tables online.

What can I do with Google Fusion Tables?

Import your own data
Upload data tables from spreadsheets or CSV files, even KML. Developers can use the Fusion Tables API to insert, update, delete and query data programmatically. You can export your data as CSV or KML too.

Visualize it instantly
See the data on a map or as a chart immediately. Use filters for more selective visualizations.

Publish your visualization on other web properties
Now that you’ve got that nice map or chart of your data, you can embed it in a web page or blog post. Or send a link by email or IM. It will always display the latest data values from your table and helps you communicate your story more easily.

Look at the Fusion Tables Example Gallery

at https://sites.google.com/site/fusiontablestalks/stories

If you are worried about data.gov closing down, heres a snapshot of Fusion Table Public datasets.


 

Save the Data

Breakdown of political party representation in...
Image via Wikipedia

I just read an online cause here-

http://sunlightfoundation.com/savethedata/

Some of the most important technology programs that keep Washington accountable are in danger of being eliminated. Data.gov, USASpending.gov, the IT Dashboard and other federal data transparency and government accountability programs are facing a massive budget cut, despite only being a tiny fraction of the national budget. Help save the data and make sure that Congress doesn’t leave the American people in the dark.

I wonder why the federal government/ non profit agencies can help create a SPARQL database, and in days of cloud computing, why a tech major cannot donate storage space to it, after all despite US corporate tax rate being high, US technological companies do end up paying a lower rate thanks to tax breaks/routing overseas revenue.

In the new age data is power, and the US has led in its mission to use technology to further its own values even especially in Middle East. The datasets should be made public and transitioned to the private sector/academia for research and re designing for data augmentation with out straining the massive deficit /borrowing/ fighting 3 wars. Of particular interest would be datasets of campaign finances  and donors especially given large number of retail/small donors/internet marketing in elections as it will also help serve as an example of democracy and change. Even countries like China can create a corruption/expense efficiency tracking internal dashboard with restricted rights to help with rural and urban governance.

Zementis partners with R Analytics Vendor- Revo

Logo for R
Image via Wikipedia

Just got a  PR email from Michael Zeller,CEO , Zementis annoucing Zementis (ADAPA) and Revolution  Analytics just partnered up.

Is this something substantial or just time-sharing http://bi.cbronline.com/news/sas-ceo-says-cep-open-source-and-cloud-bi-have-limited-appeal or a Barney Partnership (http://www.dbms2.com/2008/05/08/database-blades-are-not-what-they-used-to-be/)

Summary- Thats cloud computing scoring of models on EC2 (Zementis) partnering with the actual modeling software in R (Revolution Analytics RevoDeployR)

See previous interviews with both Dr Zeller at https://decisionstats.com/2009/02/03/interview-michael-zeller-ceozementis/ ,https://decisionstats.com/2009/05/07/interview-ron-ramos-zementis/ and https://decisionstats.com/2009/10/05/interview-michael-zellerceo-zementis-on-pmml/)

and Revolution guys at https://decisionstats.com/2010/08/03/q-a-with-david-smith-revolution-analytics/

and https://decisionstats.com/2009/05/29/interview-david-smith-revolution-computing/

strategic partnership with Revolution Analytics, the leading commercial provider of software and support for the popular open source R statistics language. With this partnership, predictive models developed on Revolution R Enterprise are now accessible for real-time scoring through the ADAPA Decisioning Engine by Zementis. 

ADAPA is an extremely fast and scalable predictive platform. Models deployed in ADAPA are automatically available for execution in real-time and batch-mode as Web Services. ADAPA allows Revolution R Enterprise to leverage the Predictive Model Markup Language (PMML) for better decision management. With PMML, models built in R can be used in a wide variety of real-world scenarios without requiring laborious or expensive proprietary processes to convert them into applications capable of running on an execution system.

partnership

“By partnering with Zementis, Revolution Analytics is building an end-to-end solution for moving enterprise-level predictive R models into the execution environment,” said Jeff Erhardt, Revolution Analytics Chief Operation Officer. “With Zementis, we are eliminating the need to take R applications apart and recode, retest and redeploy them in order to obtain desirable results.”

 

Got demo? 

Yes, we do! Revolution Analytics and Zementis have put together a demo which combines the building of models in R with automatic deployment and execution in ADAPA. It uses Revolution Analytics’ RevoDeployR, a new Web Services framework that allows for data analysts working in R to publish R scripts to a server-based installation of Revolution R Enterprise.

Action Items:

  1. Try our INTERACTIVE DEMO
  2. DOWNLOAD the white paper
  3. Try the ADAPA FREE TRIAL

RevoDeployR & ADAPA allow for real-time analysis and predictions from R to be effectively used by existing Excel spreadsheets, BI dashboards and Web-based applications, all in real-time.

RevoADAPAPredictive analytics with RevoDeployR from Revolution Analytics and ADAPA from Zementis put model building and real-time scoring into a league of their own. Seriously!

PAW Blog Partnership

Please use the following code  to get a 15% discount on the 2 Day Conference Pass: AJAY11.

 

 

 

 

Predictive Analytics World announces new full-day workshops coming to San Francisco March 13-19, amounting to seven consecutive days of content.

These workshops deliver top-notch analytical and business expertise across the hottest topics.

Register now for one or more workshops, offered just before and after the full two-day Predictive Analytics World conference program (March 14-15). Early Bird registration ends on January 31st – take advantage of reduced pricing before then.

Driving Enterprise Decisions with Business Analytics – March 13, 2011
James Taylor, CEO, Decision Management Solutions
NEW – R for Predictive Modeling: A Hands-On Introduction – March 13, 2011
Max Kuhn, Director, Nonclinical Statistics, Pfizer
The Best and Worst of Predictive Analytics: Predictive Modeling Methods and Common Data Mining Mistakes – March 16, 2011
John Elder, Ph.D., CEO and Founder, Elder Research, Inc.
Hands-On Predictive Analytics – March 17, 2011
Dean Abbott, President, Abbott Analytics
NEW – Net Lift Models: Optimizing the Impact of Your Marketing – March 18-19, 2011
Kim Larsen, VP of Analytical Insights, Market Share Partners

Download the Conference Preview or view the Predictive Analytics World Agenda online

Make savings now with the early bird rate. Receive $200 off your registration rate for Predictive Analytics World – San Francisco (March 14-15), plus $100 off each workshop for which you register.

Register now before Early Bird Price expires on January 31st!

Additional savings of $200 on the two-day conference pass when you register a colleague at the same time.

 

Choosing R for business – What to consider?

A composite of the GNU logo and the OSI logo, ...
Image via Wikipedia

Additional features in R over other analytical packages-

1) Source Code is given to ensure complete custom solution and embedding for a particular application. Open source code has an advantage that is extensively peer- reviewed in Journals and Scientific Literature.  This means bugs will found, shared and corrected transparently.

2) Wide literature of training material in the form of books is available for the R analytical platform.

3) Extensively the best data visualization tools in analytical software (apart from Tableau Software ‘s latest version). The extensive data visualization available in R is of the form a variety of customizable graphs, as well as animation. The principal reason third-party software initially started creating interfaces to R is because the graphical library of packages in R is more advanced as well as rapidly getting more features by the day.

4) Free in upfront license cost for academics and thus budget friendly for small and large analytical teams.

5) Flexible programming for your data environment. This includes having packages that ensure compatibility with Java, Python and C++.

 

6) Easy migration from other analytical platforms to R Platform. It is relatively easy for a non R platform user to migrate to R platform and there is no danger of vendor lock-in due to the GPL nature of source code and open community.

Statistics are numbers that tell (descriptive), advise ( prescriptive) or forecast (predictive). Analytics is a decision-making help tool. Analytics on which no decision is to be made or is being considered can be classified as purely statistical and non analytical. Thus ease of making a correct decision separates a good analytical platform from a not so good analytical platform. The distinction is likely to be disputed by people of either background- and business analysis requires more emphasis on how practical or actionable the results are and less emphasis on the statistical metrics in a particular data analysis task. I believe one clear reason between business analytics is different from statistical analysis is the cost of perfect information (data costs in real world) and the opportunity cost of delayed and distorted decision-making.

Specific to the following domains R has the following costs and benefits

  • Business Analytics
    • R is free per license and for download
    • It is one of the few analytical platforms that work on Mac OS
    • It’s results are credibly established in both journals like Journal of Statistical Software and in the work at LinkedIn, Google and Facebook’s analytical teams.
    • It has open source code for customization as per GPL
    • It also has a flexible option for commercial vendors like Revolution Analytics (who support 64 bit windows) as well as bigger datasets
    • It has interfaces from almost all other analytical software including SAS,SPSS, JMP, Oracle Data Mining, Rapid Miner. Existing license holders can thus invoke and use R from within these software
    • Huge library of packages for regression, time series, finance and modeling
    • High quality data visualization packages
    • Data Mining
      • R as a computing platform is better suited to the needs of data mining as it has a vast array of packages covering standard regression, decision trees, association rules, cluster analysis, machine learning, neural networks as well as exotic specialized algorithms like those based on chaos models.
      • Flexibility in tweaking a standard algorithm by seeing the source code
      • The RATTLE GUI remains the standard GUI for Data Miners using R. It was created and developed in Australia.
      • Business Dashboards and Reporting
      • Business Dashboards and Reporting are an essential piece of Business Intelligence and Decision making systems in organizations. R offers data visualization through GGPLOT, and GUI like Deducer and Red-R can help even non R users create a metrics dashboard
        • For online Dashboards- R has packages like RWeb, RServe and R Apache- which in combination with data visualization packages offer powerful dashboard capabilities.
        • R can be combined with MS Excel using the R Excel package – to enable R capabilities to be imported within Excel. Thus a MS Excel user with no knowledge of R can use the GUI within the R Excel plug-in to use powerful graphical and statistical capabilities.

Additional factors to consider in your R installation-

There are some more choices awaiting you now-
1) Licensing Choices-Academic Version or Free Version or Enterprise Version of R

2) Operating System Choices-Which Operating System to choose from? Unix, Windows or Mac OS.

3) Operating system sub choice- 32- bit or 64 bit.

4) Hardware choices-Cost -benefit trade-offs for additional hardware for R. Choices between local ,cluster and cloud computing.

5) Interface choices-Command Line versus GUI? Which GUI to choose as the default start-up option?

6) Software component choice- Which packages to install? There are almost 3000 packages, some of them are complimentary, some are dependent on each other, and almost all are free.

7) Additional Software choices- Which additional software do you need to achieve maximum accuracy, robustness and speed of computing- and how to use existing legacy software and hardware for best complementary results with R.

1) Licensing Choices-
You can choose between two kinds of R installations – one is free and open source from http://r-project.org The other R installation is commercial and is offered by many vendors including Revolution Analytics. However there are other commercial vendors too.

Commercial Vendors of R Language Products-
1) Revolution Analytics http://www.revolutionanalytics.com/
2) XL Solutions- http://www.experience-rplus.com/
3) Information Builder – Webfocus RStat -Rattle GUI http://www.informationbuilders.com/products/webfocus/PredictiveModeling.html
4) Blue Reference- Inference for R http://inferenceforr.com/default.aspx

  1. Choosing Operating System
      1. Windows

 

Windows remains the most widely used operating system on this planet. If you are experienced in Windows based computing and are active on analytical projects- it would not make sense for you to move to other operating systems. This is also based on the fact that compatibility problems are minimum for Microsoft Windows and the help is extensively documented. However there may be some R packages that would not function well under Windows- if that happens a multiple operating system is your next option.

        1. Enterprise R from Revolution Analytics- Enterprise R from Revolution Analytics has a complete R Development environment for Windows including the use of code snippets to make programming faster. Revolution is also expected to make a GUI available by 2011. Revolution Analytics claims several enhancements for it’s version of R including the use of optimized libraries for faster performance.
      1. MacOS

 

Reasons for choosing MacOS remains its considerable appeal in aesthetically designed software- but MacOS is not a standard Operating system for enterprise systems as well as statistical computing. However open source R claims to be quite optimized and it can be used for existing Mac users. However there seem to be no commercially available versions of R available as of now for this operating system.

      1. Linux

 

        1. Ubuntu
        2. Red Hat Enterprise Linux
        3. Other versions of Linux

 

Linux is considered a preferred operating system by R users due to it having the same open source credentials-much better fit for all R packages and it’s customizability for big data analytics.

Ubuntu Linux is recommended for people making the transition to Linux for the first time. Ubuntu Linux had an marketing agreement with revolution Analytics for an earlier version of Ubuntu- and many R packages can  installed in a straightforward way as Ubuntu/Debian packages are available. Red Hat Enterprise Linux is officially supported by Revolution Analytics for it’s enterprise module. Other versions of Linux popular are Open SUSE.

      1. Multiple operating systems-
        1. Virtualization vs Dual Boot-

 

You can also choose between having a VMware VM Player for a virtual partition on your computers that is dedicated to R based computing or having operating system choice at the startup or booting of your computer. A software program called wubi helps with the dual installation of Linux and Windows.

  1. 64 bit vs 32 bit – Given a choice between 32 bit versus 64 bit versions of the same operating system like Linux Ubuntu, the 64 bit version would speed up processing by an approximate factor of 2. However you need to check whether your current hardware can support 64 bit operating systems and if so- you may want to ask your Information Technology manager to upgrade atleast some operating systems in your analytics work environment to 64 bit operating systems.

 

  1. Hardware choices- At the time of writing this book, the dominant computing paradigm is workstation computing followed by server-client computing. However with the introduction of cloud computing, netbooks, tablet PCs, hardware choices are much more flexible in 2011 than just a couple of years back.

Hardware costs are a significant cost to an analytics environment and are also  remarkably depreciated over a short period of time. You may thus examine your legacy hardware, and your future analytical computing needs- and accordingly decide between the various hardware options available for R.
Unlike other analytical software which can charge by number of processors, or server pricing being higher than workstation pricing and grid computing pricing extremely high if available- R is well suited for all kinds of hardware environment with flexible costs. Given the fact that R is memory intensive (it limits the size of data analyzed to the RAM size of the machine unless special formats and /or chunking is used)- it depends on size of datasets used and number of concurrent users analyzing the dataset. Thus the defining issue is not R but size of the data being analyzed.

    1. Local Computing- This is meant to denote when the software is installed locally. For big data the data to be analyzed would be stored in the form of databases.
      1. Server version- Revolution Analytics has differential pricing for server -client versions but for the open source version it is free and the same for Server or Workstation versions.
      2. Workstation
    2. Cloud Computing- Cloud computing is defined as the delivery of data, processing, systems via remote computers. It is similar to server-client computing but the remote server (also called cloud) has flexible computing in terms of number of processors, memory, and data storage. Cloud computing in the form of public cloud enables people to do analytical tasks on massive datasets without investing in permanent hardware or software as most public clouds are priced on pay per usage. The biggest cloud computing provider is Amazon and many other vendors provide services on top of it. Google is also coming for data storage in the form of clouds (Google Storage), as well as using machine learning in the form of API (Google Prediction API)
      1. Amazon
      2. Google
      3. Cluster-Grid Computing/Parallel processing- In order to build a cluster, you would need the RMpi and the SNOW packages, among other packages that help with parallel processing.
    3. How much resources
      1. RAM-Hard Disk-Processors- for workstation computing
      2. Instances or API calls for cloud computing
  1. Interface Choices
    1. Command Line
    2. GUI
    3. Web Interfaces
  2. Software Component Choices
    1. R dependencies
    2. Packages to install
    3. Recommended Packages
  3. Additional software choices
    1. Additional legacy software
    2. Optimizing your R based computing
    3. Code Editors
      1. Code Analyzers
      2. Libraries to speed up R

citation-  R Development Core Team (2010). R: A language and environment for statistical computing. R Foundation for Statistical Computing,Vienna, Austria. ISBN 3-900051-07-0, URL http://www.R-project.org.

(Note- this is a draft in progress)