Home » Posts tagged 'hadley wickham' (Page 2)
Tag Archives: hadley wickham
The long tail of the internet
On a whim, I took the all time stats of my blog posts (more than 1000 posts) , and tried to plot their distribution.
Basically I copied and pasted all the data in a Google docs spreadsheet. and I created dummy codes (like URL1, URL2…. URL 500)
Next I downloaded the….
I wasnt in the mood for downloading and uploading stuff so I decided to use GGPLOT using Jeroen’s Application at http://www.stat.ucla.edu/~jeroen/
I used the mirror server that Dataspora provides as I have had latency issues with Jeroen’s website.
I got this error while trying to connect the Dataspora App to my Google spreadsheet
The page you have requested cannot be displayed. Another site was requesting access to your Google Account, but sent a malformed request. Please contact the site that you were trying to use when you received this message to inform them of the error. A detailed error message follows:
The site “http://dataspora.com” has not been registered.
Oh dear! Back to Jeroen’s /UCLA’s page.
http://rweb.stat.ucla.edu/ggplot2/
I get this warning but it still manages to log in
This website has not registered with Google to establish a secure connection for authorization requests. We recommend that you continue the process only if you trust the following destination:
http://rweb.stat.ucla.edu/R/googleLogin?domain=rweb.stat.ucla.edu
wow it works! thats cloud computing now so I wonder why Google and Amazon continue to ignore the rApache, and Jeroen’s cloud app . Surely their Google Fusion Tables can be always improved or tweaked. Not to mention the next gen version of R which will have its own server
Pretty cool screenshot (but click to see more)
I get the following pretty graph. Hadley Wickham would be ashamed of me by now.
What went wrong- well one page has 36000 views . Scale is the key to graphical coherence . So I redo- delete home page in Google spreadsheet ,reimport replot. ( I didnt know how to modify data in the cloud app, maybe we need a cloud PlyR) I redo it again as I have a big outlier-The top 10 Statistical GUI article which ironically has only 5 GUIs in that article but hush dont tell to high quality search engine)
So again Belatedly I discover something called layer in ggplot.
Base Graphics engine has really spoilt me to write short functions for plots. 
I give up. I rather prefer hist() I go to my favorite GUI Rattle, but it has some dating issues with the dll of GTK+
So I go to John Fox’s simple GUI. R Commander- is the best GUI if you use Occam’s Razor, and I am using Occam’s Chainsaw now.
I get the analysis I want in 12 secs
Summary- GGPLot is more complicated than base graphics engine.
Deducer GUI is not as simple too
R Commander is the best GUI because it retains simplicity
Ignore long tail of internet only at your peril
Almost 2/3 rds of my daily traffic of 400+ comes from old archived content That is why Search Engine Optimization and Alerts for Keywords are CRITICAL for any poor soul trying to write on a blog (which has no journal like prestige nor rewards)
If you make life easier for the search engine, it being a fair chap, rewards you well
Existing web traffic estimates like Comscore and Google Trends ignore this long tail
Comments are welcome (Data is pasted below of 500 rows X 2 columns if you can come up with a better analysis)
Since SAS has ignored web analytics and Google Analytics is hmm hmm, this could be an area of opportunity for R developers as well to create a web analytics package.
Related Articles
- Cloud Computing May Decrease Your API Call Limit (programmableweb.com)
- Book: ggplot2 by Hadley Wickham (r-bloggers.com)
- Google Instant Search: What does this mean for advertisers? (wpromote.com)
- 2 Fun and Useful Goog,e Spreadsheet Tricks (searchenginejournal.com)
- R Graphs Resources (decisionstats.com)
- The Importance of the Long Tail with Keywords and Phrases (businessbloggingtips.com)
- As Google Retools its Search Engine, Content Farms Lose Traction (xconomy.com)
Interviews with R Community
Authors
Interview Luis Torgo Author Data Mining with R
http://decisionstats.com/2011/01/12/interview-luis-torgo-author-data-mining-with-r/
John Fox, R Commander
http://decisionstats.com/2009/09/14/interview-professor-john-fox-creator-r-commander/
Interview Dr Graham Williams RATTLE GUI
http://decisionstats.com/2009/01/13/interview-dr-graham-williams/
Hadley Wickham
http://decisionstats.com/2010/01/12/interview-hadley-wickham-r-project-data-visualization-guru/
R for SAS and SPSS Users

http://decisionstats.com/2009/01/21/r-for-sas-and-spss-users-2/
R for Stata Users

http://decisionstats.com/2010/06/29/interview-r-for-stata-users/
R Consulting
Interview David Katz ,Dataspora /David Katz Consulting
http://decisionstats.com/2011/02/11/interview-david-katz-dataspora-david-katz-consulting/
Case Study
(http://www.predictiveanalyticsworld.com/sanfrancisco/2011/agenda.php#day2-16a)
Room: Salon 5 & 6
4:45pm – 5:05pm
Track 2: Social Data and Telecom 
Case Study: Major North American Telecom
Social Networking Data for Churn Analysis
A North American Telecom found that it had a window into social contacts – who has been calling whom on its network. This data proved to be predictive of churn. Using SQL, and GAM in R, we explored how to use this data to improve the identification of likely churners. We will present many dimensions of the lessons learned on this engagement.
Speaker: David Katz, Senior Analyst, Dataspora, and President, David Katz Consulting
Q&A with David Smith, Revolution Analytics
http://decisionstats.com/2010/08/03/q-a-with-david-smith-revolution-analytics/
Inference for R
http://decisionstats.com/2009/06/04/inference-for-r/
David Smith Revolution Computing
http://decisionstats.com/2009/05/29/interview-david-smith-revolution-computing/
Richard Schultz Revolution Computing
http://decisionstats.com/2009/01/31/interviewrichard-schultz-ceo-revolution-computing/
Karime Chine, Elastic R
http://decisionstats.com/2009/06/21/interview-karim-chine-biocep-cloud-computing-with-r/
Related Articles
- Revolution Analytics CTO on Data Science (revolutionanalytics.com)
- Revolution in the News (revolutionanalytics.com)
- 7 Data Blogs To Explore (readwriteweb.com)
- Finally! A practical R book on Data Mining: “Data Mining With R, Learning with Case Studies,” by Luis Torgo (r-bloggers.com)
Chapman/Hall announces new series on R
R Authors get more choice and variety now- http://www.mail-archive.com/r-help@r-project.org/msg122965.html We are pleased to announce the launch of a new series of books on R. Chapman & Hall/CRC: The R Series Aims and Scope This book series reflects the recent rapid growth in the development and application of R, the programming language and software environment for statistical computing and graphics. R is now widely used in academic research, education, and industry. It is constantly growing, with new versions of the core software released regularly and more than 2,600 packages available. It is difficult for the documentation to keep pace with the expansion of the software, and this vital book series provides a forum for the publication of books covering many aspects of the development and application of R. The scope of the series is wide, covering three main threads: • Applications of R to specific disciplines such as biology, epidemiology, genetics, engineering, finance, and the social sciences. • Using R for the study of topics of statistical methodology, such as linear and mixed modeling, time series, Bayesian methods, and missing data. • The development of R, including programming, building packages, and graphics. The books will appeal to programmers and developers of R software, as well as applied statisticians and data analysts in many fields. The books will feature detailed worked examples and R code fully integrated into the text, ensuring their usefulness to researchers, practitioners and students. Series Editors John M. Chambers (Department of Statistics, Stanford University, USA; j...@stat.stanford.edu) Torsten Hothorn (Institut für Statistik, Ludwig-Maximilians-Universität, München, Germany; torsten.hoth...@stat.uni-muenchen.de) Duncan Temple Lang (Department of Statistics, University of California, Davis, USA; dun...@wald.ucdavis.edu) Hadley Wickham (Department of Statistics, Rice University, Houston, Texas, USA; had...@rice.edu) Call for Proposals We are interested in books covering all aspects of the development and application of R software. If you have an idea for a book, please contact one of the series editors above or one of the Chapman & Hall/CRC statistics acquisitions editors below. Please provide brief details of topic, audience, aims and scope, and include an outline if possible. We look forward to hearing from you. Best regards,Rob Calver (rob.cal...@informa.com) David Grubbs (david.gru...@taylorandfrancis.com) John Kimmel (john.kim...@taylorandfrancis.com)
Related Articles
- Call for proposals for writing a book about R (via Chapman & Hall/CRC) (r-statistics.com)
R Journal Dec 2010 and R for Business Analytics
I almost missed out on the R Journal for this month- great reading,
and I liked Dr Hadley’s article on stringr package the best. Really really useful package and nice writing too
http://journal.r-project.org/archive/2010-2/RJournal_2010-2_Wickham.pdf
(incidentally I just downloaded a local copy of his ggplot website at http://had.co.nz/ggplot2/ggplot-static.zip
I aim to really read that one up
Okay, announcement time
I just signed a contract with Springer for a book on R, some what in first half of 2011
” R for Business Analytics“
its going to be a more business analytics than a stats perspective ( I am a MBA /Mech Engineer)
and use cases would be business analytics cases. Do write to me if you need help doing some analytics in R (business use cases)- or want something featured. Big focus would be on GUI and easier analytics, using the Einsteinian principle to make things as simple as possible but no simpler)
Related Articles
- Analysis of Facebook status updates (revolutionanalytics.com)
- Winners of 2010 ggplot2 case study competition (revolutionanalytics.com)
- Springer launchers new service tool for usage and trends (teleread.com)
- High Impact Analytics Introduces First Real-Time, Software Business Analytics For Small- to Mid-Size Walmart and Sam’s Club Suppliers (prweb.com)
- Top 5 Methods to Choose a Thesis Topic (psipsychologytutor.org)
- Modern football journalism… (scissorskick.wordpress.com)
The Year 2010
My annual traffic to this blog was almost 99,000 . Add in additional views on networking sites plus the 400 plus RSS readers- so I can say traffic was 1,20,000 for 2010. Nice. Thanks for reading and hope it was worth your time. (this is a long post and will take almost 440 secs to read but the summary is just given)
My intent is either to inform you, give something useful or atleast something interesting.
see below-
| Jan | Feb | Mar | Apr | May | Jun | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2010 | 6,311 | 4,701 | 4,922 | 5,463 | 6,493 | 4,271 |
| Jul | Aug | Sep | Oct | Nov | Dec | Total |
|---|
| 5,041 | 5,403 | 17,913 | 16,430 | 11,723 | 10,096 | 98,767 |
Sandro Saita from http://www.dataminingblog.com/ just named me for an award on his blog (but my surname is ohRi , Sandro left me without an R- What would I be without R
) ).
Aw! I am touched. Google for “Data Mining Blog” and Sandro is the best that it is in data mining writing.
“
DMR People Award 2010
There are a lot of active people in the field of data mining. You can discuss with them on forums. You can read their blogs. You can also meet them in events such as PAW or KDD. Among the people I follow on a regular basis, I have elected:Ajay Ori
He has been very active in 2010, especially on his blog . Good work Ajay and continue sharing your experience with us!”
What did I write in 2010- stuff.
What did you read on this blog- well thats the top posts list.
2009-12-31 to Today
So how do people come here -
well I guess I owe Tal G for almost 9000 views ( incidentally I withdrew posting my blog from R- Bloggers and Analyticbridge blogs – due to SEO keyword reasons and some spam I was getting see (below))
http://r-bloggers.com is still the CAT’s whiskers and I read it a lot.
I still dont know who linked my blog to a free sex movie site with 400 views but I have a few suspects.
2009-12-31 to Today
| Referrer | Views |
|---|---|
r-bloggers.com |
9,131 |
| 3,829 | |
rattle.togaware.com |
1,500 |
| 1,254 | |
| 1,215 | |
linkedin.com |
717 |
freesexmovie.irwanaf.com |
422 |
analyticbridge.com |
341 |
| 327 | |
coolavenues.com |
322 |
| 317 | |
kdnuggets.com |
298 |
dataminingblog.com |
278 |
| 185 | |
google.co.in |
151 |
| 130 | |
inside-r.org |
124 |
decisionstats.com |
119 |
ifreestores.com |
117 |
bits.blogs.nytimes.com |
108 |
-
Still reading this post- gosh let me sell you some advertising. It is only $100 a month (yes its a recession)
Advertisers are treated on First in -Last out (FILO)
I have been told I am obsessed with SEO , but I dont care much for search engines apart from Google, and yes SEO is an interesting science (they should really re name it GEO or Google Engine Optimization)
Apparently Hadley Wickham and Donald Farmer are big keywords for me so I should be more respectful I guess.
Search Terms for 365 days ending 2010-12-31 (Summarized)
2009-12-31 to Today
| Search | Views |
|---|---|
| libre office | 925 |
| facebook analytics | 798 |
| test drive a chrome notebook | 467 |
| test drive a chrome notebook. | 215 |
| r gui | 203 |
| data mining | 163 |
| wps sas lawsuit | 158 |
| wordle.net | 133 |
| wps sas | 123 |
| google maps jet ski | 123 |
| test drive chrome notebook | 96 |
| sas wps | 89 |
| sas wps lawsuit | 85 |
| chrome notebook test drive | 83 |
| decision stats | 83 |
| best statistics software | 74 |
| hadley wickham | 72 |
| google maps jetski | 72 |
| libreoffice | 70 |
| doug savage | 65 |
| hive tutorial | 58 |
| funny india | 56 |
| spss certification | 52 |
| donald farmer microsoft | 51 |
| best statistical software | 49 |
What about outgoing links? Apparently I need to find a way to ask Google to pay me for the free advertising I gave their chrome notebook launch. But since their search engine and browser is free to me, guess we are even steven.
Clicks for 365 days ending 2010-12-31 (Summarized)
2009-12-31 to Today
so in 2010,
SAS remained top daddy in business analytics,
R made revolutionary strides in terms of new packages,
JMP launched a new version,
SPSS got integrated with Cognos,
Oracle sued Google and did build a great Data Mining GUI,
Libre Office gave you a non Oracle Open office ( or open even more office)
2011 looks like a fun year. Have safe partying .
Related Articles
- IBM SPSS 19 Now Available to the Global Academic Community via e-academy’s OnTheHub eStore (prweb.com)
- ACM Data Mining Camp 3 (revolutionanalytics.com)
- Accessing R from Python using RPy2 (r-bloggers.com)
- Mining of Massive Data Sets (kinlane.com)
- 5 FeedBurner Alternatives You Should Know About (techie-buzz.com)
- Uncertainty, Risk, Statistics and Data Mining (zyxo.wordpress.com)
- ‘Data Mining’ Gains Traction in Education (edreformer.com)
- If you cut your RSS short I will ignore your post (chrisabraham.com)
- Solar trends for 2011 (cleanbreak.ca)
Who searches for this Blog?
Using WP- Stats I set about answering this question-
What search keywords lead here-
Clearly Michael Jackson is down this year
And R GUI, Data Mining is up.
How does that affect my writing- given I get almost 250 visitors by search engines alone daily- assume I write nothing on this blog from now on.
It doesnt- I still write what ever code or poem that comes to my mind. So it is hurtful people misunderstimate the effort in writing and jump to conclusions (esp if I write about a company- I am not on payroll of that company- just like if I write about a poem- I am not a full time poet)
Over to xkcd
All Time (for Decisionstats.Wordpress.com)
| Search | Views |
|---|---|
| libre office | 818 |
| facebook analytics | 806 |
| michael jackson history | 240 |
| wps sas lawsuit | 180 |
| r gui | 168 |
| wps sas | 154 |
| wordle.net | 118 |
| sas wps | 116 |
| decision stats | 110 |
| sas wps lawsuit | 100 |
| google maps jet ski | 94 |
| data mining | 88 |
| doug savage | 72 |
| hive tutorial | 63 |
| spss certification | 63 |
| hadley wickham | 63 |
| google maps jetski | 62 |
| sas sues wps | 60 |
| decisionstats | 58 |
| donald farmer microsoft | 45 |
| libreoffice | 44 |
| wps statistics | 44 |
| best statistics software | 42 |
| r gui ubuntu | 41 |
| rstat | 37 |
| tamilnadu advanced technical training institute tatti | 37 |
YTD
2009-11-24 to Today
| Search | Views |
|---|---|
| libre office | 818 |
| facebook analytics | 781 |
| wps sas lawsuit | 170 |
| r gui | 164 |
| wps sas | 125 |
| wordle.net | 118 |
| sas wps | 101 |
| sas wps lawsuit | 95 |
| google maps jet ski | 94 |
| data mining | 86 |
| decision stats | 82 |
| doug savage | 63 |
| hadley wickham | 63 |
| google maps jetski | 62 |
| hive tutorial | 56 |
| donald farmer microsoft | 45 |
Related Articles
- Ways of Optimizing Blog in Search Engines (cash-bandit.com)
- SearchCap: The Day In Search, November 23, 2010 (searchengineland.com)
- Do You Want Increased Income? Chango Could Be The Answer (wassupblog.com)
- Domain.com Announces Industry’s First Natively Integrated Browser Domain Search (prweb.com)
- Why Keyword Research Matters and Link Building Doesn’t (danlew.com)
- Six rules for producing optimised web content (econsultancy.com)
- Consumer Watch Dog Group Files Complaint with the FTC Regarding Data Mining, Profiling Algorithms – Privacy With Health Information At Risk With Insurer and Employer Usage (ducknetweb.blogspot.com)
- Find the Question to your Yahoo Answers! (seomoz.org)
Top R Interviews
Here is a list of the Top R Related Interviews I have done (in random order)-
1) John Fox , Creator of R Commander
http://decisionstats.com/2009/09/14/interview-professor-john-fox-creator-r-commander/
2) Dr Graham Williams, Creator of Rattle
http://decisionstats.com/2009/01/13/interview-dr-graham-williams/
3) David Smith, back when he was community Director of then Revolution Computing.
http://decisionstats.com/2009/05/29/interview-david-smith-revolution-computing/
and his second interview
http://decisionstats.com/2010/08/03/q-a-with-david-smith-revolution-analytics/
4) Robert Schultz, the first CEO of Revolution Computing (now Analytics)
http://decisionstats.com/2009/01/31/interviewrichard-schultz-ceo-revolution-computing/
5) Bob Muenchen, author of R for SAS and SPSS users AND R for Stata users
http://decisionstats.com/2010/06/29/interview-r-for-stata-users/
http://decisionstats.com/2008/10/16/r-for-sas-and-spss-users/
6) Karim Chine, creator Biocep, Cloud Computing for R
http://decisionstats.com/2009/06/21/interview-karim-chine-biocep-cloud-computing-with-r/
7) Paul van Eikeran, Inference for R,the first enterprise package to use R from within MS Office.
http://decisionstats.com/2009/06/04/inference-for-r/
8) Hadley Wickham, creator GGPlot and R Author
http://decisionstats.com/2010/01/12/interview-hadley-wickham-r-project-data-visualization-guru/
Thats a lot of R interviews- I need to balance them out a bit I guess.
Related Articles
- R is Hot (revolutionanalytics.com)
- The R-Files: Hadley Wickham (r-bloggers.com)




r-bloggers.com