Using R for Cricket Analysis #rstats

ESPN Crincinfo is the best site for cricket data (you can see an earlier detailed post on the database here https://decisionstats.com/2012/04/07/cricinfo-statsguru-database-for-statistical-and-graphical-analysis/ ), and using the XML package in R we can easily scrape and manipulate data

Here is the code.

library(XML)
url="http://stats.espncricinfo.com/ci/engine/stats/index.html?class=1;team=6;template=results;type=batting"
#Note I can also break the url string and use paste command to modify this url with parameters
tables=readHTMLTable(url)
tables$"Overall figures"

#Now see this- since I only got 50 results in each page, I look at the url of next page

table1=tables$"Overall figures"
url="http://stats.espncricinfo.com/ci/engine/stats/index.html?class=1;page=2;team=6;template=results;type=batting"
tables=readHTMLTable(url)
table2=tables$"Overall figures"

#Now I need to join these two tables vertically

table3=rbind(table1,table2)

Note-I can also automate the web scraping .
Now the data is within R, we can use something like Deducer to visualize.

Created by Pretty R at inside-R.org

Shiny 0.3 released . New era for #rstats

Message from Winston Cheng of R Studio.

—-

We’ve released Shiny 0.3.0, and it’s available on CRAN now. Glimmer will be updated with the latest version of Shiny some time later today.

To update your installation of Shiny, run:

install.packages(‘shiny’)

Highlights of the new version include:

* Some bugs were fixed in `reactivePrint()` and `reactiveText()`, so that they have slightly different rules for collecting the output. Please be aware that some changes to your apps’ text output is possible. The help pages for these functions explain the behavior.

* New `runGitHub()` function, which can run apps directly from a repository on GitHub

* New `runUrl()` function, which can run apps stored as zip or tar files on a remote web server.

* New `isolate()` function, which allows you to access reactive values (from input) without making the function dependent on them.

* Improved scheduling of evaluation of reactive functions, which should reduce the number of “extra” times a reactive function is called.

RapidMiner launches extensions marketplace

For some time now, I had been hoping for a place where new package or algorithm developers get at least a fraction of the money that iPad or iPhone application developers get. Rapid Miner has taken the lead in establishing a marketplace for extensions. Is there going to be paid extensions as well- I hope so!!

This probably makes it the first “app” marketplace in open source and the second app marketplace in analytics after salesforce.com

It is hard work to think of new algols, and some of them can really be usefull.

Can we hope for #rstats marketplace where people downloading say ggplot3.0 atleast get a prompt to donate 99 cents per download to Hadley Wickham’s Amazon wishlist. http://www.amazon.com/gp/registry/1Y65N3VFA613B

Do you think it is okay to pay 99 cents per iTunes song, but not pay a cent for open source software.

I dont know- but I am just a capitalist born in a country that was socialist for the first 13 years of my life. Congratulations once again to Rapid Miner for innovating and leading the way.

http://rapid-i.com/component/option,com_myblog/show,Rapid-I-Marketplace-Launched.html/Itemid,172

RapidMiner, Marketplace, Extensions	30 May 2011
Rapid-I Marketplace Launched by Simon Fischer

Over the years, many of you have been developing new RapidMiner Extensions dedicated to a broad set of topics. Whereas these extensions are easy to install in RapidMiner – just download and place them in the plugins folder – the hard part is to find them in the vastness that is the Internet. Extensions made by ourselves at Rapid-I, on the other hand, are distributed by the update server making them searchable and installable directly inside RapidMiner.

We thought that this was a bit unfair, so we decieded to open up the update server to the public, and not only this, we even gave it a new look and name. The Rapid-I Marketplace is available in beta mode at http://rapidupdate.de:8180/ . You can use the Web interface to browse, comment, and rate the extensions, and you can use the update functionality in RapidMiner by going to the preferences and entering http://rapidupdate.de:8180/UpdateServer/ as the update server URL. (Once the beta test is complete, we will change the port back to 80 so we won’t have any firewall problems.)

As an Extension developer, just register with the Marketplace and drop me an email (fischer at rapid-i dot com) so I can give you permissions to upload your own extension. Upload is simple provided you use the standard RapidMiner Extension build process and will boost visibility of your extension.

Looking forward to see many new extensions there soon!

Disclaimer- Decisionstats is a partner of Rapid Miner. I have been liking the software for a long long time, and recently agreed to partner with them just like I did with KXEN some years back, and with Predictive AnalyticsConference, and Aster Data until last year.

I still think Rapid Miner is a very very good software,and a globally created software after SAP.

Here is the actual marketplace

http://rapidupdate.de:8180/UpdateServer/faces/index.xhtml

Welcome to the Rapid-I Marketplace Public Beta Test

The Rapid-I Marketplace will soon replace the RapidMiner update server. Using this marketplace, you can share your RapidMiner extensions and make them available for download by the community of RapidMiner users. Currently, we are beta testing this server. If you want to use this server in RapidMiner, you must go to the preferences and enter http://rapidupdate.de:8180/UpdateServer for the update url. After the beta test, we will change the port back to 80, which is currently occupied by the old update server. You can test the marketplace as a user (downloading extensions) and as an Extension developer. If you want to publish your extension here, please let us know via the contact form.

Latest Events

5/30/11 12:39 PM	User burgetrm has uploaded version 1.1.0 of Imageprocessing.
5/30/11 12:34 PM	User burgetrm has uploaded version 1.0.0 of Imageprocessing.
5/30/11 11:55 AM	User burgetrm has created the new product Imageprocessing.
5/30/11 11:12 AM	User Rapid-I has uploaded version 5.0.7 of RapidMiner.
5/30/11 11:12 AM	User Rapid-I has uploaded version 5.0.2 of RapidMiner.

Hot Downloads

««

»»

Imageprocessing

The Image Processing Extension provides operators for handling image data. You can extract attributes describing colour and texture in the image, you can make several transformation of a image data which allows you to perform segmentation and detection of suspicious areas in image data.The extension provides many of image transformation and extraction operators ranging from Wavelet Decomposition, Hough Circle to Block Difference of Inverse probabilities.

RapidMiner

RapidMiner is unquestionably the world-leading open-source system for data mining. It is available as a stand-alone application for data analysis and as a data mining engine for the integration into own products. Thousands of applications of RapidMiner in more than 40 countries give their users a competitive edge.

Data Integration, Analytical ETL, Data Analysis, and Reporting in one single suite
Powerful but intuitive graphical user interface for the design of analysis processes
Repositories for process, data and meta data handling
Only solution with meta data transformation: forget trial and error and inspect results already during design time
Only solution which supports on-the-fly error recognition and quick fixes
Complete and flexible: Hundreds of data loading, data transformation, data modeling, and data visualization methods

Weka Extension

All modeling methods and attribute evaluation methods from the Weka machine learning library are available within RapidMiner. After installing this extension you will get access to about 100 additional modelling schemes including additional decision trees, rule learners and regression estimators.This extension combines two of the most widely used open source data mining solutions. By installing it, you can extend RapidMiner to everything what is possible with Weka while keeping the full analysis, preprocessing, and visualization power of RapidMiner.

R Extension

Finally, the two most widely used data analysis solutions – RapidMiner and R – are connected. Arbitrary R models and scripts can now be directly integrated into the RapidMiner analysis processes. The new R perspective offers the known R console together with the great plotting facilities of R. All variables and R scripts can be organized in the RapidMiner Repository.A directly included online help and multi-line editing makes the creation of R scripts much more comfortable.

Open Source replacements for Operaitons Research and Analytics Software (r-bloggers.com)
RapidMiner Community gets exciting! (decisionstats.wordpress.com)

The long tail of the internet

On a whim, I took the all time stats of my blog posts (more than 1000 posts) , and tried to plot their distribution.

Basically I copied and pasted all the data in a Google docs spreadsheet. and I created dummy codes (like URL1, URL2…. URL 500)

Next I downloaded the….

I wasnt in the mood for downloading and uploading stuff so I decided to use GGPLOT using Jeroen’s Application at http://www.stat.ucla.edu/~jeroen/

I used the mirror server that Dataspora provides as I have had latency issues with Jeroen’s website.

I got this error while trying to connect the Dataspora App to my Google spreadsheet

The page you have requested cannot be displayed. Another site was requesting access to your Google Account, but sent a malformed request. Please contact the site that you were trying to use when you received this message to inform them of the error. A detailed error message follows:

The site “http://dataspora.com” has not been registered.

Oh dear! Back to Jeroen’s /UCLA’s page.

http://rweb.stat.ucla.edu/ggplot2/

I get this warning but it still manages to log in

This website has not registered with Google to establish a secure connection for authorization requests. We recommend that you continue the process only if you trust the following destination:

http://rweb.stat.ucla.edu/R/googleLogin?domain=rweb.stat.ucla.edu

wow it works! thats cloud computing now so I wonder why Google and Amazon continue to ignore the rApache, and Jeroen’s cloud app . Surely their Google Fusion Tables can be always improved or tweaked. Not to mention the next gen version of R which will have its own server

Pretty cool screenshot (but click to see more)

I get the following pretty graph. Hadley Wickham would be ashamed of me by now.

What went wrong- well one page has 36000 views . Scale is the key to graphical coherence . So I redo- delete home page in Google spreadsheet ,reimport replot. ( I didnt know how to modify data in the cloud app, maybe we need a cloud PlyR) I redo it again as I have a big outlier-The top 10 Statistical GUI article which ironically has only 5 GUIs in that article but hush dont tell to high quality search engine)

So again Belatedly I discover something called layer in ggplot.

Base Graphics engine has really spoilt me to write short functions for plots.

I give up. I rather prefer hist() I go to my favorite GUI Rattle, but it has some dating issues with the dll of GTK+

So I go to John Fox’s simple GUI. R Commander- is the best GUI if you use Occam’s Razor, and I am using Occam’s Chainsaw now.

I get the analysis I want in 12 secs

Summary- GGPLot is more complicated than base graphics engine.

Deducer GUI is not as simple too

R Commander is the best GUI because it retains simplicity

Ignore long tail of internet only at your peril

Almost 2/3 rds of my daily traffic of 400+ comes from old archived content That is why Search Engine Optimization and Alerts for Keywords are CRITICAL for any poor soul trying to write on a blog (which has no journal like prestige nor rewards)

If you make life easier for the search engine, it being a fair chap, rewards you well

Existing web traffic estimates like Comscore and Google Trends ignore this long tail

Comments are welcome (Data is pasted below of 500 rows X 2 columns if you can come up with a better analysis)

Since SAS has ignored web analytics and Google Analytics is hmm hmm, this could be an area of opportunity for R developers as well to create a web analytics package.

Title		Views
Home page		36,185
Top 10 Graphical User Interfaces in Statistical Software		8,264
Matlab-Mathematica-R and GPU Computing		2,166
Wealth = function (numeracy, memory recall)		2,162
The Top Statistical Softwares (GUI)		2,118
About DecisionStats		1,902
Libre Office		1,770
Using Facebook Analytics (Updated)		1,446
Windows Azure vs Amazon EC2 (and Google Storage)		1,386
Interview Hadley Wickham R Project Data Visualization Guru		1,204
Test drive a Chrome notebook.		1,201
Interview Professor John Fox Creator R Commander		1,190
Top ten RRReasons R is bad for you ?		1,178
SAS Institute files first lawsuit against WPS- Episode 1		1,131
R Package Creating		1,104
Interfaces to R		1,039
Using Red R- R with a Visual Interface		950
Google Maps – Jet Ski across Pacific Ocean		922
Norman Nie: R GUI and More		851
Not so AWkward after all: R GUI RKWard		805
Running R on Amazon EC2		786
Startups for Geeks		785
Creating a Blog Aggregator for free		749
Cloud Computing with R		676
Rapid Miner- R Extension		671
Parallel Programming using R in Windows		664
Revolution R for Linux		645
Red R 1.8- Pretty GUI		638
John Sall sets JMP 9 free to tango with R		601
Wordle.net		597
Funny Images from India		571
R is an epic fail or is it just overhyped		568
Great article on Notepad++ and R in R Journal		564
Certifications in Analytics and Business Intelligence		548
R Excel :Updated		542
Enterprise Linux rises rapidly:New Report		537
So which software is the best analytical software? Sigh- It depends		520
Funny Photo :It happens only In India		518
Creating 3D Graphs with Data in R		507
SPSS /PASW Certification – Free until Sept 15		497
Interview :Dr Graham Williams		476
GNU PSPP- The Open Source SPSS		474
Professors and Patches: For a Betterrrr R		467
Running R on Amazon EC2 :Windows		462
WPS response to SAS Lawsuit		458
R language on the GPU		450
KXEN and a Data Mining Survey		449
News on R Commercial Development -Rattle- R Data Mining Tool		449
WPS ( Alternative SAS Language Software) Pricing		447
Kill R? Wait a sec		445
SAS Institute lawsuit against WPS Episode 2 The Clone Wars		442
How to be a BAD blogger?		435
ROC Curve		431
Bulls ,Bears ,Tigers and Asses		424
Trrrouble in land of R…and Open Source Suggestions		422
Interview- BI Dashboards dMINE Sanjay Patel		417
Top Seven Reasons :Why Outsourcing is Bad for India		408
Interviews @Decisionstats		407
Running a R GUI,and parallel programming on Amazon EC2		394
Unbreakable Oracle Linux- and Unshakable-Libre Office-		393
IBM SPSS 19: Marketing Analytics and RFM		387
Analyzing SAS Institute-WPS Lawsuit		377
Hive Tutorial: Cloud Computing		377
R and Hadoop		374
Graphics Presentations		373
Sector/ Sphere – Faster than Hadoop/Mapreduce at Terasort		370
Benchmarking GNU R: DirkE’s view and a Ninja wishlist		363
Webfocus RStat: Pervasive BI using R		363
Open Source Business Intelligence: Pentaho and Jaspersoft		362
How to do Logistic Regression		362
CommeRcial R- Integration in software		359
So what’s new in R 2.12.0		357
Interview Michael J. A. Berry Data Miners, Inc		356
Data Mining through the Android		352
Newer version of Alternative SAS / WPS 2.4 launched		350
How to Analyze Wikileaks Data – R SPARQL		348
JMP 9 releasing on Oct 12		343
The R Online WikiBook		340
Hadley’s tutorials on R Visualization		340
Interview Tasso Argyros CTO Aster Data Systems		339
Parsing XML files easily		337
A Software Called Rattle		335
Which software do we buy? -It depends		329
Jim Goodnight on Open Source- and why he is right -sigh		328
SAS/Blades/Servers/ GPU Benchmarks		326
R Commander Plugins-20 and growing!		324
10 iPhone Apps you can actually use ( and dont have to pay for)		316
R Modeling with huge data		315
The Popularity of Data Analysis Software		315
Interview Donald Farmer Microsoft		307
Learning SAS for free		305
Comparing Base SAS and SPSS		304
Towards better Statistical Interfaces		302
Making NeW R		301
Using Code Snippets in Revolution R		300
R Apache – The next frontier of R Computing		298
Using JMP 9 and R together		297
Doing Time Series using a R GUI		295
Amazon announces Micro Instances for cloud computing		295
Top 5 Free Music Websites		295
Web R- Elastic R and RevoDeploy R		291
R for Stats : Updated		290
Heritage Health Prize- Data Mining Contest for 3mill USD		289
Google AppInventor -Android and Business Intelligence		281
Top R Interviews		278
An Introduction to Data Mining-online book		272
Interview Jim Davis SAS Institute		272
Economic: Indian Caste System -Simplification		271
Rattle Re-Introduced		271
KXEN – Automated Regression Modeling		267
Movie Review- Inglorious Basterds		267
Interview :Doug Savage ,Creator SavageChickens.com		261
IPSUR – A Free R Textbook		258
SAS with the GUI Enterprise Guide (Updated)		256
Trying out Google Prediction API from R		256
Segmenting Models : When and Why		253
Using R and Excel Together		253
R Oracle Data Mining		253
KNIME		253
Using PostgreSQL and MySQL databases in R 2.12 for Windows		250
Fighting Back -The Net, Social Media, Spam, Identity Theft, Terrorism		249
Libre Office (Beta) 3 Launched		248
India to make own DoS -citing cyber security		247
Interview Dominic Pouzin Data Applied		242
R releases new version R 2.9.2		240
SAS to launch SAS/IML with R ( updated)		239
Playing with Playwith- R Package for Interactive Data Visualizations		234
Predictive Analytics World Conference		231
Analytics and BI for small biz		231
Interview Jeanne Harris Co-Author -Analytics at Work and Competing with Analytics		230
Using R for Time Series in SAS		228
General Electric ‘s breach of the spirit and letter of integrity		227
Interview Luis Torgo Author Data Mining with R		222
Browser Based Model Creation		222
Interview James Dixon Pentaho		221
Thoughts on WPS, SAS , R		220
Choosing R for business – What to consider?		220
Buying SAS Institute		219
Google: Prediction API and other cool stuff		218
Interview : R For Stata Users		216
Viva Libre Office		216
Top 10 Games on Linux -sudo update		214
When China overtook India- using DEDUCER		214
KDD 2009 : Demos		211
Interview Dean Abbott Abbott Analytics		210
Statistically Speaking		203
Data Visualization using Tableau		203
SAS and JMP : Visual Data Discovery		203
High Performance Computing and R		200
Troubleshooting Rattle Installation- Data Mining R GUI		194
Google Realtime Live Updates on Egypt Yemen Tunisia Jordan..		192
New Deal in Statistical Training		191
Interview Ken O Connor Business Intelligence Consultant		190
Karmic Koala versus Windows 7		189
Interview Shawn Kung Sr Director Aster Data		189
Pun on Putin		189
Towards better analytical software		188
Dryad- Microsoft’s answer to MR		188
Analyzing Indian – Chinese Relationships		188
LibreOffice News and Google Musings		186
Special Issue of JSS on R GUIs		184
Using Google Docs for Web Scraping		181
Using Reshape2 for transposing datasets in R		180
IBM Buys Netezza		180
Libreoffice 3.3 released		180
Google moving on from MapReduce: rest of world still catching up		179
Linux= Who did what and how much?		176
Interview Carole Jesse Experienced Analytics Professional		176
HIRE ME		175
Test Drive a Google Chrome Notebook: Last Two Days left		174
Q&A with David Smith, Revolution Analytics.		174
R , Ubuntu, RCmdr Updates		173
Interview KNIME Fabian Dill		173
Big Data and R: New Product Release by Revolution Analytics		173
Automated Content Aggregation		173
R or SAS —– R and SAS ?		170
Graphs		169
How to use Oracle for Data Mining		169
Carolina and SAS		166
Interview John Sall Founder JMP/SAS Institute		165
Aster Data hires Quentin Gallivan as CEO		165
Oracle for possible takeover of REvolution Computing		164
The Best and Worst Graphs Ever		163
Statistical Analysis with R- by John M Quick		163
Growing Rapidly: Rapid Miner 4.5		161
SAP and BI on Demand		161
Google Snappy		161
Google Refine		161
Scoring SAS and SPSS Models in the cloud		159
Hey Professor, I am not a Monkey		157
REVolution Computing fails to create a Revolution		156
SAS Lawsuit against WPS- Application Dismissed		156
KDNuggets Poll on SAS: Churn in Analytics Users		154
SAS Early Days		154
Interview James Taylor Decision Management Expert (Updated)		151
Google Books Ngram Viewer		148
Review – R for SAS and SPSS Users		148
New R Journal Edition		146
Here comes PySpread- 85,899,345 rows and 14,316,555 columns		145
Interview Karl Rexer -Rexer Analytics		144
Poem: The Extroverted Engineer		144
Hearst DataMining Challenge		144
This Is It		142
Interview Timo Elliott SAP		141
The Blind Side – Movie Review		141
Data Mining Survey Results :Tools and Offshoring		140
Going Deap : Algols in Python		140
ADVERTISE		139
Interview Jeff Bass, Bass Institute (Part 2)		139
Interview Jim Harris Data Quality Expert OCDQ Blog		139
Do Monkeys Pay for Sex?		138
Privacy Browsing Extensions in Google Chrome		137
China biggest threat to Indian Software in 5 years: Indian Tech CEO		136
Software HIStory: Bass Institute Part 1		135
Grenier’s Theory for Competitiveness		134
Interview Charlie Berger Oracle Data Mining		134
Karmic Koala Ubuntu/Linux 9.2 Preview		133
Analytics and Journals		133
Using Code Editors in R		132
Interview Stephanie McReynolds Director Product Marketing, AsterData		132
Amcharts- Cool Charts Web Editor		130
Mapreduce Book		128
Interesting R competition at Reddit		127
Color of Statistics		127
Amazon goes free for users next month		127
#3443 (deleted)		127
Interview Sarah Blow – Girly Geekdom Founder		126
Social Network Analysis: Using R		126
Interview Thomas C. Redman Author Data Driven		126
Audio Interview Anne Milley , Part 1		124
Advanced Analytics on Multi-Terabyte Datasets- Conferences		123
Geek Humour		123
John M. Chambers Statistical Software Award – 2011		122
My friend -The Computer		120
M2009 Interview Peter Pawlowski AsterData		118
R Journal Dec 2010 and R for Business Analytics		118
Top ten RRReasons R is bad for you ?		116
Interview Michael Zeller,CEO Zementis on PMML		115
Fast R Graphics		114
New Google Ad Planner		114
Making Sense: Hadoop and MapReduce		114
Using SAS/IML with R		114
Facebook App by SAP Crystal Reports		113
Whats behind that pretty SAS Blog?		113
Interview Alison Bolen SAS.com		113
Ajay @ arts		112
My latest creation		112
Indian Crabs – A story		112
Open Source’s worst enemy is itself not Microsoft/SAS/SAP/Oracle		112
Google Cloud Print -print documents from the internet		111
WPS and SAS- A rah-rah comparison		110
Facebook Gmail Killer Threatens to commit Hara Kari live on AOL Techcrunch if unsucessful		110
Open Source and Software Strategy		109
Windows Azure and Amazon Free offer		108
R for Analytics is now live		108
Open Source Compiler for SAS language/ GNU -DAP		107
Using Chromium /Chrome on Ubuntu Linux		107
Interview John Moore CTO, Swimfish		106
Nice BI Tutorials		106
Creating Customized Packages in SAS Software		106
Business Analytics Analyst Relations /Ethics/White Papers		105
Web Crawling Automation		105
The SAS-WPS Lawsuit- Preliminary Hearing		105
Handling time and date in R		105
KXEN Update		104
MapReduce Analytics Apps- AsterData’s Developer Express Plugin		104
+ 1 your website -updated		103
Movie Review- Peepli Live		103
Better Data Visualization in WordPress.com Stats		102
Customizing your R software startup		102
LibreOffice Beta 2 (Office Fork off Oracle) launches!		102
KXEN Case Studies : Financial Sector		102
Deleting Twitter, Facebook,LinkedIn- Accepting Life		102
Google Street View shows Gladiators fighting		101
Carole-Ann’s 2011 Predictions for Decision Management		101
Amazon goes HPC and GPU: Dirk E to revise his R HPC book		101
Happy Thanksgiving Id		101
Interview Phil Rack WPS Consultant and Developer		100
SPSS launches two more PASWs		99
Interview David Smith REvolution Computing		99
Data Mining with R		97
Dataset too big for R ?		97
How Jesus saved my Butt		97
Interview Evan Levy Baseline Consulting		97
The Latest GUI for R- BioR		96
WPS Version 2.5.1 Released – can still run SAS language/data and R		96
SAS legal falls flat against WPS again: Technical Grounds		95
World Programming System:300 pounds for The power of SAS language		94
KNIME and Zementis shake hands		93
Interview Eric Siegel, Phd President Prediction Impact		93
Interview Sarah Burnett BI Analyst,Ovum group		92
Quantifying Analytics ROI		92
PSPP – SPSS ‘s Open Source Counterpart		91
PySpread Magic		91
Interview SPSS Olivier Jouve		91
Interesting Data Visualization:Friendwheels		91
R on Windows HPC Server		90
The declining market for Telecommunication Churn Models		90
Getting Inside R		90
The Big Data Summit Agenda		90
Review: Clash of the Titans		89
Red Hat worth 7.8 Billion now		89
Movie Review : Rajneeti (Politics)		89
3 Idiots: Insight to Indian Engineer Campus Life		89
The Comic Water Games (aka Common Wealth Games)		88
Computer Education grants from Google		88
Challenges of Analyzing a dataset (with R)		87
Input Data in R using the top 3 R GUI		86
Complex Event Processing- SASE Language		85
Interview with Anne Milley, SAS II		85
Data Mining Presentation at M2009 by Dr Vincent Granville		85
Brief Interview Timo Elliott		85
Mapping Health Statistics at CDC.gov		85
Amazon’s Turks Mturk.com		84
Business Intelligence and Stat Computing: The White Man’s Last Stand		84
Movie Review- Dabangg		84
Movie Review: Sherlock Holmes		84
SAS Data Mining 2009 Las Vegas		83
Chinese Fortune Cookies		83
SPSS and R		83
Manjunath- A Batchmate on my mine		82
Data Mining 2010:SAS Conference in Vegas		81
DirkE and JD swoon about Shane’s MOM in Room 106 while writing R code		81
SAS to R Challenge: Unique benchmarking		81
S A S GOOD LIFE UNDER SIEGE – NYT		81
Pentaho and R: working together		81
Interview John F Moore CEO The Lab		80
Ways to use both Windows and Linux together		80
Brief Interview with James G Kobielus		80
For R Writers- Inside R		79
Using Ipod and Iphone with your Ubuntu Laptop		79
Webcasts: Oracle Data Mining		79
The Cloud OS is finally here or is it?: Karmic Koala		79
Movie Review: Lafangey Parinday (Rouge Birds)		79
SAS announcement in education initiatives		78
Using R from within Python		78
Event: Predictive analytics with R, PMML and ADAPA		78
Interesting R and BI Web Event		78
Bruno Aziza, Microsoft Global BI Lead joins PAW Keynote		77
Common Analytical Tasks		77
RWui :Creating R Web Interfaces on the go		77
R Successor Language ‘Tea’ announced		76
Learning SPSS for SAS users		76
Protovis a graphical toolkit for visualization		76
Interview Paul van Eikeren Inference for R		75
Data Visualization: Central Banks		75
Oracle Data Mining 11 G R2		75
Interview Peter J Thomas -Award Winning BI Expert		75
Weak Security in Internet Databases for Statisticians		74
Open Source Cartoon		74
Top Ten Graphs for Business Analytics -Pie Charts (1/10)		74
SAS Sentiment Analysis wins Award		74
JMP Genomics 5 released		74
Short Interview Jill Dyche		73
Interview David Katz ,Dataspora /David Katz Consulting		73
PMML 4.0		73
Ponder This: IBM Research		72
PAW Videos		71
PASW 13 :The preview		71
Cisco SocialMiner		70
Review-The Dark knight		70
MapReduce Patent Granted		70
Cloud Computing and GPU ( and some stats softwares)		70
IBM Business Analytics Forum		70
And now- The Business Analytics Summit		70
Creating an Anonymous Bot		69
R and SAS in Twitter Land		69
Interview:Richard Schultz , CEO REvolution Computing		69
China -United States -The Third Opium War		68
Quick-R and Statmethods.net		68
R Node- and other Web Interfaces to R		68
Life Mojo – A Health Startup		68
Using Views in R and comparing functions across multiple packages		68
Another R Tutorial		67
Interview Karen Lopez Data Modeling Expert		67
QGIS and R		66
Christmas Carol: The Best Software (BI-Stats-Analytics)		66
Software Lawsuits :Ergo		66
STEM is cool		65
Date Night		65
More Advanced SAS Modeling Procs		65
The Big Data Event- Why am I here?		65
Interview Gary Cokins SAS Institute		65
Browser based Music Creation		64
Interview Steve Sarsfield Author The Data Governance Imperative		63
GrapheR		63
Google Web Intelligence (Beta)		61
Data Mining 2009 Interviews- Terry Whitlock, BlueCross BlueShield of TN		60
Audio Interviews -Dr. Colleen McCue National Security Expert		60
Red R- A new beginning		59
YouTube Features: Audio Swap, Mobile posts and Themes		59
R for Predictive Modeling:Workshop		59
KDD2009: Papers Research and Industrial		58
Chapman/Hall announces new series on R		58
Data Visualization and Politics		58
T Shirts Design		58
Jump to JMP: Using Data Analysis in a visual manner		58
Aster Analytics and MapReduce.org		57
OK Cupid Data Visualization- Flow Chart to your Heart		57
R for SAS and SPSS Users		57
Carbon Footprints in the snow		57
Summer School on Uncertainty Quantification		57
High Performance Computing within R: Tutorial		57
Running Stats Softwares on Clouds		57
Amazing Data Visualization- UN Counter Terrorism		56
Cloud MapReduce		56
Statistical Features in WPS		56
An R Package only for SAS Users		56
R is Ready for Business™		55
A Google App for Sales- ERPLY		55
Rexer Analytics Annual Data Miner Survey		55
Cartoons on R		55
American Decline- Why outsourcing doesnt make sense		55
Friday Cartoon Series- New		55
What softwares do you plan to use/learn in the next one year?		54
Great App for Online Sketching		54
September Roundup by Revolution		54
Using Firesheep on Campus, Caltrain and beyond		54
Decisionstats Interview at Big Data Summit, AsterData		53
Learning Hadoop		53
The White Man’s Burden-Poem		53
Curt Monash on Analytics with MapReduce		53
To R or Not to R : Data Mining and CRM for Free		52
Algorithms and Ads: No Free Lunches and Hill Climbing		52
Interview: Roger Haddad, Founder of KXEN Automated Modeling Software		52
Google and Me on Privacy and Openness		52
MapReduce.org		52
Why do bloggers blog ?		52
Live Streaming for Free : UStream		51
Light Cycle of Tron review		51
Lyx Releases 2		51
Interview – Anne Milley, SAS Part 1		51
SAS News		51
KXEN EMEA User Conference 2010-Success in Business Analytics		51
2011 Forecast-ying		51
Kill Analytics		50
Social Media Analysis Toolkit		50
Multi State Models		50
R and Cloud Computing		50
Dataists shake up R community with a rocking contest		50
Interview Anne Milley JMP		49
Movie Review: Between the Folds		49
Jokes in Economics		49
Interview Ajay Ohri Decisionstats.com with DMR		49
One more Y Tube Video		49
Happy Diwali /Google Music		48
SPSS Directions : Rexer Survey Results		48
Redlining in Internet Access and notes on Regression Models		48
Poem : A Poets Life		48
Predictive Analytics World		48
Interview- Phil Rack		48
Building KXEN Models on Ubuntu		48
New Year Resolution Presentation		48
Adobe gulps Omniture		47
SAS Modeling Procs		47
Oracle Open World/ RODM package		47
KDNuggets Survey on R		47
IBM and Revolution team to create new in-database R		47
SAS Institute invests in R project		46
Not just a Cloud		46
New Version of R released: R 2.10.1		46
Review- Iron Man2		46
Online Analytics: Monte Carlo Simulation		45
Predictive Forecasting in Commercial Applications		45
The Race -by D.H Groberg		45
SAS Scoring Accelerators		45
IBM launches Smart Analytics Cloud		45
Reactions to IBM -SPSS takeover.		45
Zementis partners with R Analytics Vendor- Revo		44
A Missing Mandelbrot Who Dun It		44
Downloading your Facebook Photos		44
Android Tutorial		44
The Mommy Track		44
My First You Tube Video: Courtesy the competiton on VOLNIGHT by Univ of Tennessee		44
Born in the USA?		43
Interview Eric A. King President The Modeling Agency		43
Interview Augusto Albeghi (Straycat) —Founder Straysoft		43
Why Cloud?		43
Innovative ways of Calculus: Gifting a comic set for Christmas		43
To find the best chaat or paan shop		43
Google unleashes Fusion Tables		42
Using SAS and C/C++ together		42
Whats new in the latest version of R		42
Bollywood 101		42
Who will forecast for the forecasters?		42
Learning R Easily :Two GUI’s		41
Harvard DropOut Writes Open Letter- His Startup has 350m users		41
BI Software		41
How to read blogs in Indonesian and Chinese!		41
Window to a Blue Cloud: Azure Pricing		41
China bans Chinese Food for Googleplex		41
SAS Program for Students		41
The Year 2010		40
What do you want to know in data analytics?		40
America’s Data Book: Census Abstract 2011		40
Big Data Management and Advanced Analytics		40
AsterData partners with Tableau		40
Using R from other Software		40
SAS on Fraud		40

Cloud Computing May Decrease Your API Call Limit (programmableweb.com)
Book: ggplot2 by Hadley Wickham (r-bloggers.com)
Google Instant Search: What does this mean for advertisers? (wpromote.com)
2 Fun and Useful Goog,e Spreadsheet Tricks (searchenginejournal.com)
R Graphs Resources (decisionstats.com)
The Importance of the Long Tail with Keywords and Phrases (businessbloggingtips.com)
As Google Retools its Search Engine, Content Farms Lose Traction (xconomy.com)

How to Analyze Wikileaks Data – R SPARQL

Image via Wikipedia

Drew Conway- one of the very very few Project R voices I used to respect until recently. declared on his blog http://www.drewconway.com/zia/

Why I Will Not Analyze The New WikiLeaks Data

and followed it up with how HE analyzed the post announcing the non-analysis.

“If you have not visited the site in a week or so you will have missed my previous post on analyzing WikiLeaks data, which from the traffic and 35 Comments and 255 Reactions was at least somewhat controversial. Given this rare spotlight I thought it would be fun to use the infochimps API to map out the geo-location of everyone that visited the blog post over the last few days. Unfortunately, after nearly two years with the same web hosting service, only today did I realize that I was not capturing daily log files for my domain”

Anyways – non American users of R Project can analyze the Wikileaks data using the R SPARQL package I would advise American friends not to use this approach or attempt to analyze any data because technically the data is still classified and it’s possession is illegal (which is the reason Federal employees and organizations receiving federal funds have advised not to use this or any WikiLeaks dataset)

https://code.google.com/p/r-sparql/

Overview

R is a programming language designed for statistics.

R Sparql allows you to run SPARQL Queries inside R and store it as a R data frame.

The main objective is to allow the integration of Ontologies with Statistics.

It requires Java and rJava installed.

Example (in R console):

> library(sparql)> data <- query("SPARQL query>","RDF file or remote SPARQL Endpoint")

and the data in a remote SPARQL http://www.ckan.net/package/cablegate

SPARQL is an easy language to pick up, but dammit I am not supposed to blog on my vacations.

http://code.google.com/p/r-sparql/wiki/GettingStarted

Getting Started¶

1. Installation

1.1 Make sure Java is installed and is the default JVM:

$ sudo apt-get install sun-java6-bin sun-java6-jre sun-java6-jdk$ sudo update-java-alternatives -s java-6-sun

1.2 Configure R to use the correct version of Java

$ sudo R CMD javareconf

1.3 Install the rJava library

$ R> install.packages("rJava")> q()

1.4 Download and install the sparql library

Download: http://code.google.com/p/r-sparql/downloads/list

$ R CMD INSTALL sparql-0.1-X.tar.gz

2. Executing a SPARQL query

2.1 Start R

#Load the librarylibrary(sparql)#Run the queryresult <- query("SELECT ... ", "http://...")#Print the resultprint(result)

3. Examples

3.1 The Query can be a string or a local file:

query("SELECT ?date ?number ?season WHERE {  ... }", "local-file.rdf")

query("my-query.rq", "local-file.rdf")

The package will detect if my-query.rq exists and will load it from the file.

3.3 The uri can be a file or an url (for remote queries):

query("SELECT ... ","local-file.db")

query("SELECT ... ","http://dbpedia.org/sparql")

3.4 Get some examples here: http://code.google.com/p/r-sparql/downloads/list

SPARQL Tutorial-

http://openjena.org/ARQ/Tutorial/index.html

Please share:

Please share:

Related articles

Please share:

Related Articles

Please share:

Why I Will Not Analyze The New WikiLeaks Data

Overview

Getting Started¶

1. Installation

2. Executing a SPARQL query

3. Examples

Related Articles

Please share: