library(XML)url="http://stats.espncricinfo.com/ci/engine/stats/index.html?class=1;team=6;template=results;type=batting"#Note I can also break the url string and use paste command to modify this url with parameters
tables=readHTMLTable(url)
tables$"Overall figures"#Now see this- since I only got 50 results in each page, I look at the url of next page
table1=tables$"Overall figures"url="http://stats.espncricinfo.com/ci/engine/stats/index.html?class=1;page=2;team=6;template=results;type=batting"
tables=readHTMLTable(url)
table2=tables$"Overall figures"#Now I need to join these two tables vertically
table3=rbind(table1,table2)
Note-I can also automate the web scraping .
Now the data is within R, we can use something like Deducer to visualize.
We’ve released Shiny 0.3.0, and it’s available on CRAN now. Glimmer will be updated with the latest version of Shiny some time later today.
To update your installation of Shiny, run:
install.packages(‘shiny’)
Highlights of the new version include:
* Some bugs were fixed in `reactivePrint()` and `reactiveText()`, so that they have slightly different rules for collecting the output. Please be aware that some changes to your apps’ text output is possible. The help pages for these functions explain the behavior.
* New `runGitHub()` function, which can run apps directly from a repository on GitHub
* New `runUrl()` function, which can run apps stored as zip or tar files on a remote web server.
* New `isolate()` function, which allows you to access reactive values (from input) without making the function dependent on them.
* Improved scheduling of evaluation of reactive functions, which should reduce the number of “extra” times a reactive function is called.
For some time now, I had been hoping for a place where new package or algorithm developers get at least a fraction of the money that iPad or iPhone application developers get. Rapid Miner has taken the lead in establishing a marketplace for extensions. Is there going to be paid extensions as well- I hope so!!
This probably makes it the first “app” marketplace in open source and the second app marketplace in analytics after salesforce.com
It is hard work to think of new algols, and some of them can really be usefull.
Can we hope for #rstats marketplace where people downloading say ggplot3.0 atleast get a prompt to donate 99 cents per download to Hadley Wickham’s Amazon wishlist. http://www.amazon.com/gp/registry/1Y65N3VFA613B
Do you think it is okay to pay 99 cents per iTunes song, but not pay a cent for open source software.
I dont know- but I am just a capitalist born in a country that was socialist for the first 13 years of my life. Congratulations once again to Rapid Miner for innovating and leading the way.
Over the years, many of you have been developing new RapidMiner Extensions dedicated to a broad set of topics. Whereas these extensions are easy to install in RapidMiner – just download and place them in the plugins folder – the hard part is to find them in the vastness that is the Internet. Extensions made by ourselves at Rapid-I, on the other hand, are distributed by the update server making them searchable and installable directly inside RapidMiner.
We thought that this was a bit unfair, so we decieded to open up the update server to the public, and not only this, we even gave it a new look and name. The Rapid-I Marketplace is available in beta mode at http://rapidupdate.de:8180/ . You can use the Web interface to browse, comment, and rate the extensions, and you can use the update functionality in RapidMiner by going to the preferences and entering http://rapidupdate.de:8180/UpdateServer/ as the update server URL. (Once the beta test is complete, we will change the port back to 80 so we won’t have any firewall problems.)
As an Extension developer, just register with the Marketplace and drop me an email (fischer at rapid-i dot com) so I can give you permissions to upload your own extension. Upload is simple provided you use the standard RapidMiner Extension build process and will boost visibility of your extension.
Looking forward to see many new extensions there soon!
Disclaimer- Decisionstats is a partner of Rapid Miner. I have been liking the software for a long long time, and recently agreed to partner with them just like I did with KXEN some years back, and with Predictive AnalyticsConference, and Aster Data until last year.
I still think Rapid Miner is a very very good software,and a globally created software after SAP.
Welcome to the Rapid-I Marketplace Public Beta Test
The Rapid-I Marketplace will soon replace the RapidMiner update server. Using this marketplace, you can share your RapidMiner extensions and make them available for download by the community of RapidMiner users. Currently, we are beta testing this server. If you want to use this server in RapidMiner, you must go to the preferences and enter http://rapidupdate.de:8180/UpdateServer for the update url. After the beta test, we will change the port back to 80, which is currently occupied by the old update server. You can test the marketplace as a user (downloading extensions) and as an Extension developer. If you want to publish your extension here, please let us know via the contact form.
On a whim, I took the all time stats of my blog posts (more than 1000 posts) , and tried to plot their distribution.
Basically I copied and pasted all the data in a Google docs spreadsheet. and I created dummy codes (like URL1, URL2…. URL 500)
Next I downloaded the….
I wasnt in the mood for downloading and uploading stuff so I decided to use GGPLOT using Jeroen’s Application at http://www.stat.ucla.edu/~jeroen/
I used the mirror server that Dataspora provides as I have had latency issues with Jeroen’s website.
I got this error while trying to connect the Dataspora App to my Google spreadsheet
The page you have requested cannot be displayed. Another site was requesting access to your Google Account, but sent a malformed request. Please contact the site that you were trying to use when you received this message to inform them of the error. A detailed error message follows:
This website has not registered with Google to establish a secure connection for authorization requests. We recommend that you continue the process only if you trust the following destination:
wow it works! thats cloud computing now so I wonder why Google and Amazon continue to ignore the rApache, and Jeroen’s cloud app . Surely their Google Fusion Tables can be always improved or tweaked. Not to mention the next gen version of R which will have its own server
Pretty cool screenshot (but click to see more)
I get the following pretty graph. Hadley Wickham would be ashamed of me by now.
What went wrong- well one page has 36000 views . Scale is the key to graphical coherence . So I redo- delete home page in Google spreadsheet ,reimport replot. ( I didnt know how to modify data in the cloud app, maybe we need a cloud PlyR) I redo it again as I have a big outlier-The top 10 Statistical GUI article which ironically has only 5 GUIs in that article but hush dont tell to high quality search engine)
So again Belatedly I discover something called layer in ggplot.
Base Graphics engine has really spoilt me to write short functions for plots.
I give up. I rather prefer hist() I go to my favorite GUI Rattle, but it has some dating issues with the dll of GTK+
So I go to John Fox’s simple GUI. R Commander- is the best GUI if you use Occam’s Razor, and I am using Occam’s Chainsaw now.
I get the analysis I want in 12 secs
Summary- GGPLot is more complicated than base graphics engine.
Deducer GUI is not as simple too
R Commander is the best GUI because it retains simplicity
Ignore long tail of internet only at your peril
Almost 2/3 rds of my daily traffic of 400+ comes from old archived content That is why Search Engine Optimization and Alerts for Keywords are CRITICAL for any poor soul trying to write on a blog (which has no journal like prestige nor rewards)
If you make life easier for the search engine, it being a fair chap, rewards you well
Existing web traffic estimates like Comscore and Google Trends ignore this long tail
Comments are welcome (Data is pasted below of 500 rows X 2 columns if you can come up with a better analysis)
Since SAS has ignored web analytics and Google Analytics is hmm hmm, this could be an area of opportunity for R developers as well to create a web analytics package.
and followed it up with how HE analyzed the post announcing the non-analysis.
“If you have not visited the site in a week or so you will have missed my previous post on analyzing WikiLeaks data, which from the traffic and 35 Comments and 255 Reactions was at least somewhat controversial. Given this rare spotlight I thought it would be fun to use the infochimps API to map out the geo-location of everyone that visited the blog post over the last few days. Unfortunately, after nearly two years with the same web hosting service, only today did I realize that I was not capturing daily log files for my domain”
Anyways – non American users of R Project can analyze the Wikileaks data using the R SPARQL package I would advise American friends not to use this approach or attempt to analyze any data because technically the data is still classified and it’s possession is illegal (which is the reason Federal employees and organizations receiving federal funds have advised not to use this or any WikiLeaks dataset)
In May 2009, the Obama administration started putting raw
government data on the Web.
It started with 47 data sets. Today, there are more than
270,000 government data sets, spanning every imaginable
category from public health to foreign aid.