Home » Posts tagged 'ggplot2'
Tag Archives: ggplot2
I got interviewed on moving on from Excel to R in Human Resources (HR) here at http://www.hrtecheurope.com/blog/?p=5345
“There is a lot of data out there and it’s stored in different formats. Spreadsheets have their uses but they’re limited in what they can do. The spreadsheet is bad when getting over 5000 or 10000 rows – it slows down. It’s just not designed for that. It was designed for much higher levels of interaction.
In the business world we really don’t need to know every row of data, we need to summarise it, we need to visualise it and put it into a powerpoint to show to colleagues or clients.”
And a more recent interview with my fellow IIML mate, and editor at Analytics India Magazine
AIM: Which R packages do you use the most and which ones are your favorites?
AO: I use R Commander and Rattle a lot, and I use the dependent packages. I use car for regression, and forecast for time series, and many packages for specific graphs. I have not mastered ggplot though but I do use it sometimes. Overall I am waiting for Hadley Wickham to come up with an updated book to his ecosystem of packages as they are very formidable, completely comprehensive and easy to use in my opinion, so much I can get by the occasional copy and paste code.
A surprising review at R- Bloggers.com /Intelligent Trading
The good news is that many of the large companies do not view R as a threat, but as a beneficial tool to assist their own software capabilities.
After assisting and helping R users navigate through the dense forest of various GUI interface choices (in order to get R up and running), Mr. Ohri continues to handhold users through step by step approaches (with detailed screen captures) to run R from various simple to more advanced platforms (e.g. CLOUD, EC2) in order to gather, explore, and process data, with detailed illustrations on how to use R’s powerful graphing capabilities on the back-end.
Do you want to write a review too? You can visit the site here
- What does R do? Bring people together, of course! (r-bloggers.com)
- Book Review: R for Business Analytics, A Ohri (r-bloggers.com)
The noted Diamonds dataset in the ggplot2 package of R is actually culled from the website http://www.diamondse.info/diamond-prices.asp
However it has ~55000 diamonds, while the whole Diamonds search engine has almost ten times that number. Using iMacros – a Google Chrome Plugin, we can scrape that data (or almost any data). The iMacros chrome plugin is available at https://chrome.google.com/webstore/detail/cplklnmnlbnpmjogncfgfijoopmnlemp while notes on coding are at http://wiki.imacros.net
Imacros makes coding as easy as recording macro and the code is automatcially generated for whatever actions you do. You can set parameters to extract only specific parts of the website, and code can be run into a loop (of 9999 times!)
Here is the iMacros code-Note you need to navigate to the web site http://www.diamondse.info/diamond-prices.asp before running it
VERSION BUILD=5100505 RECORDER=CR
SET !EXTRACT_TEST_POPUP NO
SET !ERRORIGNORE YES
TAG POS=6 TYPE=TABLE ATTR=TXT:* EXTRACT=TXT
TAG POS=1 TYPE=DIV ATTR=CLASS:paginate_enabled_next
SAVEAS TYPE=EXTRACT FOLDER=* FILE=test+3
and voila- all the diamonds you need to analyze!
The returning data can be read using the standard delimiter data munging in the language of SAS or R.
More on IMacros from
Automate your web browser. Record and replay repetitious work
If you encounter any problems with iMacros for Chrome, please let us know in our Chrome user forum at http://forum.iopus.com/viewforum.php?f=21 Our forum is also the best place for new feature suggestions ---- iMacros was designed to automate the most repetitious tasks on the web. If there’s an activity you have to do repeatedly, just record it in iMacros. The next time you need to do it, the entire macro will run at the click of a button! With iMacros, you can quickly and easily fill out web forms, remember passwords, create a webmail notifier, and more. You can keep the macros on your computer for your own use, use them within bookmark sync / Xmarks or share them with others by embedding them on your homepage, blog, company Intranet or any social bookmarking service as bookmarklet. The uses are limited only by your imagination! Popular uses are as web macro recorder, form filler on steroids and highly-secure password manager (256-bit AES encryption).
My favorite GUI (or one of them) R Commander has a relatively new plugin called KMGGplot2. Until now Deducer was the only GUI with ggplot features , but the much lighter and more popular R Commander has been a long champion in people wanting to pick up R quickly.
RcmdrPlugin.KMggplot2: Rcmdr Plug-In for Kaplan-Meier Plot and Other Plots by Using the ggplot2 Package
As you can see by the screenshot- it makes ggplot even easier for people (like R newbies and experienced folks alike)
This package is an R Commander plug-in for Kaplan-Meier plot and other plots by using the ggplot2 package.
|Depends:||R (≥ 2.15.0), stats, methods, grid, Rcmdr (≥ 1.8-4), ggplot2 (≥ 0.9.1)|
|Imports:||tcltk2 (≥ 1.2-3), RColorBrewer (≥ 1.0-5), scales (≥ 0.2.1), survival (≥ 2.36-14)|
|Author:||Triad sou. and Kengo NAGASHIMA|
|Maintainer:||Triad sou. <triadsou at gmail.com>|
|CRAN checks:||RcmdrPlugin.KMggplot2 results|
---------------------------------------------------------------- NEWS file for the RcmdrPlugin.KMggplot2 package ---------------------------------------------------------------- ---------------------------------------------------------------- Changes in version 0.1-0 (2012-05-18) o Restructuring implementation approach for efficient maintenance. o Added options() for storing package specific options (e.g., font size, font family, ...). o Added a theme: theme_simple(). o Added a theme element: theme_rect2(). o Added a list box for facet_xx() functions in some menus (Thanks to Professor Murtaza Haider). o Kaplan-Meier plot: added confidence intervals. o Box plot: added violin plots. o Bar chart for discrete variables: deleted dynamite plots. o Bar chart for discrete variables: added stacked bar charts. o Scatter plot matrix: added univariate plots at diagonal positions (ggplot2::plotmatrix). o Deleted the dummy data for histograms, which is large in size. ---------------------------------------------------------------- Changes in version 0.0-4 (2011-07-28) o Fixed "scale_y_continuous(formatter = "percent")" to "scale_y_continuous(labels = percent)" for ggplot2 (>= 0.9.0). o Fixed "legend = FALSE" to "show_guide = FALSE" for ggplot2 (>= 0.9.0). o Fixed the DESCRIPTION file for ggplot2 (>= 0.9.0) dependency. ---------------------------------------------------------------- Changes in version 0.0-3 (2011-07-28; FIRST RELEASE VERSION) o Kaplan-Meier plot: Show no. at risk table on outside. o Histogram: Color coding. o Histogram: Density estimation. o Q-Q plot: Create plots based on a maximum likelihood estimate for the parameters of the selected theoretical distribution. o Q-Q plot: Create plots based on a user-specified theoretical distribution. o Box plot / Errorbar plot: Box plot. o Box plot / Errorbar plot: Mean plus/minus S.D. o Box plot / Errorbar plot: Mean plus/minus S.D. (Bar plot). o Box plot / Errorbar plot: 95 percent Confidence interval (t distribution). o Box plot / Errorbar plot: 95 percent Confidence interval (bootstrap). o Scatter plot: Fitting a linear regression. o Scatter plot: Smoothing with LOESS for small datasets or GAM with a cubic regression basis for large data. o Scatter plot matrix: Fitting a linear regression. o Scatter plot matrix: Smoothing with LOESS for small datasets or GAM with a cubic regression basis for large data. o Line chart: Normal line chart. o Line chart: Line char with a step function. o Line chart: Area plot. o Pie chart: Pie chart. o Bar chart for discrete variables: Bar chart for discrete variables. o Contour plot: Color coding. o Contour plot: Heat map. o Distribution plot: Normal distribution. o Distribution plot: t distribution. o Distribution plot: Chi-square distribution. o Distribution plot: F distribution. o Distribution plot: Exponential distribution. o Distribution plot: Uniform distribution. o Distribution plot: Beta distribution. o Distribution plot: Cauchy distribution. o Distribution plot: Logistic distribution. o Distribution plot: Log-normal distribution. o Distribution plot: Gamma distribution. o Distribution plot: Weibull distribution. o Distribution plot: Binomial distribution. o Distribution plot: Poisson distribution. o Distribution plot: Geometric distribution. o Distribution plot: Hypergeometric distribution. o Distribution plot: Negative binomial distribution.
Data from the ESPN Cricinfo website is available from the STATSGURU website.
The url is of the form-
If you break down this URL to get more statistics on cricket, you can choose the following parameters.
6=India ,7=Pakistan and 8=Sri Lanka
However ESPN has unleashed the API (including both free and premium)for Developers at http://developer.espn.com/docs.
and especially these sports http://developer.espn.com/docs/headlines#parameters
|/sports||News across all sports/sections|
|/sports/baseball/mlb||Major League Baseball (MLB)|
|/sports/basketball/mens-college-basketball||NCAA Men’s College Basketball|
|/sports/basketball/nba||National Basketball Association (NBA)|
|/sports/basketball/wnba||Women’s National Basketball Association (WNBA)|
|/sports/basketball/womens-college-basketball||NCAA Women’s College Basketball|
|/sports/football/college-football||NCAA College Football|
|/sports/football/nfl||National Football League (NFL)|
|/sports/hockey/nhl||National Hockey League (NHL)|
|/sports/mma||Mixed Martial Arts|
|/sports/soccer||Professional soccer (US focus)|
I wonder when this can be enabled for Cricket as well (including APIs free,academic,premium,partner ).
(Note you can use R packages XML , RCurl , rjson, to get data from the web among others).
Plotting is best done using ggplot2 http://had.co.nz/ggplot2/ or d3.js at http://mbostock.github.com/d3/, and the current status of cricket graphics can surely look a change- they are mostly a single radial plot of shots played /runs scored or a combined barplot/line graph.
Revolution Analytics just launched an roadmap detailing their product plan for 2011.
In particular I am excited for the new GUI coming up, the Hadoop packages, new K Means and Data Sort/merge using Revoscaler for bigger datasets, and also the option to offer support for community packages like ggplot2 titled “ More value in Community Version”. (more…)