Top ten business analytics graphs Bar Charts (3/10)

Bar Charts and Histograms-Bar Charts are one of the most widely used types of Business Charts. Even the ever popular histograms are  special cases of bar charts (but showing frequencies). Histograms are the not the same as bar charts, they are simply bar charts of frequencies.

Basically a bar chart shows rectangular bars with length proportional to the quantities being described. It helps to see relative quantities between various category types.

The barplot() command is used for making Bar Plots, while hist() is used for histograms. You can also use the plot() command with type=h to create histograms-The official R manual also suggests that Dot plots using dotchart () are a reasonable substitute for bar plots.
A very simple easy to understand tutorial for basic bar plots is at http://msenux.redwoods.edu/math/R/barplot.php

The difference between the three main functions that can be used for these charts are shown below-

> VADeaths
Rural Male Rural Female Urban Male Urban Female
50-54       11.7          8.7       15.4          8.4
55-59       18.1         11.7       24.3         13.6
60-64       26.9         20.3       37.0         19.3
65-69       41.0         30.9       54.6         35.1
70-74       66.0         54.3       71.1         50.0

> plot(VADeaths,type=”h”)


> dotchart(VADeaths)

> barplot(VADeaths)

> hist(VADeaths)
plot(x, y, …)

Arguments

x the coordinates of points in the plot. Alternatively, a single plotting structure, function or any R object with a plot method can be provided.
y the y coordinates of points in the plot, optional if x is an appropriate structure.
Arguments to be passed to methods, such as graphical parameters (see par). Many methods will accept the following arguments:
type
what type of plot should be drawn. Possible types are

  • “p” for points,
  • “l” for lines,
  • “b” for both,
  • “c” for the lines part alone of “b”,
  • “o” for both ‘overplotted’,
  • “h” for ‘histogram’ like (or ‘high-density’) vertical lines,
  • “s” for stair steps,
  • “S” for other steps, see ‘Details’ below,
  • “n” for no plotting.

From http://stat.ethz.ch/R-manual/R-patched/library/graphics/html/hist.html
hist(x, …)

## Default S3 method:
hist(x, breaks = “Sturges”,
freq = NULL, probability = !freq,
include.lowest = TRUE, right = TRUE,
density = NULL, angle = 45, col = NULL, border = NULL,
main = paste(“Histogram of” , xname),
xlim = range(breaks), ylim = NULL,
xlab = xname, ylab,
axes = TRUE, plot = TRUE, labels = FALSE,
nclass = NULL, warn.unused = TRUE, …)

Details

The definition of histogram differs by source (with country-specific biases). R’s default with equi-spaced breaks (also the default) is to plot the counts in the cells defined by breaks. Thus the height of a rectangle is proportional to the number of points falling into the cell, as is the area provided the breaks are equally-spaced.
The default with non-equi-spaced breaks is to give a plot of area one, in which the area of the rectangles is the fraction of the data points falling in the cells.
If right = TRUE (default), the histogram cells are intervals of the form (a, b], i.e., they include their right-hand endpoint, but not their left one, with the exception of the first cell when include.lowest is TRUE.
For right = FALSE, the intervals are of the form [a, b), and include.lowest means ‘include highest’.
A numerical tolerance of 1e-7 times the median bin size is applied when counting entries on the edges of bins. This is not included in the reported breaks nor (as from R 2.11.0) in the calculation of density.
The default for breaks is “Sturges”: see nclass.Sturges. Other names for which algorithms are supplied are “Scott” and “FD” / “Freedman-Diaconis” (with corresponding functions nclass.scott andnclass.FD). Case is ignored and partial matching is used. Alternatively, a function can be supplied which will compute the intended number of breaks as a function of x.

Arguments

x a vector of values for which the histogram is desired.
breaks one of:

  • a vector giving the breakpoints between histogram cells,
  • a single number giving the number of cells for the histogram,
  • a character string naming an algorithm to compute the number of cells (see ‘Details’),
  • a function to compute the number of cells.

In the last three cases the number is a suggestion only.

freq logical; if TRUE, the histogram graphic is a representation of frequencies, the counts component of the result; if FALSE, probability densities, component density, are plotted (so that the histogram has a total area of one). Defaults to TRUE if and only if breaks are equidistant (and probability is not specified).
probability an alias for !freq, for S compatibility.
include.lowest logical; if TRUE, an x[i] equal to the breaks value will be included in the first (or last, for right = FALSE) bar. This will be ignored (with a warning) unless breaks is a vector.
right logical; if TRUE, the histogram cells are right-closed (left open) intervals.
density the density of shading lines, in lines per inch. The default value of NULL means that no shading lines are drawn. Non-positive values of density also inhibit the drawing of shading lines.
angle the slope of shading lines, given as an angle in degrees (counter-clockwise).
col a colour to be used to fill the bars. The default of NULL yields unfilled bars.
border the color of the border around the bars. The default is to use the standard foreground color.
main, xlab, ylab these arguments to title have useful defaults here.
xlim, ylim the range of x and y values with sensible defaults. Note that xlim is not used to define the histogram (breaks), but only for plotting (when plot = TRUE).
axes logical. If TRUE (default), axes are draw if the plot is drawn.
plot logical. If TRUE (default), a histogram is plotted, otherwise a list of breaks and counts is returned. In the latter case, a warning is used if (typically graphical) arguments are specified that only apply to theplot = TRUE case.
barplot {graphics} R Documentation

http://stat.ethz.ch/R-manual/R-patched/library/graphics/html/barplot.html

Bar Plots

Description

Creates a bar plot with vertical or horizontal bars.

Usage

barplot(height, …)

## Default S3 method:
barplot(height, width = 1, space = NULL,
names.arg = NULL, legend.text = NULL, beside = FALSE,
horiz = FALSE, density = NULL, angle = 45,
col = NULL, border = par(“fg”),
main = NULL, sub = NULL, xlab = NULL, ylab = NULL,
xlim = NULL, ylim = NULL, xpd = TRUE, log = “”,
axes = TRUE, axisnames = TRUE,
cex.axis = par(“cex.axis”), cex.names = par(“cex.axis”),
inside = TRUE, plot = TRUE, axis.lty = 0, offset = 0,
add = FALSE, args.legend = NULL, …)

Arguments

height either a vector or matrix of values describing the bars which make up the plot. If height is a vector, the plot consists of a sequence of rectangular bars with heights given by the values in the vector. Ifheight is a matrix and beside is FALSE then each bar of the plot corresponds to a column of height, with the values in the column giving the heights of stacked sub-bars making up the bar. If height is a matrix and beside is TRUE, then the values in each column are juxtaposed rather than stacked.
width optional vector of bar widths. Re-cycled to length the number of bars drawn. Specifying a single value will have no visible effect unless xlim is specified.
space the amount of space (as a fraction of the average bar width) left before each bar. May be given as a single number or one number per bar. If height is a matrix and beside is TRUE, space may be specified by two numbers, where the first is the space between bars in the same group, and the second the space between the groups. If not given explicitly, it defaults to c(0,1) if height is a matrix andbeside is TRUE, and to 0.2 otherwise.
names.arg a vector of names to be plotted below each bar or group of bars. If this argument is omitted, then the names are taken from the names attribute of height if this is a vector, or the column names if it is a matrix.
legend.text a vector of text used to construct a legend for the plot, or a logical indicating whether a legend should be included. This is only useful when height is a matrix. In that case given legend labels should correspond to the rows of height; if legend.text is true, the row names of height will be used as labels if they are non-null.
beside a logical value. If FALSE, the columns of height are portrayed as stacked bars, and if TRUE the columns are portrayed as juxtaposed bars.
horiz a logical value. If FALSE, the bars are drawn vertically with the first bar to the left. If TRUE, the bars are drawn horizontally with the first at the bottom.
density a vector giving the density of shading lines, in lines per inch, for the bars or bar components. The default value of NULL means that no shading lines are drawn. Non-positive values of density also inhibit the drawing of shading lines.
angle the slope of shading lines, given as an angle in degrees (counter-clockwise), for the bars or bar components.
col a vector of colors for the bars or bar components. By default, grey is used if height is a vector, and a gamma-corrected grey palette if height is a matrix.
border the color to be used for the border of the bars. Use border = NA to omit borders. If there are shading lines, border = TRUE means use the same colour for the border as for the shading lines.
main,sub overall and sub title for the plot.
xlab a label for the x axis.
ylab a label for the y axis.
xlim limits for the x axis.
ylim limits for the y axis.
xpd logical. Should bars be allowed to go outside region?
log string specifying if axis scales should be logarithmic; see plot.default.
axes logical. If TRUE, a vertical (or horizontal, if horiz is true) axis is drawn.
axisnames logical. If TRUE, and if there are names.arg (see above), the other axis is drawn (with lty=0) and labeled.
cex.axis expansion factor for numeric axis labels.
cex.names expansion factor for axis names (bar labels).
inside logical. If TRUE, the lines which divide adjacent (non-stacked!) bars will be drawn. Only applies when space = 0 (which it partly is when beside = TRUE).
plot logical. If FALSE, nothing is plotted.
axis.lty the graphics parameter lty applied to the axis and tick marks of the categorical (default horizontal) axis. Note that by default the axis is suppressed.
offset a vector indicating how much the bars should be shifted relative to the x axis.
add logical specifying if bars should be added to an already existing plot; defaults to FALSE.
args.legend list of additional arguments to pass to legend(); names of the list are used as argument names. Only used if legend.text is supplied.
arguments to be passed to/from other methods. For the default method these can include further arguments (such as axes, asp and main) and graphical parameters (see par) which are passed toplot.window(), title() and axis.

Details

This is a generic function, it currently only has a default method. A formula interface may be added eventually.

Value

A numeric vector (or matrix, when beside = TRUE), say mp, giving the coordinates of all the bar midpoints drawn, useful for adding to the graph.
If beside is true, use colMeans(mp) for the midpoints of each group of bars,

Author: Ajay Ohri

http://about.me/ajayohri

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s