Top Funny Charts

I have recently become a Quora addict, and you can see why it is such a great site. If possible say hello to me there at

http://www.quora.com/Ajay-Ohri

My latest favorite question-

What are the most hilarious pie charts?

https://www.quora.com/Pie-Charts/What-are-the-most-hilarious-pie-charts

I am only showing you some of the answers, you can see the rest yourself.

 

 

Worst Chart Ever- Confusing PIE chart as English Test

THE IELTS is used for testing non native speakers to test if they understand English properly.

Imagine many Indian and Chinese smart engineers answering this question.

From-

http://www.ielts-blog.com/recent-ielts-exams/ielts-test-in-the-uk-spain-march-2010-academic-module/

Writing test

Writing Task 1 (a report)
Three pie charts about young Australians secondary school leavers in years 1980, 1990 and 2000. Each pie showed the proportion of school leavers that continued studying, were employed or unemployed. Write a report to a university lecturer describing the pie charts below.
IELTS Academic Writing Task 1 Pie Charts

Top Ten Graphs for Business Analytics -Pie Charts (1/10)

I have not been really posting or writing worthwhile on the website for some time, as I am still busy writing ” R for Business Analytics” which I hope to get out before year end. However while doing research for that, I came across many types of graphs and what struck me is the actual usage of some kinds of graphs is very different in business analytics as compared to statistical computing.

The criterion of top ten graphs is as follows-

1) Usage-The order in which they appear is not strictly in terms of desirability but actual frequency of usage. So a frequently used graph like box plot would be recommended above say a violin plot.

2) Adequacy- Data Visualization paradigms change over time- but the need for accurate conveying of maximum information in a minium space without overwhelming reader or misleading data perceptions.

3) Ease of creation- A simpler graph created by a single function is more preferrable to writing 4-5 lines of code to create an elaborate graph.

4) Aesthetics– Aesthetics is relative and  in addition studies have shown visual perception varies across cultures and geographies. However , beauty is universally appreciated and a pretty graph is sometimes and often preferred over a not so pretty graph. Here being pretty is in both visual appeal without compromising perceptual inference from graphical analysis.

 

so When do we use a bar chart versus a line graph versus a pie chart? When is a mosaic plot more handy and when should histograms be used with density plots? The list tries to capture most of these practicalities.

Let me elaborate on some specific graphs-

1) Pie Chart- While Pie Chart is not really used much in stats computing, and indeed it is considered a misleading example of data visualization especially the skewed or two dimensional charts. However when it comes to evaluating market share at a particular instance, a pie chart is simple to understand. At the most two pie charts are needed for comparing two different snapshots, but three or more pie charts on same data at different points of time is definitely a bad case.

In R you can create piechart, by just using pie(dataset$variable)

As per official documentation, pie charts are not  recommended at all.

http://stat.ethz.ch/R-manual/R-patched/library/graphics/html/pie.html

Pie charts are a very bad way of displaying information. The eye is good at judging linear measures and bad at judging relative areas. A bar chart or dot chart is a preferable way of displaying this type of data.

Cleveland (1985), page 264: “Data that can be shown by pie charts always can be shown by a dot chart. This means that judgements of position along a common scale can be made instead of the less accurate angle judgements.” This statement is based on the empirical investigations of Cleveland and McGill as well as investigations by perceptual psychologists.

—-

Despite this, pie charts are frequently used as an important metric they inevitably convey is market share. Market share remains an important analytical metric for business.

The pie3D( ) function in the plotrix package provides 3D exploded pie charts.An exploded pie chart remains a very commonly used (or misused) chart.

From http://lilt.ilstu.edu/jpda/charts/chart%20tips/Chartstip%202.htm#Rules

we see some rules for using Pie charts.

 

  1. Avoid using pie charts.
  2. Use pie charts only for data that add up to some meaningful total.
  3. Never ever use three-dimensional pie charts; they are even worse than two-dimensional pies.
  4. Avoid forcing comparisons across more than one pie chart

 

From the R Graph Gallery (a slightly outdated but still very comprehensive graphical repository)

http://addictedtor.free.fr/graphiques/RGraphGallery.php?graph=4

par(bg="gray")
pie(rep(1,24), col=rainbow(24), radius=0.9)
title(main="Color Wheel", cex.main=1.4, font.main=3)
title(xlab="(test)", cex.lab=0.8, font.lab=3)
(Note adding a grey background is quite easy in the basic graphics device as well without using an advanced graphical package)

 

HIGHLIGHTS from REXER Survey :R gives best satisfaction

Simple graph showing hierarchical clustering. ...
Image via Wikipedia

A Summary report from Rexer Analytics Annual Survey

 

HIGHLIGHTS from the 4th Annual Data Miner Survey (2010):

 

•   FIELDS & GOALS: Data miners work in a diverse set of fields.  CRM / Marketing has been the #1 field in each of the past four years.  Fittingly, “improving the understanding of customers”, “retaining customers” and other CRM goals are also the goals identified by the most data miners surveyed.

 

•   ALGORITHMS: Decision trees, regression, and cluster analysis continue to form a triad of core algorithms for most data miners.  However, a wide variety of algorithms are being used.  This year, for the first time, the survey asked about Ensemble Models, and 22% of data miners report using them.
A third of data miners currently use text mining and another third plan to in the future.

 

•   MODELS: About one-third of data miners typically build final models with 10 or fewer variables, while about 28% generally construct models with more than 45 variables.

 

•   TOOLS: After a steady rise across the past few years, the open source data mining software R overtook other tools to become the tool used by more data miners (43%) than any other.  STATISTICA, which has also been climbing in the rankings, is selected as the primary data mining tool by the most data miners (18%).  Data miners report using an average of 4.6 software tools overall.  STATISTICA, IBM SPSS Modeler, and R received the strongest satisfaction ratings in both 2010 and 2009.

 

•   TECHNOLOGY: Data Mining most often occurs on a desktop or laptop computer, and frequently the data is stored locally.  Model scoring typically happens using the same software used to develop models.  STATISTICA users are more likely than other tool users to deploy models using PMML.

 

•   CHALLENGES: As in previous years, dirty data, explaining data mining to others, and difficult access to data are the top challenges data miners face.  This year data miners also shared best practices for overcoming these challenges.  The best practices are available online.

 

•   FUTURE: Data miners are optimistic about continued growth in the number of projects they will be conducting, and growth in data mining adoption is the number one “future trend” identified.  There is room to improve:  only 13% of data miners rate their company’s analytic capabilities as “excellent” and only 8% rate their data quality as “very strong”.

 

Please contact us if you have any questions about the attached report or this annual research program.  The 5th Annual Data Miner Survey will be launching next month.  We will email you an invitation to participate.

 

Information about Rexer Analytics is available at www.RexerAnalytics.com. Rexer Analytics continues their impressive journey see http://www.rexeranalytics.com/Clients.html

|My only thought- since most data miners are using multiple tools including free tools as well as paid software, Perhaps a pie chart of market share by revenue and volume would be handy.

Also some ideas on comparing diverse data mining projects by data size, or complexity.

 

Common Analytical Tasks

WorldWarII-DeathsByCountry-Barchart
Image via Wikipedia

 

Some common analytical tasks from the diary of the glamorous life of a business analyst-

1) removing duplicates from a dataset based on certain key values/variables
2) merging two datasets based on a common key/variable/s
3) creating a subset based on a conditional value of a variable
4) creating a subset based on a conditional value of a time-date variable
5) changing format from one date time variable to another
6) doing a means grouped or classified at a level of aggregation
7) creating a new variable based on if then condition
8) creating a macro to run same program with different parameters
9) creating a logistic regression model, scoring dataset,
10) transforming variables
11) checking roc curves of model
12) splitting a dataset for a random sample (repeatable with random seed)
13) creating a cross tab of all variables in a dataset with one response variable
14) creating bins or ranks from a certain variable value
15) graphically examine cross tabs
16) histograms
17) plot(density())
18)creating a pie chart
19) creating a line graph, creating a bar graph
20) creating a bubbles chart
21) running a goal seek kind of simulation/optimization
22) creating a tabular report for multiple metrics grouped for one time/variable
23) creating a basic time series forecast

and some case studies I could think of-

 

As the Director, Analytics you have to examine current marketing efficiency as well as help optimize sales force efficiency across various channels. In addition you have to examine multiple sales channels including inbound telephone, outgoing direct mail, internet email campaigns. The datawarehouse is an RDBMS but it has multiple data quality issues to be checked for. In addition you need to submit your budget estimates for next year’s annual marketing budget to maximize sales return on investment.

As the Director, Risk you have to examine the overdue mortgages book that your predecessor left you. You need to optimize collections and minimize fraud and write-offs, and your efforts would be measured in maximizing profits from your department.

As a social media consultant you have been asked to maximize social media analytics and social media exposure to your client. You need to create a mechanism to report particular brand keywords, as well as automated triggers between unusual web activity, and statistical analysis of the website analytics metrics. Above all it needs to be set up in an automated reporting dashboard .

As a consultant to a telecommunication company you are asked to monitor churn and review the existing churn models. Also you need to maximize advertising spend on various channels. The problem is there are a large number of promotions always going on, some of the data is either incorrectly coded or there are interaction effects between the various promotions.

As a modeller you need to do the following-
1) Check ROC and H-L curves for existing model
2) Divide dataset in random splits of 40:60
3) Create multiple aggregated variables from the basic variables

4) run regression again and again
5) evaluate statistical robustness and fit of model
6) display results graphically
All these steps can be broken down in little little pieces of code- something which i am putting down a list of.
Are there any common data analysis tasks that you think I am missing out- any common case studies ? let me know.

 

 

 

An Introduction to Data Mining-online book

I was reading David Smith’s blog http://blog.revolutionanalytics.com/

where he mentioned this interview of Norman Nie, at TDWI

http://tdwi.org/Articles/2010/11/17/R-101.aspx?Page=2

where I saw this link (its great if you want to study Data Mining btw)

http://www.kdnuggets.com/education/usa-canada.html

and I c/liked the U Toronto link

http://chem-eng.utoronto.ca/~datamining/

Best of All- I really liked this online book created by Professor S. Sayad

Its succinct and beautiful and describes all of the Data Mining you want to read in one Map (actually 4 images painstakingly assembled with perfection)

The best thing is- in the original map- even the sub items are click-able for specifics like Pie Chart and Stacked Column chart are not in one simple drop down like Charts- but rather by nature of the kind of variables that lead to these charts. For doing that- you would need to go to the site itself- ( see http://chem-eng.utoronto.ca/~datamining/dmc/categorical_variables.htm

vs

http://chem-eng.utoronto.ca/~datamining/dmc/categorical_numerical.htm

Again- there is no mention of the data visualization software used to create the images but I think I can take a hint from the Software Page which says software used are-

Software

See it on your own-online book (c)Professor S. Sayad

Really good DIY tutorial

http://chem-eng.utoronto.ca/~datamining/dmc/data_mining_map.htm

Business Intelligence and Stat Computing: The White Man's Last Stand

Unknown White Male
Image via Wikipedia

Name an industry in which top level executives are mostly white males, new recruits are mostly male (white or Indian/Chinese), women are primarily shunted into publicity relationships, social media or marketing.

Statistical Computing And Business Intelligence are the white man’s last stand to preserve an exclusive club of hail fellow well met and lets catch up after drinks culture. Newer startups are the exception in the business intelligence world , but  a whiter face helps (so do an Indian or Chinese male) to attract a mostly male white venture capital industry.

I have earlier talked about technology being totally dominated by Asian males at grad student level and ASA membership almost not representing minorities like blacks and yes women- but this is about corporate culture in the traditional BI world.

If you are connected to the BI or Stat Computing world, who would you rather hire AND who have you actually hired- with identical resumes

White Male or White Female or Brown Indian Male/Female or Yellow Male/Female or Black Male or Black Female

How many Black Grad Assistants do you see in tech corridors- (Nah- it is easier to get a  hard working Chinese /Indian- who smiles and does a great job at $12/hour)

How many non- Asian non white Authors do you see in technology and does that compare to pie chart below


racist image Pictures, Images and Photos

Note_ 2010 Census numbers arent available for STEM, and I was unable to find ethnic background for various technology companies, because though these numbers are collected for legal purposes, they are not publicly shared.

Any technology company which has more than 40% women , or more than 10% blacks would be fairly representative to the US population. Anecdotal evidence suggests European employment for minorities is worse (especially for Asians) but better for women.

Any data sources to support/ refute these hypothesis are welcome for purposes of scientific inquiry.

racist math image Pictures, Images and Photos

%d bloggers like this: