Broad Guidelines for Graphs

Here are some broad guidelines for Graphs from EIA.gov , so you can say these are the official graphical guidelines of USA Gov

They can be really useful for sites planning to get into the Tableau Software/NYT /Guardian Infographic mode- or even for communities of blogs that have recurrent needs to display graphical plots- particularly since communication, statistical and design specialists are different areas/expertise/people.

Energy Information Administration Standard

Broad Guidelines for Graphs-I am reproducing an example from EIA ‘s guidelines for graphs-
http://www.eia.gov/about/eia_standards.cfm#Standard25

Energy Information Administration Standard 2009-25

Title: Statistical Graphs
Superseded Version: Standard 2002-25
Purpose: To ensure the utility (usefulness to intended users) and objectivity (accuracy, clarity, completeness, and lack of bias) of energy information presented in statistical graphs.
Applicability: All EIA information products.
Required Actions:

  1. Graphs should be used to show and compare changes, trends and/or relationships, and to assist users in visualizing the conclusions drawn from the data represented.
  2. A graph should contain sufficient Continue reading “Broad Guidelines for Graphs”

The Comic Water Games (aka Common Wealth Games)

We in Delhi, India are a tough people. With summer temperatures from 46 Degree Celcius (114 Degree Fahrenheit) and Winter temperatures from 2-3 Degree Celcius (just above freezing), high pollution levels, the worst traffic jams (and highest per capita cars)- there is very little that intimidates the Average Delhiite-

But the Return of the British Empire is scaring us- and it is called Common Wealth Games. The Common Wealth is a group of countries that used to be colonized by Britain in her colonial days ( USA is not a member though- as they probably kicked way too much British butt while gaining independence).

And every 4 years they have CommonWealth games (read games for the non US English speaking world). So when our commie neighborhood– the Chinese went and got themselves an Olympics- we decided to get ourselves this CWG games too. Big deal- national pride- rising economic power and all that.

So far the Games has meant the following- lots of roads dug up, lot of stadiums in various degrees of preparation, a total cost of 2 Billion USD, rampant allegations of corruption due to the ten times increase in budget – including rather suspicious looking documents procured by our local press (yes Indian press is free as it is a democracy)

And add divine grace. Delhi has the wettest monsoon since 1978- it rains cats and dogs in September- and we now have a mini dengue malaria epidemic. 4 countries have declared the living quarters for athletes as uninhabitable , some have walked out, the inevitable terrorists injured two Taiwanese tourists this weekend (in a semi ironic email they said they were prepared as the government was prepared- it isn’t)

Today a bridge collapsed-

http://www.nytimes.com/2010/09/22/sports/22iht-GAMES.html?_r=1&hp

On Tuesday afternoon, a bridge next to Jawaharlal Nehru Stadium, the main Games venue, fell apart. The footbridge collapsed into three pieces, taking several workers with it and uprooting one side of the arch that supported it.

A police officer at the scene said that 27 people had been injured, four of them seriously, in the collapse.

“This will not affect the Games,” said Raj Kumar Chauhan, a Delhi minister for development, who spoke on the scene. “We can put the bridge up again, or make a new one.”

and

http://www.nytimes.com/2010/09/20/world/asia/20india.html?ref=sports

“We really need to learn how to plan,” said Vrinda Walavalkar, a public relations executive who is not connected to the Games.

“Maybe we feel we have so many lifetimes to achieve things” that it does not matter if it gets done this time, she said.

Mr. Gupta, the shopkeeper, found a metaphor in Hindu wedding tradition.

The groom’s party, known as the barat, traditionally marches to the bride’s house on horseback with his friends and family, he explained. When the barat appears, the bride has to come to the door, he said.

“If the bride is not ready, you patch her up and try to hide all her defects,” Mr. Gupta said, and then you send her outside.

————————————————————————————————————–

To some this may be shocking. To the average Delhi-ite battling traffic and rain , this is one more episode in the chaotic Capital. As a small solace- Delhi still has the best and cheapest street food this part of the world- with golgappas, tikki and chat. If only you can beat the rain to get them !

Also see http://en.wikipedia.org/wiki/Delhi if you like to know more.

GNU PSPP- The Open Source SPSS

If you are SPSS user (for statistics/ not data mining) you can also try 0ut GNU PSPP- which is the open source equivalent and quite eerily impressive in performance. It is available at http://www.gnu.org/software/pspp/ or http://pspp.awardspace.com/ and you can also read more at http://en.wikipedia.org/wiki/PSPP

PSPP is a program for statistical analysis of sampled data. It is a Free replacement for the proprietary program SPSS, and appears very similar to it with a few exceptions.

[ Image of Variable Sheet ]The most important of these exceptions are, that there are no “time bombs”; your copy of PSPP will not “expire” or deliberately stop working in the future. Neither are there any artificial limits on the number of cases or variables which you can use. There are no additional packages to purchase in order to get “advanced” functions; all functionality that PSPP currently supports is in the core package.

PSPP can perform descriptive statistics, T-tests, linear regression and non-parametric tests. Its backend is designed to perform its analyses as fast as possible, regardless of the size of the input data. You can use PSPP with its graphical interface or the more traditional syntax commands.

A brief list of some of the features of PSPP follows:

  • Supports over 1 billion cases.
  • Supports over 1 billion variables.
  • Syntax and data files are compatible with SPSS.
  • Choice of terminal or graphical user interface.
  • Choice of text, postscript or html output formats.
  • Inter-operates with GnumericOpenOffice.Org and other free software.
  • Easy data import from spreadsheets, text files and database sources.
  • Fast statistical procedures, even on very large data sets.
  • No license fees.
  • No expiration period.
  • No unethical “end user license agreements”.
  • Fully indexed user manual.
  • Free Software; licensed under GPLv3 or later.
  • Cross platform; Runs on many different computers and many different operating systems.

PSPP is particularly aimed at statisticians, social scientists and students requiring fast convenient analysis of sampled data.

and

Features

This software provides a basic set of capabilities: frequencies, cross-tabs comparison of means (T-tests and one-way ANOVA); linear regression, reliability (Cronbach’s Alpha, not failure or Weibull), and re-ordering data, non-parametric tests, factor analysis and more.

At the user’s choice, statistical output and graphics are done in asciipdfpostscript or html formats. A limited range of statistical graphs can be produced, such as histogramspie-charts and np-charts.

PSPP can import GnumericOpenDocument and Excel spreadsheetsPostgres databasescomma-separated values– and ASCII-files. It can export files in the SPSS ‘portable’ and ‘system’ file formats and to ASCII files. Some of the libraries used by PSPP can be accessed programmatically; PSPP-Perl provides an interface to the libraries used by PSPP.

Origins

The PSPP project (originally called “Fiasco”) is a free, open-source alternative to the proprietary statistics package SPSS. SPSS is closed-source and includes a restrictive licence anddigital rights management. The author of PSPP considered this ethically unacceptable, and decided to write a program which might with time become functionally identical to SPSS, except that there would be no licence expiry, and everyone would be permitted to copy, modify and share the program.

Release history

  • 0.7.5 June 2010 http://pspp.awardspace.com/
  • 0.6.2 October 2009
  • 0.6.1 October 2008
  • 0.6.0 June 2008
  • 0.4.0.1 August 2007
  • 0.4.0 August 2005
  • 0.3.0 April 2004
  • 0.2.4 January 2000
  • 0.1.0 August 1998

Third Party Reviews

In the book “SPSS For Dummies“, the author discusses PSPP under the heading of “Ten Useful Things You Can Find on the Internet” [1]. In 2006, the South African Statistical Association presented a conference which included an analysis of how PSPP can be used as a free replacement to SPSS [2].

Citation-

Please send FSF & GNU inquiries to gnu@gnu.org. There are also other ways to contact the FSF. Please send broken links and other corrections (or suggestions) to bug-gnu-pspp@gnu.org.

Copyright © 1996, 1997, 1998, 1999, 2000, 2001, 2002, 2003, 2004, 2005, 2006, 2007 Free Software Foundation, Inc., 51 Franklin St – Suite 330, Boston, MA 02110, USA – Verbatim copying and distribution of this entire article are permitted worldwide, without royalty, in any medium, provided this notice, and the copyright notice, are preserved.

Fighting Back -The Net, Social Media, Spam, Identity Theft, Terrorism

Recently some influential bloggers got nailed by TSA for leaking security directives of Airlines that were issued post the failed Christmas Day attack. While the first Amendment is a much admired piece of legislation, a blogger’s right to blog cannot be greater than his desire to see his fellow citizens safe.

[tweetmeme=”Decisionstats”]

As someone who is brown , male, single ( and thus automatically a TSA curiousity) I travel to places like New York, San Fransisco, Austin, Atlanta, Ohio, Las Vegas for both personal and professional work- some of the following may be purely personal experiences.

1) Some of the biggest drawbacks that Airlines have had in the past were- Airport checks Susceptibility to social engineering. – They would rarely glance at a photo id if it is an American driver license but would do a proper job if it is a external passport. Unfortunately the second generation of Arab/ Asian exiles that are prone to internet based clerics have American issued passports as well as licenses. In addition they go to colleges and play soccer with actual citizens of foreign countries who can motivate or guide them. A look at the number of Arabic- Asian students in the University system who are not vetted by the TSA would reveal the magnitude of the problem.

I flew from Knoxville Tennessee to Las Vegas some months back on college I card, en route on my way back, I went through Washington, and was also hospitalized. Thus using a Vol Card, an Indian driving license and an American social security card – I managed to travel almost all the landscape. In addition I passed through enough transit airports to switch my destination. Sometimes I am so good I scare myself..

In order to catch a thief, the TSA needs to think like a thief rather than waste time and precious agents on just another liberal blogger. Have a contest open to all members of the public, and especially hackers, social media spammers, identity thieves- most of whom are starving people who need money AND respect. Say here is our system- and our processes. Break it to win a million dollars but share the solution with us in private.

2) Some elements of social media should be reviewed for a secure online identity. Twitter has a system for authenticating prominent people, that should be rolled out for all users of Facebook, Twitter, Linkedin. The costs should be subsidized by the airlines given the bail outs they received in 2004. or the Airlines should simply give an equity stake as the banks and the car companies did- to ensure  there is no cutting of corners to make profits

3) Analyzing chatter While the NSA and the TSA and CIA and the AAA etc monitor the internet for data and specifically terror linked chatter- these cases point to the fact that they need to adopt faster ways of crunching data ( MapReduce for fighting Terror- maybe is not a bad idea after all). Companies like SAS, SPSS, Revolution Computing can then collaborate with the data gathering companies by embedded analytical solutions.

What is more important? Catching people who are defaulting on their mortgages ( that can wait for a quarter and you can still catch them with more penal interest)

or Catching people who are defaulting on their conscience ( within 2 days of writing that email, tweet, facebook). Think of it as creating a big new system of online parking tickets, you can even create a lucrative online health insurance market by asking people to seek compulsory identity theft protection and insurance.

4) Spam and Identity Theft go hand in hand and so far as now have been dismissed by financial authorities as just another operating loss that shaves a few basis points. But when terrorists who are trained to blow up people get a sweet fake identity they can use to cause catastrophic losses in terms of market capitalization. If all onus on fraudulent transactions is placed firmly on the financial organizations including hefty fines they will be move much faster at eliminating these thefts.

5) Modifying customer facing interfaces- All American financial institutions have to abide by Fair Credit Lending Act and the USury act and the PAtriot Act ( ?). Since what they report is more or less the same- the interfaces of forms can be re designed or guideliness issued so that they are easy to read. A lot of fraud is caused because of the fine print phenomenon. Fine print can be fine in quality not just the font size. Design on the web needs to be monitored so that operations and risks forms have the same importance as marketing brochures. ( A sarcastic example below on Image Credits using just color and font size)

6) Kill all the terrorists.

That;s how they did in my native state of Punjab in India.

7) Point 6 may be an analytical over reaction. With social media tools that the new Govt is rolling out- citizens can play more prominent roles in suspicious activities tracking. Use your Android or I Phone to tweet to a secure govt website on anything suspicious. The techies there would have installed Map Reduce and a Data Miner solution to cut the signal and noise chatter and get to the point of impact faster. Rather than wait for Daddy to call.

Disclaimer- The author knows no government sources and no terrorists. Some of his insights are personal given his father helped fight terrorists trained in Pakistan for 2 decades while in India. These are purely personal views only and all trademarks are acknowledged etc etc.

( and yes United Airlines kept me for 4 hours on an airport, that has no co relation to this story)

Image credit ( or how credit card companies charge fees)-

The Great Game- How social media changes the Intelligence Industry

Since time immemorial, countries and corporations have used spies to displace existing equilibriums in balance of power or market share dynamics. An integral part of that was technology. From the pox infested rugs given to natives, to the plague rats, to the smuggling of the secret of silk and gunpowder from China to the West to the latest research in cloud seeding by China and Glaciars melting by India- technology espionage has been an integral part in keeping up with each other.

For the first time in history, technology has evolved to the point where tools for communicating securely , storing data has become cheap to the point of just having a small iPhone 3GS with applications for secure transmission. From an analytical purpose the need for analyzing signal from noise and the criticality in mapping chatter with events (like Major Hasan’s online activities)  has also created an opportunity for social media as well as an headache for the people involved. With Citizen Journalism, foreign relations office, and ambassadors with their bully pulpits have been brought down to defending news leaked by Twitter ( Iran) You Tube ( Thailand/Burma/Tibet) and Blogs ( Russia/Georgia). The rise of bot nets, dark clouds to create disruptions as well as hack into accounts for enhancing favourable noise and reducing unfavourable signals has only increased. Blogs have potential to influence customer behavior as they are seen more credible than public relations which is mostly public and rarely on relations.

Techniques like sentiment analysis , social network analysis, text mining and co relation of keywords to triggers remain active research points.

[tweetmeme=”decisionstats”]

The United States remains a leader as you can only think creatively out of a box if you are permitted to behave accordingly out of the box. The remaining countries are torn between a  mix of admiration , envy and plain old copy cat techniques. The rising importance of communities that act more tribal than hitherto loyal technology user lists is the reason almost all major corporates actively seek to cultivate social media communities. The market for blogs and twitter in China or Iran or Russia will have impacts on those government’s efforts to manage their growth as per their national strategic interests. Just like the title of an old and quaint novel- “The Brave New World” of social media and it’s convergence with increasing amounts of text data generated on customers, or citizens is evolving into creating new boundaries and space for itself.A fascinating Great Game in itself.