Graphs in Statistical Analysis

One of the seminal papers establishing the importance of data visualization (as it is now called) was the 1973 paper by F J Anscombe in http://www.sjsu.edu/faculty/gerstman/StatPrimer/anscombe1973.pdf

It has probably the most elegant introduction to an advanced statistical analysis paper that I have ever seen-

1. Usefulness of graphs

Most textbooks on statistical methods, and most statistical computer programs, pay too little attention to graphs. Few of us escape being indoctrinated with these notions:

(1) numerical calculations are exact, but graphs are rough;

(2) for any particular kind of statistical data there is just one set of calculations constituting a correct statistical analysis;

(3) performing intricate calculations is virtuous, whereas actually looking at the data is cheating.

A computer should make both calculations and graphs. Both sorts of output should be studied; each will contribute to understanding.

Of course the dataset makes it very very interesting for people who dont like graphical analysis too much.

From http://en.wikipedia.org/wiki/Anscombe%27s_quartet

 The x values are the same for the first three datasets.

Anscombe’s Quartet
I II III IV
x y x y x y x y
10.0 8.04 10.0 9.14 10.0 7.46 8.0 6.58
8.0 6.95 8.0 8.14 8.0 6.77 8.0 5.76
13.0 7.58 13.0 8.74 13.0 12.74 8.0 7.71
9.0 8.81 9.0 8.77 9.0 7.11 8.0 8.84
11.0 8.33 11.0 9.26 11.0 7.81 8.0 8.47
14.0 9.96 14.0 8.10 14.0 8.84 8.0 7.04
6.0 7.24 6.0 6.13 6.0 6.08 8.0 5.25
4.0 4.26 4.0 3.10 4.0 5.39 19.0 12.50
12.0 10.84 12.0 9.13 12.0 8.15 8.0 5.56
7.0 4.82 7.0 7.26 7.0 6.42 8.0 7.91
5.0 5.68 5.0 4.74 5.0 5.73 8.0 6.89

For all four datasets:

Property Value
Mean of x in each case 9 exact
Variance of x in each case 11 exact
Mean of y in each case 7.50 (to 2 decimal places)
Variance of y in each case 4.122 or 4.127 (to 3 d.p.)
Correlation between x and y in each case 0.816 (to 3 d.p.)
Linear regression line in each case y = 3.00 + 0.500x (to 2 d.p. and 3 d.p. resp.)
But see the graphical analysis –
While R has always been great in emphasizing graphical analysis, thanks in part due to work by H Wickham and others, SAS products and  language has also modified its approach at http://www.sas.com/technologies/analytics/statistics/datadiscovery/
 SAS Visual Data Discovery combines top-selling SAS products (Base SASSAS/STAT® and SAS/GRAPH®), along with two interfaces (SAS® Enterprise Guide® for guided tasks and batch analysis and JMP® software for discovery and exploratory analysis).
 and  ODS Statistical Graphs at
While ODS Statistical graphs is still not as smooth as say R’s GGPLOT2 http://tinyurl.com/ggplot2-book, it still is a progressive step
Pretty graphs make for better decisions too !

 

 

Data Documentation Initiative

Here is a nice initiative in standardizing data documentation for social sciences (which can be quite a relief to legions of analysts)

http://www.ddialliance.org/what

 

 

 

 

Benefits of DDI

The DDI facilitates:

  • Interoperability. Codebooks marked up using the DDI specification can be exchanged and transported seamlessly, and applications can be written to work with these homogeneous documents.
  • Richer content. The DDI was designed to encourage the use of a comprehensive set of elements to describe social science datasets as completely and as thoroughly as possible, thereby providing the potential data analyst with broader knowledge about a given collection.
  • Single document – multiple purposes. A DDI codebook contains all of the information necessary to produce several different types of output, including, for example, a traditional social science codebook, a bibliographic record, or SAS/SPSS/Stata data definition statements. Thus, the document may be repurposed for different needs and applications. Changes made to the core document will be passed along to any output generated.
  • On-line subsetting and analysis. Because the DDI markup extends down to the variable level and provides a standard uniform structure and content for variables, DDI documents are easily imported into on-line analysis systems, rendering datasets more readily usable for a wider audience.
  • Precision in searching. Since each of the elements in a DDI-compliant codebook is tagged in a specific way, field-specific searches across documents and studies are enabled. For example, a library of DDI codebooks could be searched to identify datasets covering protest demonstrations during the 1960s in specific states or countries.
Also see-
  1. http://www.ddialliance.org/Specification/DDI-Codebook/2.1/DTD/Documentation/DDI2-1-tree.html
  2. http://www.ddialliance.org/Specification/DDI-Lifecycle/3.1/

 

Interview Zach Goldberg, Google Prediction API

Here is an interview with Zach Goldberg, who is the product manager of Google Prediction API, the next generation machine learning analytics-as-an-api service state of the art cloud computing model building browser app.
Ajay- Describe your journey in science and technology from high school to your current job at Google.

Zach- First, thanks so much for the opportunity to do this interview Ajay!  My personal journey started in college where I worked at a startup named Invite Media.   From there I transferred to the Associate Product Manager (APM) program at Google.  The APM program is a two year rotational program.  I did my first year working in display advertising.  After that I rotated to work on the Prediction API.

Ajay- How does the Google Prediction API help an average business analytics customer who is already using enterprise software , servers to generate his business forecasts. How does Google Prediction API fit in or complement other APIs in the Google API suite.

Zach- The Google Prediction API is a cloud based machine learning API.  We offer the ability for anybody to sign up and within a few minutes have their data uploaded to the cloud, a model built and an API to make predictions from anywhere. Traditionally the task of implementing predictive analytics inside an application required a fair amount of domain knowledge; you had to know a fair bit about machine learning to make it work.  With the Google Prediction API you only need to know how to use an online REST API to get started.

You can learn more about how we help businesses by watching our video and going to our project website.

Ajay-  What are the additional use cases of Google Prediction API that you think traditional enterprise software in business analytics ignore, or are not so strong on.  What use cases would you suggest NOT using Google Prediction API for an enterprise.

Zach- We are living in a world that is changing rapidly thanks to technology.  Storing, accessing, and managing information is much easier and more affordable than it was even a few years ago.  That creates exciting opportunities for companies, and we hope the Prediction API will help them derive value from their data.

The Prediction API focuses on providing predictive solutions to two types of problems: regression and classification. Businesses facing problems where there is sufficient data to describe an underlying pattern in either of these two areas can expect to derive value from using the Prediction API.

Ajay- What are your separate incentives to teach about Google APIs  to academic or researchers in universities globally.

Zach- I’d refer you to our university relations page

Google thrives on academic curiosity. While we do significant in-house research and engineering, we also maintain strong relations with leading academic institutions world-wide pursuing research in areas of common interest. As part of our mission to build the most advanced and usable methods for information access, we support university research, technological innovation and the teaching and learning experience through a variety of programs.

Ajay- What is the biggest challenge you face while communicating about Google Prediction API to traditional users of enterprise software.

Zach- Businesses often expect that implementing predictive analytics is going to be very expensive and require a lot of resources.  Many have already begun investing heavily in this area.  Quite often we’re faced with surprise, and even skepticism, when they see the simplicity of the Google Prediction API.  We work really hard to provide a very powerful solution and take care of the complexity of building high quality models behind the scenes so businesses can focus more on building their business and less on machine learning.

 

 

Secure Browsing from Mobile and PC ( Tor ,PeerNet, WasteAgain)

While Tor remains the tool of choice with pseudo-techie hacker wannabes , there is enough juice and smoke and mirrors on the market to confuse your average Joe.

For a secure browsing experience on Mobile – do NOT use either Apple or Windows OS

Use Android  and this app called Orbot in particular

Installing Tor with a QR code

Orbot is easy to install by simply scanning the following QR code with your Android Barcode scanner.

Android QR code

Installing Tor from the Android Market

Orbot is available in the Android Market.

ENTER PEERNET

If you have a Dell PC, well just use PeerNet to configure and set up your own network around the neighbourhood. This is particularly applicable if you are in country that is both repressive and not so technologically advanced. Wont work in China or USA.

http://support.dell.com/support/edocs/network/p70008/EN/vista_7/peernet.htm

What is a peer network?

A peer network is a network in which one computer can connect directly to another computer. This capability is accomplished by enabling access point (AP) functionality on one of the computers. Other computers can then connect to this computer in the same way that they would connect to a physical AP. If Internet Connection Sharing is enabled on the computer that has the AP functionality, computers that connect to that computer have Internet connectivity as well.

A basic peer network, which requires no networking knowledge or experience to set up, should meet the needs of most home users and small businesses. By default, a basic peer network is configured with the strongest available security (see How do I set up a basic peer network?).

For users who are familiar with wireless networking technology, advanced configuration features are available to do the following:

Change security settings (see How do I configure my peer network?)
Choose which method (push button or PIN) computers with Wi-Fi Protected Setup™ capability can join your peer network (see How do I allow peer devices to join my peer network using Wi-Fi Protected Setup technology?)
Change the DHCP Server IP address (see How do I configure my peer network?).
Change the channel on which to operate your peer network (see How do I configure my peer network?)

 If you are really really in a need for secure browsing (like you are maybe a big hot shot in the tech world), I suggest go over to VMWare

http://www.vmware.com/products/player/

create a seperate Linux (Ubuntu for ease) virtual disc, then download the Tor Browser Bundle from

https://www.torproject.org/projects/torbrowser.html.en for surfing and a Peernet (above) or  a prepaid one time use disposable mobile pre-paid wireless card. It is also quite easy to delete your virtual disc in times of emergencies (but it is best to use encryption even when in Ubuntu https://help.ubuntu.com/community/EncryptedHome)

IRC chat is less secure than you think it is thanks to BOT  Trawlers- so I am hoping someone in the open source community updates Waste Again for encrypted chats http://wasteagain.sourceforge.net/

What is “WASTE again”?

“WASTE again” enables you to create a decentralized and secure private mesh network using an unsecure network, such as the internet. Once the public encryption keys are exchanged, sending messages, creating groupchats and transferring files is easy and secure.

Creating a mesh

To create a mesh you need at least two computers with “WASTE again” installed. During installation, a unique pair of public and private keys for each computer is being generated. Before the first connection can be established, you need to exchange these public keys. These keys enable “WASTE again” to authenticate every connection to other “WASTE again” clients.

After exchanging the keys, you simply type in the computers IP address to connect to. If that computer is located behind a firewall or a NAT-router, you have to create a portmap first to enable incoming connections.

At least one computer in your mesh has to be able to accept incoming connections, making it a “public node”. If no direct connection between two firewalled computers can be made, “WASTE again” automatically routes your traffic through one or more of the available public nodes.

Every new node simply has to exchange keys with one of the connected nodes and then connect to it. All the other nodes will exchange their keys automatically over the mesh.

Statistics on Social Media

Some official statistics on social media from the owners themselves

1) Facebook-

http://www.facebook.com/press/info.php?statistics

Date -17 Nov 2011

Statistics

People on Facebook

Creating Pages on Google Plus for some languages

So I decided to create Pages on Google Plus for my favorite programming languages.

a programming language that lets you work more quickly and integrate your systems more effectively

Add to circles

  –  Comment  –  Share

Ajay Ohri

Ajay Ohri's profile photo

Ajay Ohri  –
Ajay Ohri shared a Google+ page with you.
Structured Query Language
Leading statistical language since 1960’s especially in sociology and market research
The leading statistical language in the world
The leading statistical language since 1970’s

Python

https://plus.google.com/107930407101060924456/posts

 

These are in accordance with Google’s Policies http://www.google.com/intl/en/+/policy/pagesterm.html  Continue reading “Creating Pages on Google Plus for some languages”

Google Plus Hangouts gets Enterprise Level Upgrade

Check out the new Google Plus Hangout with Extras

http://www.google.com/support/plus/bin/answer.py?hl=en&answer=1289346&ctx=go&hl=en

About Hangouts with Extras

Hangouts with Extras is a simple and easy way to connect and collaborate with your colleagues in real time. With Hangouts with Extras you can:

Connect with multiple people simultaneously: With group video chat and web conferencing you can connect with multiple people around the world at the same time.

Share your screen: Ever look at something that you couldn’t quite put into words? Well, with screen sharing you give other people the ability to view what’s on your computer screen. You can choose an open window screen on your computer and give everyone in your meeting the ability to look at it. Learn More

Collaborate in real time: You can meet, share notes, and even work on documents at the same time.Learn More

For enterprises- you can throw out your video conferencing software and collaboration tools and get a new mobile app for free.

Small drawbacks in the Google Plus- lack of integration with Youtube (it is one way integration from youtube to hangouts but not the other way round fixed), lack of a whiteboard  for sketches- (like again a shortcut to a google doc 🙂 ) or even bundling the record from your web cam to record your desktop.

Ultimately enterprises want to know how they can use this stuff for e-learning modules or webcasts.

—————–END—————————-

Alternative uses-

Check out NY Met Museum with Friends (thanks to Google Art Project)

Play Linkin Park Playlist (100 videos) with Friends btw. great graphic redesign of Youtube icons!! Now if we could only convince the Google Docs to get more integrated with Open Office or LibreOffice templates

or even set up a DJ table session using Google Hangouts. with Extras of course.

But as it stands it may be good to go for webcasts !!