Movie Review : Rajneeti (Politics)

When one of the oldest action stories in the world, the Mahabharata ( a Sanskrit epic is mashed together with some inspired Michael Corleone like scenes from the Godfather) the result is another hearland drama from Prakash Jha. With an ensemble cast from Nasseerudin Shah, to Nana Patekar, Manoj Bajpai, Ajay Devgan with some inspired acting from Arjun Rampal, Katrina Kaif and Ranbir Kapoor- this one is worth a dekko at a theatre near you. With twists and turns, almost very little music, and action galore along with dramatic scenes subtly mixed in a chutney flavor- Rajneeti shows Politics at its worst and Bollywood movie making  at it’s best.

and yes Katrina Kaif does look like Sonia Gandhi, but the story is clearly influenced from Karna and Duryodhana from the Mahabharata, and if you are a movie buff you would snicker at the Godfather scenes (like the breaking of the jaw at the hospital, the dead horse er body in the bed etc). This one is the latest hit here in India.

SAS Early Days

From Anthony Barr, creator of SAS language at

http://www.barrsystems.com/about_us/the_company/sas_history.asp

and http://en.wikipedia.org/wiki/SAS_(software)#Early_history_of_SAS

A fascinating Proc by Proc read of who created what in those days. Quite easily some of the best work was coded in the 1970’s by Sall, Goodnight and Barr et al.

SAS Related History

SAS Beginnings talk at NCSU April 21, 2010

Sept 1962 – May 1963 Began assistantship with North Carolina State University Computing Center. I was assigned to work with the Statistics Department.

Created general analysis of variance program controlled by analysis of variance language similar to the notation of Kendal. Program was written on IBM 1410 assembler. Dr. A. Grandage, author of IBM 650 analysis of variance programs, advised on Analysis of Variance calculations. “Statistical programs for the IBM 650-Part I, Communications of the ACM, Volume 2, Issue 8”

June – Aug 1963 Summer fellowship in Physical Oceanography, Woods Hole Oceanographic Institute
Sept 1963 – May 1964 Resumed assistantship with North Carolina State Computing Center. Wrote multiple regression program with a compiler that generated machine code for transforming data. Dr. A. Grandage advised on the Doolittle procedure for inverting matrices.
June 1964 – May 1966 Employed with IBM Federal Systems Division at the Pentagon, Washington. DC.

I was assigned to work with the National Military Command Center, the information processing branch of the Joint Chiefs of Staff.

Project: Rewrite and enhance the Formatted File System, a generalized data based management system for retrieval and report writing.

Implemented three of the five major components: retrieval, sorting, and file update.

Innovated the idea of a uniform Lexical Analyzer for all languages in the system with a uniform method of handling all error messages within the system.

With the experience in this environment, I saw the power of the self-defining file for providing overall structure to the information processing world.

It became obvious that I could put statistical procedures in the same formatted file framework. At the same time, manuals for PL/1 appeared in the IBM library. The Lexical design of PL/l was an improvement over that used in the Formatted File System.

June 1966 I was recruited by North Carolina State University Statistics Department to rewrite analysis of variance and regression programs for the IBM 360.

I saw this as an opportunity to develop the Statistical Analysis System (SAS).

I wrote the analysis of variance program while independently developing the SAS software for inputting and transforming data.

Sept 1966 Presented conceptual ideas of SAS to members of the Committee on Statistical Software of the University Statisticians of Southeast Experiment Station (USSERS). The meeting was held in Athens, GA. Individuals present:

Frank Verlinden, North Carolina State University

Anthony J. Barr, North Carolina State University

Walt Drapula, Mississippi State University

Jim Fortson, University of Georgia

January 1968 Jim Goodnight and I cooperated in putting his regression program into SAS.

This procedure was invaluable to pharmaceutical and agricultural scientists in analysis of experiments with missing data.

Barr:
Developed language for describing regression and analysis of variance model, and preprocessor for creating dummy variables

Goodnight:
Developed regression and statistical routines that made practical the analysis of variance methodology within the regression framework

August 1972 Release of 1972 version of SAS. This was the first release to achieve wide distribution. SAS was now recognized as a major system in statistical computing.

Credits for SAS 72 as described in SAS 76 Users Guide:

Anthony J. Barr
Language translator; data management and supervisor; ANOVA, DUNCAN, FACTOR, GUTTMAN, INBREED, LATTICE, NESTED, PLAN, PRINT, RANK, SORT, SPEARMAN

James H. Goodnight
CANCORR, CORR, DISCRIM, MEANS, PLOT, PROBIT, REGR, RSQUARE, RQUE, STANDARD, STEPWISE.

Jolayne W. Service
“A User’s Guide to the Statistical Analysis System”

Carroll G. Perkins
HARVEY, HIST, PRTPCH: A Guide to the Supplementary Procedures Library for the Statistical Analysis System

37,000 total lines of code with distribution:

  • Barr ………………….65%
  • Goodnight …………..32%
  • Others…………………3%

I had developed and implemented the language, data management, and interface to operating system.

June 1973 – May 1976 I rewrote the internals of SAS: Data Management, report writing and the compiler.

John Sall joined us in 1973 (approx.).

June 1976 Release of 1976 version of SAS.

The 76 version was a functionally complete system for statistical computing and business data analysis.

I wrote the systems portion of the software.

Credits in the SAS 1976 manual:

Anthony J. Barr
Language translator; data management and supervisor; GUTTMAN, NESTED, PRINT, SORT

James H. Goodnight
ANOVA, CLUSTER, DISCRIM, GLM. MEANS, NEIGHBOR, NLIN, PROBIT, RSQUARE, STANDARD, STEPWISE, TTEST, VARCOMP

John P. Sall
AUTOREG, BMDP, CONTENTS, CORR, DUNCAN, EDITOR, FACTOR, FREQ, MATRIX, OPTIONS, PLAN, RANK, SA572, SCORE, SPECTRA, SYSREG, function library

Jane T. Helwig
“A User’s Guide to SAS 76”

Carroll G. Perkins (consultant)
CONVERT, SCATTER

67,000 total lines of code with distribution:

  • Barr ……………………35%
  • Goodnight …………….18%
  • Sall……………………..43%
June 1976 SAS Institute, Inc. was incorporated.

Principals and percentage of ownership:

  • Anthony J. Barr ……..40%
  • James H. Goodnight ..35%
  • John Sall ……………..17%
  • Jane Helwig ……………8%
January 1979 I resigned from SAS Institute

Copyright © 2006 Anthony J. Barr

Certifications in Analytics and Business Intelligence

I sometimes get a chat message on Twitter/ Facebook asking for help on some specific data issue. More often than not it is something like – How do I get started in BI/BA /Data stuff. So here is a list of certifications which I think are quite nice as beginning points or even CV multipliers.

[tweetmeme=”Decisionstats”]

1) Google’s Certifications

http://www.google.com/intl/en/adwords/professionals/

2) SAS Certifications

Quite well established and easily one of the best structured certification programs in the industry.

http://support.sas.com/certify/index.html

3) SPSS

The SPSS certification began last year and it helps provide a valuable skill set for both your practice as well as your resume. Also useful to have a second skill set apart from SAS in terms of statistical software.

http://www.spss.com/certification/

At this point I would like you to pause and think if the above certifications are useful or cost  effective for you as they are broadly general qualifications in statistical platforms as well as in applying them for the web analytics ( a key area for business analytics).

For more specialized certifications here are some more-

1) Microsoft SQL Server

http://www.microsoft.com/learning/en/us/certification/cert-sql-server.aspx

2) TDWI Certification

http://tdwi.org/pages/certification/index.aspx

3) IBM

Not sure how updated these are so caveat emptor!

http://www.redbooks.ibm.com/abstracts/sg245747.html

If you are knowledgeable about IBM’s Business Intelligence solutions and the fundamental concepts of DB2 Universal Database, and you are capable of performing the intermediate and advanced skills required to design, develop, and support Business Intelligence applications

Also IBM Cognos Certifications

http://www-01.ibm.com/software/data/education/cognos-cert.html

4) MicroStrategy

http://www.microstrategy.com/education/Certification/

5) Oracle

Included the all new Sun Certifications as well.

http://certification.oracle.com/

and http://blogs.oracle.com/certification/

6) SAP Certifications

http://www.sap.com/services/education/certification/index.epx

7) Cloudera’s Hadoop Certification

http://www.cloudera.com/developers/learn-hadoop/hadoop-certification/

These are some Business Intelligence and Business Analytics related certifications that I assembled in a list. Many other programs were either too software development specific or did not have a certification for general usage (like many R trainings or company tool specific trainings). Please feel free to add in any suggestions.

Data Mining through the Android

Here is something interesting (I probably have to ask someone or wait for Android to come to India to do this personally0.

It uses the Android App Development ( which is quite easy if you have a Linux) and basically runs R from the cloud using a GUI Rattle. Fire away the data while watching a movie or just on the go !

See this-

http://analyticdroid.togaware.com/

Question- How useful do you think it will be to do this?  Would you like to run R on your mobile?

Interesting R and BI Web Event

An interesting webinar from Revolution, the vanguard of corporate R things- mixing R analytics and BI Dashboards. Me thinks – an alliance with BI dashboard maker could also help the Revo guys as BI and Analytics are two similar yet different markets. Also could help if you are a newbie to BI  but know enough analytics/stats.

Click on the screenshot below if interested.

SUPERCHARGE BI AND DASHBOARDS WITH PREDICTIVE ANALYTICS

FREE WEBINAR WEDNESDAY, JUNE 2

Presenters:
David Smith, vice president of Marketing, Revolution Analytics
Steve Miller, president, OpenBI, LLC
Andrew Lampitt, senior director, Technology Alliances, Jaspersoft

Audience:
BI implementors seeking to integrate predictive analytics into BI dashboards;
R users and developers seeking to distribute advanced analytics to business users;
Business users seeking to improve their BI outcomes.

R Modeling with huge data

Here is a training course by BI Vendor, Netezza which uses R analytical capabilties. Its using R in the customized appliances of Netezza.

Source-

http://www.netezza.com/userconference/pce.html#rmftfic

R Modeling for TwinFin i-Class

Objective
Learn how to use TwinFin i-Class for scaling up the R language.

Description
In this class, you’ll learn how to use R to create models using huge data and how to create R algorithms that exploit our asymmetric massively parallel (AMPP®) architecture. Netezza has seamlessly integrated with R to offload the heavy lifting of the computational processing on TwinFin i-Class. This results in higher performance and increased scalability for R. Sign up for this class to learn how to take advantage of TwinFin i-Class for your R modeling. Topics include:

  1. R CRAN package installation on TwinFin i-Class
  2. Creating models using R on TwinFin i-Class
  3. Creating R algorithms for TwinFin i-Class

Format
Hands-on classroom lecture, lab exercises, tour

Audience
Knowledgeable R users – modelers, analytic developers, data miners

Course Length
0.5 day: 12pm-4pm Wednesday, June 23 OR 8am-12pm Thursday, June 24 OR 1pm-5pm Thursday, June 24, 2010

Delivery
Enzee Universe 2010, Boston, MA

Student Prerequisites

  • Working knowledge of R and parallel computing
  • Have analytic, compute-intensive challenges
  • Understanding of data mining and analytics

Google: Prediction API and other cool stuff

Google just announced it’s tools Big Query and Prediction API for use with it’s new cloud storage device called Google Storage. With this the computing cycle seems to have come a full circle – from mainframe to desktop/servers to cloud. The Prediction API seems interesting but it, and the other services, are quite clearly dependent on market as well as developer enthusiasm. Me thinks, Google knows a thing or two about Big Data, and this one looks like a revenue positive product from Google ( unless they get REST less and let it languish like other great ideas-like Docs,Wave etc)

Also could be interesting is applications from both R, as well as SAS and SPSS to start using this remote data cloud/server farm 😉

With Storage,Querying and Prediction Analysis- Google is definitely in the Infrastructure as a Service business, but success with these services would be crucial to establish it’s name in the formidably lucrative business analytics and business intelligence fields.

http://code.google.com/apis/predict/

http://code.google.com/apis/bigquery/

http://code.google.com/apis/storage/