Politics in Analytics

Observers of American Electoral politics ,including the current Presidential Campaign would be struck by the sophisticated degree of analytics being involved. This includes the following –

1) Segmentation of likely percentage Response Rates (vote yes(1) , vote opponent (0)

based on

history of voting

response to stimuli (experience vs change)

ethnicity (black, white ,latino)

income groups (<40,000 USD ,>100000 USD)

education (college educated)

gender (male,female)

geography (rural ,urban,college town)

union affliation

and even coffee (latte drinkers etc ūüôā )

What is striking is that most of these variables like race, gender cannot be used for marketing anything else like credit cards, or financial services on charges of discrimination.

What could be really interesting is if they add credit bureau variables and create logistic models (and not just segmentation). Maybe by 2016, there will be a different category of analytics called Quantitative Political analytics.

Another note – What is similar between Ralph Nader , Chaos Theory and Butterfly effect.

Chaos theory states that future results can vary a lot based on slight changes in initial differences.

Butterfly Effect uses this to say a small event like butterfly fluttering in china can cause a big event like typhoon in the US.

Ralph Nader entering the race in 2000 got 90000 votes in Florida, mostly to be siphoned away from Al Gore , who lost Florida and the elections by less than 1000 votes.

Al Gore was against Iraq war since the beginning and had he been President maybe the world would have been greener and no war in Iraq. maybe. No offense meant to anyone.

Howard Dean screaming or Bill Clinton calling Obama as similar to Jesse Jackson’s wins (which is analytically and quantitatively true) or Hillary crying in Iowa , can be similar butterfly effects.No offense meant to anyone.

10 Steps to create a Model

Models are used to predict specific events in specific time frames upon stimuli –

1) How many people will buy credit cards if we mail them

2)How many people will buy credit cards if we call them

3) How many people are more likely to buy a credit card if we target only 50 % of total population

Modeling consists of the following –

1) Select a pre-event . Eg Previous credit card campaign

2) Divide data into building, validation and third sample.Sampling should be random.

3) Remove outliers with a capping (maximum,minimum) and invalid /missing values with central tendencies (mean,median)

4) Choose method of modeling based on data-

Regression

Segmentation

Time Series

5) Choose software of modeling dependent on resource availability (budget,people,softwares) _ you can do it in sas, spss ,excel with different amounts of skill sets ,data handling and complexity.

6) For regression modeling (most commonly used) ,do iterative tests in the building sample to remove multi collinearity (like variance inflation factor ) ,and statistical measures to test goodness of model (concordance, p value of individual variables, gain or lift across deciles).

7) Score the validation sample with the same equation variables, check for lift  ,and signs and parameter estimates , they should not change radically

8) Cross validate on third sample

9) Additionally go for transformation of variables (x,1/x,x^2,logx,e^x)  to get incremental lift

10) Go for binning of variables….do the variables behave in same manner across deciles as their sign in¬† regression equation.

p-s. search wikipedia for any statistical definition that you are not familiar with

pps – This will be followed by ways to present a model, which is a different ball game all together

Comparing Base SAS and SPSS

Comparing Base SAS and SPSS is an age old question between analytics professionals as both of these are one of the longest running statistical softwares in the world.

While Base SAS is on version 9 + and has greatly improved it’s visual appeal to counter SPSS’s click and get results interface, SPSS has moved beyond version 15.0 + and started adding modules like SAS has done.

Here I will be comparing specific SAS and SPSS components like SAS ETS with SPSS Trends, and SAS Base /Stat with SPSS Base.

Base SAS is almost 1.75 times as expensive in upfront cost for a single installation than SPSS.

SAS ETS is better than SPSS Trends for time series analysis for bad data, but SPSS Trends can easily make huge numbers of time series analysis than SAS ETS.

SAS is more tougher to learn than the point and click interface of SPSS.

SPSS Documentation is much better and give better clarity on algorithms used for statistical procedures.

Base SAS is much more powerful for crunching huge numbers of data (like sorting or splicing data),

for data that is smaller than say 100 mb, the difference is not much between SAS and SPSS.

SPSS is a perpetual license, while SAS has year on year license. This eventually makes it 2-3 times more expensive.

Modeling is easier done in SPSS but SAS can provide more control thanks to command line interface/advanced editor coding.The SAS Enterprise is not as good a visual interface as the SPSS.

For a startup analytics body, the best installation for both SAS and SPSS is network licenses preferably over a Linux network. You should ideally have a mix of both SAS and SPSS to optimize both costs and analytical flexibility.

Other Comparisons with Base SAS (a SAS Institute Copyrighted Product ) can be found at http://www.ats.ucla.edu/stat/technicalreports/

or by searching packages at http://finzi.psych.upenn.edu/search.html


10 SAS Procs:Learning SAS Easily

when-i-was-beginning-in-analytics-and-learning-sas.doc

When I was beginning in analytics and learning SAS, one of my team members pulled me aside and said man-just learn 10 types of procs and you have learnt it.You can download actual cut and paste code from the link above. 

 Proc Freq  ,Proc Datasets  

Proc Means, Proc Print

  Proc Contents, Proc Univariate

Proc Summary   ,Proc Sort

  Additional РMerging data pulled from multiple locations

Sort all datasets by the key (here var1) before merging  

 

Proc Rank Proc Tabulate  

 p.s It proved to be more complex but learning different  kinds of procs is  an easy way for a SAS beginner to  learn SAS faster and clear SAS skills interviews

Analytics Training Websites

This post deals with books , certifications and tutorial for data miners and analysts from beginners to intermediate level.Excel Modeling

Regression –
http://academic.brooklyn.cuny.edu/economic/friedman/Regressnxce.html

Optimization-
http://office.microsoft.com/en-us/help/HA011245951033.aspx

Free Tutorials from SAS

http://www.sas.com/apps/elearning/elearning_courses.jsp?cat=Free+Tutoria…

SAS certification

http://support.sas.com/certify/

SPSS tutorials

http://calcnet.mth.cmich.edu/org/spss/toc.htm

http://www.ats.ucla.edu/STAT/spss/webbooks/reg/default.htm

Using Batch Files to Automate

Batch files may be old fashioned but can save you a lot of time in repeated tasks.Here is an example of a batch file that has

1) Some Global variables (Inputs)

set SALES_GRP=s1
set ID=f2
set SALES_DATE=nov0107

#BATCH FILE COMMENT using hash order_date format is monddyy

2) Used notepad to open a file for making changes

call c:\software\Notepad.exe C:\folder1\input1.sas
call c:\software\Notepad.exe C:\folder2\input2.sas

3)Created a directory to store results

call mkdir C:\folder2\%SALES_GRP%\%ID%\%SALES_DATE%

4) Executed another batch file or code using relevant software (eg SAS.exe for SAS Code)

call C:\software\sas8.exe C:\folder1\input1.sas

call C:\software\sas8.exe C:\folder2\input2.sas


5) Changes active folder

call cd C:\folder1

This can particularly help you schedule SAS code running at specific times using Windows Scheduler

A Simple SAS Code to Automate Reports

Organizations use an increasing amount of reports. Countless hours are spent manipulating spreadsheets one after another, into and out of MS Access databases and finally emailing it to a select audience of people.

The following  tries to present a simple and integrated approach to report generation and presentation. This involves the use of Macros, for repeatability and flexibility.You can copy and paste the code directly to your SAS editor and make relevant changes.

Steps in preparing a report

  • Import data from data sources (spreadsheets, databases mostly .Here assuming BASE SAS ,only csv files are considered)
  • Manipulating data for analysis (in the case below the only data manipulation done is removal of duplicates. This section can be changed as needed by user for
    • Frequency counts using Proc Freq.
    • Mean, maximum and minimum counts using Proc Means.
    • Customized reports using Proc Report Procedure.)
  • Presenting and formatting data
  • Output data in the form of a report /spreadsheet.
  • Emailing internal customers of the report.

———————————————————————————————————

Using BASE SAS, the following can be coded as:

     /*PROGRAM TO AUTOMATE REPORTING */

    /*DECLARING THE PATH OF THE INPUT/OUTPUT FOLDER */

%let pathfile = ‘X:\Auto\AUG06-AUTOS\’ ;

     /*CREATING LIBRARY NAME */

libname auto &pathfile;

run;

   /*TO CONSERVE SPACE*/

options compress=yes;

   /*TO MAKE LOG READABLE */

options macrogen symbolgen;

/* STARTING MACRO FOR REPEATABLE EXECUTION OF REPORT*/

%macro impmth(mth,num,emailid);

 /*MANIPULATING VARIABLES TO REDUCE CHANGES */

data _null_ ;

call symput(‘filepath’,”‘”||&pathfile||&mth||’.csv’||”‘” );

call symput(‘unqpath’,”‘”||&pathfile||”unq”||&mth||’.csv’||”‘” );

call symput(‘unqxls’,”‘”||&pathfile||”unq”||&mth||’.xls’||”‘” );

run;

/*IMPORT*/

/*IMPORTING DATA FROM CSV FILES STORED IN FOLDER DECLARED ABOVE*/

PROC IMPORT OUT= auto.&num

DATAFILE= &filepath

DBMS=CSV REPLACE;

GETNAMES=YES;

DATAROW=2;

quit;

/*CREATING VARIABLE FOR DISTINGUISHING INPUT*/

data auto.&num;

set auto.&num;

&num =1;

run;

———————————————————————–

/*DATA MANIPULATION HERE REMOVING DUPLICATES*/

proc sort data=auto.&num nodupkey;

by REFAGREEMENTID;

run;

———————————————————————–

/*EXPORT*/

/*EXPORTING DATA TO A CSV FILE*/

PROC EXPORT DATA= AUTO.&num

OUTFILE= &unqpath

DBMS=CSV REPLACE;

RUN;

/*EXPORTING DATA TO A XLS FILE*/

ODS HTML FILE=&unqxls;

proc print data=auto.&num;

run;

ODS HTML CLOSE;

/*EMAILING THE RESULTS*/

filename outbox email &emailid;

data _null_ ;

file outbox

to=(&emailid)/* Overrides value in filename statement */

cc=(&emailid)

subject=’File from’

attach=(&unqxls)

;

put ‘Hi,’;

put ‘This is an integrated approach to automating reports’;

put ‘It works great!’;

run;

%mend;

/*CALLING THE MACRO*/

/*HERE ‘Ahmedabad’ IS THE INPUT FILE WHILE Ahd IS THE OUTPUT FILE*/

/*HERE ‘aohri@hotmail.com’ IS THE EMAIL ID TO BE MAILED THE REPORT*/

%impmth(‘Ahmedabad’,Ahd,’aohri@hotmail.com’);

Note- This was written on version 8 of SAS, so let me know any issues.

Ajay 

%d bloggers like this: