Models are used to predict specific events in specific time frames upon stimuli –
1) How many people will buy credit cards if we mail them
2)How many people will buy credit cards if we call them
3) How many people are more likely to buy a credit card if we target only 50 % of total population
Modeling consists of the following –
1) Select a pre-event . Eg Previous credit card campaign
2) Divide data into building, validation and third sample.Sampling should be random.
3) Remove outliers with a capping (maximum,minimum) and invalid /missing values with central tendencies (mean,median)
4) Choose method of modeling based on data-
Regression
Segmentation
Time Series
5) Choose software of modeling dependent on resource availability (budget,people,softwares) _ you can do it in sas, spss ,excel with different amounts of skill sets ,data handling and complexity.
6) For regression modeling (most commonly used) ,do iterative tests in the building sample to remove multi collinearity (like variance inflation factor ) ,and statistical measures to test goodness of model (concordance, p value of individual variables, gain or lift across deciles).
7) Score the validation sample with the same equation variables, check for lift ,and signs and parameter estimates , they should not change radically
8) Cross validate on third sample
9) Additionally go for transformation of variables (x,1/x,x^2,logx,e^x) to get incremental lift
10) Go for binning of variables….do the variables behave in same manner across deciles as their sign in regression equation.
p-s. search wikipedia for any statistical definition that you are not familiar with
pps – This will be followed by ways to present a model, which is a different ball game all together