Interview Eric A. King President The Modeling Agency

Here is an interview with Eric King, President, The Modeling Agency.

eric-king

Ajay- Describe your career journey. What interested you in science? How do you think we can help more young people get interested in science?

Eric- I was a classic underachiever in school. I was bright, but was generally disinterested in academics, and focused on… well, other things at the time. However, I had always excelled in math and science, and actually paid attention those classes.

I was a high school junior when my school acquired its first computers: Apple IIs. There were no formal computer courses, so instead of study hall, I would go to the lab and tinker. Sure, I would join a few other geeks (well before it was cool to be such) for a few primitive games, but would spend the majority of my time reading about the Basic programming language and coding graphic designs, math formulas and simple games.

I loved it so much that I had decided to pursue computer science as a college major before my senior year and it went into my yearbook entry. Fortunately, my relatively high SAT scores offset my poor high school GPA and squeaked me into the University of Pittsburgh’s trial-by-fire summer program. It was the first time I really felt I had to perform (or else) and had to work hard to overcome poor study habits — but rose to the occasion with room to spare.

I’m glad I did not realize at the time that Pitt was #9 in the nation for computer science. I did have a hint though when I realized the extremely high attrition rate. In the end, our freshman class of 240 graduated 36. I did make it through the freshman year that trimmed the first half of the original group, but was a casualty my sophomore year when I fell short of a passing grade in a core CS course that was only offered annually. I repeated it the following year and graduated with extra credits – to include a directed study in table tennis (no kidding).

I loved the programming assignments but loathed the tests. After slogging through the program and graduating, I took a three month break. I figured it would be my last opportunity to be free of responsibility for that period of time possibly until retirement – and so far, I’m right.

Then, my cousin who graduated with me told me about a neural computing software tools company in Pittsburgh, called NeuralWare. I was always intrigued by “artificial intelligence”, but they were seeking a technical support representative. I realized my junior year that I did not want to code or remain on the technical side for a living, but go into business development, project management, business management and entrepreneurship. Yet, after having survived the majority of the attrition, I did want to complete my technical degree, then seek the business angle.

A short while later, NeuralWare contacted me again to start up their sales operation (a role previously fulfilled a co-founder). This was the start I was seeking: cut my teeth in business for highly technical products. I participated in numerous training sessions for neural computing and related technologies and loved it. The notion that the computer could leverage mathematics that emulated the basic learning function of the brain, or treat a formula like a gene – split it, mutate it, test and progress toward the most fitting solution was beyond exciting to me. So much so, that I’ve not left the technology in the 19 years since.

Drawing others to science, I believe is more a matter of nature over nurture. I am the father of twin boys who couldn’t have greater differences in interests, personalities and talents. In that spirit, I believe that science should be made readily available, involve both theory and practice, and be presented in a manner that motivates those who are drawn to science to excel. But I don’t believe science can be effectively pushed to those whose inherit interests and passion lie elsewhere (reference the character Neil Perry in The Dead Poet’s Society).

Ajay- Describe the path that The Modeling Agency has traveled. What is your vision for it for the future.

Eric- The Modeling Agency (TMA) was established as a highly structured formal network of senior-level consultants in January of 2000. TMA’s initial vision (and sustained slogan) was to “provide guidance and results to those who are data-rich, yet information-poor.” I still have not encountered an organization that holds a larger bench of senior-level data mining consultants and trainers. And to be senior-level, TMA consultants must be far more than technically steeped in data mining. TMA’s senior consulting staff are business consultants first – not rushing to analyze data, but assessing an organization’s environment and designing a fitting solution to resources that support stated objectives.

There are three primary divisions to TMA: training, consulting and solutions. Each division is part of an overarching business and technology maturation process. For example, training generates technology advocates for data mining that encourages consulting engagements which at times lead to productizable vertical market services that create solutions which allow other organizations to capitalize on the risk that pioneering organizations had undertaken, and springboard on the return realized by implementations within their vertical – which leads to new discoveries and innovations that feed back to training.

Beyond further developing the brand of TMA’s quickly emerging niche (described later), our future vision involves developing two specific types of vendor partnerships to allow TMA to redirect the substantial margins enjoyed by its clients through the application of predictive modeling into a residual stream of income to accelerate the growth of TMA itself. While this operation is confidential, we will be pleased to tell our future clients that we do indeed apply our services for the benefit of our own business.

Ajay- Describe the challenges and opportunities in modeling through recent innovations. i.e social network analysis software and increasing amounts of customer text data available on social media.

Eric- Please allow me to shift the focus of this question slightly. So many organizations are still making their way down the Business Intelligence chain to applying predictive modeling on standard operational data, that social network analysis and customer text analytics remains more of a research endeavor in my opinion. As a practical applications company, TMA focuses its experience in pragmatically applying its business problem solving creativity on operational and transactional data enriched by demographic and psychographic attributes. I feel that the areas of social media and social network analysis are not yet mature enough to be formalized as established practice on TMA’s menu of service offerings.

Having said that, the greatest challenges in predictive modeling are no longer in applying the methodological tactics, but rather in the comprehensive assessment, strategic problem design, project definition, results interpretation and ROI calculation. Popular data mining software is now highly effective at automating the tactical model building process – many packages running numerous methods in parallel and selecting the best performer.

So, the challenges that remain today are in tackling the tails of the process as mentioned above. This is where TMA’s expertise is focused and where our niche is quickly emerging: guiding organizations to establish their own internal predictive analytics operation.

Ajay- In the increasing game of consolidation of business intelligence vendors and data mining and analytics, which are the vendors that you have worked with and what are their relative merits.

Eric- TMA has established formal partnerships with several popular data mining tool vendors and services companies. Despite these alliances, TMA remains vendor neutral and method agnostic for clients that approach TMA directly. Having said that, I will make a general statement that there is notable merit for the organizations that recognize that they must ensure their client’s success in the full implementation cycle of data mining – not just provide a great tool that addresses the center.

In fact, it was one of TMA’s earliest partners who saw the value in teaming with TMA to support the ends of the data mining process (assessment, business understanding project definition and design, results interpretation, implementation) while their solution addressed the middle (data preparation and modeling). They recognized that as great as their tool was, it was still hitting the shelf soon after the sale. The realized that their clients were building very good models that answered the wrong questions, or were uninterpretable and incapable of implementation.

TMA soon recognized that these excellent tools combined with TMA’s strategic data mining mentorship and counsel provided the capability for organizations to essentially establish their own internal predictive analytic practice with existing business practitioners – not requiring senior statisticians or PhDs. This has become a popular and fast growing service, for which TMA’s large bench of senior-level data mining consultants is perfectly suited to fulfill.

And the best candidates for this service are those organizations who have attempted pilots or projects but fell short of their objectives. And while the acquisition of SPSS (who licenses a reputable predictive analytics tool, “PASW”) by IBM (the gold standard for IT and BI services and solutions) may be the closest competition that TMA may encounter, TMA enjoys a substantial head start and foothold with its numerous formal alliances, vendor neutrality and sizable client list specific to predictive modeling. TMA is quickly becoming the standard to turn to for progressive organizations that realize internalizing predictive analytics is not just a matter of when rather than whether, but that it is within their grasp with TMA’s guidance and the right tool(s).

Ajay- What do people at The Modeling Agency do for fun?

Eric- Our interests are as diverse as we are geographically disbursed. One of our senior consultants is a talented and fairly established tango dancer. He’s always willing to travel for assignments, as he’s anxious to tap into that city’s tango circuit. Another consultant is an avid runner, entering marathons and charity races. One common thread that most of us share is our dedication to parenting. We all love trips and time with our children. In fact, I’m writing this on a return trip from Disney World on the Auto Train with my 5 year old twin boys – a trip I know I’ll recall fondly through my remaining years.

Bio

Eric A. King is President and Founder of The Modeling Agency (TMA), a US-based company started in January 2000 that provides trainingconsultingsolutions and a popular introductory webinar in predictive modeling “for those who are data-rich, yet information-poor.”  King holds a BS in computer science from the University of Pittsburgh and has over 19 years of experience specifically in data mining, business development and project management.  Prior to TMA, King worked for NeuralWare, a neural network tools company, and American Heuristics Corporation, an artificial intelligence consulting firm.  He may be reached at eric@the-modeling-agency.com or (281) 667-4200 x210.

Modeling Visualization Macros

Here is a nice SAS Macro from Wensuis blog at http://statcompute.spaces.live.com/blog/

Its particularly useful for Modelling chaps, I have seen a version of this Macro sometime back which had curves also plotted but this one is quite nice too

SAS MACRO TO CALCULATE GAINS CHART WITH KS

%macro ks(data = , score = , y = );

options nocenter mprint nodate;

data _tmp1;
  set 
&data;
  where &score ~= . and y in (1, 0);
  random = ranuni(1);
  keep &score &y random;
run;

proc sort data = _tmp1 sortsize = max;
  by descending &score random;
run;

data _tmp2;
  set _tmp1;
  by descending &score random;
  i + 1;
run;

proc rank data = _tmp2 out = _tmp3 groups = 10;
  var i;
run;

proc sql noprint;
create table
  _tmp4 as
select
  i + 1       as decile,
  count(*)    as cnt,
  sum(&y)     as bad_cnt,
  min(&score) as min_scr format = 8.2,
  max(&score) as max_scr format = 8.2
from
  _tmp3
group by
  i;

select
  sum(cnt) into :cnt
from
  _tmp4;

select
  sum(bad_cnt) into :bad_cnt
from
  _tmp4;    
quit;

data _tmp5;
  set _tmp4;
  retain cum_cnt cum_bcnt cum_gcnt;
  cum_cnt  + cnt;
  cum_bcnt + bad_cnt;
  cum_gcnt + (cnt – bad_cnt);
  cum_pct  = cum_cnt  / &cnt;
  cum_bpct = cum_bcnt / &bad_cnt;
  cum_gpct = cum_gcnt / (&cnt &bad_cnt);
  ks       = (max(cum_bpct, cum_gpct) – min(cum_bpct, cum_gpct)) * 100;

  format cum_bpct percent9.2 cum_gpct percent9.2
         ks       6.2;
  
  label decile    = ‘DECILE’
        cnt       = ‘#FREQ’
        bad_cnt   = ‘#BAD’
        min_scr   = ‘MIN SCORE’
        max_scr   = ‘MAX SCORE’
        cum_gpct  = ‘CUM GOOD%’
        cum_bpct  = ‘CUM BAD%’
        ks        = ‘KS’;
run;

title "%upcase(&score) KS";
proc print data  = _tmp5 label noobs;
  var decile cnt bad_cnt min_scr max_scr cum_bpct cum_gpct ks;
run;    
title;

proc datasets library = work nolist;
  delete _: / memtype = data;
run;
quit;

%mend ks;    

data test;
  do i = 1 to 1000;
    score = ranuni(1);
    if score * 2 + rannor(1) * 0.3 > 1.5 then y = 1;
    else y = 0;
    output;
  end;
run;

%ks(data = test, score = score, y = y);

/*
SCORE KS              
                                MIN         MAX
DECILE    #FREQ    #BAD       SCORE       SCORE     CUM BAD%    CUM GOOD%        KS
   1       100      87         0.91        1.00      34.25%        1.74%      32.51
   2       100      78         0.80        0.91      64.96%        4.69%      60.27
   3       100      49         0.69        0.80      84.25%       11.53%      72.72
   4       100      25         0.61        0.69      94.09%       21.58%      72.51
   5       100      11         0.51        0.60      98.43%       33.51%      64.91
   6       100       3         0.40        0.51      99.61%       46.51%      53.09
   7       100       1         0.32        0.40     100.00%       59.79%      40.21
 &#
160; 8       100       0         0.20        0.31     100.00%       73.19%      26.81
   9       100       0         0.11        0.19     100.00%       86.60%      13.40
  10       100       0         0.00        0.10     100.00%      100.00%       0.00
*/

Its particularly useful for Modelling , I have seen a version of this Macro sometime back which had curves also plotted but this one is quite nice too.

Here is another example of a SAS Macro for ROC Curve  and this one comes from http://www2.sas.com/proceedings/sugi22/POSTERS/PAPER219.PDF

APPENDIX A
Macro
/***************************************************************/;
/* MACRO PURPOSE: CREATE AN ROC DATASET AND PLOT */;
/* */;
/* VARIABLES INTERPRETATION */;
/* */;
/* DATAIN INPUT SAS DATA SET */;
/* LOWLIM MACRO VARIABLE LOWER LIMIT FOR CUTOFF */;
/* UPLIM MACRO VARIABLE UPPER LIMIT FOR CUTOFF */;
/* NINC MACRO VARIABLE NUMBER OF INCREMENTS */;
/* I LOOP INDEX */;
/* OD OPTICAL DENSITY */;
/* CUTOFF CUTOFF FOR TEST */;
/* STATE STATE OF NATURE */;
/* TEST QUALITATIVE RESULT WITH CUTOFF */;
/* */;
/* DATE WRITTEN BY */;
/* */;
/* 09-25-96 A. STEAD */;
/***************************************************************/;
%MACRO ROC(DATAIN,LOWLIM,UPLIM,NINC=20);
OPTIONS MTRACE MPRINT;
DATA ROC;
SET &DATAIN;
LOWLIM = &LOWLIM; UPLIM = &UPLIM; NINC = &NINC;
DO I = 1 TO NINC+1;
CUTOFF = LOWLIM + (I-1)*((UPLIM-LOWLIM)/NINC);
IF OD > CUTOFF THEN TEST="R"; ELSE TEST="N";
OUTPUT;
END;
DROP I;
RUN;
PROC PRINT;
RUN;
PROC SORT; BY CUTOFF;
RUN;
PROC FREQ; BY CUTOFF;
TABLE TEST*STATE / OUT=PCTS1 OUTPCT NOPRINT;
RUN;
DATA TRUEPOS; SET PCTS1; IF STATE="P" AND TEST="R";
TP_RATE = PCT_COL; DROP PCT_COL;
RUN;
DATA FALSEPOS; SET PCTS1; IF STATE="N" AND TEST="R";
FP_RATE = PCT_COL; DROP PCT_COL;
RUN;
DATA ROC; MERGE TRUEPOS FALSEPOS; BY CUTOFF;
IF TP_RATE = . THEN TP_RATE=0.0;
IF FP_RATE = . THEN FP_RATE=0.0;
RUN;
PROC PRINT;
RUN;
PROC GPLOT DATA=ROC;
PLOT TP_RATE*FP_RATE=CUTOFF;
RUN;
%MEND;

VERSION 9.2 of SAS has a macro called %ROCPLOT http://support.sas.com/kb/25/018.html

SPSS also uses ROC curve and there is a nice document here on that

http://www.childrensmercy.org/stats/ask/roc.asp

Here are some examples from R with the package ROCR from

http://rocr.bioinf.mpi-sb.mpg.de/

 

image

Using ROCR’s 3 commands to produce a simple ROC plot:
pred <- prediction(predictions, labels)
perf <- performance(pred, measure = "tpr", x.measure = "fpr")
plot(perf, col=rainbow(10))

The graphics are outstanding in the R package and here is an example

Citation:

Tobias Sing, Oliver Sander, Niko Beerenwinkel, Thomas Lengauer.
ROCR: visualizing classifier performance in R.
Bioinformatics 21(20):3940-3941 (2005).

 

Ten ways to build a wrong scoring model

 

Some ways to build a wrong scoring model are below- The author doesn’t take any guarantee if your modeling team is using one of these and still getting a correct model.

1) Over fit the model to the sample. This over fitting can be checked by taking a random sample again and fitting the scoring equation and compared predicted conversion rates versus actual conversion rates. The over fit model does not rank order deciles with lower average probability may show equal or more conversions than deciles with higher probability scores.

2) Choose non random samples for building and validating the scoring equation. Read over fitting above.

3) Use Multicollinearity (http://en.wikipedia.org/wiki/Multicollinearity ) without business judgment to remove variables which may make business sense.Usually happens a few years after you studied and forgot Multicollinearity.

If you dont know the difference between Multicollinearity , Heteroskedasticity http://en.wikipedia.org/wiki/Heteroskedasticity this could be the real deal breaker for you

4) Using legacy codes for running scoring usually with step wise forward and backward  regression .Happens usually on Fridays and when in a hurry to make models.

5) Ignoring signs or magnitude of parameter estimates ( thats the output or the weightage of the variable in the equation).

6) Not knowing the difference between Type 1 and Type 2 error especially when rejecting variables based on P value. ( Not knowing P value means you may kindly stop reading and click the You Tube video in the right margin )

7) Excessive zeal in removing variables. Why ? Ask yourself this question every time you are removing a variable.

8) Using the wrong causal event (like mailings for loans) for predicting the future with scoring model (for mailings of deposit accounts) . or using the right causal event in the wrong environment ( rapid decline/rise of sales due to factors not present in model like competitor entry/going out of business ,oil prices, credit shocks sob sob sigh)

9) Over fitting

10) Learning about creating models from blogs and not  reading and refreshing your old statistics textbooks

Modeling : R Code,Books and Documents

Here is an equivalent of Proc Genmod in R .

If the SAS language code is as below-

PROC GENMOD DATA=X;
CLASS FLH;
MODEL BS/OCCUPANCY = distcrop distfor flh distcrop*flh /D=B LINK=LOGIT
TYPE3; RUN;

 

Then the R language equivalent would be :

glm(bs/occupancy ~ distcrop*flh+distcrop,
   family=binomial(logit), weights=occupancy)
where flh needs to be a factor

 

Credit to Peter Dalgaard from the R-Help List 

Peter is also author of the splendid standard R book

 

Speaking of books Here is one R book I am looking /waiting for

 

A similar named free document ( Introduction to statistical modelling in R by P.M.E.Altham, Statistical Laboratory, University of Cambridge)  is available here –

http://www.statslab.cam.ac.uk/~pat/redwsheets.pdf

It is a pretty nice reference document if Modelling is what you do, and R is what you need to explore.It was dated 5 February 2009, so its quite updated and new.You can also check Dr Althams home page for a lot of R resources.

As mentioned before, Zementis is at the forefront of using Cloud Computing ( Amazon EC2 ) for open source analytics. Recently I came in contact with Michael Zeller for a business problem , and Mike being the gentleman he is not only helped me out but also agreed on an extensive and exclusive interview.(!)

image

Ajay- What are the traditional rivals to scoring solutions offered by you. How does ADAPA compare to each of them. Case Study- Assume I have 50000 leads daily on a Car buying website. How would ADAPA help me in scoring the model ( created say by KXEN or , R or,SAS, or SPSS).What would my approximate cost advantages be if I intend to mail say the top 5 deciles everyday.

Michael- Some of the traditional scoring solutions used today are based on SAS, in-database scoring like Oracle, MS SQL Server, or very often even custom code.  ADAPA is able to import the models from all tools that support the PMML standard, so any of the above tools, open source or commercial, could serve as an excellent development environment.

The key differentiators for ADAPA are simple and focus on cost-effective deployment:

1) Open Standards – PMML & SOA:

Freedom to select best-of-breed development tools without being locked into a specific vendor;  integrate easily with other systems.

2) SaaS-based Cloud Computing:

Delivers a quantum leap in cost-effectiveness without compromising on scalability.

In your example, I assume that you’d be able to score your 50,000 leads in one hour using one ADAPA engine on Amazon.  Therefore, you could choose to either spend US$100,000 or more on hardware, software, maintenance, IT services, etc., write a project proposal, get it approved by management, and be ready to score your model in 6-12 months

OR, you could use ADAPA at something around US$1-$2 per day for the scenario above and get started today!  To get my point across here, I am of course simplifying the scenario a little bit, but in essence these are your choices.

Sounds too good to be true?  We often get this response, so please feel free to contact us today [http://www.zementis.com/contact.htm] and we will be happy show you how easy it can be to deploy predictive models with ADAPA!

 

Ajay- The ADAPA solution seems to save money on both hardware and software costs. Comment please. Also any benchmarking tests that you have done on a traditional scoring configuration system versus ADAPA.

Michael-Absolutely, the ADAPA Predictive Analytics Edition [http://www.zementis.com/predictive_analytics_edition.htm] on Amazon’s cloud computing infrastructure (Amazon EC2) eliminates the upfront investment in hardware and software.  It is a true Software as a Service (SaaS) offering on Amazon EC2 [http://www.zementis.com/howtobuy.htm] whereby users only pay for the actual machine time starting at less than US$1 per machine hour.  The ADAPA SaaS model is extremely dynamic, e.g., a user is able to select an instance type most appropriate for the job at hand (small, large, x-large) or launch one or even 100 instances within minutes.

In addition to the above savings in hardware/software, ADAPA also cuts the time-to-market for new models (priceless!) which adds to business agility, something truly critical for the current economic climate.

Regarding a benchmark comparison, it really depends on what is most important to the business.  Business agility, time-to-market, open standards for integration, or pure scoring performance?  ADAPA addresses all of the above.  At its core, it is a highly scalable scoring engine which is able to process thousands of transactions per second.  To tackle even the largest problems, it is easy to scale ADAPA via more CPUs, clustering, or parallel execution on multiple independent instances. 

Need to score lots of data once a month which would take 100 hours on one computer?  Simply launch 10 instances and complete the job in 10 hours over night.  No extra software licenses, no extra hardware to buy — that’s capacity truly on-demand, whenever needed, and cost-effective.

Ajay- What has been your vision for Zementis. What exciting products are we going to see from it next.

Michael – Our vision at Zementis [http://www.zementis.com] has been to make it easier for users to leverage analytics.  The primary focus of our products is on the deployment side, i.e., how to integrate predictive models into the business process and leverage them in real-time.  The complexity of deployment and the cost associated with it has been the main hurdle for a more widespread adoption of predictive analytics. 

Adhering to open standards like the Predictive Model Markup Language (PMML) [http://www.dmg.org/] and SOA-based integration, our ADAPA engine [http://www.zementis.com/products.htm] paves the way for new use cases of predictive analytics — wherever a painless, fast production deployment of models is critical or where the cost of real-time scoring has been prohibitive to date.

We will continue to contribute to the R/PMML export package [http://www.zementis.com/pmml_exporters.htm] and extend our free PMML converter [http://www.zementis.com/pmml_converters.htm] to support the adoption of the standard.  We believe that the analytics industry will benefit from open standards and we are just beginning to grasp what data-driven decision technology can do for us.  Without giving away much of our roadmap, please stay tuned for more exciting products that will make it easier for businesses to leverage the power of predictive analytics!

Ajay- Any India or Asia specific plans for the Zementis.

Michael-Zementis already serves customers in the Asia/Pacific region from its office in Hong Kong.  We expect rapid growth for predictive analytics in the region and we think our cost-effective SaaS solution on Amazon EC2 will be of great service to this market.  I could see various analytics outsourcing and consulting firms benefit from using ADAPA as their primary delivery mechanism to provide clients with predictive  models that are ready to be executed on-demand.

Ajay-What do you believe be the biggest challenges for analytics in 2009. What are the biggest opportunities.

Michael-The biggest challenge for analytics will most likely be the reduction in technology spending in a deep, global recession.  At the same time, companies must take advantage of analytics to cut cost, optimize processes, and to become more competitive.  Therefore, the biggest opportunity for analytics will be in the SaaS field, enabling clients to employ analytics without upfront capital expenditures.

Ajay – What made you choose a career in science. Describe your journey so far.What would your advice be to young science graduates in this recessionary times.

Michael- As a physicist, my research focused on neural networks and intelligent systems.  Predictive analytics is a great
way for me to stay close to science while applying such complex algorithms to solve real business problems.  Even in a recession, there is always a need for good people with the desire to excel in their profession.  Starting your career, I’d say the best way is to remain broad in expertise rather than being too specialized on one particular industry or proficient in a single analytics tool.  A good foundation of math and computer science, combined with curiosity in how to apply analytics to specific business problems will provide opportunities, even in the current economic climate.

About Zementis

Zementis, Inc. is a software company focused on predictive analytics and advanced Enterprise Decision Management technology. We combine science and software to create superior business imageand industrial solutions for our clients. Our scientific expertise includes statistical algorithms, machine learning, neural networks, and intelligent systems and our scientists have a proven record in producing effective predictive models to extract hidden patterns from a variety of data types. It is complemented by our product offering ADAPA, a decision engine framework for real-time execution of predictive models and rules. For more information please visit www.zementis.com

Ajay-If you have a lot of data ( GBs and GBs) , an existing model ( in SAS,SPSS,R) which you converted to PMML, and it is time for you to choose between spending more money to upgrade your hardware, renew your software licenses  then instead take a look at the ADAPA from www.zementis.com and score models as low as 1$ per hour. Check it out ( test and control !!)

Do you have any additional queries from Michael ? Use the comments page to ask.

Segmenting Models : When and Why

Creating segmented models in SAS is quite easy, with a by group processing . It is less easy in other softwares , but that is understandable given that

the first generic rule of segmentation is

1) each segment has statistically similar characteristics .

2) different segments have statistically different characteristics .

This means that just using Proc freq to
check response rate versus
independent variable is not a good way
to check the level of difference.
Proc univariate with plot option and
a by group processing
is actually a better way to test out
because it is a combination of means ,
median analysis 
(measures of central value) but also
box plot ,normal distributions and standard deviations
(measures of dispersion).

Proc freq with cross tab is incredibly powerful to decide whether to create a model in the first place. But fine tuning of decisions on segments is better done with proc univariate. The SAS equivalent for clustering of course remains Proc Fastclus and family which will be dealt in a separate post.

(Note :lovely image that explains the above from Dr Ariel Shamir’s home page (he is a research expert on Visual Succinct Representation of Information   ————-from Israel, land of the brave and intelligent).

A Picture is truly worth a thousand words (or posts !).)

How to do Logistic Regression

Logistic regression is a widely used technique in database marketing for creating scoring models and in risk classification . It helps develop propensity to buy, and propensity to default scores (and even propensity to fraud ) .

This is more of a practical approach to make the model than a theory based approach.(I was never good at the theory 😉 )

If you need to do Logistic Regression using SPSS, a very good tutorial ia available here

http://www2.chass.ncsu.edu/garson/PA765/logistic.htm

(Note -Copyright 1998, 2008 by G. David Garson.
Last update 5/21/08.)

For SAS a very good tutorial is here –

SAS Annotated Output
Ordered Logistic Regression. UCLA: Academic Technology Services, Statistical Consulting Group.

from http://www.ats.ucla.edu/stat/sas/output/sas_ologit_output.htm (accessed July 23, 2007).

For R the documentation (note :Still searching for R ‘s Logistic Regression ) is here
http://lib.stat.cmu.edu/S/Harrell/help/Design/html/lrm.html

lrm(formula, data, subset, na.action=na.delete, method=”lrm.fit”, model=FALSE, x=FALSE, y=FALSE, linear.predictors=TRUE, se.fit=FALSE, penalty=0, penalty.matrix, tol=1e-7, strata.penalty=0, var.penalty=c(‘simple’,’sandwich’), weights, normwt, …)

For linear models in R –
http://datamining.togaware.com/survivor/Linear_Model0.html

An extremely good book if you want to work with R , and do not have time to learn it is to use the GUI
rattle and look at this book

http://datamining.togaware.com/survivor/Contents.html