Ten ways to build a wrong scoring model

 

Some ways to build a wrong scoring model are below- The author doesn’t take any guarantee if your modeling team is using one of these and still getting a correct model.

1) Over fit the model to the sample. This over fitting can be checked by taking a random sample again and fitting the scoring equation and compared predicted conversion rates versus actual conversion rates. The over fit model does not rank order – deciles with lower average probability may show equal or more conversions than deciles with higher probability scores.

2) Choose non random samples for building and validating the scoring equation. Read over fitting above.

3) Use Multicollinearity (http://en.wikipedia.org/wiki/Multicollinearity ) without business judgment to remove variables which may make business sense.Usually happens a few years after you studied and forgot Multicollinearity.

If you don’t know the difference between Multicollinearity , Heteroskedasticity http://en.wikipedia.org/wiki/Heteroskedasticity this could be the real deal breaker for you

4) Using legacy codes for running scoring usually with step wise forward and backward  regression .Happens usually on Fridays and when in a hurry to make models.

5) Ignoring signs or magnitude of parameter estimates ( that’s the output or the weightage of the variable in the equation).

6) Not knowing the difference between Type 1 and Type 2 error especially when rejecting variables based on P value. ( Not knowing P value means you may kindly stop reading and click the You Tube video in the right margin )

7) Excessive zeal in removing variables. Why ? Ask yourself this question every time you are removing a variable.

8) Using the wrong causal event (like mailings for loans) for predicting the future with scoring model (for mailings of deposit accounts) . or using the right causal event in the wrong environment ( rapid decline/rise of sales due to factors not present in model like competitor entry/going out of business ,oil prices, credit shocks sob sob sigh)

9) Over fitting

10) Learning about creating models from blogs and not  reading and refreshing your old statistics textbooks

Weathering the Stormy Economy

Here is a conference you may want to visit. At first glance it may look like one of those self-help “free” webinars but it is a very relevant topic with a great speaker. Plus it is on the web.

 

Free Seminar Hosted by SAP Business Objects
Thursday, March 12, 2009
11 a.m. PST / 2 p.m. EST
Robin Fray Carey, CEO of Social Media Today will discuss the best ideas gathered from MyVenturePad.com, SMT’s online community for growth companies. Plus, two fast-growing companies, Fresh Direct, and The Life is good Company, will share their practical recommendations on how to manage business and IT priorities in these challenging times. Register today.
http://events.businessobjects.com/forms/Q109/ideas/?source=SMtoday1

Social Media Today builds Wordframe based communities like Smart Data Collective ( for data ,BI,Analytics people) Best of the Blogs ( for progressive bloggers), Energy Collective ( for Green energy enthusiasts,thinkers and researchers) ,Social Media Today ( for understanding and leveraging Social Media and Networks) and My VenturePad (for Entrepreneurs).

These communities basically work as online newspapers by aggregating and moderating the RSS feeds of thousands of bloggers (for some sites) and their sites. I have written on Wordframe’s concept of content driven communities and Ning’s concept of community driven content earlier.

Disclaimer- I have worked as an evangelist to SMT , have been awarded the Blogger of the Week once (for my article on R).

For other conferences you may also want to see AnalyticBridge ‘s page on conferences.

http://www.analyticbridge.com/group/conferences/forum/topics/best-ideas-for-weathering-the

Disclaimer -I have been awarded the Member of the Month twice by them.

I like the third party apps of Ning better than the old outdated format and themes. One Ning application can actually serve as a competitor to Wordframe – that is the RSS application ( see feed on my page ).Wordframe has capabilities for even category level filters so Analytics category  feed goes to Smart Data ,Internet category feed gets published on Social Media (when i am lucky) and my attempts at poetry go to Best of The Blogs.

The Decision Stats group (on Linkedin) also has a group on AnalyticBridge.

But why join so many communities and go to webinars ? Because knowledge is useful and productive and fun – and I have a personal motto of learning one new thing a day .

Where do you get the time ?Just sleep one hour less and devote that one hour purely to your self learning for yourself.8 hours to the boss, 4-5 hours to the family.

1 hour to yourself ??

Sounds reasonable, eh  🙂

So try this one –

http://events.businessobjects.com/forms/Q109/ideas/?source=SMtoday1

Dataset too big for R ?

In case you have a dataset too big for fitting in memory for R, there is a package called biglm .

You install it like this-

install.packages("biglm", dep=TRUE)

 

 

  Information on package ‘biglm’

Description:

Package:       biglm
Type:          Package
Title:         bounded memory linear and generalized linear models
Version:       0.6
Date:          2005-09-24
Author:        Thomas Lumley
Maintainer:    Thomas Lumley <tlumley@u.washington.edu>
Description:   Regression for data too large to fit in memory
License:       GPL
Suggests:      RSQLite, RODBC
Enhances:      leaps
Packaged:      Tue Feb 24 10:47:44 2009; tlumley
Built:         R 2.8.1; i386-pc-mingw32; 2009-02-24 21:35:12; windows

Index:

bigglm                  Bounded memory linear regression
biglm                   Bounded memory linear regression
predict.bigglm          Predictions from a biglm/bigglm

and in case you are the statistical kind of chap who want to know what’s IN the code for these functions

function (formula, data, family = gaussian(), …)
UseMethod("bigglm", data)
<environment: namespace:biglm>

 

R tip of the day – If you want to know what an R Function say procmeans does…..all you need to do is type procmeans at the command prompt , and it will tell you what is inside the code.

If it gives an error most probably you need to

1) Install

and 2) Load the package containing the function

Which are conveniently here

image

credit source –http://www.nabble.com/R-f13819.html

An R Package only for SAS Users

Dear All,

I am doing some research into creating a R Package for SAS language Users.

The name of the beta package is “ Anne”, but I am open to suggestions for the name please.

The basic idea is to enable SAS language Users (especially Windows SAS language  users) to get a feel to try out the R package without getting overwhelmed with the matrix level powerful capabilities as well as command line interface.

Creating new functions is quite easy as the following code shows.

The first R code for the “Anne 1.0” Package is

procunivariate(x) <- function(x) summary(x)

procimportcsv(x) <- function(x) read.table(x,header=TRUE,

                           + sep=”,”, row.names=”id”, na.string=”   “)

libname(x) <-function(x) setwd(x)

 

Note I am tweaking the code as we speak and would be trying to add one proc per week.

But how to put functions in a R Package ?

This is how to create a R package –( To be Continued)

Note- SAS here refers to SAS Language.

 

Learning SAS for SPSS Users

SAS Publishing just came out with a nice and nifty 28 page pdf document “ Coming To SAS FROM SPSS – A programming approach” Its a nice read, very useful for people curious or willing to try  SAS after learning SPSS, and very well written by Susan J Slaughter and Lora D , who have written “The Little SAS Book” , one of the most popular SAS handbooks ever written.

 

You can download it or plainly read it from

http://support.sas.com/publishing/bbu/companion_site/62272.pdf

SPSS of course has very nice menu driven setting, while SAS programmers generally prefer the scripting way of writing code- they do have menus in various products.