Ten ways to build a wrong scoring model

 

Some ways to build a wrong scoring model are below- The author doesn’t take any guarantee if your modeling team is using one of these and still getting a correct model.

1) Over fit the model to the sample. This over fitting can be checked by taking a random sample again and fitting the scoring equation and compared predicted conversion rates versus actual conversion rates. The over fit model does not rank order – deciles with lower average probability may show equal or more conversions than deciles with higher probability scores.

2) Choose non random samples for building and validating the scoring equation. Read over fitting above.

3) Use Multicollinearity (http://en.wikipedia.org/wiki/Multicollinearity ) without business judgment to remove variables which may make business sense.Usually happens a few years after you studied and forgot Multicollinearity.

If you don’t know the difference between Multicollinearity , Heteroskedasticity http://en.wikipedia.org/wiki/Heteroskedasticity this could be the real deal breaker for you

4) Using legacy codes for running scoring usually with step wise forward and backward  regression .Happens usually on Fridays and when in a hurry to make models.

5) Ignoring signs or magnitude of parameter estimates ( that’s the output or the weightage of the variable in the equation).

6) Not knowing the difference between Type 1 and Type 2 error especially when rejecting variables based on P value. ( Not knowing P value means you may kindly stop reading and click the You Tube video in the right margin )

7) Excessive zeal in removing variables. Why ? Ask yourself this question every time you are removing a variable.

8) Using the wrong causal event (like mailings for loans) for predicting the future with scoring model (for mailings of deposit accounts) . or using the right causal event in the wrong environment ( rapid decline/rise of sales due to factors not present in model like competitor entry/going out of business ,oil prices, credit shocks sob sob sigh)

9) Over fitting

10) Learning about creating models from blogs and not  reading and refreshing your old statistics textbooks

Guns and No Glory

Beginning henceforth here is the policy on comments and posts.It is in response of the comments on my “A Farewell to Guns “ post in which I projected my Gandhian non violence too far in suggesting the remote possibility of  a ban on guns based on Alabama and Germany events.

  1. No more political posts (including on India) or poems (including funny) will be imposed on this blog or unsuspecting readers.*
  2. I use an offline blogging system called Windows Live Writer and do not go to approve the comments online (requires me to login to word press and manually do it). All comments are read via email settings and feedback incorporated.Comments sometimes get deleted within 7 days because of auto settings (like me not going and logging in to WordPress for 7 days) not because I am ignoring anything
  3. You didn’t like the “Guns” article – you have the right to say so on the comments page. You didn’t like the R article or the package–comments page please.
  4. Read Page “Fine Print” . I use a professional analytics tracking system which I pay for every month- it tracks Ip address ,Ip provider, organization, location, time, country, with an integrated Google maps that allows me to see which block the material entered the net.Writing offensive comments from your work computer is not a great idea- not with the angry one.Not if you are……
  5. I never use the analytics system for individuals unless the comments are ghastly. Then I delete the comments and don’t use the analytics system for individuals.
  6. Akismet for spam will catch and has caught multiple attempts at malware linking and spamming.It will do so.
  7. Anonymous comments are not anonymous as explained above.

A blog on Decision Stats should quote political statements only when accompanied with statistics. or better still nothing but the statistics.So it will be.

*They will be posted on www.iwannacrib.com

Weathering the Stormy Economy

Here is a conference you may want to visit. At first glance it may look like one of those self-help “free” webinars but it is a very relevant topic with a great speaker. Plus it is on the web.

 

Free Seminar Hosted by SAP Business Objects
Thursday, March 12, 2009
11 a.m. PST / 2 p.m. EST
Robin Fray Carey, CEO of Social Media Today will discuss the best ideas gathered from MyVenturePad.com, SMT’s online community for growth companies. Plus, two fast-growing companies, Fresh Direct, and The Life is good Company, will share their practical recommendations on how to manage business and IT priorities in these challenging times. Register today.
http://events.businessobjects.com/forms/Q109/ideas/?source=SMtoday1

Social Media Today builds Wordframe based communities like Smart Data Collective ( for data ,BI,Analytics people) Best of the Blogs ( for progressive bloggers), Energy Collective ( for Green energy enthusiasts,thinkers and researchers) ,Social Media Today ( for understanding and leveraging Social Media and Networks) and My VenturePad (for Entrepreneurs).

These communities basically work as online newspapers by aggregating and moderating the RSS feeds of thousands of bloggers (for some sites) and their sites. I have written on Wordframe’s concept of content driven communities and Ning’s concept of community driven content earlier.

Disclaimer- I have worked as an evangelist to SMT , have been awarded the Blogger of the Week once (for my article on R).

For other conferences you may also want to see AnalyticBridge ‘s page on conferences.

http://www.analyticbridge.com/group/conferences/forum/topics/best-ideas-for-weathering-the

Disclaimer -I have been awarded the Member of the Month twice by them.

I like the third party apps of Ning better than the old outdated format and themes. One Ning application can actually serve as a competitor to Wordframe – that is the RSS application ( see feed on my page ).Wordframe has capabilities for even category level filters so Analytics category  feed goes to Smart Data ,Internet category feed gets published on Social Media (when i am lucky) and my attempts at poetry go to Best of The Blogs.

The Decision Stats group (on Linkedin) also has a group on AnalyticBridge.

But why join so many communities and go to webinars ? Because knowledge is useful and productive and fun – and I have a personal motto of learning one new thing a day .

Where do you get the time ?Just sleep one hour less and devote that one hour purely to your self learning for yourself.8 hours to the boss, 4-5 hours to the family.

1 hour to yourself ??

Sounds reasonable, eh  🙂

So try this one –

http://events.businessobjects.com/forms/Q109/ideas/?source=SMtoday1

Linkedin Tools : Getting job and contract

Here is a great tool by Linkedin to do the following – get a J   O  B

Where is it located ?

Look on Linked Page – Footer Area

Look in Row called Tools

Click on Jobs Insider

You get the below webpage-

image

Download and follow as per your browser. No Download for Chrome users.

But firefox is good enough.

 

Download it here

http://www.linkedin.com/static?key=jobsinsider_download&trk=hb_ft_jobsins

 

And enjoy Linkedin ‘s tool which is more useful than all the Facebook applications put together ….

Dataset too big for R ?

In case you have a dataset too big for fitting in memory for R, there is a package called biglm .

You install it like this-

install.packages("biglm", dep=TRUE)

 

 

  Information on package ‘biglm’

Description:

Package:       biglm
Type:          Package
Title:         bounded memory linear and generalized linear models
Version:       0.6
Date:          2005-09-24
Author:        Thomas Lumley
Maintainer:    Thomas Lumley <tlumley@u.washington.edu>
Description:   Regression for data too large to fit in memory
License:       GPL
Suggests:      RSQLite, RODBC
Enhances:      leaps
Packaged:      Tue Feb 24 10:47:44 2009; tlumley
Built:         R 2.8.1; i386-pc-mingw32; 2009-02-24 21:35:12; windows

Index:

bigglm                  Bounded memory linear regression
biglm                   Bounded memory linear regression
predict.bigglm          Predictions from a biglm/bigglm

and in case you are the statistical kind of chap who want to know what’s IN the code for these functions

function (formula, data, family = gaussian(), …)
UseMethod("bigglm", data)
<environment: namespace:biglm>

 

R tip of the day – If you want to know what an R Function say procmeans does…..all you need to do is type procmeans at the command prompt , and it will tell you what is inside the code.

If it gives an error most probably you need to

1) Install

and 2) Load the package containing the function

Which are conveniently here

image

credit source –http://www.nabble.com/R-f13819.html

Award : Analyticbridge.com

Blog post on http://www.analyticbridge.com/group/memberofthemonth

Ajay Ohri has been selected as our Member of the Month for the second time. Most recently, Ajay recruited great new members, posted numerous interesting messages on his blog, added many applications and feeds to his profile, and contributed in many other ways to make AnalyticBridge better.
To be eligible for the Member of the Month award, a candidate must ….

 

More details here

http://www.analyticbridge.com/group/memberofthemonth

ps — I am still waiting for the SAS Rookie Slumdog of the Year Award.

An R Package only for SAS Users

Dear All,

I am doing some research into creating a R Package for SAS language Users.

The name of the beta package is “ Anne”, but I am open to suggestions for the name please.

The basic idea is to enable SAS language Users (especially Windows SAS language  users) to get a feel to try out the R package without getting overwhelmed with the matrix level powerful capabilities as well as command line interface.

Creating new functions is quite easy as the following code shows.

The first R code for the “Anne 1.0” Package is

procunivariate(x) <- function(x) summary(x)

procimportcsv(x) <- function(x) read.table(x,header=TRUE,

                           + sep=”,”, row.names=”id”, na.string=”   “)

libname(x) <-function(x) setwd(x)

 

Note I am tweaking the code as we speak and would be trying to add one proc per week.

But how to put functions in a R Package ?

This is how to create a R package –( To be Continued)

Note- SAS here refers to SAS Language.