More Ways to get a Scoring Model wrong

I got the following answer from Linkedin groups

on my Ten Ways to get a Scoring Model Wrong.

  1. Typo
  2. Refuse to use central tendency to patch missing values. Instead, assign highest response rate because WOE says so
  3. Marketing people tell me to force the variable into the model
  4. Selection bias
  5. Forgot to segment
  6. Solely rely on data to segment without consulting the biz side
  7. Just delete observations with missing values, OK, without studying geometricl boundaries
  8. Using oversampling, but refuse to weight it back. That boosts lift, right? Let us do 50-50
  9. Insist random sampling is sufficient, while stratified sampling is critical
  10. Binning too much, or two little
  11. Selecting variables without repeated sampling
  12. Forgot to exclude numeric customer id from the candidate variables. AND,it pops….Well, both Unica and Kxen accepted it, So I see no problem
  13. When the same variable is sourced by different vendors, did not look up the scales under the same name. Just combine them
  14. Well, SAS Enterprise Miner gave me this model yesterday
  15. The binary variable is statistically significant, but there are only 27 event=1, out of ~1mm, since only 27 made some purchases..
  16. Well, I only have 250 events=1. But I think I can use exact logistic to make it up, all right? I got a PHD in Statistics, Trust me, my professor is OK with it. I just called her.
  17. Build two-stage model without Heckman adjustment
  18. Use global mean over the WHOLE customer base to replace missing value on a much smaller universe/subset. So average networth of a high networth client group has 22% worth only 225K
  19. I just spent the past two days boosting R-square. Now it is 92. Great.
  20. Forgot to set descending option in proc logistic in SAS
  21. I think we should hold out missing values when conducting EDA.
  22. Without proper separation of ‘treatment and control
  23. Treat business entities and individuals as equal and mix them in the same universe
  24. Runing clustering without validation
  25. Running discriminant model without validation. So correct classification rate on development is 89% and that over validation is …35%.(no wonder you finished it in two hours and came here to ask me for a raise)
  26. Disregard link function in multi-nomil models
  27. I think this is a better variable: xnew=y*y*y*. It is the top variable dominating others.
  28. Use standardized coefficient to calculate relative importance, because many people are doing and marketing loves it.
  29. I tried Goolge Analtyics last Friday. It recommends this variable: click stream density over Thanksgivning weekend, on my web portal, on this item
  30. Let us treat this matrix as unary so we can apply Euclidean, since that runs faster and has a lot of optimal properties. It makes our life easier
  31. Let us use score from that model to boost this model and use score from this model to boost it back. Is that what they call neural nets, Jia?


31 Ways to get a model wrong – and Hats off to a fellow mate in suffering -Jia

Coming up – One Way to get a scoring model correct

Author: Ajay Ohri

2 thoughts on “More Ways to get a Scoring Model wrong”

  1. Hello Ajay

    Many thanks for this list. A fine mixture of “haha”, “umm.. I did this wrong, did I” and many “yes, yes”. However, I must admit I did not understand point 30. What is an unary matrix ? English is not my native tongue and I also have no mathematics book in english available, web search results confused me …can you point me to an explanation ?

    kind regards,


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s