Analytics – Page 152 – DECISION STATS

KXEN releases Social Network Analysis tool

KXEN, the automated model making software company added one more innovation by being one of the first major data mining and analytics vendors in releasing an analytics tool for Social Network Analysis

From the Press Release

Press release

San Francisco, Paris, London, March 24^th, 2009

New Social Network Analysis Module Strengthens KXEN Automated Data Mining

Leverages connections between people to boost marketing campaign results and profitability

Sales, marketing and customer retention campaigns are set to become smarter, more effective and more profitable thanks to a new social network analysis module from data mining automation vendor KXEN. By exploiting the connections between customers of telcos, banks, retails and others, KXEN’s new KSN module has shown more than 15% lift improvement in campaign results.

KSN identifies the otherwise hidden links � call records or bank transfers for instance -between friends, families, co-workers and other communities and extracts significant social metrics, pinpointing who are the best connected and who plays the most important role in any group. In this way it reveals valuable new customer intelligence that – when added to existing customer information – can strengthen significantly user organizations’ customer acquisition, retention, cross-sell and up-sell campaigns.

Using KSN, companies can increase the accuracy and precision of their campaigns by leveraging the many more customer attributes that the module reveals, allowing them to better predict when customers may be about to churn to another provider, close an account, or buy a new product. A feature unique-to-KXEN allows the analysis of multiple networks and their evolution over time, exposing specific patterns of behaviors like rotational churn, fraud and identity theft.

"Traditional marketing relies on models based solely on customer-vendor interaction and assumes customers act independently of each other," says KXEN’s founder and CEO Roger Haddad. "But the new social network technology in KSN recognizes that customers do indeed interact with each other, and exploits that knowledge to drive up the effectiveness and completeness of marketing and sales campaigns."

KSN integrates with and enriches existing data mining environments or may be deployed entirely standalone. Exploiting viral marketing thinking, it eliminates the normally tedious and labor intensive aspects of social network analysis. The module provides as many social network maps as users want, recognizing that individuals may belong to many different networks across their business, family and social lives.

KSN, shipping from today, complements the existing KXEN Text Coder module which allows organizations to include plain text data into their analytics activities. Together the two modules, along with KXEN’s core software, are behind the company’s Data Fusion approach which combines structured and unstructured data from multiple sources to generate fast accurate results, thus maximizing organizations� return on analytics investments.

Please click here to learn more about the KXEN Social Network Analysis module.

About KXEN

KXEN, The Data Mining Automation Company� delivers next-generation Customer Lifecycle Analytics to enterprises that depend on analytics as a competitive advantage. KXEN’s Data Mining Automation Solution drives significant improvements in customer acquisition, retention, cross-sell and risk applications. Its solution integrates predictive analytics into strategic business processes, allowing customers to drive greater value into their business. Find out more by visiting www.kxen.com.

I found the statement by the CEO quite interesting –

"Traditional marketing relies on models based solely on customer-vendor interaction and assumes customers act independently of each other," says KXEN’s founder and CEO Roger Haddad. "But the new social network technology in KSN recognizes that customers do indeed interact with each other, and exploits that knowledge to drive up the effectiveness and completeness of marketing and sales campaigns."

I wonder if other analytics vendors are creating /releasing products like these.

Chrome Experiments

Here are some nice data visualization methods elaborated in�http://www.chromeexperiments.com/

�

I created one using Social Collider for searching @smartdataco and generated this data map coolied

The site �( which goes by the tag of �Not your mother’s Javascript) is created by Google ,Creator of Chrome Browser.

�

In light of these deeply held beliefs, we created this site to showcase cool experiments for both JavaScript and web browsers.

These experiments were created by designers and programmers from around the world. Their work is making the web faster, more fun, and more open � the same spirit in which we built Google Chrome.

Here is an experiment called Canopy available at�http://www.chromeexperiments.com/detail/canopy/

It generates Fractals

�

Another more useful experiment is Social Collider which enables you to search Twitter for specific words, and create a data map for that

SAS Global Conference 2009

The resources for SAS Global Conference are now online at

http://support.sas.com/resources/papers/proceedings09/TOC.html

The SAS Global Conference starts next week on March 22 till March 25 in Washington D.C.It is one of the oldest ,most renowned community conferences for any statistical software. Ever.

Here is a link to the SAS2009 Ballot Results in which users were polled on what features they like /dislike and want added to SAS Institute�s suite of products and indeed to the SAS Language itself

http://support.sas.com/resources/papers/proceedings09/Ballot09.pdf

I really liked the blog as well the YouTube video here –http://blogs.sas.com/sgf/

Citation:

SAS Institute Inc. 2009. Proceedings of
the SAS� Global Forum 2009 Conference. Cary, NC: SAS Institute Inc.

Business Intelligence and The Heisenberg Principle

The Heisenberg Principle states that for certain things accuracy and certainty in knowing one quality ( say position of an atom) has to be a trade off with certainty of another quality (like momentum). I was drawn towards the application of this while in an email interaction with Edith Ohri , who is a leading data mining person in Israel and has her own customized GT solution.Edith said that it seems it is impossible to have data that is both accurate (data quality) and easy to view across organizations (data transparency). More often than not the metrics that we measure are the metrics we are forced to measure due to data adequacy and data quality issues.

Now there exists a tradeoff in the price of perfect information in managerial economics , but is it really true that the Business Intelligence we deploy is more often than not constrained by simple things like input data and historic database tables.And that more often than not Data quality is the critical constraint that determines speed and efficacy of deployment.

I personally find that much more of the time in database projects goes in data measurement,aggregation, massaging outliers, missing value assumptions than in the �high value� activities like insight generation and business issue resolution.

Is it really true ? Analysis is easy but the data which is tough ?

What do you think in terms of the uncertainty inherent in data quality and data transparency-

Modeling Visualization Macros

Here is a nice SAS Macro from Wensui�s blog at http://statcompute.spaces.live.com/blog/

Its particularly useful for Modelling chaps, I have seen a version of this Macro sometime back which had curves also plotted but this one is quite nice too

SAS MACRO TO CALCULATE GAINS CHART WITH KS

%macro ks(data = , score = , y = );

options nocenter mprint nodate;

data _tmp1;
set &data;
where &score ~= . and y in (1, 0);
random = ranuni(1);
keep &score &y random;
run;

proc sort data = _tmp1 sortsize = max;
by descending &score random;
run;

data _tmp2;
set _tmp1;
by descending &score random;
i + 1;
run;

proc rank data = _tmp2 out = _tmp3 groups = 10;
var i;
run;

proc sql noprint;
create table
_tmp4 as
select
i + 1      as decile,
count(*)    as cnt,
sum(&y)    as bad_cnt,
min(&score) as min_scr format = 8.2,
max(&score) as max_scr format = 8.2
from
_tmp3
group by
i;

select
sum(cnt) into :cnt
from
_tmp4;

select
sum(bad_cnt) into :bad_cnt
from
_tmp4;
quit;

data _tmp5;
set _tmp4;
retain cum_cnt cum_bcnt cum_gcnt;
cum_cnt + cnt;
cum_bcnt + bad_cnt;
cum_gcnt + (cnt – bad_cnt);
cum_pct = cum_cnt / &cnt;
cum_bpct = cum_bcnt / &bad_cnt;
cum_gpct = cum_gcnt / (&cnt – &bad_cnt);
ks       = (max(cum_bpct, cum_gpct) – min(cum_bpct, cum_gpct)) * 100;

format cum_bpct percent9.2 cum_gpct percent9.2
         ks      6.2;

label decile    = ‘DECILE’
        cnt       = ‘#FREQ’
        bad_cnt   = ‘#BAD’
        min_scr   = ‘MIN SCORE’
        max_scr   = ‘MAX SCORE’
        cum_gpct = ‘CUM GOOD%’
        cum_bpct = ‘CUM BAD%’
        ks        = ‘KS’;
run;

title "%upcase(&score) KS";
proc print data = _tmp5 label noobs;
var decile cnt bad_cnt min_scr max_scr cum_bpct cum_gpct ks;
run;
title;

proc datasets library = work nolist;
delete _: / memtype = data;
run;
quit;

%mend ks;

data test;
do i = 1 to 1000;
    score = ranuni(1);
    if score * 2 + rannor(1) * 0.3 > 1.5 then y = 1;
    else y = 0;
    output;
end;
run;

%ks(data = test, score = score, y = y);

/*
SCORE KS
                                MIN         MAX

DECILE #FREQ #BAD SCORE SCORE CUM BAD% CUM GOOD% KS

1 100 87 0.91 1.00 34.25% 1.74% 32.51

2 100 78 0.80 0.91 64.96% 4.69% 60.27

3 100 49 0.69 0.80 84.25% 11.53% 72.72

4 100 25 0.61 0.69 94.09% 21.58% 72.51

5 100 11 0.51 0.60 98.43% 33.51% 64.91

6 100 3 0.40 0.51 99.61% 46.51% 53.09

7 100 1 0.32 0.40 100.00% 59.79% 40.21

&#
160; 8 100 0 0.20 0.31 100.00% 73.19% 26.81

9 100 0 0.11 0.19 100.00% 86.60% 13.40

10 100 0 0.00 0.10 100.00% 100.00% 0.00

*/

Its particularly useful for Modelling , I have seen a version of this Macro sometime back which had curves also plotted but this one is quite nice too.

Here is another example of a SAS Macro for ROC Curve and this one comes from http://www2.sas.com/proceedings/sugi22/POSTERS/PAPER219.PDF

APPENDIX A
Macro
/***************************************************************/;
/* MACRO PURPOSE: CREATE AN ROC DATASET AND PLOT */;
/* */;
/* VARIABLES INTERPRETATION */;
/* */;
/* DATAIN INPUT SAS DATA SET */;
/* LOWLIM MACRO VARIABLE LOWER LIMIT FOR CUTOFF */;
/* UPLIM MACRO VARIABLE UPPER LIMIT FOR CUTOFF */;
/* NINC MACRO VARIABLE NUMBER OF INCREMENTS */;
/* I LOOP INDEX */;
/* OD OPTICAL DENSITY */;
/* CUTOFF CUTOFF FOR TEST */;
/* STATE STATE OF NATURE */;
/* TEST QUALITATIVE RESULT WITH CUTOFF */;
/* */;
/* DATE WRITTEN BY */;
/* */;
/* 09-25-96 A. STEAD */;
/***************************************************************/;
%MACRO ROC(DATAIN,LOWLIM,UPLIM,NINC=20);
OPTIONS MTRACE MPRINT;
DATA ROC;
SET &DATAIN;
LOWLIM = &LOWLIM; UPLIM = &UPLIM; NINC = &NINC;
DO I = 1 TO NINC+1;
CUTOFF = LOWLIM + (I-1)*((UPLIM-LOWLIM)/NINC);
IF OD > CUTOFF THEN TEST="R"; ELSE TEST="N";
OUTPUT;
END;
DROP I;
RUN;
PROC PRINT;
RUN;
PROC SORT; BY CUTOFF;
RUN;
PROC FREQ; BY CUTOFF;
TABLE TEST*STATE / OUT=PCTS1 OUTPCT NOPRINT;
RUN;
DATA TRUEPOS; SET PCTS1; IF STATE="P" AND TEST="R";
TP_RATE = PCT_COL; DROP PCT_COL;
RUN;
DATA FALSEPOS; SET PCTS1; IF STATE="N" AND TEST="R";
FP_RATE = PCT_COL; DROP PCT_COL;
RUN;
DATA ROC; MERGE TRUEPOS FALSEPOS; BY CUTOFF;
IF TP_RATE = . THEN TP_RATE=0.0;
IF FP_RATE = . THEN FP_RATE=0.0;
RUN;
PROC PRINT;
RUN;
PROC GPLOT DATA=ROC;
PLOT TP_RATE*FP_RATE=CUTOFF;
RUN;
%MEND;

VERSION 9.2 of SAS has a macro called %ROCPLOT http://support.sas.com/kb/25/018.html

SPSS also uses ROC curve and there is a nice document here on that

http://www.childrensmercy.org/stats/ask/roc.asp

Here are some examples from R with the package ROCR from

http://rocr.bioinf.mpi-sb.mpg.de/

Using ROCR’s 3 commands to produce a simple ROC plot:
pred <- prediction(predictions, labels)
perf <- performance(pred, measure = "tpr", x.measure = "fpr")
plot(perf, col=rainbow(10))

The graphics are outstanding in the R package and here is an example

Citation:

Tobias Sing, Oliver Sander, Niko Beerenwinkel, Thomas Lengauer.
ROCR: visualizing classifier performance in R.
Bioinformatics 21(20):3940-3941 (2005).

More Ways to get a Scoring Model wrong

I got the following answer from Linkedin groups�http://www.linkedin.com/groupAnswers?viewQuestionAndAnswers=&gid=53432&discussionID=1946379&commentID=2213879&goback=.mgr_false_0_DATE.mgr_true_1_DATE.mid_1066685320#commentID_2213879

�

on my Ten Ways to get a Scoring Model Wrong.

�Typo�
Refuse to use central tendency to patch missing values. Instead, assign highest response rate because WOE says so�
Marketing people tell me to force the variable into the model�
�Selection bias�
�Forgot to segment�
Solely rely on data to segment without consulting the biz side�
�Just delete observations with missing values, OK, without studying geometricl boundaries�
�Using oversampling, but refuse to weight it back. That boosts lift, right? Let us do 50-50�
Insist random sampling is sufficient, while stratified sampling is critical�
Binning too much, or two little�
Selecting variables without repeated sampling�
Forgot to exclude numeric customer id from the candidate variables. AND,it pops….Well, both Unica and Kxen accepted it, So I see no problem�
When the same variable is sourced by different vendors, did not look up the scales under the same name. Just combine them�
�Well, SAS Enterprise Miner gave me this model yesterday�
The binary variable is statistically significant, but there are only 27 event=1, out of ~1mm, since only 27 made some purchases..�
Well, I only have 250 events=1. But I think I can use exact logistic to make it up, all right? I got a PHD in Statistics, Trust me, my professor is OK with it. I just called her.�
�Build two-stage model without Heckman adjustment�
Use global mean over the WHOLE customer base to replace missing value on a much smaller universe/subset. So average networth of a high networth client group has 22% worth only 225K�
I just spent the past two days boosting R-square. Now it is 92. Great.�
Forgot to set descending option in proc logistic in SAS�
I think we should hold out missing values when conducting EDA.�
Without proper separation of ‘treatment and control�
Treat business entities and individuals as equal and mix them in the same universe
Runing clustering without validation�
Running discriminant model without validation. So correct classification rate on development is 89% and that over validation is …35%.(no wonder you finished it in two hours and came here to ask me for a raise)�
Disregard link function in multi-nomil models�
I think this is a better variable: xnew=y*y*y*. It is the top variable dominating others.�
Use standardized coefficient to calculate relative importance, because many people are doing and marketing loves it.�
I tried Goolge Analtyics last Friday. It recommends this variable: click stream density over Thanksgivning weekend, on my web portal, on this item�
�Let us treat this matrix as unary so we can apply Euclidean, since that runs faster and has a lot of optimal properties. It makes our life easier�
Let us use score from that model to boost this model and use score from this model to boost it back. Is that what they call neural nets, Jia?�

Enough?

�

31 Ways to get a model wrong – and Hats off to a fellow mate in suffering -Jia

Coming up – One Way to get a scoring model correct

Please share:

Please share:

Please share:

Please share:

Please share:

SAS MACRO TO CALCULATE GAINS CHART WITH KS

DECILE #FREQ #BAD SCORE SCORE CUM BAD% CUM GOOD% KS

1 100 87 0.91 1.00 34.25% 1.74% 32.51

2 100 78 0.80 0.91 64.96% 4.69% 60.27

3 100 49 0.69 0.80 84.25% 11.53% 72.72

4 100 25 0.61 0.69 94.09% 21.58% 72.51

5 100 11 0.51 0.60 98.43% 33.51% 64.91

6 100 3 0.40 0.51 99.61% 46.51% 53.09

7 100 1 0.32 0.40 100.00% 59.79% 40.21

&# 160; 8 100 0 0.20 0.31 100.00% 73.19% 26.81

9 100 0 0.11 0.19 100.00% 86.60% 13.40

10 100 0 0.00 0.10 100.00% 100.00% 0.00

*/

Please share:

Please share:

&#
160; 8 100 0 0.20 0.31 100.00% 73.19% 26.81