Learning R for SAS and SPSS Users

So you decided to cut down on your Statistical software expenses and decided to get R.

but the problem is you know SAS /SPSS and you need to learn R fast enough to justify switching over …….

the ideal book for you is  http://oit.utk.edu/scc/RforSAS&SPSSusers.pdf

Thanks to the guys who pointed me here. Its a really easy book, you have the SAS Syntax, the corresponding SPSS Syntax and the R Syntax.

 That’s useful for learners in R who got projects to execute, and need to learn either SPSS or R or even switch from SPSS to SAS.

The Nastiness Index

Reading about the latest back and forth and then back again rounds between not that conservative Republicans, conservative Republicans, liberal Democrats, not so liberal Democrats makes you wonder if there is some way of quantifying the daily dose of political news. If we can quantify the level of terror threat in color coding, can we create a blue and red nastiness index for quantifying the political debates in a country.

The Nastiness Index can be used to decribe political equilibrium in a country. eg in USA if the Index shows high blue , and less red it means the Democrats are pummeling the Republicans. If the Index shows low blue and low red, it means all is quiet. If it shows high red, and low blue , it means the opposition is down and the administration is up.Stock markets can use this index to quantify the level of political activity, inactivity and stability of elected governments.

If the nastiness index shows high blue and high red, it means its election time folks.

Model Presentation

Presenting a model is different from making a model, as the end audience is non technical and business minded. These are some thumb rules I use for making model presentation templates

1) Model Lift- How good is the model vs current effort.This is best shown by lift curves or KS statistics where you plot % Responders on X Axis and % Population on Y axis. Maximum separation between goods and bads is the KS statistic.

2) Model Robustness- What facts back up statistical validity of model output/equation ? Is there a way to test the model without executing it fully?

3) Model Assumptions- This deals with historical assumptions like which event is the model based on, data assumptions for validation and missing value treatment, capping of outliers.

The best way to convince business audiences is splitting the dataset into three random samples of 60 %,30 % and 10 % for model building, validation and testing again.

Then rerun the model equation on another random sample ,using a different seed in the RANUNI function. The KS should be similar and so should be the stats.

Ultimately models get validated or battered in the marketplace. A 1 % difference in response rates can make or lose hundreds of thousands of dollars especially in mass marketing or credit modeling. Business perspective and buy in is thus essential and so is continuous model performance feedback to avoid deterioration of  model, as it will eventually deteriorate over a period of time.

Politics in Analytics

Observers of American Electoral politics ,including the current Presidential Campaign would be struck by the sophisticated degree of analytics being involved. This includes the following –

1) Segmentation of likely percentage Response Rates (vote yes(1) , vote opponent (0)

based on

history of voting

response to stimuli (experience vs change)

ethnicity (black, white ,latino)

income groups (<40,000 USD ,>100000 USD)

education (college educated)

gender (male,female)

geography (rural ,urban,college town)

union affliation

and even coffee (latte drinkers etc 🙂 )

What is striking is that most of these variables like race, gender cannot be used for marketing anything else like credit cards, or financial services on charges of discrimination.

What could be really interesting is if they add credit bureau variables and create logistic models (and not just segmentation). Maybe by 2016, there will be a different category of analytics called Quantitative Political analytics.

Another note – What is similar between Ralph Nader , Chaos Theory and Butterfly effect.

Chaos theory states that future results can vary a lot based on slight changes in initial differences.

Butterfly Effect uses this to say a small event like butterfly fluttering in china can cause a big event like typhoon in the US.

Ralph Nader entering the race in 2000 got 90000 votes in Florida, mostly to be siphoned away from Al Gore , who lost Florida and the elections by less than 1000 votes.

Al Gore was against Iraq war since the beginning and had he been President maybe the world would have been greener and no war in Iraq. maybe. No offense meant to anyone.

Howard Dean screaming or Bill Clinton calling Obama as similar to Jesse Jackson’s wins (which is analytically and quantitatively true) or Hillary crying in Iowa , can be similar butterfly effects.No offense meant to anyone.