1) removing duplicates from a dataset based on certain key values/variables
2) merging two datasets based on a common key/variable/s
3) creating a subset based on a conditional value of a variable
4) creating a subset based on a conditional value of a time-date variable
5) changing format from one date time variable to another
6) doing a means grouped or classified at a level of aggregation
7) creating a new variable based on if then condition
8) creating a macro to run same program with different parameters
9) creating a logistic regression model, scoring dataset,
10) transforming variables
11) checking roc curves of model
12) splitting a dataset for a random sample (repeatable with random seed)
13) creating a cross tab of all variables in a dataset with one response variable
14) creating bins or ranks from a certain variable value
15) graphically examine cross tabs
16) histograms
17) plot(density())
18)creating a pie chart
19) creating a line graph, creating a bar graph
20) creating a bubbles chart
21) running a goal seek kind of simulation/optimization
22) creating a tabular report for multiple metrics grouped for one time/variable
23) creating a basic time series forecast
and some case studies I could think of-
As the Director, Risk you have to examine the overdue mortgages book that your predecessor left you. You need to optimize collections and minimize fraud and write-offs, and your efforts would be measured in maximizing profits from your department.
As a social media consultant you have been asked to maximize social media analytics and social media exposure to your client. You need to create a mechanism to report particular brand keywords, as well as automated triggers between unusual web activity, and statistical analysis of the website analytics metrics. Above all it needs to be set up in an automated reporting dashboard .
As a consultant to a telecommunication company you are asked to monitor churn and review the existing churn models. Also you need to maximize advertising spend on various channels. The problem is there are a large number of promotions always going on, some of the data is either incorrectly coded or there are interaction effects between the various promotions.
As a modeller you need to do the following-
1) Check ROC and H-L curves for existing model
2) Divide dataset in random splits of 40:60
3) Create multiple aggregated variables from the basic variables
Related Articles
- Some Basics about Stats (psipsychologytutor.org)
- Learn Logistic Regression (and beyond) (win-vector.com)
- So You Call Yourself an Analyst? Part 2: Analysis Redefined (seomoz.org)
- 3 More Google Analytics Tips (gigaom.com)
- What do practitioners need to know about regression? (stat.columbia.edu)
- Using R for Introductory Statistics, Chapter 4, Model Formulae (r-bloggers.com)
- sab-R-metrics: Basics of Vectors and Data Calling (r-bloggers.com)
- sab-R-metrics: Subsetting, Conditional Statements, ‘tapply()’, and VERY simple ‘for loops’ (r-bloggers.com)
- A Volatile Look At HSQLDB (designbygravity.wordpress.com)
- Calpont InfiniDB Speeds Web Analytics Performance For Cognitive Match (prweb.com)
- Case Study: Using a Scenario to Select Business Intelligence Software (customerthink.com)