Math – Page 3 – DECISION STATS

PSPP – SPSS ‘s Open Source Counterpart

New Website for Windows Installers for PSPP– try at your own time if you are dedicated to either SPSS or free statistical computing.

http://pspp.awardspace.com/

This page is intended to give a stable root for downloading the PSPP-for-Windows setup from free mirrors.

Highlights of the current PSPP-for-Windows setup

PSPP info:

Current version:	Master version = 0.7.6
Release date:	See filenames
Information about PSPP:	http://www.gnu.org/software/pspp
PSPP Manual:	PDF or HTML (current version will be installed on your PC by the installer package)

Package info:

Windows version:	Windows XP and newer
Package Size:	15 Mb
Size on disk:	34 Mb
Technical:	MinGW based Cross-compiled on openSUSE 11.3

Downloads:
There are issues with the latest build. Some users report crashes on their systems on other systems it works fine.

Version	Installer for multi-user installation. Administrator privileges required. Recommended version.	Installer for single-user installation. No administrator privileges required
0.7.6-g38ba1e-blp-build20101116 0.7.5-g805e7e-blp-build20100908 0.7.5-g7803d3-blp-build20100820 0.7.5-g333ac4-blp-build20100727	PSPP-Master-2010-11-16 PSPP-Master-2010-09-08 PSPP-Master-2010-08-20 PSPP-Master-2010-07-27	PSPP-Master-single-user-2010-11-16 PSPP-Master-single-user-2010-09-08 PSPP-Master-single-user-2010-08-20 PSPP-Master-single-user-2010-07-27

Sources can be found here.

Also see http://en.wikipedia.org/wiki/PSPP

At the user’s choice, statistical output and graphics are done in ASCII, PDF, PostScript or HTML formats. A limited range of statistical graphs can be produced, such as histograms, pie-charts and np-charts.

PSPP can import Gnumeric, OpenDocument and Excel spreadsheets, Postgres databases, comma-separated values– and ASCII-files. It can export files in the SPSS ‘portable’ and ‘system’ file formats and to ASCII files. Some of the libraries used by PSPP can be accessed programmatically; PSPP-Perl provides an interface to the libraries used by PSPP.

and

http://www.gnu.org/software/pspp/

A brief list of some of the features of PSPP follows:

Supports over 1 billion cases.
Supports over 1 billion variables.
Syntax and data files are compatible with SPSS.
Choice of terminal or graphical user interface.
Choice of text, postscript or html output formats.
Inter-operates with Gnumeric, OpenOffice.Org and other free software.
Easy data import from spreadsheets, text files and database sources.
Fast statistical procedures, even on very large data sets.
No license fees.
No expiration period.
No unethical “end user license agreements”.
Fully indexed user manual.
Free Software; licensed under GPLv3 or later.
Cross platform; Runs on many different computers and many different operating systems.

R ready to Deduce you (ekonometrics.blogspot.com)
SPSS Guru embraces the freeware, R (ekonometrics.blogspot.com)
Create a Web Crawler in R (r-bloggers.com)

Open Source Compiler for SAS language/ GNU -DAP

I am still testing this out.

But if you know bit more about make and .compile in Ubuntu check out

http://www.gnu.org/software/dap/

I loved the humorous introduction

Dap is a small statistics and graphics package based on C. Version 3.0 and later of Dap can read SBS programs (based on the utterly famous, industry standard statistics system with similar initials – you know the one I mean)! The user wishing to perform basic statistical analyses is now freed from learning and using C syntax for straightforward tasks, while retaining access to the C-style graphics and statistics features provided by the original implementation. Dap provides core methods of data management, analysis, and graphics that are commonly used in statistical consulting practice (univariate statistics, correlations and regression, ANOVA, categorical data analysis, logistic regression, and nonparametric analyses).

Anyone familiar with the basic syntax of C programs can learn to use the C-style features of Dap quickly and easily from the manual and the examples contained in it; advanced features of C are not necessary, although they are available. (The manual contains a brief introduction to the C syntax needed for Dap.) Because Dap processes files one line at a time, rather than reading entire files into memory, it can be, and has been, used on data sets that have very many lines and/or very many variables.

I wrote Dap to use in my statistical consulting practice because the aforementioned utterly famous, industry standard statistics system is (or at least was) not available on GNU/Linux and costs a bundle every year under a lease arrangement. And now you can run programs written for that system directly on Dap! I was generally happy with that system, except for the graphics, which are all but impossible to use, but there were a number of clumsy constructs left over from its ancient origins.

http://www.gnu.org/software/dap/#Sample output

Unbalanced ANOVA

Crossed, nested ANOVA

Random model, unbalanced

Mixed model, balanced

Mixed model, unbalanced

Split plot

Latin square

Missing treatment combinations

Linear regression

Linear regression, model building

Ordinal cross-classification

Stratified 2×2 tables

Loglinear models

Logit model for linear-by-linear association

Logistic regression

sounds too good to be true- GNU /DAP joins WPS workbench and Dulles Open’s Carolina as the third SAS language compiler (besides the now defunct BASS software) see http://en.wikipedia.org/wiki/SAS_language#Controversy

Also see http://en.wikipedia.org/wiki/DAP_(software)

Dap was written to be a free replacement for SAS, but users are assumed to have a basic familiarity with the C programming language in order to permit greater flexibility. Unlike R it has been designed to be used on large data sets.

It has been designed so as to cope with very large data sets; even when the size of the data exceeds the size of the computer’s memory

R courses from Statistics.com (revolutionanalytics.com)
Categorical Data Analysis for the Behavioral and Social Sciences (psypress.com)
Skills of a good data miner (zyxo.wordpress.com)
GNU Octave 3.4 has just been released (gnu.org)
Revolution R Enterprise 4.2 now available (revolutionanalytics.com)

OK Cupid Data Visualization- Flow Chart to your Heart

Quite appropriate on a V Day, OK Cupid remains quite innovative how they use data (in this questionnaire data)

OkCupid: Finding your Valentine with R (revolutionanalytics.com)
OkCupid Demystifies Dating with Big Data (gigaom.com)
OkCupid’s Love Math Doesn’t Solve The Equation [They Blinded Us With Science] (jezebel.com)
OK Cupid Finds That It’s Our Differences That Make Us Attractive (Aw) (thegloss.com)
Match.com Buys OkCupid for $50M (appscout.com)

R Node- and other Web Interfaces to R

vector version of this image — Image via Wikipedia

R Node is a great web interface to R.

http://squirelove.net/r-node/doku.php

Features

Access to a R server backend via a web browser UI
The web browser UI works in all modern browsers, including IE 7 and 8 (excluding SVG based graphs).
Username/password login (both from the browser to the R-Node server, and from the R-Node server to Rserve and R).
- Per-user R sessions. Each user can have their own R workspace, or they can share.
Support for most R commands that perform statistical analysis and provide textual feedback.
Support for most standard R commands that provide graphical feedback via server side generation of the graphs. Some graphs (e.g. plot() can be plotted via SVG client-side as well).
Downloading of generated graphs.
Accessing R help files using help() and ? commands (Note R v2.10 altered how help is provided, so this currently is not working in R v2.10)
Uploading files to work with their data in R.

Many commands will work. Try a command, if it does not work, use the feedback button in the application to let us know.

Limitations

Various R functions are not supported. These include:
- Installation of new R packages.
- Searching of help via ??.
- Example calls (via example()).

Of course other Web Interfaces to R are-http://cran.r-project.org/doc/FAQ/R-FAQ.html#R-Web-Interfaces

First and now not so updated Rweb: Web-based Statistical Analysis Last Modified: 25-Jun-1999 JSS Paper (http://www.jstatsoft.org/v04/i01/

R-Online https://user.cs.tu-berlin.de/~ulfi/cgi-bin/r-online/r-online.cgi(The official FAQ seems outdated )

Rcgi (it is not clear if the project is still active as per official FQ) http://www.ms.uky.edu/~statweb/testR3.html

Rphp

http://dssm.unipa.it/R-php/

RWui

http://sysbio.mrc-bsu.cam.ac.uk/Rwui/

R.Rsp

http://cran.r-project.org/web/packages/R.rsp/index.html

RServe

http://www.rforge.net/Rserve/

http://www.rforge.net/doc/packages/Rserve/00Index.html

RPad

http://rpad.googlecode.com/svn-history/r76/Rpad_homepage/index.html

CGIwithR

JSS paper Citation. CGIwithR: Facilities for processing Web forms using R. Journal of Statistical Software, 8(10), pp. 1-8, 2003.

A lecture on aspects of using CGI

R Apache

http://biostat.mc.vanderbilt.edu/rapache/

Open Infrastructure for Outcomes with a live reporting module using RSessionDA
Free statistics software– Wessa server using R (see http://www.wessa.net/rwasp_arimaforecasting.wasp)

Wessa, P. (2011), Free Statistics Software, Office for Research Development and Education,
version 1.1.23-r6, URL http://www.wessa.net/

You can even see my results here

http://www.freestatistics.org/blog/index.php?v=date/2011/Feb/14/t12976948805e8vh3v1e680a0z.htm/

An impressive implementation of time series analysis based on R and Javascript. This web server creates separate browser windows for data entry, graphics, and procedure selection. These windows are dynamic. For example, after entering data there is no submit button to submit the data. The procedure selection window is used to start the analysis, which uses the current values in the data window.

http://alpha1.ism.ac.jp/inets2/parent_sample.html?

Online multivariate analysis and graphical displays from PBIL, Lyon

http://pbil.univ-lyon1.fr/Rweb/

An R web server for robust rank-based linear models

http://www.stat.wmich.edu/slab/RGLM/ (impressive except for the red font)

gWidgetsWWW

http://www.math.csi.cuny.edu/gWidgetsWWWrun/ex-index.R

To make an interactive GUI in gWidgets can be as easy as creating the following script:

w <- gwindow(’simple interactive GUI with one button’, visible=FALSE) g <- ggroup(cont=w) b <- gbutton(’click me’, cont=g, handler=function(h,...) { gmessage(’hello world’, parent=b) }) visible(w) <- TRUE

A big and slightly outdated resource page from (which I used for some find and seek of resources)

http://biostat.mc.vanderbilt.edu/twiki/bin/view/Main/StatCompCourse

AND

The famous site at http://www.yeroon.net/ggplot2/ (but no sharing of this site’s source code ,sigh!)

Thats all for now- but watch this space its exciting (to watch AND code) –

Code Enhancers for R

This page lists code editors (or IDE)

https://rforanalytics.wordpress.com/code-enhancers-for-r/

Graphical User Interfaces for R

https://rforanalytics.wordpress.com/graphical-user-interfaces-for-r/

ODBC /Databases for R

https://rforanalytics.wordpress.com/odbc-databases-for-r/

WebTunes provides Web-based iTunes interface (macworld.com)
5 Reasons to Use Twitter Web Interface (madrasgeek.com)
Getting Started With Riak & Python (pragmaticbadger.com)
Rserve – Binary R server – RForge.net (rforge.net)
How to Run Apache and Node.js on the Same Server (readwriteweb.com)
Rserve – TCP/IP interface to R – RoSuDa – Lehrstuhl für Rechnerorientierte Statistik und Datenanalyse – Universität Augsburg (stats.math.uni-augsburg.de)

Interview David Katz ,Dataspora /David Katz Consulting

Here is an interview with David Katz ,founder of David Katz Consulting (http://www.davidkatzconsulting.com/) and an analyst at the noted firm http://dataspora.com/. He is a featured speaker at Predictive Analytics World http://www.predictiveanalyticsworld.com/sanfrancisco/2011/speakers.php#katz)

Ajay- Describe your background working with analytics . How can we make analytics and science more attractive career options for young students

David- I had an interest in math from an early age, spurred by reading lots of science fiction with mathematicians and scientists in leading roles. I was fortunate to be at Harry and David (Fruit of the Month Club) when they were in the forefront of applying multivariate statistics to the challenge of targeting catalogs and other snail-mail offerings. Later I had the opportunity to expand these techniques to the retail sphere with Williams-Sonoma, who grew their retail business with the support of their catalog mailings. Since they had several catalog titles and product lines, cross-selling presented additional analytic challenges, and with the growth of the internet there was still another channel to consider, with its own dynamics.

After helping to found Abacus Direct Marketing, I became an independent consultant, which provided a lot of variety in applying statistics and data mining in a variety of settings from health care to telecom to credit marketing and education.

Students should be exposed to the many roles that analytics plays in modern life, and to the excitement of finding meaningful and useful patterns in the vast profusion of data that is now available.

Ajay- Describe your most challenging project in 3 decades of experience in this field.

David- Hard to choose just one, but the educational field has been particularly interesting. Partnering with Olympic Behavior Labs, we’ve developed systems to help identify students who are most at-risk for dropping out of school to help target interventions that could prevent dropout and promote success.

Ajay- What do you think are the top 5 trends in analytics for 2011.

David- Big Data, Privacy concerns, quick response to consumer needs, integration of testing and analysis into business processes, social networking data.

Ajay- Do you think techniques like RFM and LTV are adequately utilized by organization. How can they be propagated further.

David- Organizations vary amazingly in how sophisticated or unsophisticated the are in analytics. A key factor in success as a consultant is to understand where each client is on this continuum and how well that serves their needs.

Ajay- What are the various software you have worked for in this field- and name your favorite per category.

David- I started out using COBOL (that dates me!) then concentrated on SAS for many years. More recently R is my favorite because of its coverage, currency and programming model, and it’s debugging capabilities.

Ajay- Independent consulting can be a strenuous job. What do you do to unwind?

David- Cycling, yoga, meditation, hiking and guitar.

Biography-

David Katz, Senior Analyst, Dataspora, and President, David Katz Consulting.

David Katz has been in the forefront of applying statistical models and database technology to marketing problems since 1980. He holds a Master’s Degree in Mathematics from the University of California, Berkeley. He is one of the founders of Abacus Direct Marketing and was previously the Director of Database Development for Williams-Sonoma.

He is the founder and President of David Katz Consulting, specializing in sophisticated statistical services for a variety of applications, with a special focus on the Direct Marketing Industry. David Katz has an extensive background that includes experience in all aspects of direct marketing from data mining, to strategy, to test design and implementation. In addition, he consults on a variety of data mining and statistical applications from public health to collections analysis. He has partnered with consulting firms such as Ernst and Young, Prediction Impact, and most recently on this project with Dataspora.

For more on David’s Session in Predictive Analytics World, San Fransisco on (http://www.predictiveanalyticsworld.com/sanfrancisco/2011/agenda.php#day2-16a)

Room: Salon 5 & 6
4:45pm – 5:05pm

Track 2: Social Data and Telecom
Case Study: Major North American Telecom
Social Networking Data for Churn Analysis

A North American Telecom found that it had a window into social contacts – who has been calling whom on its network. This data proved to be predictive of churn. Using SQL, and GAM in R, we explored how to use this data to improve the identification of likely churners. We will present many dimensions of the lessons learned on this engagement.

Speaker: David Katz, Senior Analyst, Dataspora, and President, David Katz Consulting

Exhibit Hours
Monday, March 14th:10:00am to 7:30pm

Tuesday, March 15th:9:45am to 4:30pm

Skills of a good data miner (zyxo.wordpress.com)
Revolution Analytics CTO on Data Science (revolutionanalytics.com)
O’Reilly Strata – Tutorial data analytics (isabel-drost.de)
Revolution in the News (revolutionanalytics.com)

R Commander Plugins-20 and growing!

R Commander Extensions: Enhancing a Statistical Graphical User Interface by extending menus to statistical packages

R Commander ( see paper by Prof J Fox at http://www.jstatsoft.org/v14/i09/paper ) is a well known and established graphical user interface to the R analytical environment.
While the original GUI was created for a basic statistics course, the enabling of extensions (or plug-ins http://www.r-project.org/doc/Rnews/Rnews_2007-3.pdf ) has greatly enhanced the possible use and scope of this software. Here we give a list of all known R Commander Plugins and their uses along with brief comments.

DoE – http://cran.r-project.org/web/packages/RcmdrPlugin.DoE/RcmdrPlugin.DoE.pdf
doex
EHESampling
epack- http://cran.r-project.org/web/packages/RcmdrPlugin.epack/RcmdrPlugin.epack.pdf
Export- http://cran.r-project.org/web/packages/RcmdrPlugin.Export/RcmdrPlugin.Export.pdf
FactoMineR
HH
IPSUR
MAc- http://cran.r-project.org/web/packages/RcmdrPlugin.MAc/RcmdrPlugin.MAc.pdf
MAd
orloca
PT
qcc- http://cran.r-project.org/web/packages/RcmdrPlugin.qcc/RcmdrPlugin.qcc.pdf and http://cran.r-project.org/web/packages/qcc/qcc.pdf
qual
SensoMineR
SLC
sos
survival-http://cran.r-project.org/web/packages/RcmdrPlugin.survival/RcmdrPlugin.survival.pdf
SurvivalT
Teaching Demos

Note the naming convention for above e plugins is always with a Prefix of “RCmdrPlugin.” followed by the names above
Also on loading a Plugin, it must be already installed locally to be visible in R Commander’s list of load-plugin, and R Commander loads the e-plugin after restarting.Hence it is advisable to load all R Commander plugins in the beginning of the analysis session.

However the notable E Plugins are
1) DoE for Design of Experiments-
Full factorial designs, orthogonal main effects designs, regular and non-regular 2-level fractional
factorial designs, central composite and Box-Behnken designs, latin hypercube samples, and simple D-optimal designs can currently be generated from the GUI. Extensions to cover further latin hypercube designs as well as more advanced D-optimal designs (with blocking) are planned for the future.
2) Survival- This package provides an R Commander plug-in for the survival package, with dialogs for Cox models, parametric survival regression models, estimation of survival curves, and testing for differences in survival curves, along with data-management facilities and a variety of tests, diagnostics and graphs.
3) qcc -GUI for Shewhart quality control charts for continuous, attribute and count data. Cusum and EWMA charts. Operating characteristic curves. Process capability analysis. Pareto chart and cause-and-effect chart. Multivariate control charts
4) epack- an Rcmdr “plug-in” based on the time series functions. Depends also on packages like , tseries, abind,MASS,xts,forecast. It covers Log-Exceptions garch
and following Models -Arima, garch, HoltWinters
5)Export- The package helps users to graphically export Rcmdr output to LaTeX or HTML code,
via xtable() or Hmisc::latex(). The plug-in was originally intended to facilitate exporting Rcmdr
output to formats other than ASCII text and to provide R novices with an easy-to-use,
easy-to-access reference on exporting R objects to formats suited for printed output. The
package documentation contains several pointers on creating reports, either by using
conventional word processors or LaTeX/LyX.
6) MAc- This is an R-Commander plug-in for the MAc package (Meta-Analysis with
Correlations). This package enables the user to conduct a meta-analysis in a menu-driven,
graphical user interface environment (e.g., SPSS), while having the full statistical capabilities of
R and the MAc package. The MAc package itself contains a variety of useful functions for
conducting a research synthesis with correlational data. One of the unique features of the MAc
package is in its integration of user-friendly functions to complete the majority of statistical steps
involved in a meta-analysis with correlations. It uses recommended procedures as described in
The Handbook of Research Synthesis and Meta-Analysis (Cooper, Hedges, & Valentine, 2009).

A query to help for ??Rcmdrplugins reveals the following information which can be quite overwhelming given that almost 20 plugins are now available-

RcmdrPlugin.DoE::DoEGlossary
Glossary for DoE terminology as used in
RcmdrPlugin.DoE
RcmdrPlugin.DoE::Menu.linearModelDesign
RcmdrPlugin.DoE Linear Model Dialog for
experimental data
RcmdrPlugin.DoE::Menu.rsm
RcmdrPlugin.DoE response surface model Dialog
for experimental data
RcmdrPlugin.DoE::RcmdrPlugin.DoE-package
R-Commander plugin package that implements
design of experiments facilities from packages
DoE.base, FrF2 and DoE.wrapper into the
R-Commander
RcmdrPlugin.DoE::RcmdrPlugin.DoEUndocumentedFunctions
Functions used in menus
RcmdrPlugin.doex::ranblockAnova
Internal RcmdrPlugin.doex objects
RcmdrPlugin.doex::RcmdrPlugin.doex-package
Install the DOEX Rcmdr Plug-In
RcmdrPlugin.EHESsampling::OpenSampling1
Internal functions for menu system of
RcmdrPlugin.EHESsampling
RcmdrPlugin.EHESsampling::RcmdrPlugin.EHESsampling-package
Help with EHES sampling
RcmdrPlugin.Export::RcmdrPlugin.Export-package
Graphically export objects to LaTeX or HTML
RcmdrPlugin.FactoMineR::defmacro
Internal RcmdrPlugin.FactoMineR objects
RcmdrPlugin.FactoMineR::RcmdrPlugin.FactoMineR
Graphical User Interface for FactoMineR
RcmdrPlugin.IPSUR::IPSUR-package
An IPSUR Plugin for the R Commander
RcmdrPlugin.MAc::RcmdrPlugin.MAc-package
Meta-Analysis with Correlations (MAc) Rcmdr
Plug-in
RcmdrPlugin.MAd::RcmdrPlugin.MAd-package
Meta-Analysis with Mean Differences (MAd) Rcmdr
Plug-in
RcmdrPlugin.orloca::activeDataSetLocaP
RcmdrPlugin.orloca: A GUI for orloca-package
(internal functions)
RcmdrPlugin.orloca::RcmdrPlugin.orloca-package
RcmdrPlugin.orloca: A GUI for orloca-package
RcmdrPlugin.orloca::RcmdrPlugin.orloca.es
RcmdrPlugin.orloca.es: Una interfaz grafica
para el paquete orloca
RcmdrPlugin.qcc::RcmdrPlugin.qcc-package
Install the Demos Rcmdr Plug-In
RcmdrPlugin.qual::xbara
Internal RcmdrPlugin.qual objects
RcmdrPlugin.qual::RcmdrPlugin.qual-package
Install the quality Rcmdr Plug-In
RcmdrPlugin.SensoMineR::defmacro
Internal RcmdrPlugin.SensoMineR objects
RcmdrPlugin.SensoMineR::RcmdrPlugin.SensoMineR
Graphical User Interface for SensoMineR
RcmdrPlugin.SLC::Rcmdr.help.RcmdrPlugin.SLC
RcmdrPlugin.SLC: A GUI for slc-package
(internal functions)
RcmdrPlugin.SLC::RcmdrPlugin.SLC-package
RcmdrPlugin.SLC: A GUI for SLC R package
RcmdrPlugin.sos::RcmdrPlugin.sos-package
Efficiently search R Help pages
RcmdrPlugin.steepness::Rcmdr.help.RcmdrPlugin.steepness
RcmdrPlugin.steepness: A GUI for
steepness-package (internal functions)
RcmdrPlugin.steepness::RcmdrPlugin.steepness
RcmdrPlugin.steepness: A GUI for steepness R
package
RcmdrPlugin.survival::allVarsClusters
Internal RcmdrPlugin.survival Objects
RcmdrPlugin.survival::RcmdrPlugin.survival-package
Rcmdr Plug-In Package for the survival Package
RcmdrPlugin.TeachingDemos::RcmdrPlugin.TeachingDemos-package
Install the Demos Rcmdr Plug-In

New edition of “R Companion to Applied Regression” – by John Fox and Sandy Weisberg (r-bloggers.com)
Reasons for Transitioning to Vim: Bringing LaTeX, R, Sweave and More under One Roof (r-bloggers.com)

Common Analytical Tasks

WorldWarII-DeathsByCountry-Barchart — Image via Wikipedia

Some common analytical tasks from the diary of the glamorous life of a business analyst-

1) removing duplicates from a dataset based on certain key values/variables
2) merging two datasets based on a common key/variable/s
3) creating a subset based on a conditional value of a variable
4) creating a subset based on a conditional value of a time-date variable
5) changing format from one date time variable to another
6) doing a means grouped or classified at a level of aggregation
7) creating a new variable based on if then condition
8) creating a macro to run same program with different parameters
9) creating a logistic regression model, scoring dataset,
10) transforming variables
11) checking roc curves of model
12) splitting a dataset for a random sample (repeatable with random seed)
13) creating a cross tab of all variables in a dataset with one response variable
14) creating bins or ranks from a certain variable value
15) graphically examine cross tabs
16) histograms
17) plot(density())
18)creating a pie chart
19) creating a line graph, creating a bar graph
20) creating a bubbles chart
21) running a goal seek kind of simulation/optimization
22) creating a tabular report for multiple metrics grouped for one time/variable
23) creating a basic time series forecast

and some case studies I could think of-

As the Director, Analytics you have to examine current marketing efficiency as well as help optimize sales force efficiency across various channels. In addition you have to examine multiple sales channels including inbound telephone, outgoing direct mail, internet email campaigns. The datawarehouse is an RDBMS but it has multiple data quality issues to be checked for. In addition you need to submit your budget estimates for next year’s annual marketing budget to maximize sales return on investment.

As the Director, Risk you have to examine the overdue mortgages book that your predecessor left you. You need to optimize collections and minimize fraud and write-offs, and your efforts would be measured in maximizing profits from your department.

As a social media consultant you have been asked to maximize social media analytics and social media exposure to your client. You need to create a mechanism to report particular brand keywords, as well as automated triggers between unusual web activity, and statistical analysis of the website analytics metrics. Above all it needs to be set up in an automated reporting dashboard .

As a consultant to a telecommunication company you are asked to monitor churn and review the existing churn models. Also you need to maximize advertising spend on various channels. The problem is there are a large number of promotions always going on, some of the data is either incorrectly coded or there are interaction effects between the various promotions.

As a modeller you need to do the following-
1) Check ROC and H-L curves for existing model
2) Divide dataset in random splits of 40:60
3) Create multiple aggregated variables from the basic variables

4) run regression again and again

5) evaluate statistical robustness and fit of model

6) display results graphically

All these steps can be broken down in little little pieces of code- something which i am putting down a list of.

Are there any common data analysis tasks that you think I am missing out- any common case studies ? let me know.

Some Basics about Stats (psipsychologytutor.org)
Learn Logistic Regression (and beyond) (win-vector.com)
So You Call Yourself an Analyst? Part 2: Analysis Redefined (seomoz.org)
3 More Google Analytics Tips (gigaom.com)
What do practitioners need to know about regression? (stat.columbia.edu)
Using R for Introductory Statistics, Chapter 4, Model Formulae (r-bloggers.com)
sab-R-metrics: Basics of Vectors and Data Calling (r-bloggers.com)
sab-R-metrics: Subsetting, Conditional Statements, ‘tapply()’, and VERY simple ‘for loops’ (r-bloggers.com)
A Volatile Look At HSQLDB (designbygravity.wordpress.com)
Calpont InfiniDB Speeds Web Analytics Performance For Cognitive Match (prweb.com)
Case Study: Using a Scenario to Select Business Intelligence Software (customerthink.com)

Related Articles

Please share:

Related Articles

Please share:

Related Articles

Please share:

Rphp

RWui

R.Rsp

RServe

RPad

CGIwithR

R Apache

Code Enhancers for R

Graphical User Interfaces for R

ODBC /Databases for R

Related Articles

Please share:

David Katz, Senior Analyst, Dataspora, and President, David Katz Consulting.

Related Articles

Please share:

Related Articles

Please share:

Related Articles

Please share: