Building a Regression Model in R – Use #Rstats

One of the most commonly used uses of Statistical Software is building models, and that too logistic regression models for propensity in marketing of goods and services.

 

If building a model is what you do-here is a brief easy essay on  how to build a model in R.

1) Packages to be used-

For smaller datasets

use these

  1. CAR Package http://cran.r-project.org/web/packages/car/index.html
  2. GVLMA Package http://cran.r-project.org/web/packages/gvlma/index.html
  3. ROCR Package http://rocr.bioinf.mpi-sb.mpg.de/
  4. Relaimpo Package
  5. DAAG package
  6. MASS package
  7. Bootstrap package
  8. Leaps package

Also see

http://cran.r-project.org/web/packages/rms/index.html or RMS package

rms works with almost any regression model, but it was especially written to work with binary or ordinal logistic regression, Cox regression, accelerated failure time models, ordinary linear models, the Buckley-James model, generalized least squares for serially or spatially correlated observations, generalized linear models, and quantile regression.

For bigger datasets also see Biglm http://cran.r-project.org/web/packages/biglm/index.html and RevoScaleR packages.

http://www.revolutionanalytics.com/products/enterprise-big-data.php

2) Syntax

  1. outp=lm(y~x1+x2+xn,data=dataset) Model Eq
  2. summary(outp) Model Summary
  3. par(mfrow=c(2,2)) + plot(outp) Model Graphs
  4. vif(outp) MultiCollinearity
  5. gvlma(outp) Heteroscedasticity using GVLMA package
  6. outlierTest (outp) for Outliers
  7. predicted(outp) Scoring dataset with scores
  8. anova(outp)
  9. > predict(lm.result,data.frame(conc = newconc), level = 0.9, interval = “confidence”)

 

For a Reference Card -Cheat Sheet see

http://cran.r-project.org/doc/contrib/Ricci-refcard-regression.pdf

3) Also read-

http://cran.r-project.org/web/views/Econometrics.html

http://cran.r-project.org/web/views/Robust.html

 

Analytics 2011 Conference

From http://www.sas.com/events/analytics/us/

The Analytics 2011 Conference Series combines the power of SAS’s M2010 Data Mining Conference and F2010 Business Forecasting Conference into one conference covering the latest trends and techniques in the field of analytics. Analytics 2011 Conference Series brings the brightest minds in the field of analytics together with hundreds of analytics practitioners. Join us as these leading conferences change names and locations. At Analytics 2011, you’ll learn through a series of case studies, technical presentations and hands-on training. If you are in the field of analytics, this is one conference you can’t afford to miss.

Conference Details

October 24-25, 2011
Grande Lakes Resort
Orlando, FL

Analytics 2011 topic areas include:

Browser Based Model Creation

Here are some fabulous applications at http://yeroon.net – if you are in the field of data and / or analytics you should try and dekko this site- it is created by UCLA’s department of statistics.

You can create stockplots ( something similar to based to Yahoo and Google finance which I have covered earlier)

or create ggplot visualizations

or create a linear model

Just using a browser to upload the dataset and thats all the hard/soft ware you need .

Note the background uses R. It would be interesting if companies like Revolution R, SAS and SPSS can do in this browser based computing ( maybe charge like Amazon Ec2 apis)

Kudos and credits to http://www.stat.ucla.edu/~jeroen/