More Advanced SAS Modeling Procs

A special thanks to Peter Flom ( )for suggesting the following –


PROC NLMIXED can be viewed as generalizations of the random coefficient models fit by the MIXED procedure. This generalization allows the random coefficients to enter the model nonlinearly, whereas in PROC MIXED they enter linearly. With PROC MIXED you can perform both maximum likelihood and restricted maximum likelihood (REML) estimation, whereas PROC NLMIXED only implements maximum likelihood. This is because the analog to the REML method in PROC NLMIXED would involve a high dimensional integral over all of the fixed-effects parameters, and this integral is typically not available in closed form. Finally, PROC MIXED assumes the data to be normally distributed, whereas PROC NLMIXED enables you to analyze data that are normal, binomial, or Poisson or that have any likelihood programmable with SAS statements.

6) Proc Glimmix

PROC GLIMMIX fits statistical models to data with correlations or nonconstant variability and where the response is not necessarily normally distributed. These generalized linear mixed models (GLMM), like linear mixed models, assume normal (Gaussian) random effects. Conditional on these random effects, data can have any distribution in the exponential family. The binary, binomial, Poisson, and negative binomial distributions, for example, are discrete members of this family. The normal, beta, gamma, and chi-square distributions are representatives of the continuous distributions in this family.

Some PROC GLIMMIX features are:

  • Flexible covariance structures for random effects and correlated errors
  • Programmable link and variance functions
  • Bias-adjusted empirical covariance estimators
  • Univariate and multivariate low-rank smoothing
  • Joint modeling for multivariate data

Besides including performance enhancements and various fixes, the production release of the GLIMMIX procedure provides numerous additional features. These include:

  • ODS statistical graphics to display LS-means and confidence limits
  • Analysis of Means
  • Odds ratios
  • Custom hypotheses concerning LS-means with the LSMESTIMATE statement
  • New multiplicity adjustments
  • Beta regression


Ordinary least squares regression models the relationship between one or more covariates X and the conditional mean of the response variable Y given X=x. Quantile regression extends the regression model to conditional quantiles of the response variable, such as the 90th percentile. Quantile regression is particularly useful when the rate of change in the conditional quantile, expressed by the regression coefficients, depends on the quantile. The main advantage of quantile regression over least squares regression is its flexibility for modeling data with heterogeneous conditional distributions. Data of this type occur in many fields, including biomedicine, econometrics, and ecology.

Some PROC QUANTREG features are:

  • Implements the simplex, interior point, and smoothing algorithms for estimation
  • Provides three methods to compute confidence intervals for the regression quantile parameter: sparsity, rank, and resampling.
  • Provides two methods to compute the covariance and correlation matrices of the estimated parameters: an asymptotic method and a bootstrap method
  • Provides two tests for the regression parameter estimates: the Wald test and a likelihood ratio test
  • Uses robust multivariate location and scale estimates for leverage point detection
  • Multithreaded for parallel computing when multiple processors are available

4) Proc Catmod-

Categorical data with more than two factors are referred to as multi-dimensional distributions. Procedure CATMOD will be used for analyses concerning such data. PROC CATMOD may also be used to analyze one-and two-way data structures , however it is an effective means to approach more complex data structures.

PROC CATMOD utilizes a different technique to do categorical analysis than the ‘Pearson type’ chi-square. The analysis is based on a transformation of the cell probabilities. This transformation is called the response function. The exact form of the response function depends on the data type and it is normally motivated by certain theoretical considerations. SAS offers many different forms of response functions and even allows the user to specify their own, however, the most common (default) is the Generalized Logit. This function is defined as:

Generalized Logit = LOG(pi/pk),
where pi is the ith cell probability and pk is the last cell probability. The ratio of pi/pk is called an odds ratio and the log of the odds ratio is just a comparison of the ith category to the last, on a log scale. The logit can be rewritten as:
Generalized Logit = LOG(pi) – LOG(pk).
It should be noted that if there are k categories, then there will be only k-1 response functions since the kth one will be zero.

%d bloggers like this: