SAS Fun: Sudoko

Here is a SAS program to help you beat others at Sudoko, and impress people. It was written by a chap named Ryan Howard in 2006, and I am thankful for him in allowing me in sharing this.You can let us know if you find a puzzle it could not solve , or if you tweak the program a bit. The code is pasted below.

Have fun !

And the SAS paper on this was at SAS Global Forum 2007- the resulting
paper, “SAS and Sudoku”, was written by Richard DeVenezia, John Gerlach,
Larry Hoyle, Talbot Katz and Rick Langston, and can be viewed at
http://www2.sas.com/proceedings/forum2007/011-2007.pdf.

(p.s. I haven’t tested this on WPS , they still dont have the SAS Macro language ,but let me know if you have any equivalent in SPSS or R 🙂   )

*=============================================================================;
* sudoku.sas                                                                  ;
* Written by: Ryan Howard                                                     ;
* Date: Sept. 2006                                                            ;
*-----------------------------------------------------------------------------;
* Summary: This program solves sudoku puzzles consisting of a 9X9 matrix.     ;
*-----------------------------------------------------------------------------;
* Upgrade Ideas:  1. Add a GUI to collect the input numbers and display output;
*                 2. Expand logic to work for 16X16 matrices                  ;
*=============================================================================;

title;
options nodate nonumber;

data _null_;

    *-----------------------------------------------------------------------------;
    * input  inital values for each cell from puzzle                              ;
    *-----------------------------------------------------------------------------;

    _1111=9; _1112=.; _1113=.;   _1211=.; _1212=.; _1213=.;   _1311=1; _1312=.; _1313=.;
    _1121=5; _1122=.; _1123=.;   _1221=.; _1222=6; _1223=.;   _1321=.; _1322=4; _1323=2;
    _1131=.; _1132=.; _1133=.;   _1231=7; _1232=1; _1233=.;   _1331=5; _1332=.; _1333=.;

    _2111=.; _2112=.; _2113=2;   _2211=.; _2212=.; _2213=.;   _2311=.; _2312=1; _2313=.;
    _2121=.; _2122=3; _2123=.;   _2221=.; _2222=.; _2223=.;   _2321=2; _2322=9; _2323=.;
    _2131=.; _2132=7; _2133=.;   _2231=.; _2232=.; _2233=6;   _2331=.; _2332=.; _2333=3;

    _3111=.; _3112=2; _3113=.;   _3211=.; _3212=.; _3213=8;   _3311=.; _3312=.; _3313=.;
    _3121=.; _3122=.; _3123=4;   _3221=5; _3222=.; _3223=.;   _3321=.; _3322=.; _3323=.;
    _3131=.; _3132=.; _3133=.;   _3231=.; _3232=3; _3233=.;   _3331=8; _3332=.; _3333=9;

    %macro printmatrix;
    *----------------------------------------------------------------;
    * print the result matrix                                        ;
    *----------------------------------------------------------------;
       *---------------------------------------------------;
       * Assign column positions for printing matrix       ;
       *---------------------------------------------------;
       c1=1;
       c2=10;
       c3=20;
 Continue reading "SAS Fun: Sudoko"

SAS Modeling Procs

Well, so you want to be a SAS Modeler. Or atleast get a job as a junior one , and then learn on the job (we all did). Here are some SAS Procs you need to brush up on-

1) Proc Reg – Continuous Regression.

2) Proc Logistic –Logistic Regression.

3) Proc Probit –Categorical regressors also included in this.

4) Proc GLM –General Linear Models based on OLS. PROC GLM handles models relating one or several continuous dependent variables to one or several independent variables. The independent variables may be either classification variables, which divide the observations into discrete groups, or continuous variables.Proc GLM is the preferred procedure for doing univariate analysis of variance , multivariate analysis of variance , and most types of regression. :Note there is a Proc Anova also.

5) Proc Mixed –The PROC MIXED was specifically designed to fit mixed effect models. It can model random and mixed effect data.PROC MIXED has three options for the method of estimation. They are: ML (Maximum Likelihood), REML (Restricted or Residual maximum likelihood, which is the default method) and MIVQUE0 (Minimum Variance Quadratic Unbiased Estimation). ML and REML are based on a maximum likelihood estimation approach. They require the assumption that the distribution of the dependent variable (error term and the random effects) is normal. ML is just the regular maximum likelihood method,that is, the parameter estimates that it produces are such values of the model parameters that maximize the likelihood function. REML method is a variant of maximum likelihood estimation; REML estimators are obtained not from maximizing the whole likelihood function, but only that part that is invariant to the fixed effects part of the linear model. In other words, if y = Xb + Zu + e, where Xb is the fixed effects part, Zu is the random effects part and e is the error term, then the REML estimates are obtained by maximizing the likelihood function of K’y, where K is a full rank matrix with columns orthogonal to the columns of the X matrix, that is, K’X = 0. I

6) Proc Genmod-PROC GENMOD uses a class statement for specifying categorical (classification) variables, so indicator variables do not have to be constructed in advance, as is the case with, for example, PROC LOGISTIC. Interactions can be fitted by specifying, for example, age*sex. The response variable or the explanatory variable can be character  while PROC LOGISTIC requires explanatory variables to be numeric.

7) Proc Corr-CORR procedure computes correlation coefficients between variables. It can also produce covariances.

8) Proc Anova-PROC ANOVA handles only balanced ANOVA designs

Required reading http://en.wikipedia.org/wiki/Regression_analysis

SAS Online Doc

Additional Reading-

http://www.pauldickman.com/teaching/sas/genmod_logistic.php

http://www.psych.yorku.ca/lab/sas/sasanova.htm

For beginners interested in software

1) For web development , get  into http://www.wordpress.org and its a pretty easy software to start making websites on.

You can maybe spend say 10 $ a month so that you can buy some server space on http://www.bluehost.com and tinker with his own website /blog in the meantime.

For learning language CSS ,PhP and HTML are the way to go.

2) If you knows some languages already, try  to make a Facebook application , and then play with Google’s open social API,or game, as that will get his interest besides giving him a skill thats useful. Ipod developer’s kit is another hot area to experiment.

3)For designing software solutions I would recommend the Microsoft Certification program. Try  to learn 1 language like Visual Basic or into .Net programming.These platforms will still be useful in coming years.

4) For statistical/business software try  to learn a language called R, which is good for data mining (www.r-project.org) . its quite easy to learn and has a good graphical user interface too.

5)For software careers it is best to learn multiple types of softwares to hedge your bets.

For sustaining interest, you  can join and network with fellow programmers using bulletin boards especially boards on http://www.google.com for google code and microsoft developers area..

6) You can also download Ubuntu linux (www.ubuntu.com) , which is a free Linux based Operation System (like Windows) and  be more familiar in it. Also add openoffice from http://www.openoffice.org This gives you perspectives on open source software.

7)I recommend him getting summer internship in a software startup (especially any software company in Silicon valley or Bangalore) ,and with established companies (like http://www.google.com , http://www.facebook.com,www.infosys.com

Software developers are the un-sung heroes of today’s modern world!!!

All the best !

Easy Ways to Secure Network Data

 Easy ways to secure network data without letting your IT team into fooling you in more servers  or certifications than you need.

1 User login passwords can be cracked and even the encryption will eventually need a password too. Most people use rather easy to crack passwords anyways.

2  you can use or even insist on the password feature within office documents , and within zip documents, and within outlook pst files.

The actual practicality is that people rarely keep track of multiple document passwords, and once a password is known /guessed , it compromises the whole system ..say for an ex employee,keyboard loggers, other ways to read data directly from the hard disk etc.

That cant happen for encryption.

So I would first implement a strong password policy , which is the first step for any company. This means using special characters, characters,numbers and automatic changing of passwords after 1 month.

3 Also laptops should have desk locks provided and compulsory before going away from the desk.

4 The next layer is encryption for data using private key/public keys and for login to the desktop/laptop .An inexpensive encryption solution is to use PGP (Pretty Good Privacy ) for encryption. You can also have open source free encryption softwares .

5 Another layer is have closed circuit cameras or motion trigger alarms in the office activated after say 6 pm or after office hours.

6 Implement multiple solutions using a test control approach on various PCs and then evaluate usage for 1 month before deciding with the big contract.

7 ISO 27001 or BS7799 and certifications help make clients comfortable, but do not enhance data security in any special way given the huge costs.

8 Have training videos for social networking used by hackers or people breaking in to system. Eg. Calling Board numbers for cell phone numbers

9 Try and eliminate as much paper as possible. Printouts, faxes etc. A compnay I know replaced all paper with blue paper just to impress clients. Same principles applied when guards were checking senior management bags. No searches etc.

This is also good for environment too (Use that for impressing clients !), and its better to buy bigger monitors or have an encrypted wireless lan than  have tonnes  of paper too.

All systems can and will be broken given time and resources to deviants. Using these steps reduces the ease and probability of laptop loss escalated to data loss in wrong hands.

Comparing Big SpreadSheet A to Big SpreadSheet B

Many organizations have pre-fixed formats for their reporting needs.  These formats or Management Information Reports are updated at monthly and quarterly intervals at exactly the same format. However when the spreadsheets become big, analysis becomes tedious in comparing two big spreadsheets due to the sheer number of cells involved.

Using SAS , we can automate this process almost instantly.

We will use proc import to import data from the spreadsheets in such a manner that top row imported consist column headings (sas dataset variables).Note both spreadsheets are exactly in same format.

We will then use proc compare to compare these two datasets.

We can then use the integrated approach to automated reporting in SAS (See Archives- Category Analytics) to further reduce this to a simple batch process.

The relevant codes are –

%let pathfile = “C:\Documents and Settings\” ;
run;

/*CREATING LIBRARY NAME */

libname auto &pathfile;

run;

/*TO CONSERVE SPACE*/

options compress=yes;

/*TO MAKE LOG READABLE */

options macrogen symbolgen;

PROC IMPORT OUT= auto.TEST1
DATAFILE= “C:\Documents and Settings\excel1-full.xls”
DBMS=EXCEL2000 REPLACE;
SHEET=”‘Sales$'”;

/*SPECIFYING WORKSHEET FOR MULTIPLE SHEETS */
GETNAMES=YES;

/*TO TAKE VARIABLE NAMES FROM TOP ROW */

   RANGE=”A4:AB2000″;

/*SPECIFYING RANGE OF CELLS  IN SPREADSHEET TO BE READ */

RUN;

PROC IMPORT OUT= auto.TEST2
DATAFILE= “C:\Documents and Settings\excel2-full.xls”
DBMS=EXCEL2000 REPLACE;
SHEET=”‘Sales$'”;

/*SPECIFYING WORKSHEET FOR MULTIPLE SHEETS */
GETNAMES=YES;

/*TO TAKE VARIABLE NAMES FROM TOP ROW */

   RANGE=”A4:AB2000″;

/*SPECIFYING RANGE OF CELLS  IN SPREADSHEET TO BE READ */

RUN;

/* COMPARING THE TWO SPREADSHEETS */

proc compare base=auto.test1 compare=auto.test2;
var

/*SPECIFYING WHAT VARIABLES TO BE COMPARED */
Applications

Approvals

Disbursals

30dayplus

60dayplus

90dayplus

;
with Branch;

/*SPECIFYING VARIABLE FOR COMPARISON

FOR SAME BRANCH IN THIS CASE */
 run;
The output will simply compare and compute the cell by cell difference.

You can then use ods to ouput this in another big spreadsheet 🙂

This is particularly relevant in telecommunications and banks, where they need to compare a lot of metrics across timely intervals.