A Simple SAS Code to Automate Reports

Organizations use an increasing amount of reports. Countless hours are spent manipulating spreadsheets one after another, into and out of MS Access databases and finally emailing it to a select audience of people.

The following  tries to present a simple and integrated approach to report generation and presentation. This involves the use of Macros, for repeatability and flexibility.You can copy and paste the code directly to your SAS editor and make relevant changes.

Steps in preparing a report

  • Import data from data sources (spreadsheets, databases mostly .Here assuming BASE SAS ,only csv files are considered)
  • Manipulating data for analysis (in the case below the only data manipulation done is removal of duplicates. This section can be changed as needed by user for
    • Frequency counts using Proc Freq.
    • Mean, maximum and minimum counts using Proc Means.
    • Customized reports using Proc Report Procedure.)
  • Presenting and formatting data
  • Output data in the form of a report /spreadsheet.
  • Emailing internal customers of the report.

———————————————————————————————————

Using BASE SAS, the following can be coded as:

     /*PROGRAM TO AUTOMATE REPORTING */

    /*DECLARING THE PATH OF THE INPUT/OUTPUT FOLDER */

%let pathfile = ‘X:\Auto\AUG06-AUTOS\’ ;

     /*CREATING LIBRARY NAME */

libname auto &pathfile;

run;

   /*TO CONSERVE SPACE*/

options compress=yes;

   /*TO MAKE LOG READABLE */

options macrogen symbolgen;

/* STARTING MACRO FOR REPEATABLE EXECUTION OF REPORT*/

%macro impmth(mth,num,emailid);

 /*MANIPULATING VARIABLES TO REDUCE CHANGES */

data _null_ ;

call symput(‘filepath’,”‘”||&pathfile||&mth||’.csv’||”‘” );

call symput(‘unqpath’,”‘”||&pathfile||”unq”||&mth||’.csv’||”‘” );

call symput(‘unqxls’,”‘”||&pathfile||”unq”||&mth||’.xls’||”‘” );

run;

/*IMPORT*/

/*IMPORTING DATA FROM CSV FILES STORED IN FOLDER DECLARED ABOVE*/

PROC IMPORT OUT= auto.&num

DATAFILE= &filepath

DBMS=CSV REPLACE;

GETNAMES=YES;

DATAROW=2;

quit;

/*CREATING VARIABLE FOR DISTINGUISHING INPUT*/

data auto.#

set auto.#

&num =1;

run;

———————————————————————–

/*DATA MANIPULATION HERE REMOVING DUPLICATES*/

proc sort data=auto.&num nodupkey;

by REFAGREEMENTID;

run;

———————————————————————–

/*EXPORT*/

/*EXPORTING DATA TO A CSV FILE*/

PROC EXPORT DATA= AUTO.&num

OUTFILE= &unqpath

DBMS=CSV REPLACE;

RUN;

/*EXPORTING DATA TO A XLS FILE*/

ODS HTML FILE=&unqxls;

proc print data=auto.#

run;

ODS HTML CLOSE;

/*EMAILING THE RESULTS*/

filename outbox email &emailid;

data _null_ ;

file outbox

to=(&emailid)/* Overrides value in filename statement */

cc=(&emailid)

subject=’File from’

attach=(&unqxls)

;

put ‘Hi,’;

put ‘This is an integrated approach to automating reports’;

put ‘It works great!’;

run;

%mend;

/*CALLING THE MACRO*/

/*HERE ‘Ahmedabad’ IS THE INPUT FILE WHILE Ahd IS THE OUTPUT FILE*/

/*HERE ‘aohri@hotmail.com’ IS THE EMAIL ID TO BE MAILED THE REPORT*/

%impmth(‘Ahmedabad’,Ahd,’aohri@hotmail.com’);

Note- This was written on version 8 of SAS, so let me know any issues.

Ajay 

Basic Text Mining :3 Simple Paths

The locals of Punjab (india). These are the tr...
Image via Wikipedia

Text Mining in which you search alpha numeric data for meaningful patterns is relatively more complex than plain numeric variable data crunching. The reason for that is human eye can measure only a few hundred rows of data before getting tired, and analytics software algorithms need to properly programmed else they miss the relevant solution or text. An example, how many Punjabis live in Delhi (Stats needed), suppose you have a Dataset that has all the names in Delhi,in order to send an sms contest (Marketing Decision) on Lohri (Punjabi specific Festival)

Text Manipulation can be done by TRIM and LOWER functions in EXCEL and corresponding functions in SAS. For Mining use the following options-

1)SAS Basic Text Mining -Using Only Base SAS

In SAS you can use the INDEXW function for text mining.

As per SAS Online DOc

INDEXW(source, excerpt)

Arguments

source
specifies the character expression to search.
excerpt
specifies the string of characters to search for in the character expression. SAS removes the leading and trailing blanks from excerpt.

The INDEXW function searches source, from left to right, for the first occurrence of excerpt and returns the position in source of the substring’s first character. If the substring is not found in source, INDEXW returns a value of 0. If there are multiple occurrences of the string, INDEXW returns only the position of the first

occurrence.”

2) MS EXCEL

You can use MS Excel for text mining too. I recommend Office 2007 simply because it can handle more rows.

The function in Excel is SEARCH

image

3) MS ACCESS

In MS Access you can use LIKE Queries to create a different table or append a value to certain columns

.Example

Some problems can?t be solved with comparisons : e.g. ?His name begins with Mc or Mac. In a case like this, wildcards are required, and are represented in SQL with the % sign and the LIKE keyword.

e.g.

SELECT au_lname, city

FROM authors

WHERE au_lname LIKE ?Mc&? or au_lanme LIKE ?Mac%?

UPDATED- The above post is now obsolete- there are easier and better ways to to text mining. That includes weka and R

The importance of measuring Error

Most Reporting Systems tend to focus on the positive,

sales targetted ,sales achieved, etc.

For quality reasons , it is important to measure variance or deviation from the mean. This can also be interpreted as measuring error for any planning, forecasting or performance measurement exercise.

It helps bring focus, by aiming to reduce the difference between planned/forecasted and actual performance by looking at the mean absolute percentage error. Also it is impportant to have estimates that increasingly move from qualitative metrics or scale variables to quantitative inputs for forecasting or planning.

The forecasts should be realistic based on data. The forecasted pleasure or displeasure of the boss/ end audience recieving the estimate tends to be the biggest practical input in planning exercises unfortunately.

That is why analytics is sometimes a corporate cultural thing as well.

1) Forecast honestly

2) Use as much quantitative inputs

3) Plan scenario analysis with varying probabilties

4) Measure error from forecast and actual in a feedback loop

5) Aim at reducing error.

What SIX SIGMA would call

Define

Measure

Analyze

Improve

Control

.This applies to decision making and planning at most organizations where data /information ends up too many times in floating spreadsheets or gatekeepers/old timers.

What is Analytics ?

Database mining and analytics are defined as using the power of hidden information locked in databases to reveal consumer and product insights, trends and patterns for future tactical and operational strategy. A key differentiator between analytics and market research is that analytics relies on data which is existing within a database while market research generally involves collection, collation and tabulation of the data.

Business intelligence is defined as the seamless dissemination of information throughout the organization and is a broader term, which involves and includes analytics as well as reporting systems.

The field of data analytics is vast and it comprises the following types.

Types of Analytics

  1. Reporting or Descriptive Analytics – Each organization relies on series on management information systems (commonly called MIS) to gather the current state of business as well as any emerging trend. This typically involves sales, finance, customer and competitor data, which is presented within spreadsheets and presentations. Reports tend to be either regular (like monthly and quarterly) or ad-hoc (for special investigative analysis). This is known as descriptive analytics simply because it describes the data which is present. While reporting or descriptive analytics is often the starting point in analytics careers, a proper grounding in this domain is necessary both to build an eye for detail in dealing with large amounts of data and for polishing the presentation skills for presenting insights from the data.

  2. Modeling or Predictive Analytics – Predictive Analytics refers to the art and science of using statistical tests, hypotheses and methods to build up predictive recommendations. These recommendations can range from which type of customer to call by phone for a credit card or insurance, to which type of mobile scheme to offer to a cell phone customer by a short message (sms) or to what kind of customers are likely to default on the loans they have taken. Predictive analytics includes techniques like segmentation, and regression modeling. It is generally considered both high value and a background in statistics helps in preparing for predictive analytics careers.

  3. Data-Driven Strategy – This is also called test –control or champion-challenger testing. This is done by segmenting the data population into test (on which a new strategy called the challenger strategy is to be tested) and control (which uses existing strategy called champion strategy). Building association rules which describe which parts of the product or customer data are clustered or co-related together are also part of analytics.

Basic Domains within Analytics

Data driven analytics by definition thrives in industries, which have large amounts of data, and high volume transactions, which need systematic and scientific analytics to cut costs and grow sales. The following domains offer employment opportunities to both new comers and experienced analytics professionals. These can be both in domestic firms, captive outsourcing firms or third party business process outsourcing companies.

  1. Retail Sales Analytics

Retail sales analytics deals with the handling of vast amounts of Point of Sales data, inventory data, payment data and promotional data to help increase sales in retail stores especially in organized retail. The use of RFID’s, Electronic Payment and Bar Scanning helps capture the data better and store it in vast databases. An example of this is the famous Thursday baby diaper-beer sales phenomenon. A big retailer found that on Thursday evening sales of beer and baby diapers were highly co related. It then found that is was due to young couples preparing for the weekend by buying supplies of diapers and beer. Thus by placing diapers and beer closer together sales could be boosted up. This is an example of market basket analytics in which a large amount of data is scrutinized to see which products sell well together. Wal-Mart, the American Retail Giant established a competitive edge over its rivals by proactively using data driven analytics to cut costs and thus offer goods cheaper than others. Another example of a big retailer is Target which has it’s own captive back end analytics in India.

In India, since Reliance Retail, Future Group, Walmart-Bharti have started setting up shop this is a sector that is bound to grow even within the domestic sector, as these high volume retailers need data driven decisions to squeeze the maximum from their retail stores.

  1. Financial Services Analytics

Financial services use analytics extensively. This is because they are in a very competitive field, have millions of customers and a lot of transactions. It is extremely important for them to store data for billing purposes and to recover the money they lend out as well the deposits they collect. An incremental gain of a few basis points (one hundredth of 1 percentage is a basis point) in profitability can lead to millions of dollars in aggregate profits. Within financial services analytics the broad sub categories are –

  1.  
    1. Risk and Credit Analytics – Risk and Credit functions measure the ability of a customer to pay back loans or debt owed by them. Delinquent customers are those that have fallen behind in paying back debt as per agreed schedule. Debt can be fixed installment like EMI’s for a personal loan, and debt can be revolving as in variable amounts that can be paid for credit card outstanding including the minimum balance. Debt can also be secured debt as secured against houses, consumer durables ,two wheelers, automobiles as collateral or it can be unsecured as in personal loans or credit card debt that have no collateral or backing. A risk analyst develops scorecards that help measure the risk worthiness of both new and existing customers. As financial service instruments are priced against risk, the riskier the customer the more they are charged in terms of interest rate. But this has to be balanced with total repaying ability of the customer, including sources of income and current leverage .In addition the income of customers especially in India is changing rapidly and there is also

un-declared income as black money. Doing the analysis for millions of customers is what make risk and credit analytics one of the hottest sectors to be in as credit analysts are in demand with all banks, outsourcing corporate and finance companies. ICICI has a big analytics unit (called Business Intelligence Unit) and Citigroup has both domestic analytics (in Chennai) and international analytics centers (in Bangalore).

  1.  
    1. Marketing Analytics- Marketing analytics helps in customer acquisition and retention. It does so by helping choose more responsive customers and selling through a wide variety of channels like call centers, direct mail, sms through mobiles, and email. It is marketing analytics which helps to bring in new customers by giving inputs to the marketing team and feedback to sales and distribution channels.

  1.  
    1. Collections –Collections Analytics focuses on recovery from delinquent customers using optimized efforts like telephones, direct mails, emails, or visits. Its aim is to maximize recovery at minimum costs.

    2. Fraud –Fraud Analytics seeks to build in triggers or automated alarms if there is any unusual trend or behavior in spending by the customer especially in credit cards.

    3. Pricing – Pricing Analytics tries to give the most optimized price, adequately compensating for risk as well as the competition. Pricing Analytics is a vast field and is also a part of financial services analytics especially in products like insurance.

  2. Telecommunications – Telecom Analytics has the fields of marketing analytics defined above, but an important part is also attrition modeling or churn analytics. It also analyzes the wide variety of pricing schemes and options and the customer response to them. In addition it has delinquency analytics as well.

  3. Pharmaceutical or Clinical Analytics –Clinical trials depend on test and control of thousands of patients on new drugs. Clinical trial analytics focuses on large number of variables that may or may not affect the drug response.

  4. Supply Chain Analytics – Supply chain analytics comprises inventory optimization, tracking turn-around time, multiple reports, and how to minimize the distribution costs.

  5. Transportation Analytics- Transportation analytics while covered more extensively in the field of operations research seeks at minimizing route length or fuel costs, or pricing of fares.

  6. Online or Website Analytics – Website analytics focuses on analyzing traffic to the website from sources, and how to retain them on the website for longer time or purchase more goods. It also involves a bit of search engine optimization to make sure the website is relevant in searches by search engines.

Profits from Closed Customers


Closed Customers aren’t really closed; they stay on in your database.

Database marketing can help you win revenue even after a customer relationship is discontinued. This is illustrated by the following example – A prominent global financial services giant, with nearly 100 years of history, faced a unique problem while operating in India. While it had been one of the earliest entrants in the credit card industry in India, it had rapidly been losing market share to newer and nimbler more aggressive local competitors.

Indian customers have one of the lowest levels of debt worldwide due to cultural aversion to debt and lack of competition in the pre 1990 era. The credit cards receivables business in India is also a loss making operation as of 2006, because of rampant competition and discounting on annual fees and charges. Lending in India is complicated because the credit bureau CIBIL was only in nascent stages, and declared income and actual income of people varied due to the tax laws and ‘black money’ economy.

However the average receivables per card had been steadily increasing in India and it had potential to make huge profits once Indian customers became comfortable with rotating balance and paying finance charges. The credit card division also had a culture of conservative lending only to prime customers, with a good track record. On the other hand, the company’s personal loan business was making great strides in both revenue and profitability growth due to aggressive selling to both prime and sub prime customers. As a result of this the company had built up a database of 3 million customers, out of which nearly 2 million had paid off their loans.

To improve the profitability of the credit card division, and offer its customers a more value added portfolio of financial services, the company embarked on a data mining project of cross selling to its closed personal customers. After extensive tests and research based on selective tele-calling to its customer database, the company found out the following analytical findings-

1) Customers who had paid back their loans on time were the customers who were good credit customers. These customers had also increased their income since the time they had closed their personal loan.

2) People who had closed their personal loans were targeted for re churning by the person loans business. However after 6 months of closing their loans, if the customer did not take another personal loan, they were unlikely to ever take a personal loan. Thus if these customers were called again for personal loan, it would be unprofitable since the incremental expenses were not justified by incremental revenues.

3) People who bounced cheques but paid off their entire loan were bad credit risks, especially for a revolving line of credit as in for credit cards.

4) People who were called by the credit card division had better brand recall if they had an earlier relationship with personal loans division. Since they paid off their loans on time, their experiences with the company as a whole were very positive. This goodwill of the company’s brand helped to trigger higher response ratios (almost 20 % of such people took the credit card compared to only 5 % for the general population)

5) De to regulatory reasons both the credit card division and the personal loan division had to maintain an arm’s length distance. In order to do so, the credit card division decided on a transfer price of 600 rupees plus 1% of average receivables to the personal loan division. This helped track the profitability of the exercise better.

As a result of the exercise the company managed to sell an extra fifty thousand credit cards. The program was such a success that it was adopted world wide. The personal loan division earned tens of millions of rupees from its closed customer database, and the credit card division managed to increase its market share slightly.

Thus mining its own database of customers helped the company achieve the following-

a) Increase profitability
b) Improve brand recall and enhance the existing relationship
c) Cut down on marketing costs by targeting more responsive customers
d) Improve the life time value of revenues earned from each customer

This article  builds up an argument for using internal data at a customer level for decreasing marketing costs and enhancing brand recall.

11 Analytics Softwares

1) R

R is open source, is similar to S and you can find ample support groups online. The only down side is that R does not have a good GUI (yet ..). It does have a rudimentary data GUI

http://www.r-project.org/

2) WEKA

http://www.cs.waikato.ac.nz/ml/weka/index_downloading.html

3) Minitab

  • http://www.minitab.org
  • 4) E Views

    http://www.eviews.com

    5) For Bayesian Inference

    http://www.mrc-bsu.cam.ac.uk/bugs

    6) Matlab

    http://www.mathworks.com

    7) Jim LeSage’s econometrics toolbox

    http://www.spatial-econometrics.com

    8) Office 2007 and Excel 2007 (www.microsoft.com) 🙂

    Office 2007 and Excel 2007 now supports up to 1,048,576 rows. If you were going to make the upgrade anyhow, it might save you a few bucks over an expensive statistics package. The statistics plugin for Office 2007 is a lot better than previous versions as well. Runs almost as many analytics as SPSS.

    9) JMP

    http://www.sas.com/apps/demosdownloads/jmptrial_PROD_7.0_sysdep.jsp?packageID=000415&jmpflag=Y

    10) www.sas.com

    11) www.spss.com

    5 Graphing Softwares for the Web

    Here is a list of softwares apart from Adobe Flash and Microsoft Silverlight

    1) PHP offers the ability to dynamically create Shockwave Flash files.
    In addition, you can try the tool named PHP/SWF Charts.
    Link:
    http://www.maani.us/charts/index.php

    2) Another library for PHP is Open Flash Chart
    An article illustrating how to use it is here
    http://www.linux.com/feature/121823

    3)  you want a nice lightweight charting facility for the web with a simple API, you could try the Google Charts API, which is free to use and incorporate into your own applications
    http://code.google.com/apis/chart/

    4)Crystal Xcelsius

    Industry leading interactive data visualization.

    Create interactive Excel dashboards, business presentations and visual calculators from ordinary spreadsheets – then integrate them into PowerPoint, Word, PDF and the Web.

    5) Some other softwares

  • http://www.flashactionscript.org
  • http://www.flashkit.com
  • http://www.fusioncharts.com/Default.asp
  • http://www.spreadsheetconverter.com/excel-web.htm
  • %d bloggers like this: