Doing RFM Analysis in R


RFM is a method used for analyzing customer behavior and defining market segments. It is commonly used in database marketing and direct marketing and has received particular attention in retail.


RFM stands for


  • Recency – How recently did the customer purchase?
  • Frequency – How often do they purchase?
  • Monetary Value – How much do they spend?

To create an RFM analysis, one creates categories for each attribute. For instance, the Recency attribute might be broken into three categories: customers with purchases within the last 90 days; between 91 and 365 days; and longer than 365 days. Such categories may be arrived at by applying business rules, or using a data mining technique, such as CHAID, to find meaningful breaks.

from-http://en.wikipedia.org/wiki/RFM

If you are new to RFM or need more step by step help, please read here

https://decisionstats.com/2010/10/03/ibm-spss-19-marketing-analytics-and-rfm/

and here is R code- note for direct marketing you need to compute Monetization based on response rates (based on offer date) as well



##Creating Random Sales Data of the format CustomerId (unique to each customer), Sales.Date,Purchase.Value

sales=data.frame(sample(1000:1999,replace=T,size=10000),abs(round(rnorm(10000,28,13))))

names(sales)=c("CustomerId","Sales Value")

sales.dates <- as.Date("2010/1/1") + 700*sort(stats::runif(10000))

#generating random dates

sales=cbind(sales,sales.dates)

str(sales)

sales$recency=round(as.numeric(difftime(Sys.Date(),sales[,3],units="days")) )

library(gregmisc)

##if you have existing sales data you need to just shape it in this format

rename.vars(sales, from="Sales Value", to="Purchase.Value")#Renaming Variable Names

## Creating Total Sales(Monetization),Frequency, Last Purchase date for each customer

salesM=aggregate(sales[,2],list(sales$CustomerId),sum)

names(salesM)=c("CustomerId","Monetization")

salesF=aggregate(sales[,2],list(sales$CustomerId),length)

names(salesF)=c("CustomerId","Frequency")

salesR=aggregate(sales[,4],list(sales$CustomerId),min)

names(salesR)=c("CustomerId","Recency")

##Merging R,F,M

test1=merge(salesF,salesR,"CustomerId")

salesRFM=merge(salesM,test1,"CustomerId")

##Creating R,F,M levels 

salesRFM$rankR=cut(salesRFM$Recency, 5,labels=F) #rankR 1 is very recent while rankR 5 is least recent

salesRFM$rankF=cut(salesRFM$Frequency, 5,labels=F)#rankF 1 is least frequent while rankF 5 is most frequent

salesRFM$rankM=cut(salesRFM$Monetization, 5,labels=F)#rankM 1 is lowest sales while rankM 5 is highest sales

##Looking at RFM tables
table(salesRFM[,5:6])
table(salesRFM[,6:7])
table(salesRFM[,5:7])

Code Highlighted by Pretty R at inside-R.org

Note-you can also use quantile function instead of cut function. This changes cut to equal length instead of equal interval. or  see other methods for finding breaks for categories.

 

Interview David Katz ,Dataspora /David Katz Consulting

Here is an interview with David Katz ,founder of David Katz Consulting (http://www.davidkatzconsulting.com/) and an analyst at the noted firm http://dataspora.com/. He is a featured speaker at Predictive Analytics World  http://www.predictiveanalyticsworld.com/sanfrancisco/2011/speakers.php#katz)

Ajay-  Describe your background working with analytics . How can we make analytics and science more attractive career options for young students

David- I had an interest in math from an early age, spurred by reading lots of science fiction with mathematicians and scientists in leading roles. I was fortunate to be at Harry and David (Fruit of the Month Club) when they were in the forefront of applying multivariate statistics to the challenge of targeting catalogs and other snail-mail offerings. Later I had the opportunity to expand these techniques to the retail sphere with Williams-Sonoma, who grew their retail business with the support of their catalog mailings. Since they had several catalog titles and product lines, cross-selling presented additional analytic challenges, and with the growth of the internet there was still another channel to consider, with its own dynamics.

After helping to found Abacus Direct Marketing, I became an independent consultant, which provided a lot of variety in applying statistics and data mining in a variety of settings from health care to telecom to credit marketing and education.

Students should be exposed to the many roles that analytics plays in modern life, and to the excitement of finding meaningful and useful patterns in the vast profusion of data that is now available.

Ajay-  Describe your most challenging project in 3 decades of experience in this field.

David- Hard to choose just one, but the educational field has been particularly interesting. Partnering with Olympic Behavior Labs, we’ve developed systems to help identify students who are most at-risk for dropping out of school to help target interventions that could prevent dropout and promote success.

Ajay- What do you think are the top 5 trends in analytics for 2011.

David- Big Data, Privacy concerns, quick response to consumer needs, integration of testing and analysis into business processes, social networking data.

Ajay- Do you think techniques like RFM and LTV are adequately utilized by organization. How can they be propagated further.

David- Organizations vary amazingly in how sophisticated or unsophisticated the are in analytics. A key factor in success as a consultant is to understand where each client is on this continuum and how well that serves their needs.

Ajay- What are the various software you have worked for in this field- and name your favorite per category.

David- I started out using COBOL (that dates me!) then concentrated on SAS for many years. More recently R is my favorite because of its coverage, currency and programming model, and it’s debugging capabilities.

Ajay- Independent consulting can be a strenuous job. What do you do to unwind?

David- Cycling, yoga, meditation, hiking and guitar.

Biography-

David Katz, Senior Analyst, Dataspora, and President, David Katz Consulting.

David Katz has been in the forefront of applying statistical models and database technology to marketing problems since 1980. He holds a Master’s Degree in Mathematics from the University of California, Berkeley. He is one of the founders of Abacus Direct Marketing and was previously the Director of Database Development for Williams-Sonoma.

He is the founder and President of David Katz Consulting, specializing in sophisticated statistical services for a variety of applications, with a special focus on the Direct Marketing Industry. David Katz has an extensive background that includes experience in all aspects of direct marketing from data mining, to strategy, to test design and implementation. In addition, he consults on a variety of data mining and statistical applications from public health to collections analysis. He has partnered with consulting firms such as Ernst and Young, Prediction Impact, and most recently on this project with Dataspora.

For more on David’s Session in Predictive Analytics World, San Fransisco on (http://www.predictiveanalyticsworld.com/sanfrancisco/2011/agenda.php#day2-16a)

Room: Salon 5 & 6
4:45pm – 5:05pm

Track 2: Social Data and Telecom 
Case Study: Major North American Telecom
Social Networking Data for Churn Analysis

A North American Telecom found that it had a window into social contacts – who has been calling whom on its network. This data proved to be predictive of churn. Using SQL, and GAM in R, we explored how to use this data to improve the identification of likely churners. We will present many dimensions of the lessons learned on this engagement.

Speaker: David Katz, Senior Analyst, Dataspora, and President, David Katz Consulting

Exhibit Hours
Monday, March 14th:10:00am to 7:30pm

Tuesday, March 15th:9:45am to 4:30pm

IBM SPSS 19: Marketing Analytics and RFM

What is RFM Analysis?

Recency Frequency Monetization is basically a technique to classify your entire customer list. You may be a retail player with thousands of customers or a enterprise software seller with only two dozen customers.

RFM Analysis can help you cut through and focus on the real customer that drives your profit.

As per Wikipedia

http://en.wikipedia.org/wiki/RFM

RFM is a method used for analyzing customer behavior and defining market segments. It is commonly used in database marketing and direct marketing and has received particular attention in retail.

RFM stands for

  • Recency – How recently a customer has purchased?
  • Frequency – How often he purchases?
  • Monetary Value – How much does he spend?

To create an RFM analysis, one creates categories for each attribute. For instance, the Recency attribute might be broken into three categories: customers with purchases within the last 90 days; between 91 and 365 days; and longer than 365 days. Such categories may be arrived at by applying business rules, or using a data mining technique, such asCHAID, to find meaningful breaks.

—————————————————————————————————-

Even if you dont know what or how to do a RFM, see below for an easy to do way.

I just got myself an evaluation copy of a fully loaded IBM SPSS 19 Module and did some RFM Analysis on some data- the way SPSS recent version is it makes it very very useful even to non statistical tool- but an extremely useful one to a business or marketing user.

Here are some screenshots to describe the features.

1) A simple dashboard to show functionality (with room for improvement for visual appeal)

2) Simple Intuitive design to inputting data3) Some options in creating marketing scorecards4) Easy to understand features for a business audiences

rather than pseudo techie jargon5) Note the clean design of the GUI in specifying data input type6) Again multiple options to export results in a very user friendly manner with options to customize business report7) Graphical output conveniently pasted inside a word document rather than a jumble of images. Auto generated options for customized standard graphs.8) An attractive heatmap to represent monetization for customers. Note the effect that a scale of color shades have in visual representation of data.9) Comparative plots placed side by side with easy to understand explanation (in the output word doc not shown here)10) Auto generated scores attached to data table to enhance usage. 

Note here I am evaluating RFM as a marketing technique (which is well known) but also the GUI of IBM SPSS 19 Marketing Analytics. It is simple, and yet powerful into turning what used to be a purely statistical software for nerds into a beautiful easy to implement tool for business users.

So what else can you do in Marketing Analytics with SPSS 19.

IBM SPSS Direct Marketing

The Direct Marketing add-on option allows organizations to ensure their marketing programs are as effective as possible, through techniques specifically designed for direct marketing, including:

• RFM Analysis. This technique identifies existing customers who are most likely to respond to a new offer.

• Cluster Analysis. This is an exploratory tool designed to reveal natural groupings (or clusters) within your data. For example, it can identify different groups of customers based on various demographic and purchasing characteristics.

• Prospect Profiles. This technique uses results from a previous or test campaign to create descriptive profiles. You can use the profiles to target specific groups of contacts in future campaigns.

• Postal Code Response Rates. This technique uses results from a previous campaign to calculate postal code response rates. Those rates can be used to target specific postal codes in future campaigns.

• Propensity to Purchase. This technique uses results from a test mailing or previous campaign to generate propensity scores. The scores indicate which contacts are most likely to respond.

• Control Package Test. This technique compares marketing campaigns to see if there is a significant difference in effectiveness for different packages or offers.

Click here to find out more about Direct Marketing.