Doing RFM Analysis in R


RFM is a method used for analyzing customer behavior and defining market segments. It is commonly used in database marketing and direct marketing and has received particular attention in retail.


RFM stands for


  • Recency – How recently did the customer purchase?
  • Frequency – How often do they purchase?
  • Monetary Value – How much do they spend?

To create an RFM analysis, one creates categories for each attribute. For instance, the Recency attribute might be broken into three categories: customers with purchases within the last 90 days; between 91 and 365 days; and longer than 365 days. Such categories may be arrived at by applying business rules, or using a data mining technique, such as CHAID, to find meaningful breaks.

from-http://en.wikipedia.org/wiki/RFM

If you are new to RFM or need more step by step help, please read here

https://decisionstats.com/2010/10/03/ibm-spss-19-marketing-analytics-and-rfm/

and here is R code- note for direct marketing you need to compute Monetization based on response rates (based on offer date) as well



##Creating Random Sales Data of the format CustomerId (unique to each customer), Sales.Date,Purchase.Value

sales=data.frame(sample(1000:1999,replace=T,size=10000),abs(round(rnorm(10000,28,13))))

names(sales)=c("CustomerId","Sales Value")

sales.dates <- as.Date("2010/1/1") + 700*sort(stats::runif(10000))

#generating random dates

sales=cbind(sales,sales.dates)

str(sales)

sales$recency=round(as.numeric(difftime(Sys.Date(),sales[,3],units="days")) )

library(gregmisc)

##if you have existing sales data you need to just shape it in this format

rename.vars(sales, from="Sales Value", to="Purchase.Value")#Renaming Variable Names

## Creating Total Sales(Monetization),Frequency, Last Purchase date for each customer

salesM=aggregate(sales[,2],list(sales$CustomerId),sum)

names(salesM)=c("CustomerId","Monetization")

salesF=aggregate(sales[,2],list(sales$CustomerId),length)

names(salesF)=c("CustomerId","Frequency")

salesR=aggregate(sales[,4],list(sales$CustomerId),min)

names(salesR)=c("CustomerId","Recency")

##Merging R,F,M

test1=merge(salesF,salesR,"CustomerId")

salesRFM=merge(salesM,test1,"CustomerId")

##Creating R,F,M levels 

salesRFM$rankR=cut(salesRFM$Recency, 5,labels=F) #rankR 1 is very recent while rankR 5 is least recent

salesRFM$rankF=cut(salesRFM$Frequency, 5,labels=F)#rankF 1 is least frequent while rankF 5 is most frequent

salesRFM$rankM=cut(salesRFM$Monetization, 5,labels=F)#rankM 1 is lowest sales while rankM 5 is highest sales

##Looking at RFM tables
table(salesRFM[,5:6])
table(salesRFM[,6:7])
table(salesRFM[,5:7])

Code Highlighted by Pretty R at inside-R.org

Note-you can also use quantile function instead of cut function. This changes cut to equal length instead of equal interval. or  see other methods for finding breaks for categories.

 

Unknown's avatar

Author: Ajay Ohri

http://about.me/ajayohri

Leave a comment