Ryan M. Maloney

Random Thoughts on Analytics, Programming, and Tech

One of the most well documented methods for customer level modeling and segmentation is the RFM (Recency Frequency Monetary) model. RFM is essentially a method for businesses to segment customers off of three attributes:

1. When their last purchase was (Recency)
2. How frequently they purchase from you, within a set timeframe (Frequency)
3. How much money they spend with you (Monetary)

RFM is convenient because it is both easily explained and very easy to implement. You could hypothetically obtain this information about your customer database with one basic SQL query:

select 
    customer_id, 
    sum(total_order_amt) as spend, 
    count(distinct order_nbr) as num_orders, 
    max(order_date) as recent_order, 
    ( sysdate - max(order_date) ) as days_since_recent 
from orders
group by customer_id;

A query similar to above will return a useful, customer level view of purchasing activity for the population of customers you are looking at:

CUSTOMER_ID    SPEND    NUM_ORDERS    RECENT_ORDER    DAYS_SINCE_RECENT
12345          $220     4             21-Nov-2016     1

With this base table of customer level attributes, we have a very useful resource for identifying our best (and worst) customers. So how do we do that? The answer depends quite a bit on the nature of your business or client itself. There are many approaches to “scoring” customers based on their RFM attributes, so I’d encourage you to search around for one that best fits your business and/or client. For simplicity’s sake, you could score each customer in a quintile for each attribute, and then take a deep dive at the customers in the top (and bottom) quintiles for each and all of your RFM attributes. Getting the quintiles for these customers is pretty easy using proc rank in SAS (below), but there are a myriad of ways you can generate quintiles for recency, spend, and purchase frequency.

More about quintiles and PROC RANK in SAS

The SAS snippet below creates 5 bins (using the groups=5) option for spend, num_orders, and recent order. It creates a table called customer_rfm_rank that has our quintiles appended to the customer data. These ranks are created using the ranks statement on the third line and will be called q_spend, q_orders, and so on.

Once we have that we, can easily cut our database up by total “score”.. for instance selecting all customers in the top and bottom quintiles across all attributes.

proc rank data=customer_rfm out=customer_rfm_rank groups=5;
    var spend    num_orders    recent_order;
    ranks q_spend q_orders q_recent;
run;

Once we have our customers split into quintiles, we can break out our customer file by quintile and ask some informative questions:

    What categories are they buying?
    What platforms are they on?
    What marketing channels are they coming through?

We can also aggregate the customer level attributes to get a quick read on the purchase activity for this segment of customers. We can pass our customer level RFM data into a CSV file, then feed it to R and get this information with two lines of code:

    rfm<-read.csv("customer_rfm.csv", header=TRUE)
    summary(rfm)

The snippet above will provide us the mean, median, and interquartile range for spend, orders, and the days since the customer's recent purchase. You can take this a little further by subsetting for only your best (quintile 5) and worst (quintile 1) customers, then using these quintiles to construct some useful cohorts for marketing campaigns. Perhaps your company is having a retention problem with customers who were previously highly engaged. In this case, we'd segment for customers that score high on spend or orders, but low on days since recent purchase. With those segments in hand, you could then test a variety of marketing campaigns to try to reactivate those customers.

Maybe your business is very top heavy; and by that I mean it derives a majority of its revenue from its top 5-10% of customers. In that case, you could feed your customer file to an RFM model and easily select customers who score in the top quintile (555) for all categories. You could then take that list and pull additional information, such as purchase categories - to potentially use in marketing campaigns or identify upsell or cross sell opportunities. The best part about all of this is the model is easily explainable to senior management; we're just selecting customers that score the highest in terms of spend, orders, or order frequency.

Again, we're just getting started here, but as you can see, segmenting your customer file with an RFM model is an intuitive way to learn about your best (and worst) customers. Better yet, these concepts are easily explained to senior leadership teams that might not have a statistical or technical background. While it's easy to get starting using RFM segmentation, it's also not a one size fits all solution. It's also very important to come up with a ranking/segmentation methodology that's consistent with your business model. Think of it more as a starting point for further segmentation and analysis on your customer file. Having said all of that, its very easy to implement and customize, which makes it a useful addition to most e-commerce analytics departments.

Leave a Reply

Your email address will not be published. Required fields are marked *