In the first part of our series on “Finding Value in Transaction Data” we explored a problem that is encountered by many organisations – how to identify and extract value out of the ever growing amount of data.
I proposed a 3 step approach (the 3D approach) to realising value from a variety of large data sets:
- Determination –scour data sources to establish if and where there might be value
- Development – create models that will be developed for the decision areas where value was identified with data that is predictive of this predetermined outcome
- Deployment – implement and run the developed market
In this post, we will explore the 1st of the 3D approach, “Determination”.
How to eat an elephant..
A frequent question that we hear clients asking is, “I have loads of data; I know there must be value in the data, but I don’t know where to start!”
A mistake that is often made is that managers assume that the first steps in tackling this problem should result in a tangible outcome or product.
The problem is actually a lot bigger than this. What is more prudent is that an exploratory exercise is initiated to determine what value there is, for what purpose and in what area. This is the “determination phase”.
The determination phase involves the identification of valuable data within the data set(s).
The determination phase can be divided into the following steps:
1. Incorporating the data
Today credit granters, customer managers and marketers have access to a plethora of data sources both internal and external. The first step in the “Determination” phase is deciding which data source to explore while being aware that the data should be in a usable state once a solution is deployed. Types of data that might be available include:
- credit bureau,
- customer demographic,
- internal behavioural,
- transactional (e.g. retail purchases, mobile telemetry),
- geo-data,
- store data,
- social-media data, to name a few.
This data should be selected and sourced and then linked to the other sources of data typically by customer ID, store codes, and customer numbers. In this way the data can be linked across data sources which will be critical in determining where the data might add value.
Data are also comprised of structured and unstructured data. Structured data are typically fixed fields that can be easily grouped, analysed and modelled upon. Unstructured data are the rest, e.g. free-text, such as Twitter tweets. A different sort of analytics is required to assess free-text.
In this step of the process, data-cleansing is also essential. This involves the identification of valid (clean) data, the adjusting of data to make it usable and the understanding of the data universe to be analysed.
Once data is linked within a relational database or in a single file format and it has been cleaned, aggregation can take place.
2. Aggregation
Aggregation is essential for analysing transactional or behavioural type data where trends need to be measured. Raw data typically lists single events which may have a degree of value. However, there is more value in single events when they are included in a group of events and measured in relationship to other events or over a period of time.
Aggregation is what credit bureaux have been doing for years, but similar work should be done with aggregated data.
Example of aggregated fields across various industries:
- Number of SMS’s sent in the last month
- Cleaning products purchased as a percentage of all purchases this month
- Highest till-slip value in the last six months
- Average monthly spend in the last three months
- Minimum value of products viewed online
Within the different transactional sets – such as credit card, fashion purchases, mobile data, and e-commerce data – a variety of aggregation is possible. The key is to follow a methodical approach through event classification. For example, in transactional fashion retail:
- Categories (high/med/low fashion, men/women/children, clothing/apparel/other, premium/average/sale pricing, till-slip value, number of shopping events, store name)
- Time periods (1d, 7d, 1m, 3m, 6m, 12m)
- Metrics (number, average, maximum, minimum, worst, percentage)
Aggregation will often use a combination of two or three of the above categories.
Aggregation can be coded in a programme like R, SAS, MSSQL. Ultimately an aggregation process will be required when a solution goes live.
3. Identifying the target areas of value
Another important step in Determination is to identify target areas of value. A long list of outcome target areas can be identified with very little additional analytical effort. Areas of interest include:
- Probability of attrition/churn (in the next 1m/2m/3m)
- Probability of missing a payment/rolling (in the next 1m)
- Probability of missing three payments (in the next 6m/12m)
- Probability of increasing spend (by 20%/50%, e.g.)
- Propensity to take up a cross-sell or up-sell offer (1m/2m)
- Propensity to increase wallet-share (i.e. spend as percentage of spend at competitors)
- Propensity to make an (insurance) claim (1m/2m/3m)
- Propensity to migrate to a high value segment/cluster (1m/2m/3m)
Once these areas of value have been identified and the time period set, you’ll be ready to aggregate your data.
Note: Your aggregated/observational data may have to be a few months old to allow for significant time between observation and outcome.