# Cluster Analysis

**Quick definition:** Cluster analysis is a form of exploratory data analysis in which observations are divided into groups that share common characteristics. Those groups are compared and contrasted with other groups to derive information about the observations.

**Key takeaways:**

- Cluster analysis allows organizations to better understand their customers by identifying individuals with similar traits, which can inform how the organization communicates with those customers.
- There are five main clustering approaches. The most common are K-means clustering and hierarchical, or hierarchy, clustering. The clustering approach an organization takes depends on what is being analyzed and why.
- To ensure accurate cluster analysis, choose helpful variables (behavior, geography, demographics, etc.) to evaluate the observations, cluster the observations into the right number of groups, and create clusters with high intra-cluster similarity and low inter-cluster similarity.

The following questions were answered in an interview with John Bates, the director of product management for Predictive Marketing Solutions and Analytics Premium for Adobe Marketing Cloud.

What is cluster analysis?

What is the purpose of clustering?

What are the different types of clustering?

What are the characteristics of a good cluster analysis?

How do you perform cluster analysis?

What do you do with the results of a cluster analysis?

How do you make sure your cluster analysis is accurate?

Why is cluster analysis important for business strategy?

How do you make sure cluster analysis is accurate?

How often do organizations update clusters?

## What is a cluster analysis?

Cluster analysis is a type of unsupervised classification, meaning it doesn’t have any predefined classes, definitions, or expectations up front. It’s a statistical data mining technique used to cluster observations similar to each other but unlike other groups of observations.

An individual sorting out the chocolates from a sampler box is a good metaphor for understanding clustering. The person may have preferences for certain types of chocolate.

When they sift through their box, there are lots of ways they can group that chocolate. They can group it by milk chocolate vs. dark chocolate, nuts vs. no nuts, fruit

filling, nougat, etc.

The process of separating pieces of candy into piles of similar candy based on those characteristics is clustering. We do it all the time.

## What is the purpose of clustering?

The general purpose of cluster analysis in marketing is to construct groups or clusters while ensuring that the observations are as similar as possible within a group.

Ultimately, the purpose depends on the application. In marketing, clustering helps marketers discover distinct groups of customers in their customer base. They then use this knowledge to develop targeted marketing campaigns.

For example, clustering may help an insurance company identify groups of motor insurance policyholders with a high average claim cost.

The purpose behind clustering depends on how a company intends to use it, which is largely informed by the industry, the business unit, and what the company is trying to accomplish.

## What are the different types of clustering?

**There are five different major clustering approaches:**

**Partitioning algorithms****Hierarchy algorithms****Density-based algorithms****Grid-based algorithms****Model-based algorithms**

The most common clustering approaches are partitioning and hierarchy algorithms.

The main difference between the two is that partitioning algorithms look to create various partitions and then evaluate them by some criterion, while hierarchy-based algorithms decompose, or split information, based on a criterion.

K-means clustering is probably the most common partitioning algorithm. It’s generally used when the number of classes is fixed in advance. An analyst tells the algorithm how many clusters they want to divide the observations into.

Then each cluster is represented by the center of the cluster, or the mean. It's an efficient option, but it does have some weaknesses. It’s only applicable when the mean is defined and the number of clusters is determined in advance.

It also doesn't deal well with outliers, so if there are observations that are very different from the rest, K-means isn’t the best option.

Another type of algorithm is called expectation maximization (EM). EM is a type of partitioning algorithm, but it's model-based. It works similarly to K-means.

However, instead of assigning examples to clusters to maximize that difference in means or the variables, the EM clustering over the variables computes the probability of cluster memberships, or the likelihood that a single observation falls into a particular cluster.

It uses probability distributions to calculate that number.

The great thing about EM is that it's not mutually exclusive. A customer can have the probability of being associated with multiple clusters.

They will typically get assigned to the one with the highest probability, but they may also have a lot of characteristics or traits with another cluster.

The purpose of hierarchical clustering is to create a hierarchy of groups. This can either be done with an agglomerative process, which starts with each observation in its own cluster and then pairs up similar observations in multiple levels, or a divisive process.

This starts with all the observations in a single cluster and then breaks them into different groups.

A hierarchy cluster is like a data visualization tree. You can see how people start together and then divide out based on different criteria. Hierarchical clustering is great for the end user to be able to see those relationships.

## What are the characteristics of a good cluster analysis?

A good clustering method will produce high-quality clusters, which means there is high similarity between observations in a single cluster and low similarity between observations in different clusters.

The quality of the clustering result depends on both the similarity measure used by the method and its implementation. The quality is also measured by the method’s ability to discover some or all hidden patterns that may exist within the data.

A lot of this is evaluated using what’s called a “distance.” Clustering algorithms use a distance measure or metric to determine how to separate observations in the different groups.

The most common one is called Euclidean distance, which shows how far one center of a cluster is from another center of a cluster, but there are many options.

A distance measure often shows how close an observation is to the cluster's mean, or average value, and identifies the cluster's shape.

## What are the disadvantages of cluster analysis, and how can companies avoid problems?

Cluster analysis in marketing is an exploratory technique. It's not about making predictions.

In the case of expectation maximization, given the algorithm, it might look at the probability distribution of the data and the probability of assignment to a cluster. That said, it's not making any predictions regarding what those people are likely to do next.

All EM is really doing is helping make sense of data across lots of different variables for a given observation. Companies can only look at a couple of data sets simultaneously and see patterns.

These models are helpful for evaluating lots of data to identify those patterns and then group people who are similar to one another across those traits.

The advantages are that it helps in exploration. It helps inform strategy—how a company might think about their marketing campaigns or make business decisions—but it’s not the end.

Cluster analysis also looks only at known customers. When a new customer begins to interact with a business and the business does not have all the necessary data yet, the customer is an unknown quantity.

They haven't been authenticated, so the company has very little information about them (for instance, where the customer lives). A cluster analysis is static to the assignment at the time and only pertains to the data that’s put into it.

It’s important to regularly re-evaluate clustering and re-apply analysis. If new data comes in, it should be incorporated into the analysis. It’s important never to get too fixated on individual cluster assignments.

Allow clusters to be fluid. And remember to evaluate how customers may move between clusters based on certain interactions they have with the business.

## How do you perform cluster analysis?

The first step of cluster analysis is usually to choose the analysis method, which will depend on the size of the data and the types of variables.

Hierarchical clustering, for example, is appropriate for small data sets, while K-means clustering is more appropriate for moderately large data sets and when the number of clusters is known in advance.

Large data sets usually require a mixture of different types of variables, and they generally require a two-step procedure.

After you decide on what method of analysis to use, start the process by choosing the number of cases to subdivide into homogeneous groups or clusters. Those cases, or observations, can be any subject, person, or thing you want to analyze.

Next, choose the variables to include. There could be 1,000 variables, or even 10,000 or 25,000. The number and types of variables chosen will determine what type of algorithm should be used.

Then decide whether to standardize those variables in some way, so that every variable contributes equally to the distance or similarity between the cases. However, the

analysis can be run with both standardized and unstandardized variables.

Each analysis method has a different approach. For K-means clustering, select the number of clusters, then the algorithm iteratively estimates the cluster means and assigns each case to the cluster for which its distance to the cluster mean is the smallest.

For hierarchical clustering, choose a statistic that quantifies how far apart or similar two cases are.

Next, the algorithm selects a method for forming the groups. Finally, the algorithm determines how many clusters are needed to represent the data. It looks at how similar clusters are and splits.

## What do you do with the results of a cluster analysis?

Depending on the clustering method, there's usually an associated visualization. That's very common for investigating the results. In the case of K-means, it’s common to use an X, Y axis that shows the distance of groups of observations.

By using that type of visualization, those groupings become very clear. In the case of hierarchical clustering, visualization called a dendrogram is used, which shows the splits in the cut tree.

## Why is cluster analysis important for business strategy?

Cluster analysis can benefit a company in multiple ways, including how they market their products.

It can affect whom they market those products to, what retention and sales strategies might be employed, and how they might evaluate prospective customers.

They can cluster current customers and determine their lifetime value relative to their propensity for attrition, and that can inform how they communicate with different customers and how to identify new high-value customers.

## How do you make sure your cluster analysis is accurate?

When looking at the accuracy of a cluster, there are three important factors: cluster tendency, number of clusters, and clustering quality.

Before evaluating cluster performance, make sure the data set you’re working with has clustering tendency, which means that it doesn’t contain uniformly distributed points.

For example, it doesn’t benefit the analysis to choose a variable like “species,” because every observation will be the same. There are statistical methods for assessing clustering tendency.

Number of clusters is a required parameter for K-means clustering, but it’s useful for evaluating accuracy in other methods as well. By identifying how many clusters a team intends to work with, they can group observations in the best way to derive helpful insights.

Too few clusters means putting together observations that aren’t similar enough to take action, while too many clusters will divide your observations up too much to be useful.

Clustering quality looks at the level of similarity within a cluster and among separate clusters.

There are multiple methods to ensure a high clustering quality, including the adjusted rand index, the Fowlkes-Mallows scores, mutual information-based scores, and homogeneity completeness.

## How often do organizations update clusters?

It often depends on the use case. A high-tech retailer like Best Buy might use clusters at the highest level to align the entire enterprise on personas.

Every employee, from those in the call centers to the individuals in the stores themselves, can look at every customer and classify them into the cluster or persona they most align with.

The company won’t change those clusters very often because they inform a higher-level strategy across the entire business.

But then, within certain departments, you might have micro clusters. Given one of those higher-level clusters, companies may want to cluster individuals more often because they are moving through different life cycle stages of the sales process.

Once they’ve clustered their customers, the cluster becomes stale, so companies might re-cluster those individuals depending on how long the sales cycle is.