Segment finder

Introduction

Identifying segments—groups of respondents who share similar attitudes, needs, and preferences—is a powerful tool for guiding strategy. A strong segmentation solution should group respondents who are alike within each segment and different between segments. Researchers often explore 3 to 6-segment solutions, as these typically strike the right balance between simplicity and strategic value.

To evaluate the usefulness of a potential segmentation solution, researchers examine the responses to other questions in the survey (e.g., via filters or cross-tabulation) by the different segments. This helps uncover actionable differences that can inform targeting strategies, product design, messaging, and more.

MaxDiff and CBC data are especially well-suited for segmentation because they’re based on choice rather than ratings:

  • No scale bias: Unlike rating scales, choice data isn’t affected by individual differences in how people utilize scales (e.g., always rating high or low).
  • More meaningful discrimination: Respondents can’t say everything is equally important—CBC and MaxDiff force trade-offs, encouraging thoughtful responses and reducing straightlining behavior.

Latent class

Segment finder uses latent class MNL analysis to uncover meaningful segments, or groups, of respondents who share similar preferences based on their choices in a MaxDiff or CBC exercise. Before running Segment finder, we recommend you clean your data of inconsistent or fraudulent respondents.

Unlike HB analysis, which estimates a unique set of utilities for each respondent, latent class analysis identifies patterns in the data and groups respondents who tend to make similar choices. These groups, or segments, help you understand how different groups of people evaluate your products, features, or messages.

Latent class analysis estimates two key things:

  • Part-worth utilities for each segment, summarizing how that group values the items in the exercise.
  • Segment probabilities for each respondent, indicating the likelihood that they belong to each segment.

Once segments are identified, they’re saved as categorical variables. You can use them to segment your MaxDiff or CBC results, or create crosstabs with other survey responses. (Note: Segment-level utilities are not saved. Instead, we find the HB utilities more useful to describe the preferences for each segment and for making predictions with the market simulator.)

How it works

Latent class analysis is an iterative estimation process that uncovers patterns in respondent choices. Here's a simplified overview of how it works:

  1. Start with random utility estimates for each segment.
  2. Calculate the probability (using the MNL equation) that each respondent belongs to each segment, based on their answers.
  3. Re-estimate utilities for each segment using aggregate MNL, weighted by the respondent probabilities from the previous step.
  4. Repeat steps 2–3 until the model stabilizes and improvements in fit stop (known as convergence).

Each respondent ends up with a probability of belonging to each segment. If the segmentation structure fits the data well, the highest of these probabilities is often close to 1, indicating a strong fit to a particular group.

Technical note: Segment finder uses a scale-constrained version of the Latent Class MNL model. This prevents solutions where two segments appear to differ only in response error (scale) rather than true preference. As part of step 3 above, we normalize the scale for each class to match the weighted average scale across all classes. While this constraint may result in slightly lower model fit compared to the traditional (unconstrained) LC-MNL model, the benefits of clearer, more meaningful segmentation often outweigh the tradeoff.

Segmentation is as much an art as a science. A statistically “strong” solution should also make sense from a strategic or managerial perspective. The best number of segments is often subjective, though we typically recommend staying around 5–6 groups or fewer. A measure of fit is described further below.

Segment finder automatically calculates all solutions starting from two segments up to your selected maximum.

For a deeper dive into latent class analysis, read our technical paper on the topic.

Using Segment finder

To run Segment finder:

  1. Navigate to the Analysis section of Discover.
  2. Select the Segment finder item under the exercise (CBC or MaxDiff) you'd like to analyze.
  3. In the right panel, set the Maximum number of segments.
  4. Click the Run button to begin analysis.

Once segment finder finishes running, each solution will appear in its own tab at the top of the screen.

At this point, each of the solutions is also available as an option for segmentation in the other areas of exercise analysis.

Segment solutions results

Each solution includes a set of tables to help you understand how respondents were grouped and to evaluate whether the solution is meaningful and actionable.

Respondent distribution

While each respondent has a probability of belonging to every segment, for simplicity, they are assigned to the segment where their probability is highest. The count and percentage reflect these discrete assignments.

Respondent Distribution

Greatest utility differentiators

This table highlights the utility scores that vary the most between segments, helping you interpret what distinguishes the groups. These values are the average individual-level utility scores, estimated by HB, within each segment.

Greatest Utility Differentiators

Full utilities breakdown

This table displays the average HB utility scores for each segment for all attributes and levels (or items) in the exercise, presented in the order you specified them.

Full Utilities Breakdown

Analysis details

More about model fit

The final tab, Analysis details, provides information about how well each segmentation solution fits the data. Like the RLH score used in HB estimation, latent class analysis generates a model fit statistic that indicates how well the utilities for each segment predict respondents’ choices. 

However, model fit tends to improve simply by adding more segments. For example, if differences in price sensitivity largely drive a two-group solution, you might slightly improve the fit by further splitting the groups based on preferences for another attribute. But after a point, additional segments offer only marginal gains in fit, minimal preference differentiation, and can create overly fragmented solutions that aren’t helpful for strategic decisions. 

There are different ways researchers have proposed to more accurately describe the fit of segmentation solutions, with the idea that larger increases in the fit should justify additional segments. To evaluate model fit more objectively, Segment finder uses the Bayesian Information Criterion (BIC). BIC accounts for model complexity by penalizing solutions with more groups, providing a more balanced view of fit compared to methods that only reward improved accuracy. 

Model fit (BIC)

BIC (Bayesian Information Criterion) scores are displayed in a chart for each solution. Since BIC is a relative measure, there’s no absolute "good" or "bad" score. A two-group solution with a worse BIC might be more useful than a three-group solution with a better BIC, particularly if its segments are more distinct and easier to interpret.

Model Fit (bic)

Note: Lower BIC values indicate better fit. To reduce the risk of misinterpretation (since higher values are often assumed to be better), the chart is inverted with the zero axis at the top.

Number of replications

Segmentation algorithms can sometimes land on suboptimal solutions due to an unlucky starting point. To reduce the likelihood of this, Segment finder runs ten replications from different starting points for each N-group solution and returns the best result (the one with the best fit).