Segmenting on Choice and Survey Data: A Comparison

Keith Chrzan

Last updated: 04 Jun 2025

Sometimes we want to segment both on the basis of a choice model and other general survey questions. The gold standard for this kind of segmentation is to use a single latent class model that combines a scale-adjusted latent class multinomial logit (MNL) choice model for the MaxDiff experiment with a latent class clustering model for the general survey questions, as Tom Eagle has argued persuasively (Eagle 2013, Eagle and Magidson 2019).

The Sequential Alternative: A Common but Flawed Approach

This gold standard analysis is pretty complex, so many folks instead run a theoretically less appropriate "sequential" model. After running a hierarchical Bayesian (HB) mixed logit model to get respondent-level MaxDiff utility estimates they combine those respondent-level utilities with other survey variables and then run a cluster analysis on the joint data set. This alternative is inappropriate for several reasons (Eagle 2013, Eagle and Magidson 2019, Lyon 2019).

But how bad is it, really? To get a feel for how far off the sequential model can be from the gold standard, I recently ran a test. I dug up a segmentation study where we segmented on both a MaxDiff choice model and on other survey variables, I ran both the simultaneous and sequential models.

Measuring Segmentation Agreement with the Adjusted Rand Index

A standard statistic for comparing segment solutions is the Adjusted Rand Index (ARI). ARI usually ranges from 0.0 (random match that you'd expect from chance alone) to 1.0 (perfect match). When I compared the sequential and simultaneous analyses from 2-10 segment solutions, these ARIs resulted:

Figure 1: ARI by number of segments

The Results: Sequential and Simultaneous Methods Produce Different Segments

These results demonstrate that the simultaneous and sequential methods do NOT identify the same segmentation solution. To get a feel for what an ARI of 0.26 looks like, here's the crosstab of the sequential and simultaneous models' assignments for the 6-segment solution (percentages sum to 100 for the table). Green highlight identifies the best mapping of segments from one solution to the other.

Figure 2: Sequential – Simultaneous Crosstabulation

These two solutions are not classifying respondents the same way (only 54% are classified into the same segment in both solutions). They identify different segments, of different sizes, and, though not shown here, with different interpretations.

Conclusion: Why the Gold Standard Simultaneous Approach Matters

Clearly the sequential approach is not a viable substitute for the gold standard simultaneous approach. So, the answer to "How bad is it really?" is "Pretty bad." If you intend to segment on both a choice model and on other survey variables, you really should build the gold standard joint model that simultaneously runs a latent class choice model and a latent class clustering.

Of course, you can contact our analytical consulting division for help doing this at: keith@sawthooth.com.

References

Eagle, T. (2013). "Segmenting Choice and Non-Choice Data Simultaneously," 2013 Sawtooth Software Conference Proceedings, 231–250.

Eagle, T. and J. Magidson (2019) "Segmenting choice and non-choice data simultaneously: Part Deux," 2019 Sawtooth Software Conference Proceedings, 247-280.

Lyon, D.W. (2019) "Comments on 'Segmenting choice and non-choice data simultaneously: Part Deux,'" 2019 Sawtooth Software Conference Proceedings, 281-288.