Selecting the Number of Segments in Latent Class MNL

Last updated: 18 Jan 2021
President, Sawtooth Software

We often model MaxDiff and CBC data with Latent Class MNL, to find groups of respondents who tend to choose similarly. CAIC, BIC, and AIC are different (but related) measures of fit for Latent Class MNL that are comparable across segmentation solutions because they penalize the fit due to increase in complexity (number of groups & parameters). They are highly correlated and often (but do not always) agree. With these fit statistics, lower is better. There isn't consensus regarding which is more reliable and although these Latent Class fit statistics aren’t foolproof for pointing to the “true” solution, they tend to be useful.

What we sometimes see with Latent Class MNL for MaxDiff or CBC are situations in which as we fit more groups the fit statistics improve (get lower) and then turn back and start to get worse (get higher) again, such as:

CAIC, BIC, or AIC by Group #s

In this case, I would tend to look closely at the 3-6 group solutions, to see which is better from a practical/strategic purpose (clarity for the organization, target segments that are large enough and different enough to matter, discrimination across other important targeting/profiling variables, etc.). Although the 5-group solution is “optimal” in terms of the fit statistic, the 3, 4, or 6 group solutions might be better for the client’s purposes. All solutions in the range of 3-6 seem to be about optimal in terms of penalized fit and thus are worth a look.

Often what I’ll see is that these fit statistics continue to get better (lower) the more groups we add, without doubling back. But, too many groups can make a solution not very usable for practical/strategic purposes. Although it is a bit subjective, I like to look down the chart to see if there seems to be an “elbow,” such as for this chart below at 3 groups:

CAIC, BIC, or AIC by Groups #s

The 3-group solution seems to be the point after which additional complexity is not rewarded at a very fast rate in terms of the fit statistic. However, I would probably look at the 4 and 5-group solutions as well to see if they were promising from a practical/strategic standpoint.

Most clients cannot deal with more than about six segments if they are looking to develop segmentation strategy to communicate throughout the organization. So, there certainly tends to be both judgment and science involved in conducting useful segmentation studies--with back-and-forth as you work with clients to decide on the right segmentation solution.