Hierarchical Bayes (HB) estimates high-quality individual-level utilities for CBC and MaxDiff (Best-Worst Scaling) despite sparse data for each individual. HB’s goal is not to maximize the fit to the individual-level choices (the RLH statistic, or Root Likelihood). This could lead to overfitting and the utilities may do poorly in predicting new choices (such as holdouts or real market choices) outside the scope of the data used in utility estimation.

HB finds an appropriate compromise between fitting the choices at the individual level and finding utilities for the individual that are likely to be drawn from the population of respondents. Imagine a market where respondents have differences in tastes (heterogeneity of preferences). If the consideration given to fitting the respondent’s individual choices is too low, then the utilities look like small deviations from the average set of utilities for the population, and the predictive validity of the utilities (to new choices outside the estimation data) will likely suffer. On the other hand, if the emphasis on fitting individual choices is too high, then the predictive validity will suffer as well due to overfitting to sparse data.

HB’s default setting for prior variance informs the algorithm regarding the balance it should strike between fitting the lower-level model (the individual choices) vs. the upper-level model (the population characteristics). The default prior variance is 1.0, but we can manipulate it to illustrate how predictive validity for held out choices (new choices not included in the utility estimation) can be affected.

We examined a recent CBC survey involving 4 attributes and 12 choice tasks per respondent. We randomly selected 2 choice tasks per respondent to hold out for validation purposes and estimated utilities using the remaining 10 tasks. We’ve reported the fit to the 10 choice tasks (RLH) and the holdout hit rate for the 2 held out tasks below, under different degrees of consideration of the lower- and upper-level models.

RLH (Internal Fit) | Holdout Hit Rate (Predictive Validity) | |
---|---|---|

Default HB settings to find a compromise between fitting the individual-level and upper-level models (prior variance = 1) | 0.55 | 58% |

Emphasize the fit at the individual level (prior variance = 100) | 0.81 | 55% |

Emphasize the fit to the upper-level model (prior variance = 0.05) | 0.34 | 52% |

We see that obtaining the highest RLH (internal fit of the model) doesn’t lead to the highest holdout hit rate. A more appropriate compromise between fitting the individual choices and the upper-level model leads to higher predictive validity.

Practical take-away? Sometimes researchers think that a reduced RLH is a bad thing (e.g. due to imposing utility constraints or fitting a linear term rather than a part worth term). It may not be. Even though the internal fit to the choices used in utility estimation may have gone down, the predictive validity for new choices not included in the model may have increased. Conversely, sometimes researchers think that increasing the RLH (e.g. by adding interaction effects or increasing the prior variance) is a good thing. That also may not be the case. Holdout validation can help guide researchers regarding the practical effect of such changes on predictive validity.