Improving the Accuracy and Stability of Models with Hierarchical Bayes Regression

Regression analysis is widely used in marketing research for quantifying the relationship between predictor variables and an outcome. The predictor variables are termed independent variables and the outcome the dependent variable. As an example, in customer satisfaction modeling, the independent variables can be respondents' evaluations of brands on different aspects such as quality, performance, and service after the sale. The dependent variable is usually a measure of overall satisfaction with the brand or likelihood of purchasing that brand again.

Multiple regression models take the general form:

                Y = b0 + b1X1 + b2X2 + . . . bnXn

	Y 		= dependent variable
        b0              = constant          = regression weights (betas)
        X1...Xn          = independent variables

The goal of the model is to minimize the difference between the predicted and actual values of the dependent variable. The degree of fit is termed R². An R² of zero implies that the predictor variables provide no information to predict the dependent variable, and a value of 1.0 implies perfect fit.

Often in marketing research we tend to apply regression analysis to a group of observations that individually have different relationships between the independent and dependent variables. Consider people's opinions toward anchovies on pizza. Anchovies are generally either liked or despised. It is rare to find an individual who is lukewarm about anchovies. The distribution of individuals with respect to anchovy preference is not a normal bell-shaped curve, but perhaps has two "humps," reflecting the mixture of two very different populations.

Consider a hypothetical satisfaction study for pizza in which respondents tasted four different pizzas (some with anchovies and some without) and then rated each pizza on an overall desirability scale. To analyze the data, we apply a regression model to predict respondents' satisfaction for pizza based on whether it had anchovies or not. Let's assume that the independent variable (X1) indicating whether a pizza had anchovies or not was dummy coded (0=no anchovies; 1=has anchovies). Further assume that half of the population loves anchovies and their true beta weight b1 (the increase in satisfaction due to the presence of anchovies) is +10 (plus or minus some error). The other half of the population despises anchovies, b1 = -10, again plus or minus some error.

When we pool the data and estimate b1, we discover that b1 is close to but not significantly different from 0, and the R² is also near zero. (Both values would be exactly zero if respondents answered without noise and all used the rating scale in the same way.) Without any additional information, we'd be tempted to report that anchovies do not affect people's satisfaction with pizza whatsoever. And we would be dead wrong. The aggregate regression model has ignored heterogeneity and simply tried to describe the "average" respondent. Moreover, because aggregate regression cannot distinguish between (confounds) heterogeneity and noise, the estimate of b1 is not as precise as it could be.

Hierarchical Bayes (HB) can deal much better with this situation. HB "borrows" information from other respondents to compute relatively stable individual-level results when respondents provide multiple observations (in our example, respondents evaluated multiple pizzas). One can even estimate useful individual-level models for more independent variables than a respondent has given observations an impossible feat for traditional regression.

By estimating betas separately for each individual rather than just for the average of all people, HB separates heterogeneity (signal) from noise. The use of HB for this problem would reveal that anchovies significantly affect people's satisfaction for pizza. For HB, the average R² (the result of R² measured at the individual level and then averaged across respondents) is significantly greater than 0. If we submitted the individual-level betas to a cluster analysis, we'd learn that there were two distinct types of people with opposite opinions. We'd note that mean value for b1 was near zero. But, because HB has been able to separate the heterogeneity from the noise, the average estimate of b1 is more precise, and closer to zero than with aggregate regression.

Those readers attuned to the assumptions of HB will point out that this hypothetical situation is at odds with HB's assumption that respondent betas conform to a normal distribution. The beta weights are indeed tempered by this assumption, but the observations provided by each individual still strongly influence the individual-level betas. We've analyzed a synthetic data set conforming to the pizza example with HB and observed that it deals well with this problem. Respondents are separated into their respective populations, the individual estimates of beta closely fit the true betas, and estimates of aggregate betas are measured more precisely than under aggregate regression.

It is worth noting that Latent Class methods are also useful in dealing with heterogeneous populations. For this simple pizza example, a two-group Latent Class solution would be entirely appropriate. However, Latent Class solutions are subject to local minima, they typically do not achieve proper individual-level estimates and, like cluster analysis, the analyst must decide how many groups (classes) are appropriate.

While the pizza example above is a very simple illustration, the principles are important and relevant to more complicated regression problems in marketing research. The major take-aways are as follows:

  • If respondents provide multiple observations, HB can be used to estimate individual-level betas.
  • HB can distinguish between the heterogeneity and noise that aggregate regression modeling confounds. This results in more precise estimates of average betas than under aggregate regression.
  • The individual-level beta weights can be used to segment respondents, using methods such as cluster analysis, neural networks, CHAID or AID, or banner points (filters) such as in cross-tabs.

Another problematic issue that often derails multiple regression models is lack of independence (colinearity) among the independent variables. Consider a customer satisfaction study in which respondents evaluate multiple brands on various product-related features (independent variables) and then provide an overall evaluation of the brand (dependent variable). The goal of such a study might be to derive the weight (importance) each feature has in driving overall satisfaction. If some of the attributes have overlap in meaning for many of the respondents, such as "reliability" and "durability," regression modeling will have a difficult time distinguishing the relative weight of these two related items. As a result, colinearity leads to unstable estimates of beta weights. HB significantly reduces this problem by distinguishing heterogeneity from noise and by leveraging information from respondents whose ratings reflect better discrimination between "reliability" and "durability" to improve the measurement for respondents who did not make such distinctions. The result is more precise estimates of both individual and aggregate beta weights.

Marketers should be more concerned with profiling and targeting individuals and segments rather than the market average. HB methods support this strategy and are very valuable for problems that have traditionally been analyzed using aggregate regression. Whether the researcher's interest lies in achieving aggregate- or individual-level estimates of beta, for studies in which respondents provide multiple observations, HB usually beats aggregate regression.