Glossary of Terms

Last Updated: Hits: 7270
This brief glossary provides definitions for some of the terms used at Sawtooth Software. Many of the glossary below have been taken from the much more complete glossary (62-pages worth!) provided in the 3rd Edition "Getting Started with Conjoint Analysis" book. We are grateful to Tom Miller of Research Publishers, LLC for allowing us to reprint and distribute a portion of that copyrighted material here.

Alternative-Specific Constant

Also known as ASC. In discrete choice or choice-based conjoint models, each product concept (alternative) often is associated with, say, a specific (and fixed) brand. In addition to the brands, other attributes, such as colors, speeds, and prices, may be included in the choice task. The Alternative-Specific Constant is the remaining utility associated with a product alternative after accounting for all other attributes except the concept label, or in this case, the brand. In this context, it is therefore the brand intercept, or utility for brand. Of course, alternative-specific constants may reflect things other than brand. Transportation options (e.g., bus, car, train, bike) are another common example.

ASC

(see Alternative-Specific Constant)

CHO file

The older raw data file format for CBC, named studyname.cho. Old-timers at the Sawtooth Software conference refer to CHO files, which simply means a file containing respondent#, attribute levels seen by respondents in each CBC task (the experimental design), and the choices made by the respondent. CHO files are submitted to HB (hierarchical Bayes) analysis to compute part-worth utilities, or preference scores attached to each attribute and level.

Compensatory model/behavior

A fundamental assumption made by most conjoint analysis techniques that the total value of a product is simply equal to the sum of the values of its separate parts (the sum of the part-worth utilities across its relevant attribute levels). The theory of Additivity suggests that respondents apply a Compensatory Rule when considering which product to buy/choose, wherein bad characteristics can be overcome by the presence of other very good characteristics (since the value of good characteristics can compensate for bad characteristics). This simple view of buyer behavior is certainly flawed, but the simplification permits us to build generally useful models of consumer behavior that involve many attributes and levels, using a manageable number of questions asked of each respondent.

In truth, many respondents apply Noncompensatory rules, such as rejecting any product that has a particular Unacceptable color or brand, or that exceeds a certain price, irrespective of how many other excellent qualities it has. Another example of a noncompensatory rule is when respondents choose any product that reflects their favorite brand label, irrespective of how many other bad characteristics it may have.

Even though the additive rule seems simplistic and restrictive, it actually is flexible enough to represent basic (yet common) manifestations of noncompensatory behavior. For example, if a respondent has must-avoid levels and if large enough negative utility weights are given to these levels, then no combination of good product characteristics added together can compensate for a must-avoid level under the additive rule.

Dominated concepts/Dominance

When a product alternative or conjoint card is clearly superior in utility to competing alternatives, this is termed dominance. A high degree of dominance within a conjoint or discrete choice questionnaire usually is not desirable. A choice of a dominant alternative is less informative for refining respondent preferences than a more thoughtful trade-off among alternatives wherein there isn't an obvious choice. Utility Balance refers to the extent to which the alternatives in a choice task are similar in terms of their overall preference.

Hit rates

A measure of the ability of conjoint analysis to predict individual responses to holdout profiles. For example, a respondent may have completed eighteen choice tasks in a choice-based conjoint experiment, followed by another choice task that is held out for validation purposes (not used in the estimation of part-worth utilities). Using the part-worth utilities developed from the first eighteen choice tasks, one predicts responses to the holdout choice task. If the prediction matches the respondent's choice, a hit is recorded for this respondent. If not, a miss is recorded. The hit rate across the sample is the percent of correctly predicted holdout responses using the model.

Hit rates for holdout choice tasks involving three or four product alternatives usually range from 55 to 75 percent. The more alternatives per choice task, the lower the hit rate. Many researchers use hit rates to compare conjoint methods or different models using the same conjoint method. But this measure of success is typically not as meaningful to management as the accuracy of share predictions from market simulators, measured in terms of Mean Absolute Error or Mean Squared Error. Successful conjoint models feature both high hit rates and excellent share prediction accuracy.

Holdout tasks

Refers to conjoint or discrete choice (CBC) questions not used to estimate part- worth utilities, but held out separately to assess the quality or performance of the estimated part- worths. If the responses to held-out questions can be predicted accurately using estimated part-worths, it lends greater credibility to the model. Assessing the quality of part-worth estimation using holdout questions is more indicative of internal reliability than of predictive validity. True validity tests usually require real-world sales or choice data rather than holdout conjoint questions asked during the same survey as the other conjoint tasks.

Holdout respondents

Respondent data that are not included in building a model but are used to validate the model. For example, 500 Calibration Respondents could be used to build a predictive model of choice (such as for a Choice-Based Conjoint). That model could be used to predict the responses that a different set of 500 holdout respondents made to a single Version of a CBC questionnaire (see definition of Version). The use of holdout respondents to validate models is often referred to as Out-of-Sample Validation. Typically, the characteristics of the calibration respondents and holdout respondents are matched, resulting from a Split-Sample Study where respondents are randomly assigned to be either Calibration Respondents or Holdout Respondents.

IIA

Independence from Irrelevant Alternatives, commonly known as the Red Bus/Blue Bus problem. A property of the logit model, in which the ratio of any two product alternatives' shares is constant, irrespective of changes to (or introductions of) other product alternatives. As an illustration of the IIA property, consider only two available beverages in a market: Coke and milk. Further assume that Coke captures 80 percent of the market and milk 20 percent (i.e., Coke's share is four times that of milk).

Assume Pepsi, a new competitor, enters the market and captures 50 percent share. According to the IIA property, Pepsi would take share proportionally (at a constant substitution rate) from Coke and milk, such that the resulting shares would be Coke 40 percent, milk 10 percent, and Pepsi 50 percent. That is, after the introduction of Pepsi, the ratio of Coke's to milk's shares would still be four to one, but milk's share would be cut from 20 percent to 10 percent. While the IIA property makes logit models very efficient computationally, most researchers regard it as quite unrealistic. One would expect that the introduction of Pepsi would take share principally from Coke, leaving milk's share essentially unchanged. A common illustration of the same principle involves highly substitutable options for getting to work (a red bus and a blue bus) versus other dissimilar alternatives like driving, biking, and walking—hence, the "red bus/blue bus problem." The good news for market researchers is that when using individual-level estimation of part-worths and the logit simulation model, the IIA trouble is greatly reduced. Within each individual, the IIA property still holds. But, when simulated choices for each individual are accumulated across the population, the overall results reflect more realistic substitution patterns among similar products rather than strictly following the IIA property. Some researchers deal with IIA troubles by using different model formulations such as nested logit or models that incorporate availability effects.

Incentive alignment

The condition in which respondents perceive that they will be better off making choices within questionnaires that align with their true preferences. As an example, respondents may be told that they will be entered into a drawing in which the winner receives a product that most closely matches the choices that he or she has made in a conjoint exercise. Incentive-aligned conjoint analysis attempts to reduce the bias that can occur if survey respondents provide less realistic responses because they are not making real purchases with real money.

Interaction effect

Typical conjoint analysis models assume that the utility of a product alternative is equal to the sum of the values of its independent parts. However, there are situations in which the levels from two attributes combine to create something considerably better or worse than their independent values might suggest. Such a case is termed an interaction. For example, if we are studying automobiles, the combination of convertible with the color red may produce a synergistic effect upon utility that is not explainable by the preferences for the separate values of models and colors. Also, if limousine is combined with the color red, that combination is considerably worse than might be expected from the separate utility scores for red and limousine. Interaction effects are parameters that are estimated in addition to the main attribute level effects (main effects).

Level overlap/Minimal overlap

Level overlap refers to whether an attribute level (such as a particular brand, speed, or color) repeats across alternatives within the same choice task in CBC. For example, with three levels of brand, if three product alternatives (concepts) are shown per choice task, each brand can be represented exactly once per choice task. This would reflect no level overlap with respect to brand. If four product alternatives were displayed per choice set, one of the brands would need to be repeated, causing some level overlap. Repeating a level within a choice task only after all levels within the same attribute have been used is termed Minimal Overlap. Minimal overlap strategies are most efficient with respect to Main Effects estimation, assuming respondents apply compensatory (simple Additivity) decision rules. However, many (if not most) respondents do not use simple Additivity rules to answer conjoint questionnaires. Over the last few years, there has been increased emphasis on including some level overlap in CBC tasks, as the results are often superior to minimal overlap designs. Some degree of level overlap also is beneficial when estimating Interaction Effects.

MAE

(Mean Absolute Error) is a measure of fit between predicted and actual values. In conjoint analysis, MAE is most commonly used for comparing predicted shares of preference to actual shares (either actual market choices or choice shares within surveys). For example, consider four product concepts with predicted shares of preference and actual observed choices as shown the table below:

Product Predicted Share
of Preference
(Percent)
Actual Choice
Share (Percent)
Absolute Error
(Percent)
Discount Offering
(Brand A, Red, $100)
20 20 |20 − 20| = 0
Medium Offering
(Brand B, Red, $150)
25 30 |25 − 30| = 5
Medium Offering
(Brand C, Green, $175)
18 20 |18 − 20| = 2
Premium Offering
(Brand D, Gold, $250)
37 30 |37 − 30| = 7
Sum 100 100 14

Computing the mean absolute error in this problem would yield mean absolute error (MAE) =14 / 4 = 3.5. The smaller the MAE, the better the predictions align with actual choice shares. Researchers typically try to improve the predictions of their models and lower the MAE. Improvements can be made by specifying better models, using better estimation techniques, deleting bad data, increasing the sample size, and tuning the exponent or scale factor in the estimation procedure. The magnitude of the MAE depends upon the number of product concepts involved in the predictions. With four products in a market scenario, the average size of the shares is 25 percent. With ten products in a market scenario, the average size of the shares is 10 percent. It follows that MAE can be expected to be smaller as the number of product concepts increases.

To adjust for the number-of-products effect upon MAE, some researchers prefer to divide MAE by the average magnitude of choice shares, leading to a quantity called the mean absolute percentage error (MAPE). For the example above, there are four products in the market, so the average size of shares is 25 percent. Computing the MAPE, yields mean absolute percentage error (MAPE) = 3.5 / 25 = 0.14 = 14%. The MAPE indicates the average percentage difference between actual and predicted values.

Main effects

The independent preference or utility for the attribute levels, holding all other attributes constant. For example, consider the following attributes, levels, and main effect part-worth utilities:

Attribute Level Utility
Brand Coke 0.20
  Pepsi 0.00
  7-Up -0.20
Price $2.50 0.40
  $2.75 0.10
  $3.00 -0.50

With main effects, the effect (utility) of brand is estimated independent of price (and vice versa). We interpret the part-worths above to indicate that, holding price constant, Coke is preferred to Pepsi, and Pepsi is preferred to 7-Up. The part-worth utilities for price reflect the average price sensitivity across all brands (holding brand constant). Main effects ignore the possibility of interactions between attributes. If interactions exist (and are not accounted for), main effect estimates may be biased.

MaxDiff: Express

With an Express Maxdiff study, each respondent is shown a small subset of a large number of items three to four times. Each respondent sees a different subset of items. Express MaxDiff relies upon the magic of HB analysis to fill in the blanks and supply utilities for the items missing from a given respondent’s experiment (essentially imputing them based on the population means and covariances). For a comparison with Sparse MaxDiff, see A Parameter Recovery Experiment for Two Methods of MaxDiff with Many Items (2015).

MaxDiff: Sparse

With a Sparse MaxDiff study, each respondent typically sees all the items in the study, but fewer than the recommended number of times (i.e. perhaps just once). When compared head-to-head in a study based on live human preferences, Sparse MaxDiff has a modest edge over Express MaxDiff. For a comparison with Express MaxDiff, see A Parameter Recovery Experiment for Two Methods of MaxDiff with Many Items (2015).

Non-compensatory decision rules

(see Compensatory model)

Out-of-sample validation

(see Within-sample validation/Out-of-sample validation)

Part-worth

The utility associated with a particular level of an attribute in a multiattribute conjoint analysis model. The total utility for the product is made up of the part-worths of its separate attributes (components). Sometimes researchers have referred to part-worths somewhat incorrectly as utilities.

More technically, utilities refer to the total desirability of the product alternative, and part-worths refer to the component of desirability derived from the separate attribute levels for that product.

Partial Profile

A partial profile involves the presentation of a subset of the attributes in a product concept. Choice-based conjoint (CBC) can employ partial profiles, wherein only a subset of the attributes are shown in each choice task, but typically across multiple choice tasks each respondent will be exposed to all the attributes and levels within the study.

Red bus/blue bus problem

(see IIA)

Reversals

When an estimated part-worth utility or utility coefficient defies rational expectations or order. For example, consider the following part-worth utilities for price:

Attribute Level Part-Worth
$10 1.50
$15 1.63
$20 0.75

We would expect that lower prices should be preferred to higher prices (all else held constant). However, the part-worths for $10 and $15 reflect a reversal—the data suggest that $15 is preferred to $10. As another example, if we fit a linear coefficient to price, we should expect a negative sign (utility is negatively correlated with price). An estimated price coefficient with a positive sign would also be considered a reversal.

Reversals are often seen within individual respondents and are usually due to random error (lack of precision of the part-worth estimates) due to limited information. Reversals can be detrimental to individual-level predictions of behavior. But reversals due to random error at the individual level are usually of little consequence to accurate predictions of shares of preference for groups of respondents (given adequate sample size). Reversals can be eliminated by using constrained estimation, often referred to as Utility Constraints.

RLH (Root Likelihood)

A measure of fit between utility estimates (preference scores) and choices made by respondents to CBC, ACBC, MBC, and MaxDiff questionnaires. Using utility estimates, one can apply the logit rule to predict the likelihood that respondents would make the choices that they actually made within each choice task. Root likelihood (RLH) is the geometric mean of estimated probabilities associated with the alternatives actually chosen by respondents—that is, the nth root of the likelihood.

For example, consider a choice task with alternatives A, B, and C with utilities for a respondent of 1.9, 0.1, and -0.6, respectively. Under the logit rule, the likelihood of the respondent selecting each alternative is computed by taking the antilog of the alternatives' utilities and normalizing the results to sum to 1.0, as shown in the table below:

Alternative Utility table

Suppose that this respondent selected Alternative A as best in the questionnaire. The likelihood of observing that response, according to the utilities, is 0.80.

If a respondent answered three choice tasks in the questionnaire and the estimated utilities suggest that the respondent would have made those three specific choices with probabilities 0.8, 0.5, and 0.6, then the joint probability or likelihood of observing those three specific choices, assuming independence, is 0.80 × 0.50 × 0.60 = 0.24.

The greater the number of choice tasks, the closer the likelihood approaches zero. Researchers find it more meaningful to report the geometric mean of likelihoods across choice tasks by taking the nth root of the likelihood (the root likelihood, RLH), where n is the number of choice tasks. For this example with three tasks, the RLH is the cube root of 0.24, or 0.621.

The magnitude of the RLH is directly affected by the number of alternatives available within each choice task. If there are four alternatives in each choice task, then the null likelihood (the predictability of the responses using uninformative utilities) is 1/4 = 0.25. Therefore, a fit only slightly larger than 0.25 in a four-alternative choice study would not be very impressive.

RMSE (Root Mean Square Error)

A summary measure of error between predicted vs. actual values. Consider the following predictions and actual values:

Prediction Actual Difference Squared Difference
2 3 1 1
4 2 2 4
5 5 0 0
3 5 2 4
  Sum: 5 9
  Average: 1.25 2.25

The mean squared error (MSE) is 2.25, summarizing the squared differences between predictions and actual. The square root of the MSE (RMSE) is Sqrt(2.25) = 1.5. RMSE is quite an intuitive measure of error, since it summarizes about how far the predictions are on average from the actual values. Note that the mean absolute error (1.25) and the RMSE (1.50) are intuitively quite similar, but not mathematically identical. The RMSE tends to penalize individual predictions that are particularly bad, since it initially involves squaring the difference between predicted and actual values.

Scale factor

The relative size or spread of the part-worth utilities from conjoint analysis, especially choice-based conjoint analysis. Consider the following part-worth utilities:

Respondent #1 Respondent #2
-0.5 -1.0
0.0 0.0
0.5 1.0

The difference in utility between the best and worst levels for Respondent #1 is 1.0, and the similar spread for Respondent #2 is 2.0. Respondent #2's relative scale factor is twice Respondent #1's. With logit analysis, the scale factor reflects the certainty associated with an individual's or a group's choices to the questions used in developing part-worth utilities. The scale factor also has a large impact on resulting shares of preference from the logit or randomized first-choice simulation methods. Respondents (or groups) with larger scale factors will have more accentuated (extreme) share predictions than those with smaller scale factors. One can change the scale factor for individuals or groups by tuning the exponent. The exponent is applied as a simple multiplicative factor to the part- worth utilities prior to predicting choice.

Split-sample study

Randomly dividing respondents into two or more different groups who each receive different versions of a questionnaire. This is often done to test two slightly different conjoint methods, for example. Oftentimes, some of the respondents are randomly split off into a holdout sample group.

TURF

TURF stands for Total Unduplicated Reach and Frequency, a search optimization procedure for finding the minimum number of flavors, brands, colors, cable stations, or magazines that will satisfy or cover (Reach) the maximum number of people the most number of times (Frequency). A classic example is for an advertiser to choose just three magazines that will reach the largest number of readers within the target market. TURF is often used as an extension of the MaxDiff methodology, though it may be used with general pick-any data or rating scales.

Utility

An economic concept that, when used in the context of conjoint analysis, refers to a buyer's liking for (or the desirability of) a product alternative. Researchers often refer to the values for individual levels as utilities, but this is not technically correct. Utility most correctly refers to the preference for an overall product concept, whereas the components of utility associated with that product's attributes are called part-worths.

Utility balance

(see related discussion under Dominated Concepts/Dominance)

Versions (Blocks)

When there are many more conjoint questions in the total design than any one respondent can evaluate, the questions can be divided into carefully constructed blocks, or subsets of the total design plan. For example, consider a conjoint study with 100 unique conjoint questions. The researcher, realizing that any one respondent does not have the time or ability to evaluate all 100 questions, might divide the questions into five blocks of twenty questions. Each respondent is randomly assigned to receive one of the five blocks (sometimes called questionnaire versions). Ideally, each block of conjoint questions reflects a high degree of level balance (each attribute level occurs nearly an equal number of times). Blocking is often used when the method of interviewing favors managing only a limited number of questionnaire versions (such as with a paper-and-pencil format) and the estimation method involves some type of aggregation or data sharing across respondents. With computerized conjoint methods, it is often useful to give each individual a unique set of carefully constructed conjoint questions.

Within-sample validation/Out-of-sample validation

If researchers use part-worth utilities for a given set of respondents to predict answers given by those same respondents, this is called within-sample validation. To use part-worth utilities for a given set of respondents to predict answers given by a different set of respondents is considered out-of-sample validation. Out-of-sample validation is generally a higher standard for assessing the quality of predictive results, though it typically requires larger sample sizes to carry out adequately.

Zero-Centered Diffs

In a market simulator, if you are processing individual-level utilities, the displayed utilities are rescaled using a method called Zero-Centered Diffs. This method rescales utilities at the individual level so that the total sum of the utility differences between the worst and best levels of each attribute across attributes (main effects) is equal to the number of attributes times 100. Here is how you calculate them by hand:

  1. Within each attribute, compute the mean utility. Within each attribute, subtract the mean utility from each utility (this zero-centers the utilities within each attribute...which often doesn't have to be done since they are often already zero-centered in their raw form).
  2. Then, for each attribute compute the difference between best and worst utilities. Sum those across attributes.
  3. Take 100 x #attributes and divide it by the sum achieved in step 2. This is a single multiplier that you use in step 4.
  4. Multiply all utilities from step 1 by the multiplier. Now, the average difference between best and worst utilities per attribute is 100 utility points.