Anchored MaxDiff Scores

The "Scores" tab reports the average scores for the items across respondents. Also, the 95% confidence interval for each is displayed.

With Anchored scaling, the Anchor utility usually represents the utility threshold (boundary) between important/not important, preferred over status quo/not preferred over status quo, or buy/no buy items.

Zero-Anchored Interval Scale: These scores set the Anchor item equal to zero and the range of scores equal to 100 for each respondent. Negative scores are below the anchor (i.e. the important/not important threshold) and positive scores are above the anchor. This scaling method has the advantage that each respondent has equal weighting toward population means, making it more proper to compare individuals or groups of respondents on the item scores than for the other scales. But, interval-quality scales do not allow us to make ratio-quality judgments, such as saying that an item with a raw score of 2 is twice as important/preferred as an item with a score of 1.

Probability Scale (Anchor=100): These are ratio scaled scores for items where the Anchor item is equal to 100 for each respondent. All scores are positive, where those below 100 are below the anchor threshold of utility (i.e. the important/not important threshold). These data reflect a ratio-quality scale, allowing one to conclude (for example) that an item with a score of 10 is twice as important/preferred as an item with a score of 5. The ratio differences are directly tied to the probabilities of choice as reflected in the context of the MaxDiff questionnaire. One disadvantage of this scale is that each respondent does not receive equal weighting when computing population means. Respondents who believe all items fall below the threshold have a maximum score of 100, whereas respondents who believe some items exceed the Anchor utility have a maximum score of the number of items shown in each set * 100.

Probability of Choice vs. Anchor (Anchor=50): These are probabilities (ranging from 0 to 100) that reflect the likelihood that an item would be selected as "best" compared to the Anchor item. These data reflect a ratio-quality scale. A score of 50 indicates the item is equal in utility to the Anchor item. A score of 90 means the item has a 90% likelihood of being selected instead of the Anchor item. A score of 10 means the item has a 10% likelihood of being selected instead of the Anchor item. If the anchor is a buy/no buy threshold, then the score represents the likelihood of purchase. The main disadvantage of this scale is that the ratio scaling is only consistent with the probabilities of choice expressed by respondents in the questionnaire in the case of 2 items shown per set (method of paired comparisons). When using this scale for MaxDiff questionnaires that have shown more than 2 items per set, the ratio differences between items will be somewhat more accentuated than justified by the original choice data. A secondary disadvantage is that each respondent does not receive equal weighting when computing respondent means. Some respondents have a wider range of scores than others.

Raw Scores: These are the scores directly resulting from the HB estimation and are logit-scaled (an interval-quality scale). The Anchor item receives a score of zero, and the other items are scaled with respect to the Anchor item. Items preferred to the Anchor threshold are positive. Items not preferred to the Anchor threshold are negative. The Raw interval-quality scales do not allow us to make ratio-quality judgments, such as saying that an item with a raw score of 2 is twice as important/preferred as an item with a score of 1. The Raw Scores also have the disadvantage that some respondents may be weighted significantly more than others in calculating the population means (their scores have much larger range than other respondents).

The 95% confidence interval provides an indication of how much certainty we have regarding our estimate of the item's score. The interpretation is this: if we were to repeat the experiment many, many times (drawing new random sample in each case), the population's true mean would fall within the computed confidence interval in 95% of the experiments. In other words, we are 95% confident that the true mean for the population falls within the 95% confidence interval (again, assuming unbiased, random samples). The 95% confidence interval is computed by taking the item's mean, plus or minus 1.96 times its standard error. The standard error for each score is computed by dividing its standard deviation by the square root of the sample size.