MaxDiff scores

After collecting responses to your MaxDiff exercise, MaxDiff scores (importance scores) are estimated using Hierarchical Bayesian (HB) utility estimation. These scores provide powerful insights by ranking items and revealing their relative importance or preference.

Compared to traditional ranking questions, MaxDiff scores offer greater discrimination and avoid the scale use bias often seen in rating scales.

MaxDiff functions like a "beauty contest" among items. As respondents choose their most and least preferred items, Discover calculates preference scores for each item.

Each item receives a positive, ratio-scaled score that sums to 100. This scaling indicates that an item with a score of 10 is twice as preferred or important as one with a score of 5.

You can view these scores in the Summary tab of the exercise analysis area. Results are often visualized by listing items from highest to lowest scores for the overall sample or for specific respondent subgroups.

Max Diff Scores Summary Charts in Discover

For additional insight, you can display 95% confidence intervals alongside MaxDiff scores.

Assuming your respondents are representative of the randomly drawn population, you can be 95% confident that the true population score falls within this interval.

You can also use the 95% confidence intervals to determine if one item is preferred over another. If the intervals for two items do not overlap, you can be at least 95% confident that one item is preferred to the other.

Segmentation enables comparisons of MaxDiff results among respondent groups based on survey questions or variables.

For example, if your MaxDiff exercise compares music artists, you can segment the results by the respondents' locations (asked in another question) to compare preferences between North America and Europe.

To apply segmentation:

Click the Segmentation dropdown.
Select a question or variable from the menu.
The segmentation icon will turn green to confirm it's active.

To remove segmentation or apply a different one:

Click the Segmentation dropdown.
Select No segmentation or choose another option.

Segmenting Dropdown in Max Diff Analysis

When segmentation is active, the results will update to show scores for each group. Respondents who do not have a defined value for the selected variable will be excluded from the segmented results.

This setting is only available when a survey includes a Relevant items MaxDiff. When items are excluded (missing) from a respondent's MaxDiff design, you must specify how the utility estimation procedure should handle those missing items. There are two options:

Missing at random
Select this option when items are randomly excluded from a respondent’s list (or when otherwise this assumption makes more sense for your data). Since there’s no systematic reason for those items to be missing, they shouldn’t be assumed to have better or worse scores than items a respondent saw.

In this case, the respondent provides no information about the missing items, and the population’s mean preference should remain unaffected.
If using Hierarchical Bayes (HB) estimation, HB will still estimate a utility value for the missing item. This estimate is based on population-level mean preferences and covariances. As a result, if most respondents rate the item highly, the respondent who didn’t see it will receive an imputed utility estimate that is similar to the average respondents’ preference.

Inferior
Choose this option if items are excluded because the respondent previously indicated they were less important or worse than the included items. Informing the utility estimation model about this distinction prevents bias.

Discover uses data augmentation to guide the estimation process.
A new reference item, known as a threshold item, is added to the design matrix: one task for each item in the source list. This item acts as a benchmark for both dropped and included items.
Each item is compared to the threshold item. If the respondent saw the item, it is marked as "best" in the task. If the respondent did not see it, the threshold item is marked as "best" in the task.
This method encourages dropped items to have lower utility than included ones without forcing them toward extreme negative values. Instead, their utility estimates tend to be relatively low on the logit scale, but not excessively so. Additionally, because the respondent excluded these items, the pooled population estimates are influenced, indicating that the respondent views dropped items as relatively less preferred.

By correctly specifying how missing items are treated, you improve the validity of your MaxDiff analysis and reduce bias in your results.

Example 1: A random subset of 30 items is chosen from the 50 total items in the study to show each respondent (often referred to as “Express MaxDiff.”) In this case, the correct choice is to assume the missing items are missing at random. Only respondents who have seen an item should contribute information about its relative preference.
Example 2: Only the movies respondents have seen are carried into their MaxDiff questions. If the researcher believes that only respondents who have seen a movie are in a position to judge whether it is best or worst, the correct choice is to assume missing at random. However, if the researcher believes that respondents usually gain enough information about the movie from reviews and word of mouth to judge whether they would like it without seeing it, then the correct choice is to assume missing inferior.
Example 3: Respondents are asked to rate a long list of brands, and only the top 8 brands are carried forward into the MaxDiff questions. The researcher assumes that missing items for a respondent are inferior to included items, so the correct choice is missing inferior.

For some research situations, it is very clear which missing item assumption makes the most sense. In other cases, it’s not so clear. When in doubt, run utility estimation both ways to compare the results.

In the upper right corner of the settings panel, you'll see the download menu (). This menu provides five available file downloads:

Scores (.xlsx)
Individual Scores-Rescaled (.xlsx)
Individual Scores-Raw (.xlsx)
Counts (.xlsx)
Design & choices (.xlsx)
Charts (.png)

Each download offers useful insights for analyzing your MaxDiff results.

The Scores file summarizes the MaxDiff utility scores. These scores are rescaled and match the table viewable in Discover.

When relevant items MaxDiff is used, an additional tab is included that reports the percentage of times each item was carried forward into the respondents’ MaxDiff questions.

Reviewing how often each item was shown in respondents’ MaxDiff aids your interpretation of the data. For example, if only 2% of respondents saw an item in their MaxDiff questions, then a) if missing at random is assumed, these 2% of respondents dictate the average relative preference for this item for the entire sample, b) if missing inferior is assumed, then 98% of respondents are assumed to have seen this item in their MaxDiff questions and rejected it as “worst.”

Individual MaxDiff scores can be downloaded in two formats: Raw and Rescaled.

Both formats include a column labeled MaxDiff_Fit (RLH). This fit statistic, based on root likelihood (RLH), indicates how well a respondent’s estimated utility scores predict their actual choices. Technically, RLH is the geometric mean of the probabilities of choice produced by the utility scores of what the respondent selected during the exercise.

Following the RLH score is a Model fit relative quality column, which categorizes each respondent as Good, Moderate, or Poor, provided each item was shown at least twice to that respondent. This rating accounts for the difficulty of the choice tasks—predicting one answer from two options is easier than from five.

These thresholds have been determined by simulating respondents who answer randomly to variously sized MaxDiff exercises and looking for cutoffs that maximize the detection of random respondents while minimizing false positives.

When relevant items MaxDiff is used, a setting appears before the file is downloaded. This setting is labeled Missing items export as blank fields. By default, this setting is turned off, which means that the utility scores calculated for these items will be included. If enabled, blanks are exported to the file, replacing whatever utility value had been imputed for the missing item for respondents whose dynamic lists did not include these items.

Raw scores represent the regression weights from a multinomial logistic regression model typically utilizing a Hierarchical Bayes (HB) MNL model. These scores are centered around zero, meaning their average value is 0, unless anchored scaling is applied in MaxDiff. In anchored scaling, scores are positive for items preferred more than the anchor point and negative for those less preferred.

Although not as intuitive as rescaled scores, raw scores are valuable for advanced researchers aiming to predict choice probabilities among utilities with the logit equation.

Rescaled scores are always positive, sum to 100, and follow a ratio scale. A score of 10 means the item is twice as preferred as an item with a score of 5.

Rescaled scores are more intuitive for most audiences to understand. For details on how the raw HB scores are converted to probabilities that add up to 100, see Appendix K of the  CBC/HB manual,  specifically the section titled "A Suggested Rescaling Procedure."

The Counts download provides detailed information on how often each item appeared in the MaxDiff exercise and how respondents reacted to those items. The data includes:

Number of times each item was shown
Number of times each item was selected as “best” or “worst”
Count proportions, which express the likelihood that an item was chosen when shown

Count proportions indicate the likelihood of an item being selected when presented. The data includes columns that report these probabilities:

Best count probability — The likelihood that respondents selected the item as “best” when it appeared in a MaxDiff set.
Worst count probability — The likelihood that respondents selected the item as “worst.”

Since these are probabilities, count proportions are restricted to values between 0 and 1. For example, if four items are shown in each task, the probability of selecting any given item is 25%. Therefore, an item with a "best" count proportion of 50% is considered the "best” at twice the chance level (assuming four items per task).

To obtain a quick summary score that is highly correlated with Hierarchical Bayes (HB) results, subtract an item's "worst" count proportion from its "best" count proportion.

The Chart download exports a .png image identical to the Discover Scores chart. This option is useful for quickly sharing or presenting MaxDiff results without having to recreate the visual in other software.

The Design & choices (.xlsx) file shows the specific design each respondent saw and the concepts they selected during the exercise.

The file is formatted for use in Sawtooth’s desktop software tools. If the exercise uses a dynamic list or includes anchoring, you can include additional tasks representing this information. These extras are designed to work seamlessly with the desktop tools and may include instructional notes directly in the file.

Discover Help Guide

MaxDiff scores

Introduction

Interpreting scores

Confidence intervals

Segmenting

Missing items behavior

Downloads

Scores

Individual MaxDiff scores

Raw scores

Rescaled scores

Counts

Count proportions

Chart

Design & choices