Fit Statistic and Identifying Random Responders
MaxDiff displays an average "fit statistic" (RLH) to the screen during HB estimation and also writes to file the individualized fit statistic along with each respondent's scores on the items. This fit statistic ranges from a low of 0 to a high of 1 and characterizes the internal consistency for each respondent.
It is natural to ask what minimum fit statistic distinguishes more thoughtful responses from purely random data. The fit statistic can help you know with a high degree of confidence if a respondent has provided purely random answers and should be discarded.
The magnitude of the fit statistic depends on how many items were shown in each set. If four items are shown in each set, then any random set of scores should be able to predict respondents' answers correctly with 25% likelihood (fit statistic=.25). If two items are shown per set (paired comparison approach), then the expected likelihood given random data is 50% (fit statistic=.5). Thus we can generalize that the fit statistic should be at a minimum 1/c, where c is the number of items displayed per set. We should hope that real respondents should perform considerably better than chance. But since the score estimation algorithm used in MaxDiff (HB) attempts to fit the respondent's choices, the actual fit we observe even from random data is almost always above the chance rate.
The table below displays minimum fit statistics to achieve about 80% correct classification of random responders while avoiding throwing away hardly any (~1%) real responders. In developing this table, we assumed a 20-item standard MaxDiff study (not anchored, not relevant items constructed list) wherein respondents are shown each item either three or four times across all sets (more sparse designs than this make it more difficult to distinguish between random and good responders). Our results should generalize to other standard MaxDiff studies with more or fewer items, as long as respondents see each item either three or four times. For more information on identifying random responders using HB RLH cutoffs, see Chrzan and Halversen, 2020.
Suggested Minimum Fit Statistic to Identify Random Responders with 80% Correct Classification |
||
Items per Set |
Suggested Minimum Fit. Each Item Shown Four Times to Each Respondent |
Suggested Minimum Fit. Each Item Shown Three Times to Each Respondent |
2 |
.556 |
.579 |
3 |
.416 |
.445 |
4 |
.312 |
.336 |
5 |
.254 |
.269 |
6 |
.218 |
.228 |
Chrzan and Halversen (2020) conducted a meta analysis across multiple commercial MaxDiff data sets where they purposefully added truly random respondents to the data sets. They found that the cutoff rules in table above led to identifying truly random respondents with near 80% accuracy. There is only about a 20% likelihood that a random responder can achieve a fit statistic better than these cutoff values. In other words, if a respondent truly is a random responder, you will be about 80% successful in identifying them for exclusion following the cutoff values in the table above. This follows the traditional statistical practice of limiting Type 1 error. Chrzan and Halversen also examined how many real responders would be classified as random responders (Type 2 error) if using the RLH thresholds in the table above, finding that fewer than 2% of real respondents would be incorrectly classified as random responders.
Technical Notes: We simulated 1000 respondents answering randomly to each questionnaire, then estimated the scores using HB: prior variance=1, d.f.=5. Both "best" and "worst" answers were simulated for each respondent (with the exception of the 2 items case). For each simulated data set, we sorted the respondents' fit statistics from highest to lowest and recorded the 95% percentile fit (where 95% of the data fell below the cutoff point). If asking only "bests," because the amount of information for each respondent is reduced, the Suggested Minimum Fit would be higher. The table above was created using standard MaxDiff, not anchored or evoked set MaxDiff. Anchored or evoked set MaxDiff would lead to different norms regarding fit for identifying "bad" respondents. If using anchored MaxDiff, you can estimate the results using HB without the anchor, so that you may use the recommendations above. If using evoked set MaxDiff, you could follow our procedure of simulating 1000 random responders and estimating HB to find the RLH cutoff value for identifying random responders.
Reference:
Chrzan, Keith and Cameron Halversen (2020), "Diagnostics for Random Respondents," 2020 Sawtooth Software European Conference.