Design status

Introduction

A design is the combination of tasks and concepts that respondents see in a CBC exercise. To reduce order bias and improve utility estimation, Discover generates multiple versions of a design so respondents receive different sets of questions. For example, a 30-version design produces 30 unique sets of tasks assigned to respondents in rotation. Some respondents may see the same version, but strong balance is maintained both within and across versions.

A valid design is required before you can publish your survey. You can view and generate the design from the Design tab of each CBC exercise.

A successfully generated design status is shown

Design statuses

  • Successfully generated: The design is ready for data collection.
  • Warning: The design is usable but may result in poor utility estimation. Notes are included with suggestions for improvement.
  • Unusable: The design cannot produce reliable utilities. Address the listed issues and regenerate.
  • Failed: The design could not be generated, typically due to too many prohibitions. Try generating again — if it continues to fail, contact support.
  • Out of date: The design no longer matches the current exercise settings. Regenerate to reflect the latest changes.

Generating the design

Click Generate in the Design tab. By default, Discover spends approximately 3 seconds per version — with the default of 30 versions, generation and testing take about 90 seconds.

Note that test surveys don't require a pre-generated design. If no design has been generated, Discover creates one on the fly when the test respondent reaches the CBC questions. If a design has been generated, the test pulls from those versions.

How the designer works

The designer distributes attribute levels across tasks with three goals:

  • Each level appears the same number of times.
  • Levels from different attributes appear together equally often.
  • Tasks and concepts are shuffled to balance level positions and avoid repeats in back-to-back tasks.

A perfect balance isn't always achievable, but near-perfect designs still perform well and yield excellent results.

Generation settings

Click the settings icon next to Generate for additional options.

Estimated percent answering “none”

Available when the exercise includes a "none" option. Enter the percentage of responses you expect to be "none." Because "none" responses provide little information about attribute preferences, a higher estimated percentage results in lower precision and higher recommended sample sizes. Defaults to 15%.

Design generation duration

Controls how long the designer searches for an efficient design. Design quality is evaluated across several factors, including one-way and two-way level balance, level overlap, and positional balance within and across versions. The designer works in stages of multiple passes, relabeling or swapping attribute levels to improve quality. If improvement falls below a breakout threshold for a stage, the designer stops.

Three duration options are available:

  • Default: Completes multiple stages of 20 passes, stopping when improvement falls below .01 (1%). Near-optimal for small to modest exercises without prohibitions.
  • Extended: Each stage performs 100 passes, with a .001 (0.1%) breakout threshold. Can produce designs 1–3% more efficient than default for complex studies or those with prohibitions.
  • Maximum: Each stage performs 250 passes, with a 0.00001 (0.001%) breakout threshold. Similar improvements to extended but with more opportunity for refinement.

Design seed

Sets the random starting point used to initialize the design algorithm. Changing the seed produces a different design, though overall quality typically remains similar. Defaults to 1.

 

Design testing

After versions are generated, Discover automatically tests the design to evaluate whether the exercise will provide enough information to estimate utilities with adequate precision.

Design report

Once testing is complete, a quality report is created, including a recommended sample size range and assessment of design quality.

Discover recommends a sample size range based on standard errors from pooled logit (MNL) analysis of simulated respondent data. The lower end represents the minimum needed for all levels (excluding the “none” option) to have standard errors of 0.05 or lower. The upper end represents the preferred sample size, where all standard errors are 0.03 or lower.

Several formulas exist in the literature for recommending sample sizes. In our experience, standard error guidelines of 0.05 to 0.03 for main effects work well as a practical rule of thumb — attributed to Bryan Orme at Sawtooth — because they account for design strength, sample size, and the number of tasks each respondent completes.

If your study aims to identify differences between segments rather than the overall population, aim for around 200 or more respondents per group.

1-way level metrics

How often each level appears across all versions, as well as the average frequency per version. For accurate individual-level HB estimation, each level should appear about 6 times per respondent. A warning is triggered if the average falls below 4.

1 Way Level Metrics table

Standard errors

The precision of utility estimates at the recommended sample size. You can enter a custom sample size to test standard errors using the settings icon at the top right of the table.

Standard errors should be:

  • Roughly equivalent within each attribute. 
  • 0.05 or lower for main effects (attributes taken one at a time).
  • 0.1 or lower for nested attribute levels.
Standard errors table

2-way frequency balance

How often each level of one attribute appears with each level of another within the same concept. Along the diagonal, one-way frequencies are repeated. With no prohibitions, two-way frequencies should be fairly balanced. With prohibitions, a perfect balance between one-way and two-way frequencies is not achievable.

2 Way Frequency Balance table

1-way frequencies per version

How often each attribute level appears within each individual version, confirming balance within versions.

1 Way Frequencies Per Version table

What makes a good design

Discover tests your design and verifies that utility effects will be estimable before allowing you to field. That said, some designs are more efficient than others. Following these principles will produce a good design:

  • A reasonable number of attributes and levels per attribute.
  • A reasonable number of tasks per respondent.
  • A reasonable number of concepts per task.
  • No or very few prohibitions between attributes.

Discover warns you when a design is starting to degrade and blocks fielding only when a design is deficient. Warnings and errors most commonly arise from too many prohibitions or too few tasks. Reducing prohibitions or increasing tasks per respondent resolves most issues.