Lighthouse Studio

Testing the CBC Design

Why to Use Test Design

Testing your CBC Design prior to fielding your study enables you to get a good feel for whether your CBC Questionnaire, together with your planned sample size, will obtain enough data to produce adequate precision when estimating Utilities to represent your target population.

How to use Test Design

To Test the Design:

1.Open the CBC Exercise Settings window

2.Select the Design tab

3.Click Test Design... this opens the CBC Test Design dialog

4.Check the Advanced Test Checkbox

5.Specify how many final respondents you plan to use in analysis (after any data cleaning).

6.Click Test

Understanding Test Design Results

As you interpret the resulting Test Design report, here are some rules of thumb regarding CBC questionnaire and sample size:

Each attribute level should appear approximately an equal number of times (within the same attribute). (Deviations in raw occurrences of levels of more than about 10% could be cause for concern, to be checked by the more thorough rule of thumb directly below...)

The standard errors for each attribute level should be about 0.05 or less

We recommend you also keep the following rule of thumb in mind: for good HB estimation (the default utility estimation approach) each attribute level should appear about 6x for each respondent (across each respondent's Choice Tasks times Concepts per Task).  The Test Design report doesn't report on this; but you can figure it out quite easily by thinking about the attribute with the largest number of levels and considering how many (Tasks x Concepts per Task) each respondent receives.

For example, if you create a survey with 12 Choice Tasks, and specify 3 Concepts will be displayed for the Respondent with each Task, you'll get a total of 12 x 3 = 36 Choice Tasks displayed.  If your Attribute with the most Levels has 6 levels, each of these levels would be shown, on average, 6 times (36/6 = 6) meaning you could expect good HB Estimation.  If your Attribute with the most levels contained 8 levels, given 12 Choice Tasks and 3 Concepts per Task, you could expect each Level to be displayed 4.5 times (36/8 = 4.5) meaning to get good HB Estimations you should either increase the number of Choice Tasks, or increase the number of Concepts displayed per Task.

 


The CBC Experimental Design

 

In CBC, a design refers to the attribute and level combinations shown to respondents.  The design is saved to a design file that you upload to the server.  Optimally efficient CBC designs can estimate all part-worths with optimal precision; meaning that the standard errors of the estimates are as small as possible, given the total observations (respondents x tasks).

 

CBC's "random" design strategies (Balanced Overlap, Complete Enumeration, and Shortcut) are not really "random," but are carefully controlled for level balance and independence of the attributes and generally result in very efficient designs.  These designs are not optimally efficient, but are nearly so.  However, there are conditions that can result in inefficient designs (e.g., those involving extreme prohibitions).  Sometimes, a design can be so inefficient as to defy all attempts to compute reasonable part-worth utilities and the Test Design report will give you warnings if that's the case.  We have heard of entire data sets with hundreds of respondents going to waste because the user neglected to test the design and pay attention to warning messages.

 

Therefore, it is imperative to test your design whenever any of the following conditions exist:

 

any prohibitions are included

sample size (respondents x tasks) is abnormally small (typical studies have hundreds of respondents and 8-15 choice tasks per respondent)

 

The Test Design option simulates respondents with your requested sample size completing the CBC tasks and reports the standard errors (from a logit run) along with D-efficiency.  

 

Our Test Design approach assumes aggregate logit analysis and no prior information about respondent utilities (utilities of zero), though most CBC users eventually employ individual-level estimation via HB.  That said, CBC's design strategies can produce designs that are efficient at both the aggregate and individual levels.  When you are planning HB analysis (the default approach) we recommend that you include enough choice tasks and concepts for each respondent such that each attribute level appears at least 6x for each respondent.  (The Test Design report doesn't address this, but you can easily do the math in your head to figure out how many times each attribute level is expected to be shown to each respondent.)

 

Prohibitions are often the culprit when it comes to unacceptable design efficiency.  If this is the case, try reducing the number of prohibitions.  If your study seems to require what look like prohibitions, consider whether alternative-specific designs or conditional pricing will accomplish the study without traditional prohibitions and allow you to field an efficient experimental design.

 


Testing the Efficiency of Your Design

 

When you choose Test Design... from the CBC exercise Design tab, CBC automatically tests the design and displays the results within the results window.  We recommend you use the default Advanced Test option so that CBC automatically generates simulated respondents (default n=300) appropriate for advanced design testing.

 

Note: if you plan to collect fewer than 300 respondents, make sure to change the default from 300 respondents to the number of respondents applicable to your study!

 


The Frequency Test

 

When you generate a design or when you run Test Design, a preliminary Frequency report is shown.  When Generating a Design this is the only data included in the results.  When running Test Design it is the first of two tabs.

 

CBC Design Test

 

 

 

 

 

 

 

 

 

 

Date/Time


2/6/2023 12:11 PM

 

 

 

 

 

 

 

Exercise Name


CBCgolfexercise

 

 

 

 

 

 

 

None Alternative


Traditional

 

 

 

 

 

 

 

Random Seed


1

 

 

 

 

 

 

 

 

Generation Method


Balanced Overlap

 

 

 

 

 

 

 

Version Count


300

 

 

 

 

 

 

 

 

Tasks per version


12

 

 

 

 

 

 

 

 

Concepts per task


3.00

 

 

 

 

 

 

 

 

 

 

One-way Frequency Balance

 

 

 

 

 

 

 

 

 

Attribute

Level

Frequency

Label

 

 

 

 

 

 

 

 

1

1

2699

High-Flyer Pro, by Smith and Forester

 

 

 

 

 

1

2

2700

Magnum Force, by Durango

 

 

 

 

 

 

1

3

2700

Eclipse+, by Golfers, Inc.

 

 

 

 

 

 

1

4

2701

Long Shot, by Performance Plus

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

2

1

3600

Drives 5 yards farther than the average ball

 

 

 

 

2

2

3600

Drives 10 yards farther than the average ball

 

 

 

 

2

3

3600

Drives 15 yards farther than the average ball

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

3

1

2699

$4.99 for package of 3 balls

 

 

 

 

 

 

3

2

2701

$6.99 for package of 3 balls

 

 

 

 

 

 

3

3

2700

$8.99 for package of 3 balls

 

 

 

 

 

 

3

4

2700

$10.99 for package of 3 balls

 

 

 

 

 

 

 

In the One-Way Frequency area of the report, the number of times each level occurs across versions of the design is counted and provided under the column titled "Frequency".  Optimally efficient designs show levels within each attribute an equal number of times.  Designs do not have to have perfect balance to be quite acceptable in practice (variation up to 10% in terms of frequencies within the same attribute could be acceptable in practice).

 

Two-Way Frequency Balance

 

 

 

 

 

 

 

 

 

Att/Level

1/1

1/2

1/3

1/4

2/1

2/2

2/3

3/1

3/2

3/3

3/4

1/1

2699

 

 

 

 

 

 

 

 

 

 

1/2

0

2700

 

 

 

 

 

 

 

 

 

1/3

0

0

2700

 

 

 

 

 

 

 

 

1/4

0

0

0

2701

 

 

 

 

 

 

 

2/1

899

901

899

901

3600

 

 

 

 

 

 

2/2

900

899

901

900

0

3600

 

 

 

 

 

2/3

900

900

900

900

0

0

3600

 

 

 

 

3/1

675

673

675

676

899

901

899

2699

 

 

 

3/2

676

675

674

676

900

901

900

0

2701

 

 

3/3

674

676

675

675

900

899

901

0

0

2700

 

3/4

674

676

676

674

901

899

900

0

0

0

2700

 

 

 

 

 

 

 

 

 

 

 

 

This counting test reports how balanced the design is in terms of frequencies.

 

 

 

 

To assess how precisely this design can estimate utilities given your expected sample size we recommend you refer to the Logit Efficiencies Test.

 

We also report the Two-Way Frequencies describing how often each level of each attribute appears with each level of each different attribute within the same concept.  Along the main diagonal (in bold) we repeat the one-way frequencies.  If there are no prohibitions, the frequencies of joint level occurrences (between any two attributes) should be close to balanced.  As with one-way frequencies, the two-way frequencies between any two attributes don't have to be perfectly balanced to be quite acceptable in practice.

 

If you have specified prohibitions, then this presents a conflict between one-way and two-way frequency balance.  You cannot achieve perfect one-way and two-way frequency balance between two attributes when a prohibition is involved.

 

You should run Test Design (described below) to assess more accurately the precision of your design given your expected sample size.  This is especially true if you have specified any prohibitions in your design, as they can have adverse effects on the precision of utility estimates for your attribute levels.

 


The Aggregate Logit (Standard Errors and D-Efficiency) Test

 

When you run Test Design, the results are on two tabs.  The first tab (shown above) is the Frequencies tab.  The second is Standard Errors.

 

On the Standard Errors tab of your Advanced Test, we report the precision of the utilities for your attribute levels under aggregate logit estimation, given your experimental design and expected sample size.  Test Design is useful for both standard and complex designs that include interactions or alternative-specific effects.  It also reports a widely accepted measure of design efficiency called D-efficiency, which summarizes the overall relative precision of the design.

 

Test Design assumes an aggregate logit (MNL) run using your CBC design and the number of respondents you requested.  It reports the standard errors (the precision) for your attributes and levels (by default, only main effects are considered, though you can use the dialog to specify interaction effects).  Sample results are shown below:

 

Logit Efficiencies

 

 

Using main effects only

 

Respondent Count

300

 

 

 

 

 

Attribute

Level

Std. Error

Label

1

1

0.03477

High-Flyer Pro, by Smith and Forester

1

2

0.03466

Magnum Force, by Durango

1

3

0.03474

Eclipse+, by Golfers, Inc.

1

4

0.03473

Long Shot, by Performance Plus

 

 

 

 

2

1

0.02766

Drives 5 yards farther than the average ball

2

2

0.02776

Drives 10 yards farther than the average ball

2

3

0.02767

Drives 15 yards farther than the average ball

 

 

 

 

3

1

0.03486

$4.99 for package of 3 balls

3

2

0.03469

$6.99 for package of 3 balls

3

3

0.03467

$8.99 for package of 3 balls

3

4

0.03469

$10.99 for package of 3 balls

 

 

 

 

None

 

0.04668

 

 

 

 

 

A general guideline is to achieve standard errors of 0.05 or smaller for main effect utilities and 0.10 or smaller for interaction effects or alternative-specific effects.

 

 

 

 

The strength of design for this model is 1097.10152421289.

(The ratio of strengths of design for two designs reflects the D-Efficiency of one design relative to the other.)

 

If the standard error for any level is reported as a series of asterisks ********* this indicates that the utility effect is not estimable and you should not field this study. The problem is usually due to prohibitions you have specified, and you should change the prohibitions so that your design can estimate the utilities with an adequate degree of precision.

 

The beginning of the report shows that the analysis is using 300 simulated respondents.

 

Next, the standard errors from the logit report based on the simulated n=300 responses are shown, reflecting the precision we obtain for each estimated utility.  Lower error means greater precision and we like to see standard errors of about 0.05 or less for each attribute level.  This design included no prohibitions, so the standard errors are quite uniform within each attribute.  If we had included prohibitions, some levels might have been estimated with much lower precision than others within the same attribute.

 

For our simulated data above, the levels within the three-level attribute all have standard errors around 0.028 and for four-level attributes around 0.035.  We have obtained less precision for the four-level attributes, since each level appears fewer times in the design than for the three-level attributes.  The standard error of the None utility is shown, but you may ignore it and not worry that it achieves any specific precision.

 

Suggested guidelines are:

 

Standard errors within each attribute should be roughly equivalent

Standard errors for main effects should be no larger than about 0.05

Standard errors for interaction effects should be no larger than about 0.10

Standard errors for alternative-specific effects (an advanced type of design) should be no larger than about 0.10

 

These criteria are rules of thumb based on our experience with many different data sets and our opinions regarding minimum sample sizes and minimum acceptable precision.  Ideally, we prefer standard errors from this test of less than 0.025 and 0.05 for main effects and interaction effects, respectively.  These simulated data (300 respondents with 12 tasks each) almost meet that higher standard for this particular attribute list and set of effects.

 

Additional Details on Standard Errors Report

 

The estimated standard errors assume no prior information (no prior knowledge of utilities), which is the usual and safest approach in creating CBC designs. Technically, the utility balance among the concepts within tasks affects overall design efficiency, and thus respondents' preferences need to be known to assess the efficiency of a design.  However, most researchers prefer to create designs that are efficient with respect to uninformative (zero) part-worth utility values, and that is the approach we take.

 

Test Design simulates respondents interacting with your questionnaire, for as many respondents as you plan to interview.  The test is run with respect to a given model specification (main effects plus any optional first-order interactions that you can specify).  

 

To perform Test Design, you need to supply some information:

 

Number of Respondents

% None (if applicable to your questionnaire)

Included Interaction Effects (if any; by default we assume none)

 

With this information, CBC simulates choice data for your questionnaire (assuming prior utilities of zero for the attribute levels and a prior utility for the None that targets the % None you requested).

 

Simulated respondents are assigned to the versions of your questionnaire (the first respondent receives the first version, the second respondent the second version, etc.).  If you are simulating more respondents than versions of the questionnaire, once all versions have been assigned, the next respondent starts again with the first version.

 

Note for testing design efficiency that we report only the standard errors of the estimates.  Their utility values (the effects) are zero and therefore are not of interest.  Details regarding the logit report may be found in the section entitled Estimating Utilities with Logit.

 

Additional detail on D-Efficiency and Efficiency for Specific Parameters is available in the next two topics.

Created with Help & Manual 8 and styled with Premium Pack Version 4 © by EC Software