SS Fall 1998
Deadline for Early Conference Registration: November 30, 1998!
You should have received a brochure announcing our next research conference, to be held in San Diego, CA, February 2-5, 1999. To avoid the late fee, you must register by November 30, 1998. After that date, registration will be $750, instead of $650, and the optional tutorials will cost $25 more.
We hope you've taken a look at the preliminary conference agenda, either on the Web (http://www.sawtoothsoftware.com) or in the conference brochure. This program has something for nearly everybody.
For the general research practitioner, the conference offers:
Achieving Individual-Level Predictions from CBC Data: Comparing ICE and Hierarchical Bayes
This article is adapted from a presentation given at the 1998 Advanced Research Techniques Forum by Joel Huber, Duke University, co-authored by Richard Johnson (Sawtooth Software) and Neeraj Arora (Virginia Tech).
Choice-Based Conjoint data have traditionally been analyzed in the aggregate. However, several methods have been developed recently that recognize individual differences among respondents, and permit modeling at the segment or individual level. We'll discuss three of these: Hierarchical Bayes, Latent Class and an extended application of Latent Class called ICE (Individual Choice Analysis).
Why should we worry about heterogeneity in respondent preferences? Marketers know that people are unique. Market simulators based on average preferences can result in incorrect managerial decisions. The distortions are particularly apparent in product line applications where highly similar products are expected to take share from each other. An aggregate logit model is particularly poor at reflecting differential substitution effects because it assumes that each alternative takes from all other alternatives in proportion to their market share. The three models we'll discuss have different ways of avoiding this problem.
Defining the Models
Hierarchical Bayes (HB) methods derive individual part worths by combining information on the distribution of part worths across respondents with the specific choices of the individual. The posterior distribution of the individual is estimated through a computationally-intense method called Gibbs Sampling that produces estimates of each respondent's part worths and standard errors.
Latent Class analysis (LC) assumes that respondents can be clustered into homogeneous segments, and that differences among segments account adequately for underlying differences among individuals. Segment part worths are estimated so as to maximize the likelihood of the respondent data. The probability is also estimated that each respondent belongs to each segment, so one can also estimate expected individual part worths as probability-weighted combinations of the segment part worths.
ICE is a new product from Sawtooth Software that estimates individual part worths from experimental choice data. ICE also estimates part worths as weighted combinations of segment part worths from LC. However, ICE does not constrain those weights to be positive, thereby allowing for much greater differentiation of individual values from the segments.
Strengths and Weaknesses
Hierarchical Bayes provides very flexible output. Results can be used to estimate ratios and profits. The researcher may choose among many possible population distributions. HB also reveals uncertainty about each respondent's utilities. The principal difficulties of the approach are that it requires a good deal of expertise to execute properly, current software is not very user-friendly, and run times can be very long.
Latent Class has an elegant theoretical foundation and the segment solutions are often managerially useful. One can later estimate each individual's expected part worths as a weighted combination of the various segments' part worths, where the weights are the probabilities of belonging to each segment. However, since those weights are probabilities and therefore positive, individual part worth estimates don't differ as much from one another as the segment part worths do, so they fail to capture the full richness of individual differences. LC is also vulnerable to local optima, so it is prudent to make many runs from different starting points.
ICE is pragmatic: it estimates individual part worths that best fit individual choices. It is very fast, taking only a few minutes given a LC solution as a starting point. Its main shortcomings are that it lacks the strong theoretical basis of LC and, like LC, results depend on the choice of segments and the number of segments used.
Which Models Work Better in Practice?
We examined three data sets: a simulation study (synthetic data), a laboratory study (MBAs as respondents), and a field study (actual consumers). We measured performance based on correlations with known utilities, correct prediction of first choices, and accurate prediction of aggregate market shares.
Simulation Study: An artificial data set was generated based on known individual part worths and conforming to the typical assumptions of multivariate normal distribution, with response errors having the extreme value distribution. As expected (given that the data set was produced according to its assumptions) HB performed best. Using five LC segments as a basis, ICE did nearly (90%) as well as HB. Latent Class did least well: constraining the weights hurt individual predictions.
Laboratory Study: MBA students completed 30 customized choice tasks, and twelve additional holdout choice tasks. ICE and HB performed equally well in terms of predicting individual holdout choices, and ICE performed slightly better in predicting holdout choice shares. Latent Class was less able to predict individual choices than the other two methods, but performed relatively well at predicting aggregate choice shares (though not as consistently as ICE and HB.)
Field Study: Three-hundred fifty consumers were interviewed at a mall intercept. Respondents completed 18 choice tasks, with an additional 9 holdout choices. ICE and HB performed equally well in predicting individual holdout choices and aggregate choice shares. Latent Class did not perform as well on either measure.
The good news is that it is possible to generate reliable individual part worths from choice data and that these values can be used in choice simulators, just as in classic conjoint analysis. Choice models that recognize heterogeneity through individual-level analysis strongly out-perform aggregate-level models in terms of predicting consumer choice.
The important result is that although HB is more theoretically elegant than ICE, our experience suggests that both methods work equally well in practice. Latent Class, for its part, does a poor job of predicting individual choices unless its weights are allowed to be negative, as they are with ICE.
12 Steps to a Successful Web-Based Conjoint Survey
The following article was submitted by Patrick Delana and Zach Curtis, POPULUS, Inc. We invite other users to submit case studies or other articles as well.
Recently a major communications firm commissioned us to determine demand for various configurations and pricing packages for its high speed data (HSD) services. The measurement objectives clearly called for some sort of conjoint analysis. Because there were relatively few attributes and because pricing was a key objective, we focused on either full profile conjoint (Sawtooth Software's CVA) or Choice-based Conjoint (Sawtooth Software's CBC).
The client's budget was modest; there were only four weeks until the analyzed findings were due. The stimuli were sufficiently complex that telephone interviewing would not be possible. Computer assisted self-administered interviews (CASI) were considered but there were not qualified field agencies located within all of the six geographic regions required by the client. Further the budget would not permit recruiting current Internet users for central location interviews.
The Internet was a logical means to interview people about an Internet service. So we chose to implement the study using Sawtooth Software's CVA Internet Module. It was a learning experience for us and we would like to share that learning with other Sawtooth Software users.
Here are the steps that we followed and recommend:
1. Use Sawtooth Software's CVA System Internet Module. The Windows-based system uses a template, fill-in-the-blank approach that makes questionnaire development very straightforward (easier than Ci3). We used an Internet service provider (ISP) that gave us the permissions necessary to run the interview on its server. (Some ISPs do not permit users to run programs or to collect and store data on their servers.)
2. Study a universe appropriate for a web-based survey. Despite higher claimed estimates, real web access penetration is only about 20%.
3. Obtain an appropriate sample. In our recent study, we focused on only 5% of ZIP codes; these reflected the geographic areas in which our client would soon launch its service. Survey Sampling (www.ssisamples.com) was able to provide names and phone numbers of Internet subscribers within our targeted areas.
4. Recruit and screen participants by telephone. Recruiting can be accomplished by a brief CATI survey, screening potential respondents for Internet access along with topical and security screens. A successful recruit resulted in verifying the respondent's name and obtaining e-mail addresses. Aiming for a final survey sample of 500, POPULUS obtained the names and e-mail addresses of 1,000 qualified respondents, assuming 50% cooperation for a completed interview.
5. Train interviewers to properly record an e-mail address. Even the best of interviewers are accustomed to record open-end answers for meaning rather than precise wording. Interviewers must be instructed to read back an e-mail address, character by character.
6. Send personalized e-mails within 24 hours to each recruited respondent. Otherwise people can quickly forget what they've promised to do.
7. Create a unique password allowing each respondent access to the web site. The Sawtooth Software CVA Internet module allows for the creation of passwords. Respondents use the password to begin the survey. If necessary, a respondent can leave the survey and use the password to resume the survey at a later time. However, once a respondent had completed the survey, the password is rendered inoperable, preventing repeated access to the survey site.
8. Use an e-mail package such as MailKing® (http://www.mailking.com), a software program used to send personalized e-mail messages to each respondent. Each e-mail message should contain a hyperlink to the survey web site and the respondent's unique password. Each morning, simply load the results of the previous evening's recruiting into a spreadsheet, add a password to each record, and MailKing does the rest.
9. Offer a generous incentive. In our case, respondents were informed that of those who completed the Web survey, one person from each of the six service area cells would be randomly selected to receive a check for $100. Notify winners via e-mail.
10. Assume a 50% cooperation rate. POPULUS sent out 1,057 e-mail messages along with follow-up messages. Of these, 162 (15%) were returned as undeliverable. Completed surveys were obtained from 482 respondents within two weeks of the first e-mail mailing.
11. Monitor the site regularly: daily, even hourly. It's easy to keep clients up-to-date regarding the number of completed web site interviews.
12. Download interim data frequently. Use the Sawtooth Software DOS CVA program for the conjoint analysis and any statistical package for the rest of the data. Schedule the top-line meeting with your client the day after the survey site is closed.
The project was completed on time and on budget. The client's only concern was a reluctance to utilize the CVA simulator. "Wouldn't it be easier for you just to do the simulations for us?" he asked. "Just give us fifteen minutes," we answered. The program and data files were zipped up, appended to an e-mail, and installed five minutes later. After less than ten minutes of instruction in the simulator's use, we heard what we have heard many times before: "Wow, this is really neat. Thanks very much!"
Reducing the Number-of-Attribute-Levels Effect in ACA with Optimal Weighting
This article is taken from a more complete technical paper available for downloading from our Technical Papers library.
Dick Wittink was the first to document the number-of-attribute-levels (NOL) effect in conjoint analysis. He found that the number of levels on which an attribute was defined had a direct impact on the resulting attribute importance. One could increase the apparent importance of an attribute simply by adding more levels!
The NOL effect occurs in varying degrees in all conjoint methods, and even can play a role in self-explicated approaches. Both psychological and algorithmic explanations have been proposed to explain the effect. This paper will demonstrate that the optimal weighting option in ACA Version 4 for combining Priors and Pairs utilities significantly reduces the NOL effect relative to Version 3's equal weighting.
How Utilities Are Calculated in ACA
Before examining the NOL effect and optimal weighting in ACA, it may be helpful to review how ACA utilities are determined.
ACA is an adaptive hybrid conjoint model combining self-explicated evaluations with paired conjoint comparisons. The self-explicated half of the model is referred to as the Priors. The paired comparison conjoint section is referred to as the Pairs.
ACA computes utilities using Ordinary Least Squares (OLS) regression. The Priors contribute as many cases (rows) to the design matrix as levels in the study. The Pairs section contributes as many cases as pairs questions.
Criticism Leads to Innovation
In an article published in the Journal of Marketing Research, Green et al. (1991) criticized ACA Version 3 for combining the information from the Priors and Pairs in a single OLS matrix operation. They argued that the coded dependent variables in the Priors and Pairs were not necessarily congruent.
Johnson (ACA, 1987) had developed these coding procedures based on many years of experience and experiments on combining self-explicated ratings with paired comparison conjoint evaluations. The response scales and the coded dependent variables were determined empirically to work well in practice. Following Green's criticism, Johnson released a new version of ACA (Version 4) in 1993 which provided an option for Optimal Weighting.
Under optimal weighting, utilities are calculated independently (at the individual level) for the Priors and the Pairs (ridge regression is used to stabilize the Pairs utilities). Subsequent calibration concepts are rated on a 100-pt purchase likelihood scale. These additional observations are used to determine the relative weights that should be applied to the Priors and Pairs utilities, using a simple linear model:
y = a + bX1 + cX2 where, y = The logit transform of the calibration concept rating a = Intercept b = Weight for the Priors utilities X1= Utility of concept as predicted by Priors utilities c = Weight for the Pairs utilities X2= Utility of concept as predicted by Pairs utilitiesThe scaling of the dependent variable and coding of the independent variables are not required to be congruent across the two halves of the design, since the utilities are calculated independently for each exercise.
Optimal Weighting and the Number of Levels Effect
A paper (Wittink et al. 1997) given at our 1997 Conference suggested to us that the NOL effect in ACA may be in part due to an incompatibility in the way respondents use the scale in the Pairs and Priors sections. We examined two data sets to test this hypothesis.
The first data set was a commercial study which included 336 respondents and 20 attributes, varying in number of levels from 2 to 5. The data were collected under Version 4 of ACA, but equal weighted utilities (Version 3 method) were also accessible in ACA's audit trail file, and these are labeled as Version 3. Importances were calculated at the individual level for the 20 attributes, for both Version 4 and Version 3 utilities. (Attributes that have two levels are represented on the graph as "2," three levels as "3," etc.)The 45-degree line reflects where the data points should lie if the two methods for calculating utilities resulted in the same attribute importance. If a data point lies above the line, the importance from Version 3 exceeds the importance from Version 4's method.
All of the 5-level attributes lie above the line, and all 2-level attributes fall below the line, strongly suggesting that the Version 4 method reduces the NOL effect. The 2-level attributes are significantly less importantunder Version 3 than Version 4.
Under Version 3, the average importance for 2-level attributes is 12% less than the corresponding Version 4 importances, with an average t-value for the mean difference of 8.1 (p<0.001). Under Version 3, the average importance for a 5-level attribute is 6% higher than the corresponding optimally-weighted importances, with an average t-value for the mean difference of 6.3 (p<0.001).
The Version 3 utilities display a pattern consistent with the NOL effect. Attributes with more levels are biased to receive greater importance relative to the Version 4 result.
The second data set was an experimental study conducted among 80 MBAs in 1997. This design was considerably smaller in scope than Study #1. Only 9 attributes were included, each having either 2 or 3 levels. The differences in importances were not as large for this data set as the previous. Only 2- and 3-level attributes were measured, so there was less potential bias from the NOL effect. Even so, the deviations were all in the expected direction. The 2-level attributes on average were 8% lower and the 3-level attributes were 3% higher with the Version 3 approach versus Version 4.
The optimal weighting option in Version 4 reduces the NOL effect relative to the equal weighting approach of Version 3. It is important to note that the principal reason the optimal weighting method reduces the NOL effect is not due to the customized differential weights for pairs and priors, but due to estimating utilities independently within those separate components prior to combining the information.
One cannot argue that we can completely control the NOL effect with ACA. Other factors contributing to the effect undoubtedly remain. We have seen, however, that even equal-weighted ACA is less susceptible to the number of levels effect than traditional full profile methods (Wittink et al. 1991).
The optimal weighting method appears to have been a nice addition to ACA. It probably deserves more credit than it has been given. We at Sawtooth Software have believed for some time that optimal weighting provided modest improvements to ACA utilities relative to equal weighting. We now recognize that this method of calculating ACA utilities plays a significant role in reducing the NOL effect, and recommend that ACA users use optimal weighting, especially when the number of levels varies across attributes in the design.
Green, Paul, Abba M. Krieger, and Monj K. Agarwal (1991), "Adaptive
Conjoint Analysis: Some Caveats and Suggestions", Journal of Marketing
Research, (May), 215-22.
Johnson, Richard M. (1987), "Adaptive Conjoint Analysis," Sawtooth Software
Conference Proceedings, Ketchum, ID: Sawtooth Software, 253-65.
Wittink, Dick R., Joel Huber, John A. Fiedler, and Richard L. Miller
(1991), "Attribute Level Effects in Conjoint Revisited: ACA versus Full
Profile," Advanced Research Techniques Forum, Proceedings of Second
Conference, Rene Mora (ed.) Chicago: AMA, 51-61.
Wittink, Dick R., William G. McLauchlan, and P.B. Seetharaman (1997),
"Solving the Number-of -Attribute-Levels Problem in Conjoint Analysis,"
Sawtooth Software Conference Proceedings, Sequim, WA: Sawtooth Software,
Johnson, Richard M. (1987), "Adaptive Conjoint Analysis," Sawtooth Software Conference Proceedings, Ketchum, ID: Sawtooth Software, 253-65.
Wittink, Dick R., Joel Huber, John A. Fiedler, and Richard L. Miller (1991), "Attribute Level Effects in Conjoint Revisited: ACA versus Full Profile," Advanced Research Techniques Forum, Proceedings of Second Conference, Rene Mora (ed.) Chicago: AMA, 51-61.
Wittink, Dick R., William G. McLauchlan, and P.B. Seetharaman (1997), "Solving the Number-of -Attribute-Levels Problem in Conjoint Analysis," Sawtooth Software Conference Proceedings, Sequim, WA: Sawtooth Software, 227-40.Go back to Index
Ci3 v2.5 Upgrade Forthcoming
Among other things, our staff has been busy preparing an upgrade for the Ci3 System. Version 2.5 will remedy some shortcomings of the current Ci3 system, and will offer a number of very useful enhancements.
A new editor will let you work with very large questionnaire files (no 64K limitation). In the Export Specifications table, you'll be able to toggle all Select questions between Present/Absent and Ordinal (reduces potential carpal tunnel syndrome). The Print Questionnaire option (absent in v2) is again part of the product. Version 2.5 will also support .jpg graphics files (much more compact than .bmp files). A new "Data Doctor" utility will scan and fix many corruptions that can occur in Ci3 data files, such as those due to media read/write errors.
We've also significantly increased the flexibility of Ci3 arithmetic and logical instructions, including multiple operations per line, AND/OR statements and string operations. Here's an example of a line of legitimate v2.5 Ci3 code:
IF ((Q1 = 1) AND ((Q3 <> 3) OR (Q2 > 2))) X = (Q4 + Q5 + Q6) / 3We'll send out an official announcement and upgrade form (including pricing) when v2.5 is complete.
On Interaction Effects and CBC Designs
It is easy and automatic to get efficient designs using the CBC System. As long as you do not specify any prohibitions, CBC's randomized designs are near-orthogonal, providing near-optimal efficiency for measuring main effects. However, CBC version 1's design strategies (Complete Enumeration and Shortcut Method) are often not as efficient as they might be for measuring interactions.
CBC's design strategies include the criterion of Minimal Overlap (each level is shown as few times as possible within a choice task.) Minimal level overlap within choice tasks is optimal for measuring main effects, but not optimal for measuring interactions. To illustrate this point, consider two attributes each with three levels in a minimal overlap design with three concepts per task. Each level, therefore, is shown exactly once per task. The following matrix represents the possible combinations of two variables that define a hypothetical league of nine volleyball teams.
For our minimal overlap design, Team #1 (Seattle, Men) can only be shown in a CBC task versus teams defined by cells not in the same row or column (teams 5, 6, 8, and 9). (Don't ask us to explain the rules for three-way volleyball matches!) Why is this problematic for measuring two-way interactions? If we wanted to judge how good each team was in our hypothetical league, we'd like to arrange for each to play all other teams to maximize our ability to declare a winner.
Only being able to match each team against four of the eight other teams limits our ability to learn how good each team is. For measuring interactions between Gender and City, it seems particularly useful to have teams in the same rows and columns play one another. For example, we'd like to have the Seattle women play the Chicago women. Not being able to directly compare products in the same row or column hinders our ability to most efficiently measure the interactions. But adding overlap comes at a price: it weakens the precision of main effects estimates.
Based on this observation, we have created two new randomized design strategies that will be available in the upcoming release of CBC for Windows: Random and Balanced Overlap. The Random method is as its name implies, simple random selection (with replacement). Balanced Overlap is a controlled middling position between Random (large amount of possible level overlap) and Complete Enumeration (minimum possible level overlap). Both of these new methods permit levels to be displayed more than once within the same choice task.
We created a synthetic data set to investigate the tradeoff between precision of main effects versus interaction terms under these different design strategies. (The shortcut method is so similar to the complete enumeration with respect to this issue that we've omitted it from the discussion.)
We used a design with three attributes, each with three levels. There were 500 respondents, 20 tasks each, 3 concepts per task, and no None alternative. We developed known utilities for both main effects and interactions. The main effects utilities were (1, 0, -1) representing the three levels of each of the three attributes. Additionally, we specified a two-way interaction effect between attributes 1 and 3. The interaction effects were one-fourth the size of the main effects, with values of (0.25, 0.00, -0.25). These were applied so that the sum of the interaction effects across rows and columns was 0. We added a relatively large random normal component (Z score times 3) to the utility sums before simulating respondent answers. The data below are the average of ten replicates of the synthetic data set using different random numbers and random designs.
The following table shows the standard errors and average t-values (among known non-zero effects) for main effects:
As expected, the minimal overlap design (complete enumeration) has the highest precision for main effects estimates. The t-value is the effect (utility) divided by the standard error, and can be taken as a signal-to-noise ratio. For main effects, the signal-to-noise ratio is 14% lower (1-23.0/26.8) for random, and 6% lower for the balanced overlap approach relative to complete enumeration.
Below is the same information for interaction terms:
The random approach achieves the greatest precision for interaction effects, followed closely by balanced overlap. The signal-to-noise ratio is 32% and 22% higher for the random and balanced overlap methods, respectively, relative to complete enumeration.
It is worth noting that the performance of the different design methods for estimation of main effects and interactions will vary depending on the number of attributes, levels, concepts per task and amount of variation in the data. The figures we've presented represent one such case. Even so, the findings should generalize to other cases.
These findings suggest that one should include at least some degree of overlap in the CBC design when interaction terms are of particular interest. Overlap for an attribute can be added to a design by simply using more concepts than attribute levels in tasks. Our example above represents a worst case scenario for estimating interaction effects under minimal overlap design strategies. We expect that minimal overlap strategies may be about as effective as the random approach for estimating interactions between attributes which have fewer levels than concepts per task, though we didn't investigate this specifically.
In summary, we suggest using complete enumeration (or its sister shortcut method) for main-effects only designs. If detecting and measuring interactions is the primary goal, then the random approach is favored. If the goal is to estimate both main effects and interactions efficiently, then overlap should be built into the design, at least for the attributes involved in the interaction. Using more concepts than attribute levels with complete enumeration, or utilizing the compromise balanced overlap approach would seem to be good alternatives.
More details on these two new design methods will be available in the forthcoming CBC for Windows documentation.
© 2013 Sawtooth Software, Inc. All rights reserved.