Have an idea?

Visit Sawtooth Software Feedback to share your ideas on how we can improve our products.

Cluster analysis based on CVA utilities

Hello Guys,

I am doing my first conjoint project (pairwise CVA) and I'm trying to do hiararchical cluster analysis based on conjoint results.
I tried to run hiererchical cluster analysis on logit recoded utilities and zero-centered utilities (SSI Web - OLS - Recode Method). The problem is that I get totally different segments when based on logit in comparison to zero-centered utilities. I expected some differences, but I did not expect so much difference (caused only by different recode method: logit vs zero-centered).
Is there any "best practice" saying that for clustering one should use logit or zero-centered utilities? How can I decide which solution is better in my case?

Many thanks in advance.
asked Oct 24, 2013 by Tomek

1 Answer

+1 vote
Best answer
Logit-recoded utilities transform the dependent variable to have a logistic scale rather than linear prior to computing betas under OLS.  But, it doesn't control for the response bias, that some responders could use just a narrow part of the scale (the central ratings) while other respondents tend to jump back and forth between the ends of the scale.  

Similarly, selecting the zero-centered option for linear transformation of the dependent variable prior to running OLS doesn't control for scale-use bias.

Generally, researchers would want to use a final transformation (after computing betas) that puts all respondents on the same magnitude scale, and thus tries to reduce the effect of scale use bias.  

So, whether you use the logit or zero-centered transformation of the dependent variable, I strongly suggest that as the second step you use the Zero-Centered Diffs rescaling method to prepare data for cluster analysis.  This gives each respondent essentially an equal range of utility values.

You get zero-centered diffs by exporting the utilities from the SMRT software system (Analysis + Run Manager + Export) and selecting the "Zero-Centered Diffs" method.

(If you want to do the zero-centered diffs rescaling on your own in Excel, for each respondent, just subtract the mean utility within each attribute from each attribute level, to zero-center the utilities within attribute, and then find the multiplier for each respondent such that the sums of differences between best and worst levels of attributes across attributes is a constant for each respondent.)

That said, after you do a final rescaling to Zero-Centered Diffs, you still might observe big differences in the final cluster solutions depending on your initial transformation of the dependent variable for the OLS estimation (logit or zero-centered scale).  If that's the case, it is probably evidence that there isn't good stable segmentation occuring for the number of dimensions you've requested.

I would note that hierarchical clustering is a rather old technique.  There are more modern and generally better methods, such as Cluster Ensemble Analysis as provided by our CCEA software package.
answered Oct 24, 2013 by Bryan Orme Platinum Sawtooth Software, Inc. (171,265 points)
Thank you very much for suggestions. I tried it, but using Zero-Centered Diffs does not help much. More helpful is standarisation to z-scores and using z-scores for clustering.
I am just wondering why the clusters in both (zero-centered vs logit) are so different. The utilities for total sample are quite similar (differences below 10%).