# Analysis on Utilities

Can we do any statistical analysis (e.g. cluster / Regression) on the respondent wise utility data?
Any specific rules to follow for the same.
Do share details on the same
Thanks!

Most important thing to remember is that the utilities should be normalized per respondent.  That gives each respondent essentially the same "scale factor" or magnitude of the parameters.

There are multiple ways to do this.  You can export the utilities from SMRT, through Run Manager + Export (and then choose the "Zero-centered Diffs" option to export the utilities.   Zero-centered diffs subtracts the mean utility within each attribute, causing each attribute to have utilities that sum to zero (this really should automatically occur in most conjoint analysis approaches, such as CBC).  Then, Zero-centered diffs scaling procedure multiplies all the zero-centered utilities by the constant for each individual such that the sum of the differences between min and max utilities within each attribute, across attributes, averages 100 per attribute.

There are other ways to normalize the parameters as well to give each respondent the same scaling.  For example, one could find the multiplicative factor that gives each respondent the same standard deviation across all utilities.
answered Sep 24, 2013 by Platinum (169,815 points)
Thanks Bryan
+1 vote
Another thing to remember is that, because they sum to zero,  utilities for attributes with part worth coding will create ill conditioned matrices for some analyses (e.g. regression, factor analysis).  So if you have a 5 level attribute you'll need to drop one of its levels out of your regression or factor analysis to get the analysis to run.

This won't be a problem for cluster analysis, which isn't a multivariate technique and which therefore isn't affected in this way by ill conditioning.
answered Sep 24, 2013 by Platinum (84,975 points)
Oh Keith I think It can be a problem for cluster analysis. But the problem is not technical in its nature. Cluster analysis results depend on a variance of the source variables. Now lets say that we have 3 sources of variance form 3 attributes (say 2,3,4 levels accordingly). If we include all partworths of those variables, 50% of the variance from attribute 1 is spurious and only 25% from the 3rd attribute ... And cluster analysis goes where the variance is not where genuine information lays. And consider the fact that linearly coded attribute doesn't have any spurious information - just one utility. So I always segment using k-1 utilities.  Maybe I'm wrong with that, so correct me if you think so.
You're right, the choice between using part worth and linear coding of attributes can affect cluster analysis and is part of a more general topic of weighting, implicit in this case, explicit in others.  That's a different issue than ill-conditioning, but an important one nonetheless.