Segment Finder: Finding Needs-Based Segments in CBC and MaxDiff Data

Podcast

This episode, originally a webinar, focuses on 'Segment Finder,' a tool designed to identify needs-based segments in CBC (Choice-Based Conjoint) and MaxDiff data. Hosted by Brian Orm and Brian McEwen from Sawtooth Software, the episode covers the importance of market segmentation and the drawbacks of relying solely on mean values. It explains the historical and modern methods of segmentation, including the limitations of traditional rating scales and the advantages of MaxDiff and CBC analyzed through latent class multinomial logistic regression. Practical suggestions for conducting effective segmentation and cleaning data are also discussed. The episode includes a live demonstration of the Segment Finder tool in action, showcasing how it effortlessly segments respondents based on their latent preferences and aids in meaningful data analysis.

About Our Guest(s)

Bryan Orme, CEO & President, Sawtooth

Bryan is the recipient of the American Marketing Association’s 2017 Charles Coolidge Parlin Award, an honor reserved for those who “have demonstrated outstanding leadership and sustained impact on advancing the evolving profession of marketing research over an extended period of time.” He joins a prestigious list of past recipients of the Parlin award that include Philip Kotler, Michael Porter, Paul Green, and George Gallup.

Bryan received his BA (1991) from Brigham Young University and an MBA from the University of Texas at Austin (1993). Prior to joining Sawtooth Software in 1995, Bryan worked in the marketing sciences department at IntelliQuest, Inc.

Bryan has served as committee chair of the American Marketing Association’s Advanced Research Techniques Forum, and has been the long-time chair of Sawtooth Software’s Research Conference. As an accomplished instructor, Bryan has taught over 70 multi-day courses involving conjoint analysis, MaxDiff, survey research, and market segmentation across North and South America, Europe, Southeast Asia and Australia.

Bryan has published over one-hundred articles and white papers on conjoint analysis and related methods and received the David K. Hardin award for the best paper published in Marketing Research during 2004. He also authored the book Getting Started with Conjoint Analysis (now in its 4th edition) and co-authored the books Becoming an Expert in Conjoint Analysis and Applied MaxDiff. He has also served as an ad hoc reviewer for the Journal of Marketing Research.

Brian McEwan, Senior Advisor & Product Manager, Sawtooth

Like many folks, I kind of fell into marketing research and analytics. I had always liked consumer behavior, but never really enjoyed math until seeing the practical applications of it. My favorite part of my job is taking complex things (experimental design, regression analysis, models and predictions) and helping people get to a conceptual understanding of what's going on.

Teaching is my passion, and I'm always open to a guest lecture, student group presentation, or lunch and learn for your team on Conjoint Analysis and MaxDiff.

Bryan Orme

Brian McEwan

Transcript

Automatically transcribed

Vanessa: Quick disclaimer before we start, this episode was originally recorded as a webinar and edited for podcast format. You can find the original webinar recording on our website at sawtooth.com or on our YouTube channel.

Now back to the show.

Justin Luster: Good morning, everyone. Good afternoon, good evening, wherever you are in the world. We're grateful to have you join us This one's entitled Segment Finder, finding Needs-Based Segments in CBC and MaxDiff Data. And we're privileged to be joined today by the two Brians, Brian Orme and Brian McEwan from [00:01:00] Sawtooth Software.

Brian is the CEO and president of Sawtooth Software. He's been doing research for over 33 years. He's the author of two books on congen analysis and has written.

A hundred plus white papers on conjoint and MaxDiff I would say he's probably the world expert on conjoint. Uh, he loves travel and scuba diving. So that's Bryan with a y, Bryan Orme. And then we also have Brian with the I, Brian McEwan. Uh, Brian is the senior advisor and product manager here at Sawtooth.

He was the former managing director out in uh, Sawtooth UK office. And he joined Sawtooth back in 2008 and has a lot of experience doing, uh, customer support and teaching people about Conjoint and MaxDiff and how to use our products. I'm grateful to work with both of these guys. They're great coworkers.

And with that, um, Brian [00:02:00] Orme, if turn the time over to you.

Brian Orme: Great. Thank you for the introduction. When we talk about market segmentation, you typically have to ju, you just have to start with the joke that you would hear from your professor in Stats 101 back at the university, the one about the statistician who drowned crossing a river with an average depth of one half meter of water.

The point being for us as researchers is that looking only at averages can mask a lot of important critical things happening at the segment or individual level. In terms of cases or in market research, we are typically dealing with cases being each respondent. So let's give an example, not about crossing a river, but involving brand preference.

Let's imagine that we conduct a survey asking respondents to rate Brand X from zero to 10, and if the average score across 500 respondents is five for Brand [00:03:00] X, what could this mean? Does it mean that people feel middle of the road in terms of brand X? Alternate explanations are that half of the respondents think very highly of Brand X.

They give it a 10 and half think very poorly of Brand X. They give it a zero, or it varies significantly between age groups. Young people are giving brand X and eight on average, and older people are giving it a two on average, or maybe it varies significantly due to household income. Or finally, maybe respondents were just answering randomly and we're getting a five as the average of those random numbers just due to low motivation.

So there's lots of different explanations that could be going on if we see across 500 respondents that we're getting a five rating for on average for brand x. So you know, the point is, is that you're usually not trying to market to the average consumer [00:04:00] segments and individuals differ in terms of their preferences for brands and features, performance, style, price, et cetera.

Sometimes those segments of respondents can be identified by age, by income, or other variables that you can ask about or observe about them, but sometimes the segments that you want to target. Can't be identified by observable segmentation questions, rather that these segments that could be important for your targeting could be latent.

Latent, meaning that they're not observable with a simple segmentation question. You can't just include question to obtain this segmentation, but the segmentation can be found using another method that we're gonna be talking about. So why do this market segmentation? Well, we wanna find out about and target the segments that would be most [00:05:00] receptive to your product or service.

If we're talking about the typical marketing research case, or maybe we're about to create new products or modifications to our products or services, and we wanna create those multiple. Options that better meet the needs of a diverse set of customers and can increase our market share and hopefully won't, uh, have, be subject to a lot of, uh, cannibalization of our own product line.

I. So how do you do this market segmentation? That's a needs-based segmentation focused on, uh, developing a product or developing messages that reach segments of people who think differently. Again, you could segment people based on variables you already know about or ask about or can ask about. Maybe these variables are gonna be the ones that help you target people and create new products that are [00:06:00] specific.

To them in terms of making them happy. Maybe you're lucky in that these, these variables like geography and company size, age, income, will get you where you need to be in terms of market segmentation or, and this is often the case that those. Stated variables that you can throw into surveys don't give you quite the segments that you would be as useful for you for these purposes.

So then you turn to a latent type of segmentation. You classify people based on latent segments, segmentation variables that are revealed. Using a series of, for example, importance questions about needs, desires, and price sensitivity. And the historic approach has been to create a series of questions using rating scales followed by cluster analysis.

Often K means, um, if any of you've done this in the past, you're probably cringing because. Often fails miserably. [00:07:00] If we ask, let's say, 10 or 20 ratings based questions in a grid, or one question at a time about needs, desires, price sensitivity, it often fails miserably just due to lack of discrimination. A lot of respondents just kind of say, well.

Everything's important. I'll give you a bunch of fives and fours or eights, nines and tens, and you just get not very much discrimination for the algorithm to be able to grab onto and, and separate much signal from noise. The other big, big problem with rating scales. In this historic approach to segmentation is scale to use bias.

Different people and particularly different groups of people such as cross-culturally. Um, Germans are known to use a little bit lower scale points on average than South Americans or people from India, for example. This is, these are just norms that follow over. You know, we've seen this many, many [00:08:00] times and what can happen is if.

Groups of people are answering higher on the scale versus lower on the scale. And that's mainly due to cultural bias. Uh, the, just the way that they, um, that, that they're more ye saying or more no nay saying you can get a segmentation where the main thing that's driving the segmentation is whether you're a German or a South American.

It's not really. Whether you prefer more price sensitive or you prefer a particular brand or that you need more performance, that's not really coming out. It's the scale use bias that's often driving the majority of the, uh, of the, of, of what this segmentation, these Cain segmentations on these batteries of, of, uh, of, of.

Standard survey questions are, so this just doesn't work out well. And Max Diff or CBC Max Diff, also known as best worth Scaling or Choice-Based conjoint [00:09:00] analyzed by Latent Class Multinomial Logistic regression, latent class M and L, which is a type of finite mixture modeling is much, much better. It gets better discrimination, and it avoids scale use bias.

Because these choice methods, max Diff or CBC force choices, it doesn't really force choices. We'll talk about that in a moment or trade-offs among attributes and items. These methodologies, whether it's Max Diff or CBC, engage respondents more encouraging to use their brains to think carefully. Rather than just being more lazy and typically straight lining us.

Now they don't force choices because respondents can just quit, for example. But we typically say that these are forced trade-offs, and in that sense, respondents can't just say everything's important. They can't just say, oh, everything's a five on a five point scale. With C, B, C and [00:10:00] Max diff, we don't let them do that.

Now, of course, alternatively, respondents can just get, um, lazy and answer randomly if they don't want to engage with us. But we will find that by looking at hierarchical basis fit statistic and so that we, we can clean the data afterward if some respondents won't engage with us. But these methods. Do so much better at engaging people to actually discriminate and think about the importance of these attributes and features involving product choice.

And they avoid scale use bias. There's no scale. We're not asking people to rate products or messages on a five or a 10 point scale. There's no scale to use. We're asking people to choose. Choose the best or choose the best and the worst. Okay, so we're gonna be able to avoid the issue. For example, and I'm generalizing that Germans potentially use, uh, generally lower points on a rating scale than South Americans.

It doesn't [00:11:00] matter with Max Diff and CBC. Whether you have scale use bias, because we're not asking a scale, we're just asking you to choose best or best and worst. That means that you're gonna get much more meaningful discrimination among the items and attribute levels from Max Diff and CBC than you'll get from some say, rating scales.

Your T tests between items, your F tests across groups of people are gonna be much stronger. With the same sample size when you use MaxDiff and CBC with choice data to conduct market segmentation than if you use the schlocky approach of rating scales, which often fails miserably. So what you get from blatant class M and L, whether you're running it on Max Diff or CB, C, is you get, in the first case, the number of groups or classes that you as the analyst request.

Just like cluster analysis. You as a researcher need to say to the algorithm, Hey. Go out and look for the two group, the [00:12:00] three group, the four group solution, and let me look at some of those solutions and figure out which one works best for me and measures of model fit, help you decide which is more appropriate, and I'll talk about that in a minute.

The other thing that you get from latent class M and L is the size of each group, the size of each class. Okay. And you're probably going to ignore groups that are too small. If you have a segment that pops out that has 2% of respondents, uh, you're probably gonna wanna rethink things, do some data cleaning, not take that solution, because it's just typically not very useful to think about targeting or think about considering, uh, a group that only has 2% of the sample or 4%, or whatever your threshold might be.

The last thing that you get from latent class m and L is segment assignment for each respondent. Each respondent actually has a probability of assignment to the groups. So if there's two groups, I might be [00:13:00] 83% in group one and 17% in group two. And but typically to make things easier to deal with and to explain to clients, we typically will classify people into the group that they have the highest probability of membership into.

Let me talk a little bit about how the algorithm works. This is an intuitive explanation rather than the full math, so forgive me on that. If you're a real math person, you can look it up in the, in the latent class technical paper to get the more full specifications. But this is essentially how it works.

The analyst. Specifies how many segments to look for, let's say K segments, and let's imagine that K is three. So for the three segments, we're gonna randomly

select smallish initial preference weights for the attributes that are assumed to drive choice. So it's a, if it's conjoint model, we're gonna have the part worth utilities for segment one, for segment two, for segment three, we're gonna have three vectors.

Of part worth utilities, and we're gonna [00:14:00] randomly flip a coin and make tiny little preference weight bo uh, differences right around zero. Like, you know, plus 0.01, minus 0.01 plus, you know, just a tiny bit of signal to get the algorithm going. You have to have, you can't start with all zeros or else the algorithm can't get going.

So after you've initially seeded the algorithm with some random starts, you calculate the likelihood in step one, that each respondent belongs to each of those case segments, each of those three segments, given the respondent's answers. We compare it to the, um, to the utilities and say, Hey, which one of these segments best fits this respondent?

And we can calculate likelihoods that sum to 100. So for each of the K segments, each of the three segments, we're gonna run three weighted regressions where the weights. Are the respondent's likelihoods of belonging to each segment. So each respondent is involved [00:15:00] every time, but each respondent gets a different weight when we're updating the utilities for the um, segment one, when we're updating utilities for segment two, each respondent is used but has a different weight in each of the three weighted regressions, and we repeat.

And obviously when you run those weighted regressions, you're gonna get a new set of utilities, which are different from our initial. Plus 0.1 minus 0.1 starting points, and then we repeat steps one and two until the fit of the model fails to improve by very much. And then we break out and we say, Hey, we think we're good enough.

I. That final solution that we break out at depends, unfortunately on the initial random start. A challenge with these methods is that you might stumble into an unlucky, suboptimal solution if you're unlucky with your initial set of random weights. So the simple thing to do obviously, is to do it multiple times from different random starting points [00:16:00] and to take as the final solution, the one that leads to the best model fit in terms of log likelihood.

Or BIC, um, either one will tell you the same answer. Uh, in terms of pointing to which of these five or 10, uh, these 10 tries that you're doing from different starting points is the best one that fits the data the best. So you run it multiple

times. Take the one that fits the beta, the data the best, and that way you avoid, um, getting an an unlucky solution, uh, due to an unlucky starting point.

So one of the big questions with latent class or cluster analysis or any of these methodologies that find latent segments is how to pick the best one for your client, or the best one for the data, or the best one for the social research, um, study that that, that, that you're doing from a statistical standpoint.

The Bayesian information, criterion BIC can be used to identify [00:17:00] the solution, the two group three, group four group solution, et cetera, that has the best justifiable fit to the data. Here is the, um, formula for it. It involves the log likelihood, and it involves a penalty for the number of parameters we're estimating, um, which also involves the number of parameters in the model.

Uh, the number of part worth, uh, utilities, for example, that you're fitting and the number of segments you're fitting them for. So similar to the idea of adjusted r squared that you might've been taught many years ago, um, we want to see as we add more segments that we're getting not only better fit to the data, but we're not just getting spurious fit due to just adding more parameters to the model that these.

These additional, um, segments actually provide justifiable fit to the data. Okay. Um, so, uh, we adjust the BIC adjust the model by the number of parameters to say, Hey, yeah, we added [00:18:00] more parameters that we did, did we get really enough fit to justify that we're going after yet another segment?

Interestingly enough, um, unlike things like R squared or adjusted R squared, lower BIC numbers are better. So we want to see lower numbers, and that indicates, uh, a more justifiable fit to a segmentate to more, more inherent, uh, segmentation goodness in the data that it's captured by the, uh, by the parameters.

But you know, you shouldn't just look at the BIC and say to your client, Hey, SEG, the six segmentation solution has the lowest BIC, the best fit. So that's the one dear client that you should take. You know, you really. Should think to yourself, what does the client need? Because choosing the latent class solution that has the lowest BIC may not necessarily be the thing that helps your [00:19:00] client the most.

You should rather prioritize the latent class segmentation solutions. That's the most useful. For answering the business or research question, the most useful

one typically isn't the two group solution and the most useful one, at least from, from a strategic segmentation in terms of, uh, a segmentation.

That's helpful. Understandable. Memorable is probably not the 15 group solution either. Okay. In our experience, it's typically. The three to six group solutions that the one are the ones that, from a strategic standpoint are most helpful, understandable, and memorable for communicating throughout the organization for creating messages or target products that reach those segments, et cetera, et cetera.

After thinking about that, you should look at solutions in terms of BIC, but make that, you know, if you're about, if, if, if you're trying to break the tie between the four group and the five group [00:20:00] solution, both of them seem to work well for your client, you probably wanna look at the one that has the best by BIC and let that break your tie.

Getting close to the end before I turn it over to Brian McEwen, who's gonna show you how it works within the new segment finder in Discover that's coming in a minute, and he's gonna do a great job latent class m and l or hierarchical. I say that because maybe I'm boring you at this point. You're like, just show me the software.

I'm getting close latent class or hierarchical base. Some people ask that, well, there are two methods that I can use to estimate utilities. Well, latent class M and L produces. Both those utilities and importantly segmentation assignment. Whereas hierarchical base does a really good job of finding individual level utilities.

You could use the segment utility weights from latent class to build a market simulator to make predictions of choices, but we think. You should use the two for the, for what we believe that they're best at. Use latent [00:21:00] class for classifying respondents and finding those segments. And then use the classification variable.

Maybe it's a three group variable that has each respondents assigned to a one, two, or three. Use that as a segmentation variable, and then. Or cross tabs and segment your predictions that come from using individual level data at the respondent level. So each respondent is voting for their products according to their HP utilities, but you are slicing and dicing and segmenting by the classification variable that came out of latent class.

Last slide for me, some practical suggestions for latent class segmentation. Don't field sparse CBC or max diff studies and hope to get great segmentations. Rather, you know, make sure that you don't skimp on the number of questions that the standard kind of rules of thumb. [00:22:00] Suggest the standard rules of thumb are to show each item in max diff at least three times to each respondent.

And for CBC, a standard rule of thumb is to make sure that each attribute level appears at least six times for each respondent. If you field sparse questionnaires that go much below this, this will lead to really low ability to find meaningful segments. Second practical suggestion is before running latent class, clean the data.

Because if you include random responders, and we often are finding from 20 to 50% of risk respondents who are just not doing a very engaged job, unfortunately today these random responders will often form their own group of meaningless responses. There often will be their own latent class, and your client might look at that and say.

Oh my gosh. I, I like the pattern of preferences in, in that group, and you guys start talking about and thinking about how to target those [00:23:00] respondents and in the end, they were just random responders. I've heard of this actually happening where they had to back up and say, oops. That group that we fell in love with was actually the group of random responders.

They don't really exist. Okay. I mean, they exist as random respondents, but the signal that we're seeing, um, just is, is not gonna be reproducible if we repeat the study. It's, it's fiction. So clean the data before running latent class. Hierarchical base provides an excellent way to clean the data from max stiff, also from CB, C, it can do a creditable job.

Identify those respondents with fit statistics that are near the chance level. We have other white papers that talk about that. If you wanna ask us about how to clean the data according to HBS FIT statistic, the RLH. The next thing is use multiple criteria to identify bad respondents. Low HB FIT statistic is one that I just mentioned, but speeders.

[00:24:00] Okay. Some speeders are actually really, um, good respondents. They just read fast and answer quickly. So just because it appears that they're speeding doesn't mean that they're bad. But it could be one strike in a multiple strike. Uh, uh. Setup that you have for identifying bad respondents. Look at

straight lining for from grids if you have any grids in there, or other repeated questions where you could identify straight liners.

Look at your standard consistency checks. Maybe you ask people what date they were born in. Uh, in early in the questionnaire and later in the questionnaire, you ask them their age and then you compare the, their date of birth to their stated age and see how consistent they're, and there, and there are other consistency checks that you can do.

They're also, you know, hidden honey pots that only bots can see, et cetera, et cetera. Looking at the quality of open ends, for example, can help you also try to figure out [00:25:00] whether respondent is being very conscientious or not. So, um, there's other data that, um, are implemented in a number of, um, panel platforms for, uh, trying to figure out whether.

The, the data that we're seeing, the the seem to come from a survey farm or from a bot. Okay? So there's lots of checks and maybe you have six or seven strikes that a respondent can get. And maybe after a respondent gets two or three strikes, you say, Hey, that's enough evidence that we think that they are bad.

It just clean them out before running late in class. There I've set the table for you. Brian McEwen and please take over and I'm boring them enough. Show them the software, show 'em how great of a job we've done with Discover, with Segment Finder.

Brian McEwan: Okay, um, let me go ahead and start sharing my screen here. I,

and we're gonna, we're gonna do something a little risky. We're [00:26:00] gonna, we're gonna try to segment live. It might. Blow up in our face, but we'll, we'll see how it goes. I've got a, I've got a different survey that that has a guaranteed solution that, that has come out well, but we're gonna see how it goes. Um, I've got a very simple survey that we've got programmed up.

It's got just two questions at the beginning about, um, your current phone and how old it is. A little intro about screen size, uh, and then a, a conjoint exercise, uh, trading off, um, the brand of a phone, how big the screen size is, uh, whether or not it has an AI assistant, the amount of storage, um, and then the levels of price.

Um, so I've got this survey published. Uh, we're going to take a minute and, um. Share it with ever

you want to scan the screen with a QR code, you [00:27:00] can take it on your phone. Or if you want to type in sawtooth.com/phones, um, that's gonna take you to the survey.

Justin Luster: I did put it in the chat, so you should just be able to click on it in the chat if you can find the chat.

Brian McEwan: Awesome. Thanks. So, we'll, we're gonna, we're gonna pause, uh, just for a minute. I'm gonna come back in 60 seconds, talk a little bit more about this, um, and give everyone an opportunity to generate some data.

And, uh, see how it goes. Hey, um, I'm gonna put on my, my product manager hat, uh, very briefly, and, um, just, uh, number one, tell you how easy we've made it. Uh, to program these type of exercises. Um, so, uh, discovers our online platform. Um, if you are, uh, our, our desktop user, lighthouse Studio, um, Sawtooth has had tools for running [00:28:00] this type of segmentation work, uh, for a lot of years.

Um, this is something we just recently added, um, into discover, uh, where we can do, you know, skip logic and quotas and, and all that sort of stuff, and, and then add in our. Uh, our expertise in conjoint, uh, and max stiff. Um, if you haven't tried out, discover, give it a shot. We've got, uh, I've actually got a, a preview mode here going on where, um, I've got access to nested attributes or alternative specific designs.

Um, and I've also got, uh, the option to merge attributes visually. Um, and I've got a, a second survey up where we've got some nice. Uh, enhancements coming to CBC exercises where we're gonna be running some design tests, give you some recommended sample sizes, um, kind of make sure that you're in a good spot or alert you to any warnings, uh, coming down the pipeline.

Those, all these features [00:29:00] should be coming out within the next few weeks, and we're very, very excited about them. Let me check the data really quick and hopefully we've got, oh, we've got quite a lot of completes. That's great. Um. Let's go ahead. So what I'm gonna do is I'm gonna start the analysis here.

It's gonna take, it's gonna take a minute or two to run both because we have to run both our hierarchical base, individual level utility models, and then we're

also then gonna run the segment finder to look for these, uh, existing groups of people. So I'm gonna come in here and, uh, start this off.

And that's gonna kind of run up in the cloud. So as we wait for that, I'm gonna switch over to, uh, a max stiff, just so you can also see, uh, the results over there. This was a, uh, an anchored max stiff. It was a message testing for, um, a new Cola. Um. Uh, concept for a new Cola, uh, soft [00:30:00] drink. So we had a lot of different messages here about, it was mountain themed and it was zero calories.

So, uh, a lot of, uh, different options here. And as you can see in the MaxDiff, everyone would see, uh, a couple at a time and choose the one that they liked, the, the most and least here. And so segment finder is in addition to. Uh, our typical analysis, so this was an anchored max stiff. We have our anchor point, um, and then we estimate scores for all of our items.

So nothing changes there. Um, that's kind of an independent analysis, um, with segment finder. Uh, we ran, uh, a couple different segmentation solutions. Uh, you can see I've got them as tabs up here, my two, three, and four segment solutions under our details. This is where you see that, um, fit statistic, the Bayesian information criterion.

Um, remember what Brian said, this is, uh, this is kind of an [00:31:00] inverted graph, so, um, lower numbers are better. So we decided to, to visualize it and flip it here so that. Uh, lower numbers go up. Um, and you can see kind of that example here where, uh, statistically according to this fit statistic here, we're not seeing a lot of improvement as we increase segments.

And this also kind of passes the, the sniff test, um, here. So if I go to my, my two segment solution, this actually has an interesting. Uh, segmentation come out of it, segment one. So up here we have the greatest differentiators. So this is, is taking all of the scores. We generate scores for both of the groups.

Um, and these are generating, uh, looking at where the biggest differences, uh, are for me and just highlighting a couple of those top ones. Um, and that's really helpful. It makes it really clear that segment one. Um. Really likes the messages about the, uh, no calorie aspect of the soda. [00:32:00] Whereas segment two is, uh, appreciating, uh, the kind of adventurous, uh, a taste in a, a taste adventure in every bottle.

And taste the thrill, share the chill. Um, they kind of like, so that's interesting. You can see that, uh, kind of reflected as well. A lot of these other adventurous

ones where bold meats refreshing, unleash the flavor. They're, they've got higher scores coming outta here. Um. Uh, satisfy those cravings, boldness in every bubble.

The coal, it's adventurous as you are. Those are all generally doing, uh, better, um, consistently above the anchor and, uh, being chosen as best more by this segment two, whereas segment one, uh, a lot of people. They're right close to that anchor threshold of this is a good message or not. Um, but those, uh, those zero calorie options only made with natural ingredients, um, and then satisfying those craving also does [00:33:00] a little well, um, with this group as well.

So, pretty interesting segmentation solution. Um, and then of course. These methods are very friendly. If we ask it, like Brian said, to just keep going and keep going. It will, it will find segments that will do its best job at it. So there is this, this element of, um, evaluating and not just looking, uh, on the fit statistic and, and kind of taking it at face value.

So if we just kind of think about what happened now with our three segment solution. You can see segment three now is still that, uh, low calorie favoring group. Um, that's going on here, segment one and segment two. However, um, their greatest differentiators, some of these aren't that big of a difference, right?

Um, you do have a difference about embrace the bold flavor of mountains in every sip. A taste adventure in every bottle is also. Uh, pretty much being the same. So this is, this is kind of that art part of the [00:34:00] science, right, where it's maybe not quite as interesting and, um. Uh, explainable of, of why these all these segments exist, um, which is, uh, the opposite of se of the two segment solution where it was very clear what was going on.

Um, so that is, I think, also reflected in the FIT statistic. We're not really. Uh, we're kind of splitting the, uh, the two group solution as we further create these groups so we're not statistically really improving. Um, and as we look here, we're again, we're seeing kind of the same thing. Segment three is that low calorie solution, and then segment one just kind of is stronger in preference for a lot of the adventurous.

Mountainous options. So there are some differences, but again, it's, it's kinda hard to tell an interesting story as we move on to the two and four group. Uh, we, sorry, as we move on to the three and four group solutions. Um, so this is, this is the, the interesting part of segmentation. It's not just clear cut, you're always gonna get an [00:35:00] answer.

Uh, they're not necessarily gonna be, uh, the most useful answers. But this was a very interesting, uh, um, group here. Um. Because we ran our HB utilities, we can still do the, uh, turf analysis, the total unduplicated reach and frequency. Um, interestingly, uh, if we do that and say for example, what two messages, uh, are the top for as many people as possible who took my survey, um, then that's the zero calories, a hundred percent flavor and the satisfy.

Those cravings, remember, satisfy those cravings, kind of tested well in, in the adventurous group, um, as well. So again, passing the sniff test, we're seeing similar things, uh, in our, in an ideal bundle of messages. And then, uh, what's also cool, uh, these are, these are just variables in my survey now, and so I ran some cross tabs and, uh, again, a very interesting outcome if we take [00:36:00] that two segment solution.

And we look at, uh, the gender question that we had in the survey. Uh, there's an equal number of people in both groups. This does not line up with, uh, a lot of these demographics. If we look at our age question, we've got a little bit, uh, going on where segment two, um, kind of has a little more in the, uh, the 26 to 35 and 36 to 55 group.

Um. The people who were a bit more adventurous tended to drink, uh, Coca-Cola more, whereas the low calorie, uh, segment one tended to like Coke Zero and Diet Coke a little bit more. Um, so that, that kind of tracks a little bit, but that's an an interesting thing we can look to see do these needs-based segments.

Correlate, uh, with any of our other survey questions. Um, sometimes they will, sometimes they won't. It's just something you have to, uh, investigate. This question [00:37:00] asked how often people drink soda. And again, not very big differences in this one. These, these, uh. Low calorie and our adventurous people seem to not, uh, line up.

So going back to what Brian was saying, if we're just looking at averages, uh, we can miss these insights that are latent, that are hidden in our dataset. Um, so that's pretty interesting there.

Okay, let's head back here.

And we'll go to our analysis area.

And so overall we've got some, uh, this is kind of what I was hoping for. We've got some strong preferences. Um, uh, apple, uh, the, uh, Chinese brand. Uh, not doing quite as well. Um, people tending to prefer larger screens. So, uh,

interesting. Nothing, uh, nothing crazy going on here. It looks like we had a couple extra people, um, take the survey, [00:38:00] uh, finish.

So, um. I think we'll just leave it as is. Some of you aren't represented here, that that is. Okay. Um, so let's take a look. Uh, so this is live. We'll see what's going on. We've got our two segment solution. Um, interestingly enough, I was hoping for this. We've got, uh, segment one are people who really prefer Apple and also have, um, a much higher tendency to say none of these, which, which tracks in the survey.

Sometimes you, you didn't see an apple. Uh, phone on the screen. Um, whereas the Android people had a lot more options. And so their non-utility, uh, is, uh, lower than the Apple people. Um,

Justin Luster: Brian, I get that to say that's me, right? I chose Apple every time, but if Apple wasn't there, or if it was too expensive, I would say none.

I would never do something that wasn't Apple. That was me personally.

Brian McEwan: Yep. Yeah. And that was, uh, that's what I was going for. Uh, I was hoping for a, a strong segmentation [00:39:00] solution here. Um. So, uh, that's pretty great. If we look at all the utilities, um, doesn't look like we have a ton else going on. Uh, Android users tended to choose the lower prices a bit more, uh, which again, uh, makes sense given the competitive landscape.

Um, let's look to see if there's anything in the three segment solution here. So we've still got our, our Apple people here. We've split it out. These numbers have gotten a little bit stronger, so we must have taken some of those, uh, apple people out. Um, and it looks like now we've got, uh, a very, very strongly price driven, uh, segment in segment one who still, uh, doesn't feel as negatively, uh, about Apple.

Uh, but is is primarily driven by that low price point. Um, let's see if there's anything else, [00:40:00] anything else popping out here? Uh, looks like we've got a little bit of, little bit of preference. That group one also tended to prefer not having an AI assistant on the phone. Um, so that's kind of an interesting.

Going on there. So, uh, yeah, experiment. Uh, successful. Thanks everyone for that data. And, uh, hopefully you can see the benefit of putting people into groups and, and especially with these methods that help you, um, hunt for these, these groups of people that do not necessarily correlate with, uh, traditional demographics.

Um. We, we might, in this example, we might have been able to, um, come up with some questions that might correlate really highly right? But, uh, it's really nice that if you're already running a conjoint or a max stiff or some other purpose, um, you can very easily run these type, uh, this type of segmentation [00:41:00] analysis.

Um, and then wait to see if anything interesting comes out.

Bryan Orme: Brian, could we see the, uh, BIC statistic on our data here? Oh, yeah, let's

Brian McEwan: do that. So I didn't run a huge number of groups. I didn't, uh, want it to take super long order in the background, but we do, we do get a bump as we go from, uh, no segments up to two, and then a, a small increase as we go up to three segments.

Bryan Orme: Yeah. And I just mentioned that one segment solution, a one segment latent class solution is aggregate logit, MNL.

Brian McEwan: Yep. So that, that represents the set of utilities. That is as if we're analyzing this as if we just smooshed all the answers together and we treated it like one person who answered all of the choice tasks.

Um, that's what those utilities represent. Um, for people who might be, uh, a little more technical, um. You can actually produce utilities from latent [00:42:00] class just by itself. Uh, Brian mentioned that you have the opportunity to see the, um, likelihood of group membership. So if I was, you know, a hundred percent group one, then we could create a set of utilities for me that look exactly like group one.

If I was like. 60 40, group one and group two, we can create a set of utilities. Um, however, uh, like Brian mentioned before, we generally think HP is the better set of utilities to use. So if I was gonna come into the simulator and I was going to, uh, start running some choice predictions of, I've got an Apple phone here versus a Samsung phone, this is going to use the HB utilities.

There's no, there's no settings to choose or anything. So when we're looking at the, um. Averages that's gonna be using the HB utilities if we wanted to segment by, um, the type of phone that people have.

[00:43:00] Um, yep. So everything's passing the sniff test. Uh, people who have iPhones really like iPhones, um, and people who have iPhones are less price

sensitive. That's all tracking. So this is all gonna be using the HB utilities. Um, where, and, and that's actually the same thing here for our segment finder reports.

Uh, we are creating the group assignment for people. But all of this is using the HB utilities to produce the charts and graphs and simulations and the turf analysis and MaxDiff and all that. Brian,

Bryan Orme: can you, can you address where, where do I see how respondents are assigned to segment one or segment two?

How do I see that? Respondent by respondent, how do I know what they're assigned to?

Brian McEwan: Um, if you want to get those respondent by respondent segments, then um, you can download all, everything comes out into an Excel spreadsheet. If we came into the records area. Um, and click download data. This would give [00:44:00] us one row per respondent, and then all our columns would be the, the raw choices, the survey answers, and then we would export out the segmentation data, the utilities.

Um, all that good stuff would come out in an Excel file or an SPSS, uh, file if you wanted to take that and run it somewhere else.

Okay. And with that, I will turn it back over to Justin.

Justin Luster: Okay. Thanks guys. With that. Thanks everybody. Great job and uh, we will see you for the next webinar. See ya. Bye.