Leveraging ChatGPT as a Research Data Co-pilot: A Deep Dive

Podcast

In this insightful episode, Ray Pointer, founder of NewMR, explores using ChatGPT as a co-pilot for research data analysis.

Ray covers study design, qualitative and quantitative analyses, data cleaning, driver analysis, and the use of ChatGPT for generating scripts and visualizations.

Ray also highlights the benefits and limitations of ChatGPT, emphasizing its role as a starting point and second pair of eyes for research projects.

Listen in to learn how AI can enhance your research and insights efforts.


Follow us on your preferred platform

Apple Podcasts badgeSpotify badge

About Our Guest(s)

Ray Poynter has spent the last 45 years at the intersection of insights, research, and new thinking. He has held director-level positions with companies such as The Research Business, IntelliQuest, Millward Brown, and Vision Critical.

Ray is committed to the research and insights industry, having been a member of ESOMAR for over 30 years and is a fellow of the MRS.

In recent years Ray’s work has focused on training, writing, speaking and sharing. Ray has run training workshops for a variety of national and international organizations, including RANZ, TRS, JMRA, MRS and ESOMAR. He has written textbooks, taught at Saitama and Nottingham Universities, regularly blogs, and is active on social media.

Ray Poynter

Ray Poynter

Transcript

Automatically transcribed

Vanessa: Quick disclaimer before we start, this episode was originally recorded as a webinar and edited for podcast format. You can find the original webinar recording on our website at sawtooth. com or on our YouTube channel.

Now back to the show.

Justin Luster: Welcome to our latest webinar at Sawtooth Software entitled using chat GPT as a co pilot to explore research data. And we're joined by our guest speaker, Ray Pointer, founder of NewMR. Ray has spent the last 45 years at the intersection of insights, research and new thinking. He has held director level positions with companies such as the research business, IntelliQuest, Miller Brown and Vision Critical.

Ray is committed to the research and insights industry, having been a member of SMR for over 30 years. And as a fellow of the MRS, he's actually president of SMR in recent years. Ray's work has focused on training, writing, speaking, and sharing. Ray has run training workshops for a variety of national and international organizations, including RANS, TRS, JMRA.

MRS and SMR. He has written textbooks. He's taught, he's taught at Saitama and Nottingham Universities. He's a regular blogger and is active on social media. So with that, Ray, we're excited to hear from you.

Ray Pointer: Thanks, Justin. And hello everyone, and we're going to be exploring some of the things you can do with CHAT GPT. So here's the agenda. I'm going to start with a caveat. I think that's really important in this fast moving area. I'm actually going to start with how you might design the study, then looking a little bit at qualitative information, but the big dive is going to be into the area of quantitative analysis.

So I'm using ChatGPT 4. 0. I'm using the team version. The team version is quite a good option. It's slightly more expensive than, the standard plus. But it does default to not storing or learning about my stuff. And that's pretty important. You have to have at least two licenses to be a team.

The next level above team is enterprise. And that's got even more wonderful features and features. But you need 150 licenses, and that gets pretty expensive, and you've got to find a lot of friends. The co pilots I'm going to be talking about is, imagine you've got a learner, a new intern, a new employee, they're quite bright, they're quite knowledgeable, but they've got gaps in their knowledge and they make mistakes.

So they can help you. Get more work done, but you need to be careful. Use it to start projects. Use it to give you something to amend, to correct and refine. It's great as a second pair of eyes to check what you've done and it's useful to look things up. So imagine we're designing a study. So I type in a prompt to ChatGPT.

I want to conduct a B2B market research study on behalf of my client who is thinking of launching a product to compete with Salesforce. The client wants to run a conjoint choice model study to understand the key drivers in procurement of customer relationship management software. Please describe the key attributes and levels to include in this study.

And what happens then is you go into an interactive process, it comes up with 10 attributes. I think to myself, no, no, I don't want 10, I want 7, so reduce the attributes to 7. Now we have a look and we see what they are. And if this was a full example, I would be tweaking several of these. I'd be asking why it generated them and we'd end up with a, a finished design.

I then want to ask it things like how many tasks, so how many options to include in each task, how many tasks to show for each participant. It comes back with some ideas and what you'll see on the screens that I'm showing you is I've highlighted occasional things in red. I'm not expecting you to read the whole screen at the speed I'm talking about, but the things that pop up in red are worth having a look at.

So the number of options per task. It's suggesting three to four options per task. I tend to go with four, but yeah, that's a reasonable suggestion to start with. Number of tasks per participant. Very much in line with most modern thinking at the moment. Eight to twelve tasks. Quite acceptable. Then I ask it should I include none of these because I'm relatively knowledgeable.

I'm trying to develop my assistant to come along with me. It comes up with a whole range of reasons why none of these might be a good option. And then I ask it, how many people should I interview? It comes back with a rule of thumb, at least 300 respondents. It tells us that it's a lot to do with the complexity of the study.

It's to do with the number of choice sets you've got. But it makes this nice point at the bottom that if my client is going to want to segment the data, then the sample size will have to be larger. And that's a useful point to be making. What software could I use to administer this research? And, I'm sure Justin and all of his colleagues, Brian and so on, will be delighted to see the first one on the list.

that it came up with was Sawtooth, came up with Qualtrics. It said, well, if you've got a simple conjoint study, you could use SurveyMonkey. And it came up with a couple of other suggestions conjointly, if I remember correctly, was one of them. So I now move a little bit further forward. I intend to conduct the research in the USA with decision makers in the medium to large size corporations using Sawtooth software.

Please design a questionnaire I can use. So it starts off with a nice welcome to the study company size 100 to 499. It's taken me probably a little bit too literally at medium to large size because a medium sized company can't be fewer than 100 people. But actually, I need to make sure that I deal with it if one gets recruited.

Are you a decision maker? We move into, an interesting thing. It starts off by saying it wants simply to do an old fashioned rating of importance on these seven, attributes. And then it tells us that we're going to be presented with several sets of CRM software. And each one of these has got a none of these so we can see, the attribute levels, we can see the attributes, the choices, the tasks, it's going to repeat this for 10 combinations.

It's going to ask a few demographics. Now what happens when I really use ChatGPT is I'm spending a lot of time, I'm adding two or three demographics, I'm changing them, but it's quite nice to get that sort of framework coming through. It then tells me how i'd implement that in soft in sawtooth software So we're going to define the attributes and the levels using the sawtooth interface We're going to create choice sets using the automatic choice set generator Set up screening questions deploy the survey all that sort of parts and processes within that.

So when we're designing research, we're going to treat ChatGPT as a source of ideas, as a starting point. We're going to refine the design with our own knowledge. We're going to ask it why it's picked things, ask it if there could be drawbacks of that option, ask it if there are alternatives. And a really nice question, um, I want to say, is there a custom and practice rule about this?

Are there rules of thumb that can guide me to where I need to be in this area? It won't write a study for you, but it will help you design one. The next topic I want to talk about is, before we get into analysis, we always need to check And

clean the data, especially if it's come from online panels, there are immense problems with data quality with most online panels.

So we can ask chat GP to do this. Now there are some great tools using API's. But we can do some basic stuff with the web based interface. Checking for things like, are there duplicates, missing data, etc. So, the way you upload a file into ChatGPT, if you've not done it, is you use the little paper clip.

Click on the paper clip and you can upload it from Google Drive or Microsoft OneDrive. Or more likely from your computer. So I created some data. Um, it's really got quite a lot of bad data in it, considering there are only six participants in this data. Um, so I've got this dummy. You can see that ID three is repeated twice.

We can see that we've got some missing data for ID two. And at the bottom, we have something that looks like a, a. a bad piece of data entry. The NPS rating was a three. How was your visit? Monday. Above that, we've got some personally identifiable data from bogus at test. com. So are there any duplicates?

Yes, there's a duplicate entry in the data. Here's the duplicate row. Would you like me to remove it? Yes, please remove the duplicate. Are there any cells with missing data? It says, yes, there's a cell with missing data. Would you like me to handle it in any specific way, for example, filling it with a specific value or putting in the mean of the column or some other option?

No, I'm happy to leave that as a missing value. I want to remove any responses which might be from bogus participants or participants that are not concentrating in giving valid answers. Are there any values which suggest a participant is not giving valid and useful information? And it suggested two cases.

Well, the first one of those, ID 1, has given me an NPS of 3. How was your visit? Okay. I'm happy with that. It thinks it's possibly not good. I disagree. So I'm going to tell it to leave that one. Then it comes up with this 5th ID, NPS rating 3. How was your visit? Monday. Yeah, that's somebody not paying attention, or it's a bot, or it's just somebody trying to earn the incentive.

So I'm going to delete that one, remove ID 5. Okay. Final check I'm going to run is check each record to see if there's any PII, personally identifiable information. It goes through and it tells me, yes, there is one entry, email me bogus at test. com. So what does it want? What am I supposed to do about that?

I tell it to replace it with John slash Jane Doe, but leave all the rest of the field as it is. Then I'm going to export the file as a CSV. I can export it as Excel. I can export it in a variety of formats. Click on there and I'm going to get the downloaded file. That generates this cleaned up data like this.

So we're going to use things like this frequently to help us clean the data. It shouldn't be the only thing you do to clean the data, but it can be one of the first things you do to reduce other steps you need to do. We can use it to help us write code and as a reference. For example, if I want to know how to find the correlation between two sets of data in Excel, I ask it, and it shows me that you can use the CORAL function, or you can use the Data Analysis Toolpak.

We can also ask it how to write code. So here I'm looking at it using Google Script. How can I write a program for Google to take a column of text, read it in, Auto translate it into English and paste it into the adjacent column. It tells me how to do that. It also gives me the code. I can click now on copy code.

I can paste that in as a Google script and I can run it. So I'm just going to show you an example of what happens when I did that. I have taken the script created, suggested by ChatGPT, copied it into my script editor, linked it to this worksheet, and I've put in some text here in a range of languages.

So if I select this text, I can then click on the run button, and I've linked this run button to the script that ChatGPT suggested. I get a message that it's running, and we see that it's translated Salut le monde, Hallo Welt, Salve Mundi, etc. into Hello World. Now, just before I'm moving on, I could have done that for VBA, for Excel, if I were using, a Microsoft operating system.

If I'm using a Mac, it's actually a little bit flaky. It struggles to write VBA. For Excel on a Mac. So not everything is, is equally good. Um, now let's think about qualitative information. So open ended comments, social media comments, online discussions, depth interviews, focus groups, et cetera, even images.

So this is from a new MR study that we did earlier this year, which was looking at how people are using artificial intelligence. So I uploaded the data. I asked it to show me all the questions in the file, and I'm going to pick this one at the bottom, which is an open ended question. Looking forward two years, what impact do you think AI will have on research and insights?

So, I ask it to conduct a cluster analysis of this open ended text in question number 14, and then to describe the clusters. I could also have said, show me the themes, but I chose to ask it to us, create clusters. The first cluster was about cautious optimism and it describes what that was like. Then there are people who are talking about the integration challenges over the next two years.

There are people who talk mostly about efficiency and automation. And there are people who talk about significant impacts, so we can see that there. We can then pull some information about those groups. So show me a table showing how often each cluster is used in this file. Comes back and it shows me that cautious optimism was the most frequent category, followed by those people who were talking about integration challenges.

The significant impact group, by the way, we're not. overly interesting. They're when asked, how do you think AI is going to be impacting our industry? They tend to say things like in a big way, which of course is true, but it's not necessarily helpful for diagnosing which way to take things. Create a table with the comments in column 1.

This is the raw comments that I had uploaded. And in column 2, put the name of the cluster it's allocated into. And then, in column 3, put a sort of sentiment analysis. But what I want is specifically, does this person think that the world in two years is going to be better? Is it going to be worse? Or is it unclear?

And so we can get that sort of coding. We can then export that as an Excel file, because we might want to add that to some tables data we've got. We might be looking at open ends. We can download this. and bring it into somewhere else. And this is what that data would look like if we downloaded it and loaded it into Excel or whatever.

So let's move into quant and we're moving into simple quant to start with. This is another new MR study. This is from a couple of years ago, about three years ago. We've got 527 people from all around the world. So I can ask it to do questions like this, like a sort of tables package almost, create a table where, with, were you involved in creating that report or in receiving it as the columns and how would you describe the quality of that report as the rows?

And. It comes back with a table. It's got counts in there, and it's got the order alphabetical, which is, across the top, which isn't particularly useful for me. So, add a total row and a total column and move the other to the last three columns and reorder the rows so that the very poor is at the top and excellent is at the bottom.

And it generates this table here. Still counts. In my tables, do not show decimal places. Change the values in the cells to be column percentages, but leave the totals as counts. For the percentages, display a percentage symbol. And now I've got something much more conventional looking. We've got The basis effectively at the bottom.

We've got the column percentages. We can see what's going on and we can start to understand this information. Now I want to add a row that adds the top two boxes together, i. e. very good and excellent and label the row top two boxes. So it's added in this row and now we see quite an interesting difference between the two sets.

The people, 69 percent of the people creating the reports Thought they were top two boxes good. And 36 percent of people receiving market research reports thought the top two boxes were good. We can then ask it whether or not there are significant differences between the views of those creating and receiving reports.

Now, one of the interesting things when you start using chat GPT to do that is it normally uses the right tests. So instead of using some version of the t test or the normal test, which you will often see in market research, it looks at it and it says, all right, that's a chi squared test, which is what we all did when we were at university, but not what we always do commercially.

And it runs the chi squared test and it says, yes, there is a significant difference between the views of the people writing reports and the people receiving the reports. I have a hunch that over the next two to five years, we will start using the correct statistics when we're testing things more often, because the more AI is looking at the problem, the more it's going to suggest the right test, unless we train it to behave like commercial market researchers.

I could also, rather than ask it to produce tables for me so that I can read the information and come to a view, maybe we could get a first draft by just simply asking it to analyze it. So review this data and write a brief report that highlights the main differences between those people who say they create reports and those who receive the reports.

Now it's come up with some stuff on the left, which we can go into, but Look at the summary, and it's not quite got what I meant by this, and this is where we need to iterate quite often. The main difference between those who create and receive reports is in their roles and experience levels. So if somebody is client side, they are more likely to be receiving reports.

If they are agency side, they are more likely to be writing reports. True, but not useful. We wouldn't highlight that as a finding. chat GPT highlighted as a finding. However, it has also said, that the people writing them have a better opinion of than the people receiving them. We can make that very explicit by saying, is there a difference between how good these two groups think the reports are?

And it makes the point that both groups think they're good, but the receivers think they are less good and the writers think they are better. And additionally, it's gone a little bit further than I had asked it for, which is quite nice. It's a smart intern. Additionally, both groups largely agree that the quality of reports had remained consistent over the past two years at that point.

We can also use it to help generate our thinking around visualizations of the data. So show me five visualizations of this data in terms of creators versus receivers. And it comes up with some charts, here. Showing that we've got a different shape really between the quality rating, for the creators, which is quite peaked on very good and the quality rating of the receivers, which is rather flatter between fair, good, and very good.

This is the quality change chart, pretty similar between the two. Then it suggests some pie charts. So it hasn't come up with some wonderful, wonderful charts. But they are at least useful as ideas.

Quick tip while we're talking, these are all images. If you export them, they will come out as images. If however, you have a visualization you like. If you tell ChatGPT, I'd like you to create a PowerPoint slide for me with that chart. I would like you to create the chart as a PowerPoint chart object. And then get it to export the PowerPoint slide.

It will give you editable outputs. You have still have to tweak things like the size of the fonts, but you can tweak them because they're not just an image. Okay. Let's move to, something a little bit more interesting. This is looking at driver analysis. You can also get chat GPT to generate these cheesy images.

So this is driver analysis about hotels. So again, we have to create prompts that are really specific about what we want to do. I run a market research agency. My client is a US hotel chain, and they want to create a CX tracking study for their 200 US hotels. The study should measure satisfaction with the key services and the Overall NPS score.

Suggest a research design for this project. Assume that we will want to send email invitations to an online survey to customers the day after they check out of their hotel. We will include the name of the hotel they stayed in in the invitation from the hotel's records let's ask an overall satisfaction question first with various, Dissatisfied through to very satisfied. Then we're going to ask a net promoter score. Then we're going to ask satisfaction with room cleanliness, comfort of the bed, quality of the food, customer services, hotel amenities.

And we're going to use the same five point satisfaction scale that we use for overall satisfaction. My client wants to understand how to improve their NPS scores. What are my options for using this data to conduct a driver analysis?

To help your client understand it, it suggests doing the following. We could do correlation, and it tells us the benefits. We could do regression analysis, and it tells us the benefits. We could use relative importance analysis, for example, Shapley values, and it tells us the benefits. And it also says we could use structural equation modeling.

I'm ruling that one out. It takes too long. It's too expensive. So I'm going to home in on the other three. So let's try correlations first. Show me the correlations between each of the satisfaction scores and the likelihood to recommend question. Rank the answers from highest to lowest. And we see overall satisfaction, not surprisingly, at the top.

Room cleanliness, comfort of the bed, Hotel amenities at the bottom. This is from some synthetic data, which had some biases built into it. I used ChatGPT to create the synthetic data so that I was able then to look for the patterns that came out. Room cleanliness being really important was one of the biases.

Hotel amenities being really unimportant. Was one of the biases that I built in now, let's look at regression use regression to calculate the importance of each of the satisfactions in terms of nps question and it comes back with the coefficients And it then sums those to 100 And it tells us which ones are the most important and we see again Room cleanliness coming out as very important.

We see hotel amenities coming out as not important. However, I would have liked chat GPT. If I was a really novice user to have said, you better check. That regression is appropriate for this analysis. So I'm now asking chat GPT. Are there any things we need to check when using regression in this way? And it comes up with a whole bunch of things.

In fact, it comes up with seven things we need to check. Independence of errors, linearity, et cetera, but multicollinearity. So I ask it to check for multicollinearity. And it comes back with the VIF scores like it should do, and it very nicely tells us, as it should do, that values above 10 are a problem.

Some people like them to be below 5. We certainly don't want them above 10. So that's useful to know. So let's try Shapley values. Use the Shapley values approach to generate a driver analysis for this data and produce a table that shows the results. So it produced lots and lots of intermediate pages.

Just showing you the final one it came out and slightly different. Now, overall satisfaction is a chunk ahead of room cleanliness because of the prioritization that happens with Shapley value. But pretty much everything else is very similar. Hotel amenities coming out again, really low. So I can then ask it to show me the results of the three approaches side by side.

And despite The really red flag on using regression analysis, what we see is that there is broad similarity between the three approaches. Overall satisfaction is, is really important, not surprisingly, and we're going to come to that point in a moment. Room cleanliness and home, um, hotel amenities, really, really low in that process.

I ask it to summarize what's going on, and it comes back with these conclusions. That the overall satisfaction of room cleanness is most important. Secondary factors, nearly everything else. But, quite a nice point that the average intern might miss. Um, hotel amenities, while still important, can be a lower priority unless specific feedback indicates otherwise.

So, we don't just go with the numbers. We think about the context, what do the numbers mean? What can we do with them? But this is something that I wish ChatGPT had said. Now I have to go to my intern and say, you know what? If I'm conducting a driver analysis for satisfaction data for hotels with the aim of improving the NPS score, should I include a measure of overall satisfaction along with things like satisfaction with the amenities, Rome, Room cleanliness and the comfort of the bed.

It's come back with quite a long answer, but no, we shouldn't do because of multicollinearity and causal understanding. It should have been removed, would have been fantastic. And maybe if we, um, uploaded custom GPTs and we told it some of our rules in the future, it would highlight that at the beginning rather than me having to ask it, is that a problem?

So use your co pilot. to start projects. Use it to give you something to amend, to correct, and to refine, as a second pair of eyes, and also to look things up. So, here is an example of one of the problems. At the end of that section on conjoint earlier, when I was preparing it, I said, Is there a good book I can read to help me design conjoint studies?

And it came back with the book by Brian Orms. I'm sure we're all very happy with that. Um, and it came back with three other books. I did what I normally do, which is I check that the books are there. This book doesn't exist. So, I asked Positivity, Are you sure this is a real publication? And it comes back and apologises for making that mistake, and suggests this is actually the book.

Now, this book does exist, you can buy it for 125 in the Kindle version, but it is actually out there. I don't get many hallucinations when I'm working with numbers. I always get hallucinations. I do periodic check. I do cross checking. If this is true, then this must be true. Let's see if that is true. And looking at things like that, it is much more prone when it's actually pulling up this factual stuff, like who wrote which books, which, where did they work and things like that.

So that is one of the things we need to be aware of. And then finally, and just before we jump to the Q& A, I asked it at the end of creating this presentation, just seeing. Please check this presentation and tell me about any mistakes or bad errors. Tell me the slide number and then tell me the error. And it came up with a couple of pages of mistakes, and I went through and I corrected them, but one of them is not a mistake and the font's too small for you to read, but that's all I'll read it to you.

Slide four context error. I am using chat GPT 4. 0. That should say I'm using chat PT 4. That is because ChatGPT 4. 0 final date for learning was before ChatGPT 4. 0 was released. It doesn't know about ChatGPT 4. 0, which is kind of a strange philosophical situation. It's really nodgy about lots of things, but it doesn't even know about itself because it was launched after its last reference update from the learning database.

So, thank you. For that, for listening to that, that was a whistle stop tour. And I think we are going to now switch over to Justin.

Justin Luster: Yeah, that was excellent, Ray, and brilliant. You're a great presenter and a fascinating topic. And with that, we'll wrap it up. See you guys at our next webinar. Thank you.