To write this post, I spoke into my phone and watched with no small bit of amazement how quickly my device was able to translate my individual voice into written text, albeit with an aggressive use of punctuation.
When we talk about AI, and how it will impact the work of UX, we really do need to consider the fact that in many ways, the impact is already there - so much so that we don’t even notice it anymore.
In that regard, AI's impact on UX Reseach will probably center around transcription and summarization - if we can get the abelist, English-first, western-centric bias out of the system.
And yet, the evidence that tools can do this well at the moment (at least in so far as Chat-GPT 3.5, llama, and Bard are concerned) is still uncertain. To understand why that's the case, we need to think about what LLM‘s are, and what they can and cannot do.
LLM‘s are quite good at reducing the relationships between seemingly unrelated things, and following linkages that may not be apparent on a surface level. Think of a simple spreadsheet, listing types of animals as columns headers (mammals, birds, reptiles, invertebrates, etc.), and a list of characteristics as rows (average height, average weigh, number of legs, etc.)
An extremely simplistic LLM, looking at the data filling in this spreadsheet, will "notice" that mammals tend to have two or four legs, and run larger and heavier than invertebrates.
If you then tell the LLM that you've found a new animal that weights 300,000 lbs and is about 100 feet long, and ask it, "what kind of animal is this", the LLM will probably guess "mammal" - without knowing anything else.
But what happened here? Does the LLM "understand" that such a large creature would, biomechanically, need a skelaton to support that much weight? Would it infer that this mammal lives in water? Would it "know" the animal lives in the present epoch, and is not extinct?
Not in the slightest. LLMs are essentially sentence fragment completion engines. So when we "ask" it to identify this new animal, what we're really asking it to do is complete a sentence like, "an animal which is 100 feet long and weight three hundred thousand pounds is likely to be..."
And then the LLM spits out: "a Blue whale."
Likewise, when we ask an LLM to summarize that session what we’re really asking it to do is complete the sentence: “An abbreviated version of this body of text would be:”
And then the LLM will essentially search it’s vector space, and find a sequence of letters, spaces and punctuation that resides “close“ to the space occupied by the body of text, the transcript, that it’s been fed.
Just as the LLM can't infer that the whale has a skeleton, there’s no semantic understanding of the intent of the words spoken by the participant, or the context that exists outside of what’s captured in the transcript; details which a skilled Researcher would immediately understand, and are vital to interpreting the meaning of what participants are saying.
Moreover, the transcript, in all likelihood, includes nothing about gestures, body movements, and nonverbal expressions made by the participant. All of these are vitally important to understanding what’s going on in our research.
Going back to the animal kingdom: if you had a set of pictures of dogs showing only their four legs, and asked asked a large language model like GPT3 to categorize this set of images, you’d get some weird results.
The model may not understand, and in fact, would have no frame of reference for, the fact that you had given it a bunch of pictures of dogs. It might just as well, and some cases might be more likely, to categorize them all as giraffes.
This might point the way towards how AI might best be used in UX research. Image recognition, facial recognition, emotional recognition, and speech to text could all be combined into a fairly accurate “session analysis.“
But you need to spend a significant amount of time composing a reference set of analyses, and you need an LLM with the capacity (GPT4 might do it) to take in that data set as part of each prompt. This would be the “cheaper“ method of training such a model for this kind of a specialized task, as opposed to building the model up on your set of training data.
One could imagine cloud hosted UX research platforms, like UserTesting building, a specialized model like this, but it’s likely out of reach for most small UX research teams, and certainly for any independent UX researcher.
A possibly more fruitful area for UX Researchers to work AI into their process centers around conducting sessions in a non-native language. With speech to text, text to text translation, and text to speech capabilities all vastly enhanced by artificial intelligence, I think we’re not very far from a world where researchers can conduct one-on-one sessions with participants who do not speak the same language.
This opens up even more interesting areas of research, where researchers may find it easier to conduct sessions with participants who have a variety of disabilities that inhibit either movement, speech, or other methods of interaction.
Want More Of This in Your Inbox?