Comparison test

Photo by rawpixel on Unsplash

Designers learn early on that they should never produce one version of a design. Instead, they produce a small handful of designs, say 3-5, that allows them to explore alternatives. Either the client decides which one they prefer or they decide to put it to the vote by asking users.

Researchers who should know better sometimes refer to this as A/B testing. They have two designs, A and B, and then attempt to find out which one participants prefer.

This is not A/B testing. This is preference testing. The difference is not a semantic one. In our book, Think Like a UX Researcher, Philip Hodgson and I distinguish between three kinds of UX research evidence:

  • Strong evidence: this comes from target users doing tasks or engaging in some activity that is relevant to the product being designed.
  • Moderately strong evidence: this comes from studies that at least include carrying out tasks. This could be by users or by usability experts, and may involve self-reporting of actual behaviours.
  • Weak evidence: this results from methods that are either flawed or are little better than guesswork.

Preference testing provides weak evidence — evidence so weak it is worthless. In contrast, A/B testing provides strong research evidence.

What is A/B Testing?

A/B testing, as opposed to preference testing, is a form of multivariate testing. A/B testing takes place on a live web site. Half the visitors see one version of a page (Design A) and the other half see a slightly different version of the page (Design B). Development teams use A/B tests to test calls to action, product pricing and images on landing pages. The "winner" of an A/B test is the design that best drives the behaviour you want (such as purchases, donations or sign-ups to a newsletter).

One of my favourite examples of A/B testing comes from Google. When Google launched ads on Gmail, the development team wanted to optimise the ad hyperlink colour. They tested forty different shades of blue. They discovered that a slightly purpler shade of blue resulted in more clicks than a slightly greener shade of blue. The upshot from this testing was that by choosing the purpler shade of blue, Google made an extra $200m a year in ad revenue.

What's the problem with preference testing?

Imagine we asked participants which of those 40 shades of blue they preferred. Would people prefer the slightly purpler shade of blue that results in more clicks? Of course not. Expressed like this, it sounds ridiculous. So why does preference testing persist?

It persists because asking people which design they prefer appears to have face validity. Indeed, this kind of research method is popular in market research where marketers aim to discover which imagery, logo or brand identify people prefer.

But UX research is not market research. There are at least 4 reasons why preference testing has only a minor role, if any role at all, to play in UX research.

Preference testing does not reflect real world usage

When researchers ask about preference, they generally show a participant two alternatives. They then ask them to pick the one they prefer. For example, "Do you prefer Design A or Design B?". Or "Would you prefer some feature (e.g. the navigation bar) presented on the left or right of a display?"

But that bears no resemblance to the way people make that judgment in reality. In the real world, people don't look at two web pages and make a decision on which one they prefer. Instead, people use a product to achieve a goal. Asking people to choose a design that they haven't used will lead to judgements weighted towards less important factors. Factors like a product's visual design, or whether it looks familiar, or an idiosyncratic preference (such as hating the colour orange). Instead, UX researchers should focus on the factors that lead to long-term usage and loyalty (such as "Can I do what I want to with this product?")

People are not invested in the outcome

It's true that people know what they prefer. You can tell me if you prefer Coke or Pepsi, Apple or Microsoft, or your friend Amy or your friend Janice. But what you don't know is why you feel that way. Preference testing asks people to introspect on the reasons for their choices so that the development team can choose a design direction. Questions like, "Why do you prefer design A over Design B?" or "Why do you prefer the navigation on the left of the screen?" are common. It turns out people aren't good at answering this kind of question. People don't know why, or they don't care enough to answer, or they may not want to tell you. When asked for an opinion, most people will form one on the spot. Such opinions aren't carefully considered or deeply held. It's not that UX researchers don't care what people like: it's just risky making important design decisions based on fickle opinions.

Preference testing asks people to predict the future

Preference testing asks people to imagine a future where these designs exist and then predict which of the two they will use. But research shows that people are poor at predicting a wide range of future behaviours. This includes the time it takes to complete tasks, the likelihood that they will have long and happy relationships, how much they will save, how they will perform in exams, how much they will give to charity, and how likely they are to adopt healthy behaviours. The best predictor of your users' future behaviour is their past behaviour, not their current opinions. Rather than ask people what they prefer, we should instead be discovering which they perform best with.

Preference testing conflates research questions with interview questions

Your research question is the purpose of your research. It guides the design of your study. Whether Design A is better than Design B is an example of a valid research question.

But a research question is not a question you can ask participants in your study. As an example, your research question may be, "Does cycling affect employee productivity?" but you can't ask participants to answer this question and expect to get meaningful answers. Instead, you may ask a participant, "How many times a week do you cycle to work?" or "What stops you cycling more often?" and compare those with productivity measures.

In fact, the way you answer your research question may not entail asking participants any questions at all. You may simply ask participants to do tasks as you observe (usability testing). In "The Mom Test", Rob Fitzpatrick captures this well when he writes:

"Trying to learn from customer conversations is like excavating a delicate archaeological site. The truth is down there somewhere, but it's fragile. While each blow with your shovel gets you closer to the truth, you're liable to smash it into a million little pieces if you use too blunt an instrument."

Usability testing: The next best thing to a time machine

The difficulty with A/B Testing is that you can't use it early in design. What if we have two prototypes of an early design concept and want to compare them? This is presumably why people are drawn to preference testing instead.

But this is exactly why usability testing was invented. Now, rather than ask people which they prefer, we ask people to use a prototype and discover the usability issues that need to be fixed.

A usability issue is anything that:

  • Prevents task completion.
  • Slows down the participant.
  • Takes the participant off-course.
  • Causes the participant to find a workaround.
  • Makes the participant confused.
  • Irritates or annoys the participant.
  • Forces an error.
  • Prevents the participant noticing something.
  • Implies things are OK (when they are not).
  • Implies a task is complete (when it isn't).
  • Causes the participant to misinterpret content.
  • Prevents the participant from taking the next step.

Usability testing provides strong research data because it is based on behaviour — what people do rather than what they say they do. That's the next best thing we have to a time machine because we get to project the test participant forward to a future when your concept has become a real product and we get to see how it will perform.

-oOo-

About the author

David Travis

Dr. David Travis (@userfocus) has been carrying out ethnographic field research and running product usability tests since 1989. He has published three books on user experience including Think Like a UX Researcher. If you like his articles, you might enjoy his free online user experience course.



Foundation Certificate in UX

Gain hands-on practice in all the key areas of UX while you prepare for the BCS Foundation Certificate in User Experience. More details

Download the best of Userfocus. For free.

100s of pages of practical advice on user experience, in handy portable form. 'Bright Ideas' eBooks.

Related articles & resources

This article is tagged usability testing.


Our services

Let us help you create great customer experiences.

Training courses

Join our community of UX professionals who get their user experience training from Userfocus. See our curriculum.

David Travis Dr. David Travis (@userfocus) has been carrying out ethnographic field research and running product usability tests since 1989. He has published three books on user experience including Think Like a UX Researcher.

Phillip Hodgson Dr. Philip Hodgson (@bpusability on Twitter) has been a UX researcher for over 25 years. His work has influenced design for the US, European and Asian markets, for everything from banking software and medical devices to store displays, packaging and even baby care products. His book, Think Like a UX Researcher, was published in January 2019.

Get help with…

If you liked this, try…

Get our newsletter (And a free guide to usability test moderation)
No thanks