Taking notes in usability tests

Anyone who has ever conducted a usability evaluation of a web site, software application, or consumer product, knows that human behaviour research often produces reams of data that can take significant time to analyse. To be productive, researchers must organize and reduce these data so that they can quickly perform their analysis and proceed with improving the product.

People who are new to the field tend to take notes on paper or on a computer. Unfortunately, this approach can make data compilation cumbersome. More complex and powerful tools to assist with compilation and analysis are becoming more common, but they can be expensive and often provide more functions than most researchers need. In addition, such proprietary tools are frequently restricted to the Windows operating system, and they offer little, if any, ability for customizing, making it difficult for practitioners to tailor the tool to meet their specific needs.

How to define behaviours and collect data in usability tests

Before entering into a discussion on logging the data that arises from usability research, let's review some terminology associated with usability research.

The ISO Definition of usability: Effectiveness, Efficiency, and Satisfaction

According to the International Standards Organization (ISO 9241-11) there are three primary attributes that comprise usability: effectiveness, efficiency, and satisfaction. Some usability experts feel that we should also consider some additional contributing elements such as learnability and retention. Table 1 shows how some of the more popular definitions of usability map to one another.

Three Popular Definitions of Usability
ISO 9241-11 Nielsen 1993 Shneiderman 1998
Efficiency Efficiency
Learnability
Speed of Performance
Time to Learn
Effectiveness Memorability
Errors
Safety
Retention over time
Rate of errors by users
Satisfaction Satisfaction Subjective satisfaction

Distinguishing Quantitative/Qualitative and Objective/Subjective Data

The terms quantitative and qualitative have become pervasive in the user-centered design community. Because of the close working relationship between marketing and user experience groups in product design, there is often confusion surrounding these terms. In my experience, the confusion is generally due to the tendency to refer to research "methods" and "data" as if they are always the same. This is not always true, however.

For example, a survey of 1000 people is often regarded as quantitative research, yet it may collect both quantitative and qualitative data. Similarly, interviews and usability tests are often considered qualitative research, yet they too can collect both quantitative and qualitative data. By reducing the "research" to a single type, we fail to recognize that different types of data may be collected, and that different types of analysis are appropriate.

Another dimension of data that can be confusing to people is the objective and subjective dimension. In general, objective data, that which is "external to the mind", based on facts, and unbiased by opinion or interpretation is more valuable than subjective data, that which "exists in the mind" and belongs to the thinking subject rather than the object of thought. Just as both quantitative and qualitative data may be collected in usability research, both objective and subjective data may be collected.

Effectively determining the types of data to be collected in a study is really a function of the research questions that need to be answered. Table 2 provides some examples of usability data collection to help convey the possibilities.

Types of Data Collected for Effectiveness, Efficiency, Satisfaction
Quantitative Qualitative
Objective Subjective Objective Subjective
Effectiveness Count of tasks completed successfully (according to predefined criteria).

Count of errors committed by user during task performance (according to predefined criteria).
Likert scale rating by participant of how well the product solves the intended job. A description of the observed sequence of steps performed by user. Participant's comments related to completing a given task.
Efficiency Time spent per completed task.

Count of number of clicks performed during task completion.
Likert scale rating by participant of how efficient they perceive the product to be. Participant's comments related to perceived efficiency of product.
Satisfaction Likert scale rating of participant satisfaction. Participant's comments related to satisfaction with product.

A description of observed behaviour by participant (frustration, delight, etc.)

Distinguishing between Formative and Summative Evaluations

One final distinction that exists in the field of usability research is the one between formative and summative evaluations. In a formative evaluation, the emphasis is on the "formation" of the future design and direction of a product. Data collected to help drive this future direction may include qualitative data that is largely based on users' observed behaviours and comments about the product. It may also include quantitative data, however, especially when the research question involves an A-B comparison between two early prototypes.

A summative evaluation is intended to provide a "summation" of the products' current state, ideally in the form of a measurable score. Due to the desire for a numerical score, quantitative data collection is generally the priority in summative research. Qualitative data may still be collected as a 'bonus' to supplement the value gained from the study, provided it does not interfere or influence the collection of quantitative data. For example, a think-aloud protocol is generally not performed during a summative study due to its impact on the time required to complete a task. However, qualitative comments may be recorded after tasks and/or at the end of a study without adversely affecting other data collection.

Presenting Your Data with Confidence

Finally, it's worth noting a relatively recent enthusiasm in the usability community for including confidence intervals when presenting the effectiveness results from a usability study. Sauro (2005) makes a case for the benefit of including confidence intervals in addition to point estimates (i.e. the basic task completion percentage that is commonly reported) in order to help convey the margin of error associated with the results, and to "temper both excessive scepticism and overstated usability findings".

Calculating confidence intervals is a relatively straightforward procedure, although as Lewis and Sauro (2006) point out, there are numerous subtleties to be considered in your selection of method used. In practice, however, the results seem to be quite similar across the different methods of calculation, such that applying any one of their top 2-3 recommended methods will allow you to achieve the goal of communicating that a margin of error is associated with your results.

Datalogging Practices

What's the common practice?

As with most types of research, usability research is frequently characterized by "tailored approaches" refined and customized by organizations and individuals to meet their specific needs. This is particularly true when it comes to collecting data.

In an Idea Market session conducted at UPA 2004, Dr. David Dayton (Southern Polytechnic State University) delved into the details of how six different practitioners (including himself) logged usability data. As part of his review of datalogging practices, Dayton (2004) describes four popular types:

Problem Coding
Record predictable events and sort them on the fly into one or more categories. Analyse the resulting quantitative data with statistical methods and compare to pre-defined benchmarks to assess the usability of the product. None of the practitioners reported regular use of this technique.
Event Description
Records free-form handwritten notes to capture significant events and/or usability problems. Analyze notes post-test, group and categorize events, and rate their severity. 4 of the practitioners reported regular use of this technique.
Event Description with Problem Coding
Combine methods 1 and 2. Code events into certain pre-set categories, and enter descriptive notes for later team review and discussion of the most significant problems. 1 of the practitioners reported regular use of this technique.
Event Description with Problem Coding & Video Time Stamps
Capture the "story" of a test session in shorthand notes. Code significant events into preset categories (“navigation problem”, “mental model gap”). Time-stamp to sync notes with video. 1 of the practitioners reported regular use of this technique.

Dayton (2004) made two other interesting observations about data logging in his UPA workshop. He noted that his collection of participants and attendees at the session were generally able to agree that an effective data log is one that "allows a team to find answers to its questions without having to review the session tapes." Interestingly, of the six data points mapped to his common practices the only practitioner who regularly includes a reference to recorded video data was Dayton himself.

Dayton (2004) also discovered from his work with experienced practitioners, that the most common mistake made by those new to logging is the tendency for "observation overkill". That is, the tendency to record excess information that ends up impeding the team in reviewing the logs during analysis sessions. In this author's experience, this signal to noise ratio of "data to information" certainly determines the value of your logged results.

The benefits of separating data collection from data analysis

With respect to the types of data collection commonly practiced, an interesting issue presents itself regarding the 'on the fly' coding of data according to preset categories. Researchers should consider how important it is to their study that they separate data collection from data analysis, for this decision may have implications on the time required to complete the analysis phase, as well as the quality of analysis performed.

Keeping data collection separate from analysis allows the researcher to concentrate on making quality observations, and leaves the analysis and pattern identification of problems to be performed later once all data has been gathered. This approach may be especially appropriate when the product being evaluated is new and a predetermined set of categories or codes may not be entirely appropriate for that product.

Alternatively, pre-defined categories for coding data 'on the fly' may help expedite a research study by reducing the amount of time spent in data analysis. When the product being tested has been tested previously, pre-defined categories are more likely to be anticipated with a high degree of accuracy.

Data Collection: The low-tech solution

As Dayton (2004) revealed by his small sample from a UPA conference, data logging practices may consist of a basic paper notepad or electronic document. A pre-designed form or template may further facilitate logging by anticipating in advance some of the patterns to be recorded and providing a checklist approach to recording common observations. Even with a well-designed form, however, this method can result in a significant "paper shuffle" at the end of the study. Multiple pages of documents with scribbled notes and numbers, often out of order and inconsistently labelled, must then be collated, coded, entered into some type of sorting software, and categorized by task or question — all of this just so the analysis phase can begin!

Low-tech solutions are also limited when it comes to collecting efficiency measures. Typically, measuring efficiency requires the use of a stopwatch or some external timekeeping tool whose results are then manually recorded onto the printed form. During analysis, these efficiency data may get manually entered a second time from the paper form into some analysis software before insertion into the final report. Practices which require little or no data re-entry are most desirable from the perspective of data-integrity.

Data Collection: the high-tech solution

In recent years, an increasing number of computer software packages has been developed specifically to support usability researchers in collecting, coding, analysing, and even reporting their usability data. Several of these programs provide particularly excellent solutions for managing the video data captured during a usability study. Like any solution, however, these applications have their strengths and weaknesses. The following section presents a quick review of some of the available solutions on the market.

A Survey of Usability Data Logging Software

The following software products represent a range of solutions currently available:

Bit Debris
The Usability Activity Log "offers an effective means to easily and unobtrusively document observational data and task performance". This application can be synchronized with existing video equipment so that recorded observations are directly 'linked' to the accompanying video data for easy access by the researcher. The product is a Windows application and costs $300.00 USD per license.
Noldus
The Noldus Observer is "a professional system for the collection, analysis, presentation and management of observational data." This application is able to accommodate data entry directly from a computer, a handheld device, or a video recorder, and offers extensive coding and analysis options to the researcher. While the Observer is packed with powerful features that may be needed for extensive qualitative research, it may be overkill for many usability researchers. Observer is Windows only.
OvoStudios
OVO Logger comes in three different flavours (freeware, a la carte, fully featured) and provides extensive logging options for both notes and "tapeless" video as well as powerful bookmarking and reporting features designed to optimise the analysis and reporting phase of a study. While the fully featured version may be more than many researchers need or have a budget for, the freeware version may be just the right ticket. OVO Logger is a Windows-only product.
Techsmith
Morae is touted as "the only fully integrated, all-digital solution for analysing human-computer interaction". This application takes full advantage of digital video technology to allow researchers to capture, store, locate, and edit their video data from a usability study. Priced at $1495.00, this application may be an attractively priced solution for managing video data. In addition, the software provides the ability to enter notes and set markers to reference the corresponding video. Morae is Windows only but can be used with Uservue to collect data from users anywhere in the world.
Usability Systems/Alucid
UsabilityWare 4.0 is "a single program that can be relied upon as a beginning-to-end tool for all of your data collection, analysis, and final deliverables." This program allows you to enter your recruitment and scheduling details prior to a study, record your observations during the study, analyze the results, and build a report based on the data. The product is a Windows application and costs $4500.00 per license.

Pros and Cons of Commercial Software Dataloggers

Pros

  • They address the majority of a usability data needs in a single tool (e.g. test protocols, participant details, qualitative observations, quantitative measures, data analysis and/or export to other analysis tools, and automatic report generation).
  • Ability to bookmark specific points on corresponding video to facilitate easy retrieval and editing following completion of the study.
  • Automatic linking within the final report to desired video selections.
  • Various levels of note-taking ability to allow capture of observations, issues, etc.
  • Remote capabilities allowing team members at remote locations to view sessions over the Internet and contribute notes, markers, etc. to a common data file.
  • Automatic collection of quantitative data (e.g. time on task, mouse clicks)
  • Exporting capabilities to transfer quantitative data to analysis tools (e.g. MS Excel).

Cons

  • Expensive, ranging from a few hundred dollars to several thousand dollars.
  • Complex to learn and master depending on the product's feature list.
  • Limited to the PC platform (few options for Macintosh-based researchers).
  • Often focused primarily on the recording, storage, and retrieval of selected video data, with less attention paid to recording and analysing observed data.
  • Lack of customisation available.
  • Require significant hard drive space for installation.

How to use Microsoft Excel for Data Logging

Since VisiCalc (short for 'visible calculator') arrived on the scene in 1978, the computer spreadsheet has been considered by many to be the original 'killer software application'. A few years later, Lotus 1-2-3 assumed the lead position in the spreadsheet category, and shortly after that Microsoft Excel captured first place. Today, Microsoft Excel holds one of the largest installed bases of any software application, and is an integral part of the Microsoft Office suite of business applications.

Pros

  • Ubiquitous nature of Microsoft Excel means it is familiar to many people and already resides on the majority of computers.
  • Built-in statistical and charting capabilities facilitate a variety of analyses.
  • Many researchers already use Excel to organize, sort, and chart their data.
  • Ability to incorporate numbers, text, formulas, and Visual Basic programming, permits significant customisation to meet individual needs.
  • Cross-platform compatibility accommodates PC and Macintosh users.
  • Lightweight file size requires little additional hard drive space (assuming that Microsoft Excel application already exists on the computer).

Cons

  • Limited layout and formatting abilities due to fixed grid pattern of columns/rows.
  • Limited timing ability even with Visual Basic programming enhancements,
  • Visual Basic knowledge is required in order to make certain customisations.
  • Visual Basic macros employ absolute references rather than relative references, making it necessary to update certain macros with certain customisations.
  • Entering the cursor into a cell interrupts any visible Visual Basic macros that are currently running (e.g. timer feature).

Download Excel datalogger

If you'd like to try out an Excel datalogger, visit the Datalogger download page.

About the author

Todd Zazelenchuk

Dr. Todd Zazelenchuk (@ToddZazelenchuk on Twitter) holds a BSc in Geography, a BEd, an MSc in Educational Technology and a PhD in Instructional Design. Todd is an associate of Userfocus and works in product design at Plantronics in Santa Cruz, CA where he designs integrated mobile, web, and client-based software applications that enhance the user experience with Plantronics' hardware devices.



Foundation Certificate in UX

Gain hands-on practice in all the key areas of UX while you prepare for the BCS Foundation Certificate in User Experience. More details

Download the best of Userfocus. For free.

100s of pages of practical advice on user experience, in handy portable form. 'Bright Ideas' eBooks.

Usability test datalogger

This Excel spreadsheet allows you to measure task completion rates, time-on-task, analyse questionnaire data, and summarise participant comments. Usability test datalogger.

Related articles & resources

This article is tagged ISO 9241, metrics, questionnaires, tools, usability testing.


Our services

Let us help you create great customer experiences.

Training courses

Join our community of UX professionals who get their user experience training from Userfocus. See our curriculum.

Todd ZazelenchukDr. Todd Zazelenchuk (@ToddZazelenchuk on Twitter) holds a BSc in Geography, a BEd, an MSc in Educational Technology and a PhD in Instructional Design. Todd is an associate of Userfocus and works in product design at Plantronics in Santa Cruz, CA.

Get help with…

If you liked this, try…

Get our newsletter (And a free guide to usability test moderation)
No thanks