Home Blog UX How to Perform a Usability Evaluation

How to Perform a Usability Evaluation

Last Edited April 28, 2026
by Garenne Bigby

A usability evaluation tells you how well visitors can find what they need on your site, complete the tasks you want them to complete, and how they feel about the process. Done well, it surfaces the friction points that bounce people back to a search results page — before those bounces show up in your analytics.

This guide walks through the five usability evaluation methods worth knowing in 2026 — heuristic evaluation, usability testing, focus groups, card sorting, and first-click testing — plus how many users you actually need, how to recruit them, and which tools handle the work today.

What is Usability?

Usability is the degree to which your site is effective, efficient, and satisfying for the people who use it. The ISO standard (ISO 9241-11) breaks it into those three dimensions explicitly. Usability is not a new concept — Jakob Nielsen and others codified the discipline in the 1990s, the ISO definition has been around since 1998, and the field is older than most teams realize. What has changed is the variety of methods and tools available to evaluate it.

A useful informal definition: usability is the combination of ease of learning, intuitive design, efficiency of use, satisfaction, memorability, and the rate and severity of errors users encounter. For a longer foundation, the Nielsen Norman Group’s Usability 101 remains the canonical primer.

Why Usability Evaluations Matter

You have probably hit a site where the layout made no sense, the content you were looking for was somewhere on the site but not findable, and a few stray redirects sent you in circles. The next click went back to Google, and you ended up on a competitor’s page.

Usability evaluations tell you where your design, categorization, and layout fall short before that pattern shows up in your bounce rate. They give you a concrete picture of how the people you actually want to reach navigate your site, which lets you prioritize fixes by impact rather than guesswork. Done early in a project, evaluations are far cheaper than retroactive redesigns. For a related framing, see our overview of 18 usability guidelines and website design standards.

The Five Methods Worth Knowing

A complete usability evaluation usually combines several of these:

Heuristic evaluation — experts review the site against known usability principles. Cheap, fast, catches obvious issues without participants.
Usability testing — real users attempt tasks while you observe. The single highest-yield method for finding actual user problems.
Focus groups — moderated discussions to understand attitudes and expectations.
Card sorting — users group your content the way they think about it; informs information architecture.
First-click testing — measures whether users click the right thing first when given a goal.

The rest of this guide covers each in turn, then how to recruit and how many participants you need.

Heuristic Evaluation

A heuristic evaluation is a structured expert review of your site against an established set of usability principles. The most widely used set is Jakob Nielsen’s 10 usability heuristics — visibility of system status, match between system and real world, user control and freedom, consistency, error prevention, recognition over recall, flexibility, aesthetic minimalism, error recovery, and help and documentation.

Three to five evaluators independently review the site, each writes up issues against the heuristics, and the team consolidates findings into a single ranked list. Heuristic evaluation is fast, requires no participants, and tends to surface a different class of problems than user testing — together they give you the most complete picture.

Usability Testing

In a usability test, you give participants representative tasks and watch them attempt those tasks on your site. Observers note where users hesitate, get lost, or give up. Three things to measure on every task:

Effectiveness — did the user complete the task?
Efficiency — how long did it take, and how many steps?
Satisfaction — how did the user feel about the experience?

Effectiveness and efficiency come from observation; satisfaction comes from short post-task questions or a standard questionnaire (see SUS below). Tests can be moderated (you watch live and ask follow-up questions) or unmoderated (a tool records sessions for later review).

16 Usability Testing Tools for Optimizing User Experience

Focus Groups

A focus group is a moderated discussion among five to ten participants about how they perceive and react to your site. The key difference from usability testing: in a usability test you watch what people do; in a focus group you hear what they say. Both signal types matter, and they catch different problems.

Focus groups work best as a follow-up to usability testing. Participants who have just completed tasks have the experience fresh in their minds and can articulate frustrations and wins concretely. Run the session from a script of open-ended questions designed to encourage discussion rather than a Q&A. Record video and audio for later analysis.

Remote focus groups run well over Zoom, Microsoft Teams, or Google Meet — note that Skype was retired by Microsoft in May 2025, so older guides recommending it are out of date. Pick whichever platform your participants are most comfortable with, since unfamiliar tooling biases the discussion.

Focus Groups: Everything You Need to Know

Card Sorting

In a card sorting session, participants are given a list of content topics and asked to organize them into groups that make sense to them. The output drives your information architecture: how content should be categorized, what your top-level navigation should look like, and which pages naturally cluster.

Two variants:

Open card sort — participants name the categories themselves. Best for greenfield IA work, when you do not yet have category labels.
Closed card sort — categories are fixed; participants drop topics into them. Best for validating an existing IA proposal.

A common pattern is to run an open sort first, identify recurring category labels, then run a closed sort to validate them. Combine card sorting with a discussion of how navigation differs from information architecture if you find your team conflating the two.

10 Card Sorting Tools for Surveying Information Architecture (IA)

First-Click Testing

First-click testing measures one specific behavior: where do users click first when trying to complete a goal? Research from the Nielsen Norman Group and others has shown that users who click the right thing first are roughly twice as likely to complete the task overall, which makes the first click an unusually high-leverage signal.

Set up by giving each participant a question (“where would you go to update your billing address?”) and recording where they click on a static screenshot or live page. As the moderator, you should already know the correct path; compare your participants’ first clicks against it. If many participants click somewhere else, your IA or labeling has a problem.

First-click testing pairs naturally with card sorting — run a card sort to fix categorization, then a first-click test on the new layout to confirm the fix works.

Moderated vs Unmoderated Remote Testing

Most usability evaluations run remote in 2026. The choice is between moderated (you watch live, can ask follow-up questions, sessions are scheduled one at a time) and unmoderated (participants follow a script on their own, sessions are recorded and reviewed later, you can run dozens in parallel).

Moderated is better when the questions are open-ended, when participant behavior needs to be probed, or when you are early in a design and the right follow-up questions are not yet obvious.
Unmoderated is better when tasks are well-defined, when you need quantitative data across many participants, and when scheduling overhead would otherwise dominate the project timeline.

Most modern projects mix the two — a small moderated round to identify open questions, then a larger unmoderated round to confirm patterns at scale.

How Many Users Do You Need?

Probably fewer than you think. The Nielsen Norman Group’s research shows that five participants will surface roughly 85% of usability issues in a moderated qualitative test. After five, you hit diminishing returns — each additional participant finds fewer new problems while costing the same.

The five-user rule applies to qualitative usability testing of a single user group. If you need quantitative data (success rates, average completion times, statistical significance), plan on 20-40 participants per condition. If you have multiple distinct user groups, run five-person rounds with each group separately.

The trap teams fall into is recruiting 30 participants for what is fundamentally a qualitative study, then over-investing in a single round. Three or four small iterative rounds beat one large round almost every time.

System Usability Scale: Quantifying What You Find

The System Usability Scale (SUS) is a 10-question questionnaire that produces a single 0-100 usability score. It is the de facto standard for putting a number on usability, and is useful for tracking changes across redesigns or benchmarking against industry norms (a SUS score of 68 is roughly average; above 80 is considered excellent). For the questionnaire and scoring instructions, see usability.gov’s SUS documentation.

Recruiting Participants

The most important screening criterion is behavior, not demographics. Recruit people whose actual goals and tasks match the ones your site supports — someone who has shopped for office furniture in the last 90 days is a better participant for a furniture-site test than someone who matches the right age bracket but has never bought furniture.

A practical screener focuses on:

Task experience — has the participant done the kind of thing your site asks them to do?
Goals and motivations — do they have a real need that maps to what you are testing?
Tool familiarity — what platforms and devices do they currently use? Beware testing only with people who already use your category of product.
Accessibility needs — recruit participants who use assistive technology if your site has any meaningful audience using it.

Demographics like age, language, and region matter as secondary screeners — useful for ensuring your sample is not skewed, but not the primary recruitment lens. For a fuller framing of who to recruit and why, see how personas can help your content strategy.

Where to find participants:

Existing customers — usually the highest-quality match, often willing to participate for a modest incentive
On-site recruitment widgets that pop up to qualifying visitors
Recruiting panels run by tools like UserTesting, Maze, and Lyssna (covered below)
Targeted social-media or LinkedIn outreach for niche audiences

Offer a fair incentive — gift cards in the $50-100 range for a 30-60 minute session are typical in 2026. Test sessions on a free or trial basis tend to attract people who do not match your real audience.

Modern Tools for Usability Evaluation

The market consolidated around a handful of options in the 2020s. The right tool depends on whether you need moderated or unmoderated testing and how much recruiting you need to outsource:

UserTesting — large recruitment panel, both moderated and unmoderated, AI-assisted session summarization. Best for teams that need participants delivered turnkey.
Maze — unmoderated tests on prototypes and live sites with strong quantitative reporting (success rate, time on task, heatmaps). Common for design-team workflows.
Lyssna (formerly UsabilityHub) — first-click tests, five-second tests, card sorting, surveys. Cheap, fast, good for quick iteration.
Hotjar — heatmaps, session recordings, and on-site surveys. Different category from the others; complements rather than replaces moderated testing.
Lookback, Optimal Workshop, and PlaybookUX round out the field for specific use cases (live moderated, IA research, and B2B research respectively).

AI-assisted features — auto-summarization of recorded sessions, sentiment analysis of think-aloud transcripts, and anomaly detection — are now standard in the major tools. They speed up review but should not replace watching at least a couple of full sessions yourself; nuances get lost in summary.

For broader UX tool coverage including AB testing platforms (which complement usability testing for quantitative validation), see our roundup of AB testing tools.

Usability and Accessibility

A site that fails for screen reader users, keyboard-only users, or users with low vision is not usable; it just fails for a smaller, less-visible group. Bake accessibility into every usability evaluation by including participants who use assistive technology, testing with keyboard navigation, and validating against WCAG 2.2. Our piece on how to involve users in web accessibility testing covers the recruiting and methodology specifics. Treating usability and accessibility as the same workstream rather than parallel ones is one of the highest-leverage practice changes a UX team can make.

Frequently Asked Questions

What is usability evaluation?

The structured process of measuring how effective, efficient, and satisfying a website is for its users. It typically combines methods like heuristic evaluation, usability testing, card sorting, and first-click testing.

What is the difference between usability testing and usability evaluation?

Usability evaluation is the broader category. Usability testing is one method within it — real users attempt tasks while you observe. A complete evaluation usually includes testing plus other methods.

How many users do I need?

Five participants surface around 85% of issues in qualitative testing of a single user group. For quantitative measurements, plan on 20-40 per condition.

What is heuristic evaluation?

An expert review of an interface against established usability principles, most commonly Jakob Nielsen’s 10 heuristics. It is fast, cheap, and requires no participants.

How is usability evaluation different from accessibility testing?

Accessibility testing checks whether the site works for users with disabilities — typically against WCAG. Usability evaluation measures broader effectiveness for any user. The two should be combined.

Bottom Line

Usability evaluation is not a single test — it is a stack of methods that catch different classes of problems. Heuristic evaluation finds principles violations cheaply. Usability testing finds the specific friction real users hit. Card sorting and first-click testing validate that your information architecture matches how users think. SUS and quantitative testing turn what you find into something you can track over time.

Run early, run small, run often. Five users beats fifty in most cases. Bake accessibility into every round. And avoid the trap of substituting analytics for evaluation — pageviews tell you what people did, but evaluation tells you why.