How UX & UI Are Tested: The Full Process (From Lab Sensors to Cursor Data)

A practical walkthrough of modern UX/UI testing — planning, recruiting, moderated sessions, biometrics, eye tracking, and behavior analytics like cursor heatmaps.

11.01.2026 BY Jakub

How UX & UI Are Tested: The Full Process (From Lab Sensors to Cursor Data) header image

Introduction

The First Rule ↓ The Whole UX/UI Testing Process (End-to-End) ↓ The “Physical” Side ↓ The “Digital Trail” Side ↓ Experiments: A/B Testing Isn’t “Better UX” by Default ↓ Accessibility Testing (The Most Ignored UX Superpower) ↓ A Realistic Example: One Study, Two Measurement Layers ↓ Conclusion ↓

UX & UI testing isn’t a single method or a complicated science experiment. It’s simply a way to check whether people can actually use what you’ve designed.

Sometimes that means sitting in a room with a few users and watching them struggle. Other times it means looking at thousands of real sessions to see where people get stuck.

The most effective teams usually combine both approaches:

Qualitative testing to understand why people are confused or hesitant
Quantitative data to see how often those problems happen and whether fixes work

This guide walks through the full UX/UI testing process in plain terms — from moderated testing and eye tracking to analytics, heatmaps, and A/B experiments — without assuming you’re a researcher or data specialist.

User testing session with a researcher observing and notes on a tablet

The First Rule

Test a question, not a screen

Testing “this new UI” doesn’t help much. Good testing starts with a simple, concrete question you want answered.

Examples:

Can first-time users find pricing without help?
Where do people hesitate before paying?
Which step causes the most confusion on mobile?

The easiest way to waste time in UX testing is to collect lots of recordings without knowing what decision they should inform. Decide what you might change first — then test to support that decision.

UX researcher writing usability test questions and tasks

The Whole UX/UI Testing Process (End-to-End)

1) Define the goal, constraints, and success metrics

You need one primary outcome and a few supporting signals.

Common usability metrics (session-level):

Task success: completed / failed / completed with help
Time on task: how long it takes (watch for outliers)
Error rate: wrong taps, wrong paths, form mistakes
SEQ (Single Ease Question): “How easy was this task?” (1–7, because middle point)
SUS (System Usability Scale): broader usability benchmark (10 questions)

Common product metrics (behavior-level):

conversion rate, activation, retention
funnel drop-off per step
support contacts / returns / cancellations

“If you don’t define “success”, you’ll end up debating opinions. If you define it too narrowly, you’ll ship a “winner” that harms trust.”

2) Pick the right method (the “why” and the “how many”)

Most teams combine 2–4 methods, because each one answers different questions.

Moderated usability testing (classic)

Best for: discovering friction, misunderstandings, missing expectations.

5–8 participants per key audience segment often reveals the biggest usability issues
you learn language, mental models, and emotional reactions

Unmoderated remote testing

Best for: breadth, quick comparisons, and less “researcher influence”.

larger samples (often 20–100+) reduce noise
great for first-click tests, preference tests, comprehension checks

Analytics + session replay + cursor tracking

Best for: finding where to look and validating impact at scale.

shows patterns across thousands of sessions
doesn’t tell you the full “why” on its own

Experiments (A/B, multivariate, holdouts)

Best for: proving causality (did the change cause the improvement?).

Overview of UX research methods: moderated, unmoderated, analytics, experiments

3) Choose prototype fidelity (paper → clickable → coded)

The prototype should match what you’re trying to learn.

Paper / whiteboard: perfect for flows and information architecture (fast, cheap, honest feedback)
Clickable prototype (Figma, etc.): great for navigation and first-time understanding
Coded prototype / staging: needed for performance, real data, edge cases, accessibility, and device-specific behavior

“Wizard of Oz” testing is underrated: users interact with a realistic UI, while a human secretly powers the “AI” or complex logic behind the scenes. It lets you test value and trust before building heavy systems.

Prototype fidelity from paper sketches to clickable UI to coded prototype

4) Recruitment: who you test matters more than how many

Recruit by behavior, not demographics. Examples:

“Has switched email provider in the last 6 months”
“Books hotels for work at least once a month”
“Uses screen reader daily”

Also decide:

device and context (phone-on-the-go vs desktop-at-work)
incentives, NDAs, consent forms
exclusion criteria (avoid “professional testers” if you’re testing mainstream UX)

5) Write tasks that don’t give away the answer

Bad task: “Find the billing settings and change your plan.” Better: “You were charged for the wrong plan. What would you do?”

Good task design:

starts with a believable story
avoids button names and UI terms
can be completed in more than one reasonable way (so you can see real strategies)

Usability testing plan with tasks, metrics, and participant notes

The “Physical” Side

Lab Testing With Sensors, Cameras, and Eye Tracking

This is where UX research looks almost like sports science. It’s not always necessary — but it can reveal how people allocate attention, how hard a task is, and which moments create stress or confusion.

A typical lab setup (what’s being recorded)

Screen recording + audio (what they do, what they say)
Face camera (micro-reactions, confusion moments, trust signals)
Overhead camera (hands, device handling, posture, hesitations)
Eye tracking (where attention goes, what’s ignored, scan patterns)

Then, optionally:

EDA / GSR (skin conductance): changes in arousal (stress/effort), not “happiness”
Heart rate / HRV (often via PPG wrist sensors): workload and stress trends (noisy, context-dependent)
Pupil dilation (pupillometry): cognitive load signals (strongly affected by lighting)
Motion sensors (in VR/AR or physical prototypes): head/hand movement, reach, time-to-grab

Unknown-but-important detail: eye tracking isn’t just “where they look”. Researchers often analyze time-to-first-fixation (how long it takes to notice a critical element) and revisits (how often people must re-check something they don’t trust).

Participant wearing eye-tracking equipment during a usability test

What eye tracking can reveal (that interviews often miss)

Eye tracking is powerful when the problem is visual hierarchy, scanning, or attention. Examples:

users say they saw a CTA, but their gaze never landed on it
users “read” a form label, but their eyes bounce between two fields (ambiguity)
users over-scan navigation because information scent is weak

Common eye tracking outputs:

gaze plots (scan paths)
heatmaps (aggregated attention)
AOI metrics (areas of interest: time spent, order, revisits)

Limitation: looking is not understanding. Always pair with tasks + outcomes.

When biometrics help (and when they don’t)

Physiological signals can be useful for timing: they can flag “something happened here” even when the participant can’t articulate it.

But they’re rarely proof of a specific emotion. EDA spikes can mean: confusion, stress, excitement, surprise, or even a room temperature change.

Use biometrics to:

find moments worth replaying and discussing
compare workload between two versions (with careful setup)
validate that a “small” UI change actually reduces effort

Avoid using biometrics to:

label feelings (“they are happy now”)
justify a decision without behavioral outcomes

Lab user testing setup with cameras and biometric sensors

The “Digital Trail” Side

Cursor Data, Heatmaps, Replays, and Funnels

Not every team has a lab. But almost every product can measure real behavior at scale.

Cursor tracking and heatmaps (what they measure)

Cursor data is imperfect — but still useful. Modern tools can capture:

click maps: where people actually click (and miss-click)
scroll depth: how far people go before giving up
move maps: where the cursor spends time (often correlates with attention on desktop)

More “hidden” signals used in mature teams:

rage clicks: repeated clicks in frustration
dead clicks: clicks on non-interactive elements (false affordances)
u-turns: back-and-forth navigation patterns
hesitation time: long pauses before committing to an action

One of the most actionable metrics is “dead clicks per 100 sessions”. It’s a direct measure of UI misleading people — and it’s often easier to fix than “conversion”.

Heatmap analytics showing clicks, scroll depth, and cursor movement

Session replay (why it’s powerful, and why it’s sensitive)

Session replay shows the full story: hesitations, loops, rage clicks, input errors, and device problems. It’s extremely effective for debugging UX — but it comes with privacy obligations.

If you use replay:

mask inputs by default (especially PII)
limit retention
restrict access (research team only)
document consent and legal basis (GDPR/CCPA context)

Funnel analytics (how teams pick what to test next)

Funnels answer:

where people drop off
which step is unexpectedly slow
which segment struggles the most (new users, mobile, certain regions)

The best workflow is:

find the friction point in analytics
watch replays to see patterns
run usability tests to learn why
iterate and validate with an experiment

A/B test dashboard comparing two UI variants and results

Experiments: A/B Testing Isn’t “Better UX” by Default

A/B tests are incredible for causality, but they can reward short-term behavior. Great teams pair A/B testing with “guardrails”:

long-term retention
refund / complaint rates
accessibility and error rates
trust proxies (drop-off after pricing, “account deletion” spikes)

“If an A/B test increases conversion but increases support tickets, you didn’t “win” — you just moved the cost.”

Accessibility Testing (The Most Ignored UX Superpower)

Accessibility testing isn’t a separate “compliance task” — it’s UX testing under real constraints.

Include sessions with:

keyboard-only navigation
screen readers (NVDA, VoiceOver, JAWS)
high zoom / reflow (200–400%)
reduced motion preferences
color contrast and color-blindness checks

Many “mysterious” usability problems disappear once a product is built to be robust and predictable.

Accessibility testing using keyboard navigation and a screen reader

A Realistic Example: One Study, Two Measurement Layers

Imagine you’re improving a checkout form.

In a moderated test (8 participants):

run 3–4 tasks (add item, apply discount, pay, find invoice)
record screen + audio
track: task success, time, errors, SEQ
optionally add eye tracking to see if key trust signals are noticed (delivery, returns, payment security)

In analytics (thousands of sessions):

track form error rate per field
track rage clicks / dead clicks around promo code
compare drop-off by device and country

The overlap between these layers gives you confidence: you know what broke, why it broke, and how often it breaks.

UX research findings translated into a prioritized design backlog

Conclusion

Modern UX/UI testing is a loop: observe → measure → fix → validate.

The “physical” side (eye tracking, sensors, cameras) can reveal attention and workload. The “digital trail” side (cursor data, replays, funnels, experiments) shows what happens at scale.

When you combine them, you don’t just ship prettier interfaces — you ship interfaces that people can actually use, trust, and understand.

Sources:

📓 ISO 9241-210 (Human-centred design for interactive systems) 📚 Rubin, J., & Chisnell, D. (Handbook of Usability Testing) 📚 Tullis, T., & Albert, B. (Measuring the User Experience) 📓 Nielsen Norman Group (NN/g) research articles on usability testing and eye tracking 📓 W3C WCAG (Web Content Accessibility Guidelines)

Thanks for reading ✌️