StudioXDS

Work

UX at Scale

Startups

Services

Studio

Book a Call →

Incident IQ

LLM-Driven UX Metrics System

Work Delivered:

Metrics Architecture

UX → Business Alignment

System Design

LLM-Assisted Synthesis

Operational Analytics

+31% faster time-to-market

Incident IQ

LLM-Driven UX Metrics System

Work Delivered:

Metrics Architecture

UX → Business Alignment

System Design

LLM-Assisted Synthesis

Operational Analytics

+31% faster time-to-market

Incident IQ

LLM-Driven UX Metrics System

Work Delivered:

Metrics Architecture

UX → Business Alignment

System Design

LLM-Assisted Synthesis

Operational Analytics

+31% faster time-to-market

Industry

EdTech, K–12 SaaS

Headquarters

Atlanta, GA

Founded

2018

2012

Company Size

250+ employees

101-250

Key Markets

U.S. K–12 Districts

Growth Stage

Series B Growth Investment

Website

www.incidentiq.com

Company Profile

Industry: SaaS (K-12 / Workflow Management)
Context: Scaling UX across multi-team product org
Constraint: No analytics layer, fragmented data sources
Tooling: Notion, Flow Metrics, internal scorecards

Overview

We had the data. We didn’t have the system. UX output existed across tickets, Flow metrics, and design artifacts. But there was no consistent way to answer simple questions:

Are we getting faster?
Are we stable?
Are we wasting effort?

The goal was to build a metrics system that connects UX work to delivery outcomes. Not a dashboard. A model.

Challenge

Three core problems:

1. Fragmented data

Tickets, Flow metrics, and scorecards lived in different systems
No shared structure or definitions

2. Misaligned metrics

Flow metrics (jitter, cycle time) existed
Scorecard metrics (efficiency, performance) existed
No mapping between them

3. Data model limitations

Period-based rows (YTD, H1, H2) prevented direct comparison
No backend or SQL layer
Only Notion formulas

Foundation

The first move was structural. Instead of comparing rows, I restructured the model to compare columns inside a row:

H1 Cycle Time (days)
H2 Cycle Time (days)
Jitter (YTD), H1 Jitter, H2 Jitter

This allowed direct computation of trends without needing joins or aggregation layers. That shift unlocked the entire system.

Thinking

I treated metrics like a pipeline:

Raw inputs
Normalize by time
Normalize by capacity
Calculate change
Add interpretation

LLMs were used as a thinking partner, not a source of truth:

Process

1. Metric Selection + Mapping

Started with Flow metrics:

Jitter
Cycle Time
Queue Time
Backflow Rate
Tickets Completed

Mapped each to a performance outcome:

Workload Efficiency
Feature Completion Time
Ticket Completion Efficiency
UX Overall Score

Constraint applied: no metric without a decision.

2. Metric Translation (LLM-Assisted)

LLM used to convert raw signals into meaning:

Jitter → workload stability
Backflow → quality / rework
Cycle Time → delivery speed
Queue Time → bottlenecks

Generated multiple interpretations fast, then validated against real team behavior.

3. Computation Layer

Core pattern:

(New - Old) / Old

Applied to:

Jitter trend
Cycle Time improvement
Backflow change

Example:

(13.2 - 9.8) / 13.2 = 25.76

This created a time-based performance model, not static reporting.

4. Capacity Normalization (FTE)

Initial mistake:

Tickets / FTE

Produced meaningless outputs (11,100 tickets per FTE).

Correction:

Jitter trend
Cycle Time improvement
Backflow change

Tickets / (FTE hours)

Baseline:

3 workers
40 hrs/week
52 weeks

6240 total hours

Final:

111 / 6240 = 0.018 tickets/hour

Key insight: all performance metrics must be time-normalized.

5. Sprint-Level Modeling

Added execution layer:

11-day sprint
~88 hours

New metric:

Tickets / (FTE * sprint hours)

Now performance could be evaluated:

per year
per half
per sprint

6. Interpretation Layer

Raw numbers don’t drive decisions.

Added:

Productivity bands (Low, Moderate, High)
Stability signals (Low / Medium)
System classification (Balanced vs Bottleneck)

Example:

0.03 tickets/hour per FTE → Low Productivity

LLM + Notion System

What the LLM Did

Synthesized metric definitions from raw inputs
Validated coverage across metrics
Standardized documentation
Generated audience-specific narratives

What Notion Did

Became the system of record
Stored:
- metric definitions
- mapping tables
- formulas
- rationale
Acted as a lightweight analytics layer + documentation system

Governance

Four rules kept the system honest:

No metric without a decision
Every metric has a failure mode (anti-gaming)
Trends over snapshots
Speed paired with quality

Results

Operational Impact

Jitter reduced by 35.39% → improved stability
Workload efficiency increased 48.3%
Strong correlation:
- reuse rate → time saved (0.82)
- effort → delivery time (0.77)

System Impact

Unified Flow metrics + UX scorecard
Created shared language across UX, Product, Engineering
Enabled repeatable monthly and quarterly reviews

Impact

This turned UX from “we did the work” into “here’s how the system is performing and where it breaks.” The system made UX measurable, comparable, and defensible.

Reflection

The biggest shift wasn’t using LLMs. It was understanding where they fit. LLMs accelerated synthesis, iteration, and clarity, but they didn’t define the system. That came from structuring the data, defining the relationships, and enforcing constraints. Final insight: metrics are not formulas — they are models of how a system behaves.

Role

Director of UX, reporting to the VP of Engineering and CTO.

Designed the full metrics architecture, defined computation logic, and used LLM-assisted iteration to build a scalable performance system inside Notion.

Related Case Studies

Human Interest

Scaled UX from MVP to enterprise—cut onboarding friction 30% and supported $1.3B growth.

Scalable UX

Activation 25%

Operational Clarity

Compliance UX

FinTech (SaaS)

GoFundMe

Scaled $500M+ fundraising platform from MVP to acquisition. Led UX, design ops, and trust-first donation flows that enabled scalable giving.

Nonprofit Tech

Behavioral UX

Platform Startegy

Conversion 18%

$500M Donations

FinTech (SaaS)

Clever Inc.

Scaled national EdTech access platform to 25M+ users—led UX for SSO, onboarding, and system growth.

UX Systems

Brand Strategy

Conversion Clarity

Infrastructure-Led

Engagement 20%

10K+ Districts

EdTech

A better system makes everyone’s day less stupid. Clarity pays for itself.

Let's Collaborate →

Related Case Studies

Human Interest

Scaled UX from MVP to enterprise—cut onboarding friction 30% and supported $1.3B growth.

Scalable UX

Activation 25%

Operational Clarity

$2.3B Valuation

FinTech (SaaS)

GoFundMe

Scaled $500M+ fundraising platform from MVP to acquisition. Led UX, design ops, and trust-first donation flows that enabled scalable giving.

Nonprofit Tech

Behavioral UX

Platform Startegy

Conversion 18%

$500M Donations

Clever Inc.

Scaled national EdTech access platform to 25M+ users—led UX for SSO, onboarding, and system growth.

UX Systems

Brand Strategy

Conversion Clarity

Infrastructure-Led

Engagement 20%

10K+ Districts

EdTech

A better system makes everyone’s day less stupid. Clarity pays for itself.

Let's Collaborate →

Related Case Studies

Human Interest

Scaled UX from MVP to enterprise—cut onboarding friction 30% and supported $1.3B growth.

Scalable UX

Activation 25%

Operational Clarity

$2.3B Valuation

FinTech (SaaS)

GoFundMe

Scaled $500M+ fundraising platform from MVP to acquisition. Led UX, design ops, and trust-first donation flows that enabled scalable giving.

Nonprofit Tech

FinTech (SaaS)

Behavioral UX

Platform Startegy

Conversion 18%

$500M Donations

Clever Inc.

Scaled national EdTech access platform to 25M+ users—led UX for SSO, onboarding, and system growth.

EdTech

UX Systems

Conversion Clarity

Brand Strategy

Infrastructure-Led

Engagement 20%

10K+ Districts

FinTech (SaaS)

A better system makes everyone’s day less stupid. Clarity pays for itself.

Let's Collaborate →

Design built to scale.

Startups

Studio

Services

Leadership

Design built to scale.

Startups

Studio

Services

Leadership

Design built to scale.

Startups

Blog

Services

Leadership

Overview

Work

Brandon Tully

Experience

StudioXDS

Work

UX at Scale

Startups

Services

Studio

Incident IQ

LLM-Driven UX Metrics System

Work Delivered:

Metrics Architecture

UX → Business Alignment

System Design

LLM-Assisted Synthesis

Operational Analytics

+31% faster time-to-market

Incident IQ

LLM-Driven UX Metrics System

Work Delivered:

Metrics Architecture

UX → Business Alignment

System Design

LLM-Assisted Synthesis

Operational Analytics

+31% faster time-to-market

Incident IQ

LLM-Driven UX Metrics System

Work Delivered:

Metrics Architecture

UX → Business Alignment

System Design

LLM-Assisted Synthesis

Operational Analytics

+31% faster time-to-market

Industry

EdTech, K–12 SaaS

Headquarters

Atlanta, GA

Founded

2018

Company Size

250+ employees

Key Markets

U.S. K–12 Districts

Growth Stage

Series B Growth Investment

Website

www.incidentiq.com

Company Profile

Industry: SaaS (K-12 / Workflow Management)

Context: Scaling UX across multi-team product org

Constraint: No analytics layer, fragmented data sources

Tooling: Notion, Flow Metrics, internal scorecards

Overview

We had the data. We didn’t have the system. UX output existed across tickets, Flow metrics, and design artifacts. But there was no consistent way to answer simple questions:

Are we getting faster?

Are we stable?

Are we wasting effort?

The goal was to build a metrics system that connects UX work to delivery outcomes. Not a dashboard. A model.

Challenge

Three core problems:

1. Fragmented data

Tickets, Flow metrics, and scorecards lived in different systems

No shared structure or definitions

2. Misaligned metrics

Flow metrics (jitter, cycle time) existed

Scorecard metrics (efficiency, performance) existed

No mapping between them

3. Data model limitations

Period-based rows (YTD, H1, H2) prevented direct comparison

No backend or SQL layer

Only Notion formulas

Foundation

The first move was structural. Instead of comparing rows, I restructured the model to compare columns inside a row:

H1 Cycle Time (days)

H2 Cycle Time (days)

Jitter (YTD), H1 Jitter, H2 Jitter

This allowed direct computation of trends without needing joins or aggregation layers. That shift unlocked the entire system.

Thinking

I treated metrics like a pipeline:

Raw inputs