Incident IQ

LLM-Driven UX Metrics System

Work Delivered:

Metrics Architecture
UX → Business Alignment
System Design
LLM-Assisted Synthesis
Operational Analytics
+31% faster time-to-market

Incident IQ

LLM-Driven UX Metrics System

Work Delivered:

Metrics Architecture
UX → Business Alignment
System Design
LLM-Assisted Synthesis
Operational Analytics
+31% faster time-to-market

Incident IQ

LLM-Driven UX Metrics System

Work Delivered:

Metrics Architecture
UX → Business Alignment
System Design
LLM-Assisted Synthesis
Operational Analytics
+31% faster time-to-market

Industry

Industry

EdTech, K–12 SaaS

EdTech, K–12 SaaS

Headquarters

Headquarters

Atlanta, GA

Atlanta, GA

Founded

Founded

2018

2012

Company Size

Company Size

250+ employees

101-250

Key Markets

Key Markets

U.S. K–12 Districts

U.S. K–12 Districts

Growth Stage

Growth Stage

Series B Growth Investment

Series B Growth Investment

Company Profile

  • Industry: SaaS (K-12 / Workflow Management)

  • Context: Scaling UX across multi-team product org

  • Constraint: No analytics layer, fragmented data sources

  • Tooling: Notion, Flow Metrics, internal scorecards

Overview

We had the data. We didn’t have the system. UX output existed across tickets, Flow metrics, and design artifacts. But there was no consistent way to answer simple questions:

  • Are we getting faster?

  • Are we stable?

  • Are we wasting effort?

The goal was to build a metrics system that connects UX work to delivery outcomes. Not a dashboard. A model.

Challenge

Three core problems:

1. Fragmented data

  • Tickets, Flow metrics, and scorecards lived in different systems

  • No shared structure or definitions

2. Misaligned metrics

  • Flow metrics (jitter, cycle time) existed

  • Scorecard metrics (efficiency, performance) existed

  • No mapping between them

3. Data model limitations

  • Period-based rows (YTD, H1, H2) prevented direct comparison

  • No backend or SQL layer

  • Only Notion formulas

Foundation

The first move was structural. Instead of comparing rows, I restructured the model to compare columns inside a row:

  • H1 Cycle Time (days)

  • H2 Cycle Time (days)

  • Jitter (YTD), H1 Jitter, H2 Jitter

This allowed direct computation of trends without needing joins or aggregation layers. That shift unlocked the entire system.

Thinking

I treated metrics like a pipeline:

  • Raw inputs

  • Normalize by time

  • Normalize by capacity

  • Calculate change

  • Add interpretation

LLMs were used as a thinking partner, not a source of truth:

Process

1. Metric Selection + Mapping

Started with Flow metrics:

  • Jitter

  • Cycle Time

  • Queue Time

  • Backflow Rate

  • Tickets Completed

Mapped each to a performance outcome:

  • Workload Efficiency

  • Feature Completion Time

  • Ticket Completion Efficiency

  • UX Overall Score

Constraint applied: no metric without a decision.

2. Metric Translation (LLM-Assisted)

LLM used to convert raw signals into meaning:

  • Jitter → workload stability

  • Backflow → quality / rework

  • Cycle Time → delivery speed

  • Queue Time → bottlenecks

Generated multiple interpretations fast, then validated against real team behavior.

3. Computation Layer

Core pattern:

(New - Old) / Old

Applied to:

  • Jitter trend

  • Cycle Time improvement

  • Backflow change

Example:

(13.2 - 9.8) / 13.2 = 25.76

This created a time-based performance model, not static reporting.

4. Capacity Normalization (FTE)

Initial mistake:

Tickets / FTE

Produced meaningless outputs (11,100 tickets per FTE).

Correction:

  • Jitter trend

  • Cycle Time improvement

  • Backflow change

Tickets / (FTE hours)

Baseline:

  • 3 workers

  • 40 hrs/week

  • 52 weeks

6240 total hours

Final:

111 / 6240 = 0.018 tickets/hour

Key insight: all performance metrics must be time-normalized.

5. Sprint-Level Modeling

Added execution layer:

  • 11-day sprint

  • ~88 hours

New metric:

Tickets / (FTE * sprint hours)

Now performance could be evaluated:

  • per year

  • per half

  • per sprint

6. Interpretation Layer

Raw numbers don’t drive decisions.

Added:

  • Productivity bands (Low, Moderate, High)

  • Stability signals (Low / Medium)

  • System classification (Balanced vs Bottleneck)

Example:

0.03 tickets/hour per FTE Low Productivity

LLM + Notion System

What the LLM Did

  • Synthesized metric definitions from raw inputs

  • Validated coverage across metrics

  • Standardized documentation

  • Generated audience-specific narratives

What Notion Did

  • Became the system of record

  • Stored:

    • metric definitions

    • mapping tables

    • formulas

    • rationale

  • Acted as a lightweight analytics layer + documentation system

Governance

Four rules kept the system honest:

  • No metric without a decision

  • Every metric has a failure mode (anti-gaming)

  • Trends over snapshots

  • Speed paired with quality

Results

Operational Impact

  • Jitter reduced by 35.39% → improved stability

  • Workload efficiency increased 48.3%

  • Strong correlation:

    • reuse rate → time saved (0.82)

    • effort → delivery time (0.77)

System Impact

  • Unified Flow metrics + UX scorecard

  • Created shared language across UX, Product, Engineering

  • Enabled repeatable monthly and quarterly reviews

Impact

This turned UX from “we did the work” into “here’s how the system is performing and where it breaks.” The system made UX measurable, comparable, and defensible.

Reflection

The biggest shift wasn’t using LLMs. It was understanding where they fit. LLMs accelerated synthesis, iteration, and clarity, but they didn’t define the system. That came from structuring the data, defining the relationships, and enforcing constraints. Final insight: metrics are not formulas — they are models of how a system behaves.

Role

Director of UX, reporting to the VP of Engineering and CTO.

Designed the full metrics architecture, defined computation logic, and used LLM-assisted iteration to build a scalable performance system inside Notion.

Related Case Studies

Scaled UX from MVP to enterprise—cut onboarding friction 30% and supported $1.3B growth.

Scalable UX
Activation 25%
Operational Clarity
$2.3B Valuation
FinTech (SaaS)

Scaled $500M+ fundraising platform from MVP to acquisition. Led UX, design ops, and trust-first donation flows that enabled scalable giving.

Nonprofit Tech
Behavioral UX
Platform Startegy
Conversion 18%
$500M Donations
FinTech (SaaS)

Scaled national EdTech access platform to 25M+ users—led UX for SSO, onboarding, and system growth.

UX Systems
Brand Strategy
Conversion Clarity
Infrastructure-Led
Engagement 20%
10K+ Districts​
EdTech

A better system makes everyone’s day less stupid. Clarity pays for itself.

Related Case Studies

Scaled UX from MVP to enterprise—cut onboarding friction 30% and supported $1.3B growth.

Scalable UX
Activation 25%
Operational Clarity
$2.3B Valuation
FinTech (SaaS)

Scaled $500M+ fundraising platform from MVP to acquisition. Led UX, design ops, and trust-first donation flows that enabled scalable giving.

Nonprofit Tech
Behavioral UX
Platform Startegy
Conversion 18%
$500M Donations

Scaled national EdTech access platform to 25M+ users—led UX for SSO, onboarding, and system growth.

UX Systems
Brand Strategy
Conversion Clarity
Infrastructure-Led
Engagement 20%
10K+ Districts​
EdTech

A better system makes everyone’s day less stupid. Clarity pays for itself.

Related Case Studies

Scaled UX from MVP to enterprise—cut onboarding friction 30% and supported $1.3B growth.

Scalable UX
Activation 25%
Operational Clarity
$2.3B Valuation
FinTech (SaaS)

Scaled $500M+ fundraising platform from MVP to acquisition. Led UX, design ops, and trust-first donation flows that enabled scalable giving.

Nonprofit Tech

FinTech (SaaS)

Behavioral UX

Platform Startegy

Conversion 18%

$500M Donations

$500M Donations

Scaled national EdTech access platform to 25M+ users—led UX for SSO, onboarding, and system growth.

EdTech

UX Systems

Conversion Clarity

Brand Strategy

Infrastructure-Led

Engagement 20%

10K+ Districts​

FinTech (SaaS)

A better system makes everyone’s day less stupid. Clarity pays for itself.