Website
Company Profile
Industry: SaaS (K-12 / Workflow Management)
Context: Scaling UX across multi-team product org
Constraint: No analytics layer, fragmented data sources
Tooling: Notion, Flow Metrics, internal scorecards
Overview
We had the data. We didn’t have the system. UX output existed across tickets, Flow metrics, and design artifacts. But there was no consistent way to answer simple questions:
Are we getting faster?
Are we stable?
Are we wasting effort?
The goal was to build a metrics system that connects UX work to delivery outcomes. Not a dashboard. A model.
Challenge
Three core problems:
1. Fragmented data
Tickets, Flow metrics, and scorecards lived in different systems
No shared structure or definitions
2. Misaligned metrics
Flow metrics (jitter, cycle time) existed
Scorecard metrics (efficiency, performance) existed
No mapping between them
3. Data model limitations
Period-based rows (YTD, H1, H2) prevented direct comparison
No backend or SQL layer
Only Notion formulas
Foundation
The first move was structural. Instead of comparing rows, I restructured the model to compare columns inside a row:
H1 Cycle Time (days)
H2 Cycle Time (days)
Jitter (YTD), H1 Jitter, H2 Jitter
This allowed direct computation of trends without needing joins or aggregation layers. That shift unlocked the entire system.
Thinking
I treated metrics like a pipeline:
Raw inputs
Normalize by time
Normalize by capacity
Calculate change
Add interpretation
LLMs were used as a thinking partner, not a source of truth:
Process
1. Metric Selection + Mapping
Started with Flow metrics:
Jitter
Cycle Time
Queue Time
Backflow Rate
Tickets Completed
Mapped each to a performance outcome:
Workload Efficiency
Feature Completion Time
Ticket Completion Efficiency
UX Overall Score
Constraint applied: no metric without a decision.
2. Metric Translation (LLM-Assisted)
LLM used to convert raw signals into meaning:
Jitter → workload stability
Backflow → quality / rework
Cycle Time → delivery speed
Queue Time → bottlenecks
Generated multiple interpretations fast, then validated against real team behavior.
3. Computation Layer
Core pattern:
Applied to:
Jitter trend
Cycle Time improvement
Backflow change
Example:
This created a time-based performance model, not static reporting.
4. Capacity Normalization (FTE)
Initial mistake:
Produced meaningless outputs (11,100 tickets per FTE).
Correction:
Jitter trend
Cycle Time improvement
Backflow change
Baseline:
3 workers
40 hrs/week
52 weeks
Final:
Key insight: all performance metrics must be time-normalized.
5. Sprint-Level Modeling
Added execution layer:
11-day sprint
~88 hours
New metric:
Now performance could be evaluated:
per year
per half
per sprint
6. Interpretation Layer
Raw numbers don’t drive decisions.
Added:
Productivity bands (Low, Moderate, High)
Stability signals (Low / Medium)
System classification (Balanced vs Bottleneck)
Example:
LLM + Notion System
What the LLM Did
Synthesized metric definitions from raw inputs
Validated coverage across metrics
Standardized documentation
Generated audience-specific narratives
What Notion Did
Became the system of record
Stored:
metric definitions
mapping tables
formulas
rationale
Acted as a lightweight analytics layer + documentation system
Governance
Four rules kept the system honest:
No metric without a decision
Every metric has a failure mode (anti-gaming)
Trends over snapshots
Speed paired with quality
Results
Operational Impact
Jitter reduced by 35.39% → improved stability
Workload efficiency increased 48.3%
Strong correlation:
reuse rate → time saved (0.82)
effort → delivery time (0.77)
System Impact
Unified Flow metrics + UX scorecard
Created shared language across UX, Product, Engineering
Enabled repeatable monthly and quarterly reviews


