Noor
An AI copilot that drafts the reply before you've finished reading the thread.
Cut median time-to-first-reply in user tests from ~4 minutes to under 40 seconds, with humans keeping ~70% of the draft.
The bet
The bottleneck was never typing. It was context.
Every "AI for support" demo writes a reply from a single message. Real reps don't work from a single message — they work from a thread, an account history, three tabs, and a half-remembered Slack decision. The blank box isn't the problem. Gathering the context to fill it is.
Noor's bet: if the copilot does the gathering — reads the whole thread, retrieves the account's history and the relevant docs, and grounds a draft in all of it — the human's job shrinks from author to editor. Editing a good draft is a different, faster cognitive task than writing one.
What I shipped
Scope held tight on purpose
- Thread-aware drafting: the model reads the full conversation, not just the last message.
- Grounded retrieval over the team's docs and past resolved tickets, with the sources shown inline so the rep can trust or reject them.
- An "edit, don't accept" UX — the draft lands in an editable field, never auto-sends. The human is always the last decision.
- A lightweight eval harness so I could measure draft quality across releases instead of vibing it.
Tradeoffs
The decisions I'd defend in a review
Draft into an editable field, never auto-send
Auto-send demos beautifully and erodes trust the first time it's confidently wrong. Keeping the human as editor made the product something a team would actually turn on. Trust was the real adoption gate, not capability.
Show retrieved sources inline, even when ugly
Reps won't ship words they can't verify. Visible sources turned the copilot from a black box into a tool they could audit in two seconds — slower to read, far faster to trust.
Build the eval harness before scaling prompts
Without a way to measure draft quality, every prompt change is a guess. The harness cost a week and made every week after it honest.
What I learned
The hard part of AI products is the part that isn't AI.
The model was the easy 20%. The product was retrieval quality, the trust UX, the eval loop, and knowing which 30% of replies to *not* try to draft because getting them wrong was worse than staying silent. That last call — where to set the confidence floor — was a pure product decision the model couldn't make for me.
[PLACEHOLDER: add a real anecdote from a design partner — the moment someone said 'I'd pay for this' or the feature they asked for that you said no to.]