July 1, 2026·5 min read

Anatomy of a 95 and a 45

Same task. Two prompts. A fifty-point gap. Not because one writer is smarter — because one of them thought about what the model actually needs, and the other typed what they'd say to a coworker.

This is the walkthrough we wish we'd had when we started building Player Cue. Below is a real scenario, two prompts written for it, the outputs each one produced, and a line-by-line pass at why the rubric scored them the way it did.

Nothing here is exotic. The low-scoring prompt is what almost everyone would write on the first try. That's the point.

The task

You've written a long, rambling email to your boss explaining a technical problem and what you want to do about it. Before you hit send, you decide to ask AI to tighten it up. Here's what you wrote:

Hey Sarah,

I know you're busy so I'll try to keep this short but wanted to
flag something. So basically we've been having some issues with the
analytics dashboard — it's been really slow for a lot of users,
especially in the mornings. I looked into it a bit and it seems like
the issue is that we're running the queries against the main database
instead of a read replica, which was fine when we had less traffic
but now that we've onboarded the enterprise accounts it's causing
problems.

I think we should probably move it over. It would take maybe 2 days
of engineering time but would fix the slowness. Marcus mentioned he
could take it if we prioritize it.

Let me know what you think, happy to chat about it if that's easier.

Thanks!

The email works — it's not offensive, it contains the facts, your boss could figure out what you want if she read it carefully. But it takes about 45 seconds to read, buries the ask at the bottom, and doesn't make it obvious what you need from her. She's going to skim it. You want to give her something she can act on in ten seconds.

Prompt A — scores 45

Here's what most people would type:

Rewrite this email to be shorter.

The output:

Hey Sarah,

Wanted to flag that the analytics dashboard has been slow, especially
in the mornings, since we onboarded the enterprise accounts. I think
we should move it to a read replica — Marcus said he could take it and
it'd be about 2 days of work. Let me know what you think!

Thanks!

At a glance this looks like a win. It's shorter. It has all the facts. It sounds friendly. If you skim it you'd say the AI did its job.

But read it as your boss. She has five minutes before her next meeting. What does she do with this email? She's not sure. It reads like a heads-up, but you actually want approval — and nothing in the email tells her that. It ends with "let me know what you think," which is a shrug, not a question. She has to write back and ask what you're actually requesting.

Specifically, the rewrite missed:

It still buries the ask. The proposal shows up in the middle of the email instead of at the top, where a skimmer would see it.
It doesn't tell her what to do. Is this a heads-up? An approval request? An FYI? The output leaves her guessing — just like the original did.
The filler survived. "Wanted to flag," "let me know what you think," "thanks!" — all present in shorter form. The model treats those as friendly, not as noise.
It kept the casual opening. "Hey Sarah" is fine between peers. For an approval request going up the chain, it undercuts the seriousness of the ask.

The model did exactly what it was asked. The problem is what it was asked to do. "Shorter" isn't a goal — it's a proxy for a goal. The real goal was "a busy exec can act on this in ten seconds." Nobody said that. So the model just... shortened it.

Prompt B — scores 95

Rewrite this email so it works for a busy executive.

The person reading it will spend ten seconds on it. They need to
know, in this order:

1. What's broken and who it affects
2. What you're proposing
3. What decision you need from them

Keep it under 80 words. Lead with the ask. Cut anything that isn't
a fact, a proposal, or a specific request — no filler phrases like
"wanted to flag" or "let me know what you think." Keep the two
concrete details: the 2-day estimate and that Marcus can own it.

Email:
Hey Sarah,

I know you're busy so I'll try to keep this short but wanted to
flag something. So basically we've been having some issues with the
analytics dashboard — it's been really slow for a lot of users,
especially in the mornings. I looked into it a bit and it seems like
the issue is that we're running the queries against the main database
instead of a read replica, which was fine when we had less traffic
but now that we've onboarded the enterprise accounts it's causing
problems.

I think we should probably move it over. It would take maybe 2 days
of engineering time but would fix the slowness. Marcus mentioned he
could take it if we prioritize it.

Let me know what you think, happy to chat about it if that's easier.

Thanks!

The output:

Sarah,

Approval request: move the analytics dashboard from the main
database to the read replica. It's been slow in the mornings since
the enterprise rollout — currently affecting all enterprise users.
Marcus can own it, ~2 days of work.

OK to prioritize this sprint?

Every sentence does work. It opens with what Sarah has to decide. It gives her the problem, who it affects, the proposed fix, the owner, and the cost. It ends with a yes/no question she can answer in a one-line reply. The filler is gone. The whole thing takes seven seconds to read.

What actually changed

The gap between these two prompts isn't effort. It's three specific decisions the second writer made that the first didn't.

1. They said who it's for.

"Shorter" is meaningless without an audience. Shorter for who? An intern who needs context? A peer who already knows the situation? A tired exec who will forget the details in an hour? Each of those calls for a different rewrite. The model has to pick one, so it picks the vaguest, most inoffensive middle path.

Prompt B says "a busy executive who'll spend ten seconds." One sentence. That single line does more work than any amount of tweaking on the output side. Once the model knows the audience, it stops being polite and starts being useful.

Most prompt failures aren't model failures — they're unstated-audience failures. You knew who you were writing to. The model had to guess.

2. They said what to cut, not just what to keep.

"Cut anything that isn't a fact, a proposal, or a specific request." That one sentence kills the filler. It removes "wanted to flag," "let me know what you think," "happy to chat," "thanks!" — all the phrases the first rewrite kept because the model thought they were friendly.

This is the single most under-used move in prompt writing. Most people obsess over what to include. The bigger lever, almost always, is telling the model what to leave out. Politeness, hedging, transitional phrases, restating what you just said — the model will keep all of it unless you specifically say to strip it.

3. They gave it a shape, not just a request.

"In this order: what's broken, what you're proposing, what decision you need." That's not a style note — it's a structure. Three slots. The model now has to fill each slot rather than write until it feels done.

Notice the difference in the two outputs. Prompt A's rewrite is a shorter version of the original in the same order — problem, cause, proposal, shrug. Prompt B's rewrite reorders everything so the decision is first, the context is second, and the yes/no question is last. That reordering isn't something the model would do on its own from "make it shorter." You have to ask for it.

When your task cares about the order of information, or about hitting specific beats, tell the model what those beats are. Don't hope it guesses.

Why this scoring style is the point

Both prompts feel reasonable when you read them cold. If we asked a room of people to guess which one scored higher, most would probably pick the right one — but few would be able to articulate the five specific places the first one loses points. That's the gap Player Cue is built to close.

You don't learn to write prompts like Prompt B by reading advice. You learn by writing something, watching it fail against a rubric, seeing exactly which criterion it missed, and trying again. The scoring is the teacher.

The reason we grade the output rather than the prompt itself is exactly what this walkthrough shows. Prompt A is fine as a piece of English. It's only when you look at what it produced — and check that against what a good answer would actually look like — that the fifty-point gap becomes visible. No prompt-length checker or clarity-scoring tool would have caught it. Only the output can.

Try it

The scenario above is representative of the daily challenges on Player Cue. New task every day. Real scoring. A retry loop that shows you the exact criteria you missed. If you've been meaning to get sharper at this, the fastest way is to write a prompt, get it scored, and see the gap for yourself.

Take today's challenge →