The rubric

How prompts get graded here

This page is the actual rubric the grader is held to — same source file, rendered for humans. No secret sauce, no vibes. Learn it and you won't need us.

The seven dimensions

Each is scored 0–10 against written anchors. The total is a weighted average computed in code — the model never picks your number. Dimensions that can't apply to your kind of task (few-shot examples for a one-line question, say) are marked n/a and their weight is redistributed, never held against you.

1.Clarity

Could a stranger execute this prompt without guessing what you meant?

0–2: Ambiguous referents, vague verbs ("make it better", "fix this"), or internal contradictions. The model must guess the actual task.
5: The task is understandable but has fuzzy edges the model fills in by assumption.
9–10: Exactly one reasonable reading. Every "it", "this", and instruction resolves unambiguously.

2.Specificity

Are the concrete details there — names, quantities, technologies, styles?

0–2: Generic to the point of interchangeability ("write something about marketing").
5: Topic and rough angle present, but no quantities, named tools, or concrete particulars.
9–10: Concrete subject, scope, quantities, named technologies or styles where they matter.

3.Context

Does the model get the background it cannot guess — audience, purpose, situation, inputs?

0–2: Zero background. The model must invent the situation, audience, and purpose.
5: Some background, but key facts (who it is for, why, what already exists) are missing.
9–10: Everything the model cannot guess is supplied: audience, purpose, prior state, relevant inputs.

4.Constraints & scope

Are the boundaries set — length, tone, scope, and what NOT to do?

0–2: No bounds at all. Length, tone, scope, and exclusions are entirely open.
5: A bound or two (often length), but large degrees of freedom remain that will produce unwanted output.
9–10: Length, tone, scope, and exclusions are pinned exactly where they would otherwise cause drift.

5.Output format

Is the desired shape of the answer specified?

0–2: No hint of the desired output shape.
5: Shape implied ("list some…") but structure unspecified.
9–10: Explicit structure: sections, fields, schema, table, or template.

6.Role & audience

Where it would change the output, does the prompt say who the model should be and who the output is for?

0–2: Neither a useful role framing nor an audience, in a task where both would change the result.
5: One of role or audience present, the other missing.
9–10: Role and audience defined wherever they would change the output (or genuinely not needed).

7.Examples

If the task needs style or format mimicry, is there a sample to mimic?

0–2: The task clearly begs for an example (match a style, follow a format) and there is none.
5: References a style or format without showing it ("like our usual tone").
9–10: Includes a sample input/output or concrete reference where it helps. Mark not-applicable if the task does not need one.

Not every task fails the same way

An image prompt lives or dies on specificity; agent instructions on constraints. The grader classifies your prompt first, then weights the dimensions to match. Higher number, more weight.

Task type	Clarity	Specificity	Context	Constraints	Output	Role	Examples
Creative writing	20	18	18	16	8	12	8
Coding	18	22	20	14	10	4	12
Analysis & research	18	16	16	16	18	8	8
Image generation	14	30	10	18	10	2	16
System / agent prompt	20	14	14	20	14	10	8
Conversational	22	20	24	12	8	6	8

The tiers

A+Prompt Whisperer90–100
ASharp75–89
BDecent, Honestly60–74
CMid40–59
DSucks20–39
FDumpster Fire0–19

Six habits that fix most prompts

Write for a stranger, not a mind-reader

The model knows nothing about your project, your audience, or what “better” means to you. Everything you don’t say gets replaced with a guess — and averaged guesses are exactly the generic output you keep complaining about.

Say what NOT to do

Models drift toward the statistical middle: hedged claims, bullet lists, “in conclusion”. Exclusions are how you fence off the middle. “No bullet points, no preamble, don’t restate the question” does more work than another adjective.

Show, don’t describe

One example of the tone or format you want beats three sentences describing it. If you want output that matches something — a style, a schema, a house voice — paste a sample of it.

Pin the output shape

If you’ll paste the result somewhere, say where and what shape: “a 5-row markdown table”, “a JSON object with keys x, y”, “three subject lines under 40 characters”. Format instructions are nearly free and nearly always followed.

One prompt, one job

“Summarize this, then rewrite it for LinkedIn, also suggest hashtags” gets you three mediocre results sharing one attention budget. Chain separate prompts and each one gets to be good.

Don’t prompt-engineer a simple question

The rubric punishes missing information the task needs — not missing ceremony. “What’s the capital of France” is a perfect prompt. Adding a role, a format spec, and three constraints to it would make it worse.