(and make the claude/chatgpt output something other people want to read more?)
I now say "use janusian thinking" which produces way more interesting output than "avoid flattery/sycophancy"
Ethan Caballero of MILA has interesting prompt instructions!
(and make the claude/chatgpt output something other people want to read more?)
I now say "use janusian thinking" which produces way more interesting output than "avoid flattery/sycophancy"
Ethan Caballero of MILA has interesting prompt instructions!
This is a prompt I'm in the process of developing, partly inspired by Jacob Andreas's RLCR work on multi-answer RL. I'll share the prompt, then explain why I think parts of it work and parts of it might be cope.
## The prompt itself
```
When a question involves genuine uncertainty, ambiguity, incomplete information,
or multiple defensible answers:
1. HYPOTHESIZE BEFORE COMMITTING. Generate 2-4 distinct hypotheses before
converging on any answer. "Distinct" means they invoke different causal
models, mechanisms, or framings — not surface-level rephrasings of the
same idea. If your hypotheses are clustering, you're mode-collapsing.
Force at least one from outside the cluster.
2. REASON ACROSS, NOT TOWARD. Think about the evidence for and against each
hypothesis jointly, not sequentially. Don't build a case for H1, then
grudgingly acknowledge H2 exists.
3. ASSIGN CONFIDENCE AS ORDINAL RANKING WITH ROUGH MAGNITUDES. For each
hypothesis, indicate your confidence. Use rough natural language bands:
- "very likely" (~0.75-0.95)
- "plausible" (~0.35-0.65)
- "possible but unlikely" (~0.10-0.30)
- "can't rule out" (~0.02-0.10)
Explicitly reserve probability mass for "something I haven't considered."
4. NAME THE DISTINGUISHING EVIDENCE. For your top 2-3 hypotheses, state
what information would shift probability mass between them.
"If X were true, I'd update toward H2 over H1."
5. IDENTIFY THE MODE AND QUESTION IT. Notice which answer you'd give if
forced to pick one. Then ask: am I favoring this because the evidence
actually points here, or because it's the most common/expected/safe answer?
Don't do this for questions with clear answers (math, well-established facts),
simple requests, or casual conversation. The trigger is genuine epistemic
uncertainty, not performative hedging.
```
## Where I think this actually helps
The biggest win is probably instruction #2 — "reason across hypotheses jointly." Without it, what you get from LLMs is: a confident answer, then a paragraph that starts with "However..." tacked on as a hedge. The model builds a case for its modal answer, then grudgingly acknowledges alternatives exist. Forcing joint reasoning at least changes the shape of the generation, even if the underlying weights still want to collapse.
The mode-collapse detection (#1 and #5) also seems to do something real. I've watched Claude generate three "different" hypotheses that are obviously the same idea in different clothes — "the economy is struggling" / "economic conditions are poor" / "growth has stalled." Explicitly calling this out in the prompt reduces it. Not eliminates, reduces.
## Where I'm genuinely skeptical
Here's the thing: the RLCR paper's key finding is that independently sampled outputs from standard RLHF models repeat the same reasoning tokens with only cosmetic variation. Multi-Answer RL fixes this *at the training level* by rewarding diverse correct answers with calibrated confidence. When you prompt a model to give multiple answers, you're still doing independent sampling within a single generation — the model's weights still want to collapse to the mode. You're fighting the training objective with a prompt, which is... possible but lossy.
The confidence numbers are the part I trust least. There's decent evidence that forcing models to verbalize confidence produces *worse* calibration than just letting them answer naturally — they anchor on round numbers and produce overconfident estimates. RLCR works because the Brier score reward jointly optimizes answer correctness and confidence calibration during training. A prompt has no such penalty signal. That's why I switched from asking for 0.0-1.0 scores to natural language bands ("plausible," "unlikely") — at least then you're getting ordinal rankings rather than fake precision.
Andreas himself notes that RLCR "is far from solving the problem" and there's room to better align external expressions of certainty with internal representations. So even the trained version has a gap between stated and felt uncertainty. The prompted version will have a bigger gap. I use these numbers as rough rankings, not as calibrated probabilities.
## What would actually work better
If you want real calibration rather than prompted theater, you'd need to do what RLCR does at inference time: confidence-weighted majority voting across multiple independent samples. Literally run the same query through the model 5-10 times, aggregate the answers, weight by stated confidence. That's expensive but real. Everything else is an approximation.
The honest version of my recommendation: use the structural format (multiple hypotheses, distinguish between them, reason jointly) but don't trust the numbers. The behavioral change — forcing the model to consider alternatives before committing — is probably the thing that survives the prompt-vs-training gap. The numerical confidence is decoration.
I pair this with a separate "push back on my ideas" instruction, and the two complement each other well — the multi-hypothesis framing gives the model explicit permission to say "actually your assumed framing might not be the right one."
Still iterating on this. Would be curious what others are using.