May 9, 2026

The Holy Grail Has a Bug

Natural language interfaces were the great unlock of human-computer interaction. At the specification layer, they're the problem.

#llm #methodology #context-engineering #devlog

entry

For decades, natural language interfaces were the goal. Get the computer to understand plain English and you remove the last barrier between human intent and machine execution. No more syntax, no more compilers, no more learning to think like a machine. Just say what you want.

LLMs delivered it. And the place where it matters most — specification, the layer where you tell the machine precisely what to build — turns out to be exactly where natural language fails.

The killer feature is the failure mode.

The attention problem

When LLM-assisted development starts breaking down, the first instinct is that the model doesn’t have enough context. The project has grown. The design has complexity. Feed it more — more docs, more history, more background. Surely the problem is that the model doesn’t know enough.

This is wrong, and understanding why is one of the more important things I’ve figured out in the past few months.

The issue isn’t context window size. It’s attention. Transformer models don’t process context uniformly — they attend to it. Relevance is weighted, not equal. Information buried in the middle of a long context window gets underweighted relative to what appears at the beginning and end. There’s a documented “lost in the middle” failure mode: the same relevant fact, at different positions in a long context, produces measurably different results.

The implication is counterintuitive. More context can actively degrade performance. Padding the context with a more complete picture of the project can make the model worse at the specific task, not better, because relevant information gets diluted by noise. The right answer isn’t more context. It’s more precise context — everything relevant, nothing extraneous.

This reframes the problem entirely. It isn’t “how do I give the model enough information.” It’s “how do I give it as little as possible while keeping everything it actually needs.”

The compression intuition

If precision beats completeness and the context needs to be smaller, the natural next thought is compression. Take the ten-thousand-line spec and compress it. Preserve the meaning, reduce the size.

The intuition is right about the goal and wrong about the mechanism.

Semantic lossless compression isn’t possible in the general case. Meaning isn’t fully contained in text — it’s text plus reader context. The same document means different things to different readers, in different situations, at different points in a project’s life. A compression algorithm that guaranteed semantic equivalence would need a formal definition of meaning, and natural language doesn’t have one. The “deterministic” requirement is the killer: you’d need a bijection between semantic space and compressed space, and no such thing exists for prose.

You can’t compress your way out of this. The structure of the problem won’t allow it.

The pivot

But specs aren’t general natural language.

A well-written spec isn’t prose. It’s a set of decisions, constraints, relationships, and scope boundaries. It has prose in it — rationale, context, explanation — but the load-bearing content is closer to structured data than narrative. A decision is a decision. A constraint is a constraint. Those things are compressible, because they’re closer to formal logic than to meaning-in-context.

The “why” behind each decision resists compression. The “what” and the constraints don’t. And the implementing LLM doesn’t need the why. It needs to know what to build and what constraints to operate within. The rationale is for humans — for auditing decisions later, for understanding why something is the way it is. Strip the rationale and you lose something important for humans. You lose almost nothing for the implementer.

The problem isn’t that specs need to be compressed. It’s that specs are being written in the wrong native format. Natural language is the wrong medium for the content they’re carrying.

What the right medium might look like

Formal specification languages exist. TLA+, Z notation, Alloy, VDM. Leslie Lamport’s TLA+ is the most famous — Amazon uses it to specify distributed systems designs before implementation. They never took over software development broadly because they require mathematical training and most developers aren’t willing to learn them.

The interesting angle isn’t asking developers to write formal specs by hand. It’s whether the pipeline from design to implementation could use a denser intermediate format — not a formal language exactly, but something denser than prose. Decisions encoded as structured facts. Constraints as explicit conditions. A format designed for retrieval rather than readability.

The six-field spec format I’ve been working with — What, Boundaries, Inputs and outputs, Upstream and downstream constraints, Invariants, Done condition — is already gesturing at this. Not as project management discipline, though it looks like that from the outside. As a context compression artifact. Each field enforces decision-shaped content. The format excludes narrative reasoning by construction. The result is smaller and more precise than a prose spec carrying the same decisions.

The theoretical destination is a pipeline that looks like: natural language exploration → natural language normative doc → structured spec render → chunked implementation prompts. The natural language phases stay natural language — that’s where the LLM’s breadth is most valuable and precision matters less. The structured layer is a transformation at the point where decisions have been made and the work is to encode them precisely for execution.

Whether the structured render is done by hand or by an LLM translating from a prose spec into a denser representation is an open question. The translation step carries risk: an LLM rendering a spec into a formal format can silently reinterpret, which is exactly the failure mode you’re trying to guard against. The output needs to be legible enough to verify. The format can’t be opaque.

Where this leaves things

The pipeline shape is clearer than the tooling. The argument for a denser intermediate format is compelling. The exact format isn’t determined. The tooling to support it doesn’t exist in any satisfying form.

What I’m more confident about: natural language as the native format for specifications is a mistake that compounds. The exploration phase should be natural language. The normative design doc probably should be too. But once the decisions are made and it’s time to encode them for execution, the medium needs to change. Prose carries too much that the implementer doesn’t need and buries the signal it does.

The holy grail works. It just works better at some layers than others. At the specification layer, it’s working against you.

← back to devlog