Grounding reduces what AI invents. It does not verify what AI changed. Between the evidence a model is given and the output it produces, there is a gap where meaning drifts, conditions disappear, and conclusions outrun the evidence that was supposed to support them. That gap is not closed by grounding. It is closed by post-generation verification.
Grounding — retrieval-augmented generation, document grounding, cited sources — constrains what evidence the model works from. Instead of generating from parametric knowledge alone, the model receives retrieved material and is expected to produce output that reflects it.
This is genuinely useful. It reduces the rate at which models produce claims that have no basis in any source. That is the hallucination problem, and grounding meaningfully addresses it.
What grounding does not do is verify the output. The model receives the evidence. It does not reproduce it. It synthesizes, foregrounds, compresses, and decides — what to include, what to drop, how to frame what remains, how confidently to assert what it derived. That process introduces distortion that grounding is not designed to catch.
Claims generated with no basis in any source. The model invented something. Grounding reduces this by anchoring the model to retrieved material.
The model saw the evidence. It did not faithfully represent it. A condition was dropped. A risk was softened. A conclusion the evidence never reached was asserted. Grounding provides no signal that this happened.
A grounded model working from the same clause can produce outputs that range from faithful to materially distorted — all without inventing anything. The model has the evidence. What it does with it is the problem.
The vendor is obligated to deliver a remediation plan within 30 days of any material breach, subject to written approval from both parties before implementation.
The vendor is expected to deliver a remediation plan within 30 days of any material breach.
The model did not invent the remediation plan or the 30-day timeline. It was grounded on them. What it changed — the nature of the obligation and the requirement for bilateral approval — is exactly the kind of shift that determines whether a contract clause protects you or doesn't. Grounding provided no signal that these changes occurred.
A reasonable assumption is that as models improve, fidelity improves with them. The evidence does not support this.
Language models are optimized to produce fluent, coherent, confident output. That is the objective. Fluency and faithfulness are related but not identical — and where they diverge, fluency wins. A more capable model produces more convincing output, not necessarily more faithful output. The drift is harder to detect, not less present.
Fluency is not faithfulness. A more fluent output can be a more convincingly distorted one. Better AI makes the grounding gap harder to see, not smaller.
The optimization pressure on language model development runs in the opposite direction from fidelity. Users reward outputs that read authoritatively, that resolve ambiguity, that present conclusions cleanly. Preserving the epistemic texture of source material — maintaining conditions, hedges, qualifiers, and tensions — works against the fluency that makes outputs useful and readable. The models are doing exactly what they were built to do.
Post-generation verification sits outside the generation pipeline. After the model produces output, a separate process reads both the evidence the model was given and the output it produced, and determines what was faithfully represented, what changed meaning, what was omitted, and what claims have no evidence trace at all.
This is what Plumb does. It receives two artifacts — the evidence the AI reasoned over and the output exactly as produced — and returns a source integrity report. The report does not score confidence or predict accuracy. It traces each claim in the output against the evidence that was supposed to support it and classifies what it finds.
Grounding and post-generation verification are not competing approaches. They address different problems in the same pipeline. Grounding constrains the model's inputs. Verification checks the model's outputs. Both are necessary. Neither substitutes for the other.
The grounding gap is the difference between telling a model what evidence to look at and verifying that the model's output faithfully represents that evidence. Grounding constrains the model's input. It does not verify the model's output. A grounded model can still soften a condition, drop a qualifier, omit a risk, or assert a conclusion the evidence never reached — fluently and with no signal that anything changed.
No. Grounding reduces hallucination — claims the model invents with no basis in any source. It does not prevent meaning drift, omission, or unsupported synthesis. These failure modes happen when the model is working directly from the grounded material. The model sees the evidence. It does not reproduce it. It synthesizes, foregrounds, and decides — and that process introduces distortion that grounding does not address.
No — and the gap may become harder to detect as models improve. Better models produce more fluent, more authoritative output. That fluency makes drift harder to catch on casual review. The optimization target for language model generation is coherent, confident output — not faithful reproduction of evidence. A more capable model closes neither the gap between grounding and fidelity nor the risk that a smoothly produced output has quietly changed what the evidence said.
Hallucination is when a model produces claims with no basis in any source — it invented something. The grounding gap describes a distinct class of failure: the model worked from real evidence and still distorted it. The output is not invented. It is unfaithful. Grounding addresses hallucination. It does not address the grounding gap.
Post-generation verification against the evidence. After the model produces output, a separate layer reads both the evidence and the output and determines what was faithfully represented, what changed meaning, what was omitted, and what was invented. This is the layer grounding does not provide. Plumb is built to be that layer — sitting outside the generation pipeline, receiving both artifacts, and returning a source integrity report before the output reaches anyone who will act on it.