2026-06-109 min·#engineering#ai#thesis

If you can't diff a slide, you can't trust the AI that changed it

Zephyr WhimsyEditorial · 2026-06-10

Binary decks make every AI edit a black box. Text-source slides turn each change into a reviewable diff you can revert line by line.

Here is a workflow that happens a thousand times a day: you ask an AI to "tighten the metrics on slide 4," it does something, the slide looks plausible, and you move on. You never actually find out what it touched. Maybe it only fixed the formatting. Maybe it also quietly changed your retention number from 71% to 76% because that read better. You have no way to tell, because the thing it edited was a binary file and the only record you have is "the deck before" and "the deck after" — two opaque states with no visible delta between them.

That is the real problem with letting AI edit presentations. It's not that the model is unreliable. It's that the file format makes reliability unverifiable. You can't review what you can't read, and you can't trust a change you can't review.

"Version history" in slide tools is not version control

Google Slides and PowerPoint both have a feature called version history, and it does something useful: it snapshots the file every so often so you can roll the whole document back to an earlier timestamp. People assume that's version control. It isn't. It's a save-state timeline. There's a precise difference, and it's the difference that matters for trusting AI edits.

A snapshot timeline answers one question: "what did the entire deck look like at 2:14pm?" It cannot answer the question you actually have after an AI edit: "what specifically changed, and was any of it wrong?" To find out, you'd have to open the 2:14pm version and the 2:31pm version side by side and visually scan every slide hunting for the difference. On a 30-slide board deck that's not a review, it's a spot-the-difference puzzle — and the one number the model hallucinated is exactly the kind of small change a human eye skips.

And rollback is all-or-nothing. If the AI made one good change (fixed a typo) and one bad change (invented a number), restoring the earlier snapshot throws away both. There is no "accept this, reject that." The unit of undo is the whole file at a moment in time, not the individual edit.

The root cause is the format. A .pptx is a zip of XML and binaries. Change one cell of a chart and the byte-level diff sprawls across a hundred lines of unrelated layout metadata. Google Slides stores the deck as an internal object graph you never see as text at all. In both cases there is no clean, line-oriented representation of "the slide" that a diff tool — or a human — can read. We covered why this dooms collaboration on documents generally in diffable documents; this piece is about the narrower, sharper case: trusting an AI's edits to a presentation.

What an AI edit looks like when the source is text

Flip the format and the whole problem dissolves. If the slide's source is Markdown, an AI edit is just a text change — and a text change has a diff. Say slide 4 starts as this:

## Q1 retention held while CAC fell

- Net revenue retention: 71%
- Logo churn: 4.2% monthly
- CAC: $1,840 (down 12% QoQ)
- Payback period: 9 months

You ask the model: "Make this punchier, drop the weakest bullet." It returns a new version. Because both are text, your tool shows you exactly this:

  ## Q1 retention held while CAC fell

- - Net revenue retention: 71%
+ + Net revenue retention: 76%
    - Logo churn: 4.2% monthly
    - CAC: $1,840 (down 12% QoQ)
- - Payback period: 9 months

Now you can see what the model did. It made two changes, not one. It dropped the payback-period bullet, which is the tightening you asked for — fine, accept it. And it changed retention from 71% to 76%, which you did not ask for and which is simply wrong. The number didn't move; the model rounded toward a prettier story. Without the diff you'd have shipped a fabricated metric to your board. With the diff it's a five-second catch.

This is exactly how engineers review code, and it works for the same reason: the reviewable unit is the change, line by line, not the whole artifact. You accept the hunk you wanted and reject the hunk you didn't. The model gets to be a fast, sometimes-careless contributor, and you get to be the reviewer who has final say — which is the only arrangement under which delegating to an AI is actually safe.

The mental model: AI is a contributor, the diff is the PR

Treating AI output as something you read top-to-bottom and either keep or discard wholesale is the wrong frame. The right frame is a pull request. The model proposes a change set. You read the diff. You merge the parts that are right and drop the parts that aren't. The presentation's source being plain text is what makes the "PR" possible — there's nothing to review in a binary blob.

A workflow you can actually run

Here's the loop in practice, whether your slide source lives in a git repo or in a tool that keeps text history for you like Plain does.

  1. Keep the deck source as text. Each slide is Markdown. The whole deck is one readable file (or a folder of them). This is the precondition for everything else.
  2. Commit the known-good version before you let the AI touch it. In git that's a commit; in a text-native tool it's just the current saved state, automatically tracked.
  3. Ask for the change. "Rewrite slide 4 to lead with the CAC win." The model edits the source text.
  4. Read the diff, not the deck. Don't re-read all 30 slides. Look at the handful of lines that changed. This is where you'd catch the 71%→76% hallucination above.
  5. Revert one hunk if needed. If five lines are good and one invented a number, you don't roll back the whole edit. In git, git checkout -p lets you discard a single hunk; in a text-history tool you reject that one change and keep the rest. The good edits survive.

The keystone is step 4. Snapshot-based slide tools force you to re-audit the entire artifact after every AI touch, which is so tedious that people skip it — and skipping it is precisely how fabricated numbers ship. A diff makes the review small enough that you'll actually do it every time. The cost of trust drops from "re-read the deck" to "scan six lines."

Why this is a property of the format, not the tool

You can't bolt real diffs onto PowerPoint with a plugin. The diffability has to live in the source representation. If the canonical thing — the artifact of record — is text, then diff, blame, branch, revert, and review all come for free, because forty years of version-control tooling already speaks text. If the canonical thing is a binary or a hidden object graph, no amount of UI can reconstruct a clean line-level change set after the fact. The bytes that moved don't map to the ideas that moved.

This is the same reason Plain treats the web page — rendered from Markdown source — as the real product, and treats .pptx as a downgrade export, not the default. The text source is what makes a slide reviewable, AI-editable, and revertible. The binary export is the thing you hand to someone still living in the old workflow, after you've already verified the change against a diff you could actually read.

Where this honestly doesn't help

Diff-based review is about trusting edits — yours or an AI's — to a versioned source. It is not the same thing as real-time co-editing, and it would be dishonest to pretend it wins everywhere. If three people need to drag boxes around the same slide simultaneously and watch each other's cursors, Google Slides' live multiplayer model is genuinely better at that, and a commit-and-diff workflow would feel clumsy.

These are two different philosophies, not a scoreboard. Real-time co-editing optimizes for "we're all in the room together right now." Versioned text optimizes for "I need to review, trust, and reproduce a change later." When the question is "did the AI change a number it shouldn't have," the second philosophy is the one that gives you an answer. When the question is "can four of us brainstorm on a canvas at once," the first one is. Most teams want a deck in the second category far more often than they admit, because most decks are reviewed and approved far more often than they're co-authored live.

The one-line takeaway

Delegating slide edits to an AI is only as safe as your ability to audit them, and your ability to audit them is decided entirely by the file format. A binary deck gives you a save-state timeline: restore the whole thing to a moment, or don't. A text-source deck gives you a diff: see every line the model touched, accept what's right, revert what's wrong, ship with the receipts. If you can't diff a slide, you can't trust the AI that changed it — so the first decision isn't which model to use, it's whether your slides are made of text.