Case Study · Workflow · Springfield Commonwealth Academy

The Process

Ed O'Connell May 2026

01Verification

The verification stack

AI is good at generation. It is uneven at telling you when its output is wrong.

The verification stack exists because of that gap — to catch what the generator cannot reliably catch about itself.

What runs, in order of cost:

astro check — TypeScript shape errors, instant.
axe-core via Playwright — accessibility regressions across mobile, tablet, desktop.
Per-page Playwright spec — this case study has its own, asserting required headings, correct JSON-LD, no forbidden internal vocabulary, no console errors.
Lighthouse CI — thresholds at 95+ for accessibility and best practices.
Manual GROQ reads when something looks suspicious in HTML. Stega encoding hides things that read clean.

Generation that doesn't pass the floor doesn't ship.

02Provenance

Provenance metadata

In this content model, Sanity-backed articles and demos can carry a provenance object — author, generator, reviewer, context, date.

The object is visible in the Studio, queryable via GROQ, and visible in source maps when stega encoding is on. A representative shape:

provenance: {
  author: "Ed O'Connell",
  generatedBy: 'AI-assisted with human authorship and review',
  reviewedBy: "Ed O'Connell",
  context: 'Sanity-backed article draft',
  date: '2026-05-07',
}

It is a queryable shape, not a watermark. A future filter — "show me every Sanity-backed document that was AI-drafted but not yet human-reviewed" — is a one-line GROQ change, not a project.

Committing to this metadata at authoring is cheap; backfilling it later is expensive and approximate.

03Agent at work

The alt-text afternoon

Late spring 2026. A schema validation pass (ADR-024 added rule.required().warning(...) on alt-text fields) flagged missing alt on dozens of image-bearing documents — news posts, hero images, program pages. Filling it by hand would have meant clicking through every document in the Studio, every photo, every byline.

Sanity's content agent did it in one pass. A few minutes of agent-generated alt text against the actual image content, a few more of human review against the school's brand voice, then publish across the site.

"That would have taken me all day," the VP of marketing said when it finished.

The speed mattered. The bigger thing was who watched it run.

The VP of marketing — not the engineer who set up the agent — reviewed each generation against the school's brand and signed off before publish. If she had needed to step away during the run, we would not have used the agent that way.

04Behavior gap

The agent-toolkit gap

Sanity's agent-toolkit MCP server has a tool called patch_document_from_markdown. The name implies a format conversion: markdown → Portable Text → applied as a patch.

On the documents I tested, the output didn't match the input. Three reproductions, at different scales of input:

Margaret test — 4 controlled sentences in; 1 sentence came back rewritten.
Context Sage framing — 2 editorial paragraphs in; a bulleted explainer came back with invented bold headers and a "not a truth claim" qualifier deleted.
Full article test — a 1,700-word essay in; a third-person book report came back, headings stripped, links removed, closing line deleted.

The mechanism turns out to be that the markdown-input MCP tools route through a model that maps markdown onto the document's Portable Text schema. Sanity's own docs acknowledge this — "some models may struggle to map markdown onto complex Portable Text schemas" — but the tool itself offers no diff, no verbatim mode, and no point-of-use disclosure of the model mediation.

The JSON-input equivalent, patch_document_from_json, does not go through a model. It applies the patch deterministically.

I opened sanity-io/agent-toolkit issue #20 to surface the disclosure-vs-consent gap — what I tested, what came back, and the gap between what the docs note about model behavior and what the tool surfaces at invocation time.

The workaround in my own pipeline: convert markdown to Portable Text JSON locally with a deterministic script (bold, code, links, headings), then patch via patch_document_from_json. Slower, verifiably faithful to the input.

05Decisions

The decision trail

Why migrate at all: editor velocity and a content layer that could outlast a single site design. Editors waited on a marketing contractor for routine copy changes. There was no way to ship a new content type without rebuilding the page that displayed it. The site looked fine. Underneath, it was inert.

Why headless specifically: separation of concerns. A schema editors can use, a query language engineers can use, and a render layer that can target a website today and something else tomorrow without a rewrite of the source.

Why Sanity in particular: three reasons.

Portable Text. Sanity treats prose as a typed array of blocks rather than an opaque HTML blob. That means AI agents can read and write it. It also means the same content can render to a web page, an OG card, a search index, or a JSON-LD payload without rewriting the source.

GROQ. SQL-shaped retrieval over typed documents. The state-machine and projection examples on the build page rely on it.

Visual Editing. Click-to-edit overlays that work without a custom CMS UI, which is what lets non-engineer editors update content directly.

None of these is unique to Sanity. The combination is what felt right for a small team that needed both editor velocity and content-as-data.

WordPress was a real alternative — easier ramp, larger ecosystem, broader hireable skill base. What you give up by choosing WordPress is the structured layer underneath. What you give up by choosing Sanity is the breadth of plugin ecosystem. For a school site whose content has a future beyond the current site design, the first trade-off was worth more.

06Decision records

ADRs as insurance

Every architectural decision lives in a single decisions.md file in the repo. Each entry includes context, options considered, the choice, and — when applicable — reversal instructions: what would need to change to undo this decision and what the cost of the undo would be.

ADRs read like discipline. They function like insurance.

ADR-019 is the example I keep coming back to: it documents why the public student-project showcase route is currently dormant — consent, content review, and the curated dataset all need to land before reactivation. Without that record, the next person to find the dormant route would reactivate it and only discover the gates by tripping them.

The cost of writing the ADR was twenty minutes. The cost of not having it would be a public mistake.