Skill Forge is a deliberative process for decisions too complex or consequential to trust to a single model or a single human judgment. Building on Irving et al.'s (2018) insight that humans can judge debates they couldn't generate, but departing from same-model debate by exploiting heterogeneous priors—the pattern uses models with genuinely different training to surface each other's blindspots. Unlike consensus-seeking approaches (A-HMAD, LLM Council), the goal is not model agreement but qualified human judgment: the human must articulate the decision before approving a model-produced skill. Unlike agent-centric skill accumulation (ExpeL), nothing enters the skill library without a human who demonstrated understanding.

Single Problem Flow Through the Forge

Unlike role-specialized approaches (A-HMAD's researcher/critic/synthesizer), models here aren't assigned roles—they're chosen for genuinely different priors. The value comes from different training, not from prompting one model to act as critic.

Precondition: Object Verification

Before deliberation begins, the object under discussion must be inspected—not assumed.

Artifact inspection complete — Run tools (ffprobe, stat, etc.) on source artifacts
All parameters [verified:artifact] — No parameter may be asserted from "general knowledge"

Why this matters: In the Forensic Audio case, both models agreed on 48kHz sample rate from "typical iPhone behavior." The actual file was 44.1kHz. Inspection saved the project.

Existing skills don't apply or have failed

Example: The portfolio site needs to display AI-generated artifacts with clear epistemic status markers—existing React components don't handle provenance display.

Skill Forge Visualizer

Skill Forge Pattern

Single Problem Flow Through the Forge