AhaDiff User Guide · v1.2.0

AhaDiff User Guide

AhaDiff turns a git diff into a code-grounded lesson, a list of verifiable claims, a quiz, review cards, a concept graph and a quality ratchet. It is local-first by default: each repository keeps its own AhaDiff state under .ahadiff/.

Video tutorials: English walkthrough (MP4); Chinese walkthrough (Bilibili). Each video uses the matching UI language with burned-in subtitles; intermediate subtitle source files are not shipped in the public docs tree. Install with pip install ahadiff. The English walkthrough shows this; the Chinese Bilibili cut is being refreshed to match.
Getting started

1. Quick Start

Install the published CLI from PyPI. Contributors can still run from a source checkout when developing AhaDiff itself.

Install

Install the CLI and bundled WebUI with pip.

pip install ahadiff ahadiff --version

Develop from source (contributors)

Sync the locked dev environment, then confirm the module runs.

uv sync --locked --dev uv run python -m ahadiff --version uv run python -m ahadiff doctor

First-time use in a repository

When you first use AhaDiff inside a repo, initialize it and confirm the environment.

cd /path/to/your/repo ahadiff init ahadiff doctor ahadiff config show --resolved

Everything AhaDiff generates lives in this repo. There is no global database.

The storage gate accepts these SQLite builds

3.51.3+ 3.50.4+ (backport) 3.44.6+ (backport)

uv tool ships SQLite 3.50.4 on Python 3.11.14. Already accepted.

Run ahadiff doctor before your first learn session to confirm your SQLite build clears the gate; it reports the detected version and the required threshold, and fails closed when the build is too old. A higher version number is not automatically accepted. The gate requires SQLite 3.51.3 or newer, or a backported 3.50.4+ / 3.44.6+ build, so 3.51.0-3.51.2 fall below the bar and are not accepted. If ahadiff doctor flags your build, switch to a Python whose bundled SQLite meets the bar, for example uv tool, the python.org installer, Homebrew, or conda.

Welcome: hero tagline, intro paragraph, and the first-run quick start.
Welcome: the first screen after you open the WebUI.
  1. Serif hero headline: Ship with AI. Learn it back.
  2. Drop-cap intro paragraph describing AhaDiff.
  3. View latest lesson CTA next to a uv run learn snippet.
  4. Right card previews the latest finalized run's lesson outline.
Setup

2. Configure a Provider

Configuring one LLM provider is the only setup step before ahadiff learn. The key goes in an environment variable; you hand AhaDiff only the variable name. No key ever lands in Git, the README, manifests, or checked-in scripts.

What provider test does

Probe the model with a small request.
On success, save the provider to .ahadiff/config.toml.
Fail closed if the key env-var is unset.

Accepted key env-var names in repo config:

AHADIFF_PROVIDER_API_KEY OPENAI_API_KEY ANTHROPIC_API_KEY GEMINI_API_KEY AZURE_OPENAI_API_KEY

Name

--api-key-env is the variable NAME. An identifier-shaped value fails closed when the variable is unset, so a name is never sent as a bearer token by accident.

Secret

The real key stays in your shell environment. It never goes in a flag or a file.

Provider setup

Run ahadiff serve and add a provider in Settings (the card previews model limits before you save, with no remote call and no key read), or run one provider test from the matching tab below.

Export your key and provider base URL, then run one probe.

export OPENAI_API_KEY="<your-provider-api-key>" export AHADIFF_PROVIDER_BASE_URL="<provider-base-url>" ahadiff provider test \ --name default \ --provider-class openai \ --base-url "$AHADIFF_PROVIDER_BASE_URL" \ --api-key-env OPENAI_API_KEY

Read the key without echoing it, then probe a Responses/API provider example. On macOS / Linux:

export AHADIFF_PROVIDER_BASE_URL="<provider-base-url>" read -rsp "Provider API key: " AHADIFF_PROVIDER_API_KEY; export AHADIFF_PROVIDER_API_KEY; printf '\n' ahadiff provider test \ --name gpt55 \ --provider-class openai_responses \ --base-url "$AHADIFF_PROVIDER_BASE_URL" \ --model gpt-5.5 \ --api-key-env AHADIFF_PROVIDER_API_KEY \ --privacy-mode explicit_remote

On Windows PowerShell, read it with Read-Host -AsSecureString:

$env:AHADIFF_PROVIDER_BASE_URL="<provider-base-url>" $secure = Read-Host "Provider API key" -AsSecureString $env:AHADIFF_PROVIDER_API_KEY = [System.Net.NetworkCredential]::new("", $secure).Password ahadiff provider test ` --name gpt55 ` --provider-class openai_responses ` --base-url $env:AHADIFF_PROVIDER_BASE_URL ` --model gpt-5.5 ` --api-key-env AHADIFF_PROVIDER_API_KEY ` --privacy-mode explicit_remote

Point at a local OpenAI-compatible server. Ollama and LM Studio expose their own base URLs.

ahadiff provider test \ --name local \ --provider-class lmstudio \ --base-url "$LOCAL_PROVIDER_BASE_URL" \ --api-key-env AHADIFF_PROVIDER_API_KEY
Settings: providers, privacy, audit, preferences, and project-level AI tool guidance.
Settings: configuration and diagnostics.
  1. Left settings rail: Account, Provider, Capture, Privacy, Audit.
  2. Privacy Controls: privacy mode, local-only, redaction gate, audit log.
  3. Server section with the serve port field and a Save button.

Registry context example

openai

≈400k
context budget for ordinary openai access
vs

openai_responses / API

≈1.05M
context budget on Responses / API access

For gpt-5.5, the bundled registry keeps these two context profiles. A trustworthy live probe can override either profile when the endpoint reports a real total context.

Once configured, your single provider (or the generate provider/model you pick in Settings) is used by ahadiff learn automatically. Pass --provider / --model only to override one run.

ahadiff learn --last

How a saved max_output_tokens is treated

Empty → Auto

No value means AhaDiff sizes the output limit for you.

Trusted hard max

A known, trusted ceiling is clamped on save and returns a warning.

Unknown / low-confidence / route-specific / local-runtime

These stay warnings only. They are not treated as a hard guarantee.

Local server reports the wrong JSON capability?

Set only known boolean overrides in .ahadiff/config.toml. NewAPI defaults to supports_native_json_schema=false; if your gateway supports native JSON schema, add the override.

[providers.local.capability_overrides] supports_native_json_schema = false
Workflow

3. Run a Learn Session

Pick a diff source

AhaDiff has 10 capture sources. Pick the tile that matches what you changed.

Generate the lesson and evidence

AhaDiff captures the diff, scans for safety issues, and produces claims, lesson, quiz, score and other artifacts.

Open the WebUI to learn

Read the Lesson for the explanation, jump to Diff for the evidence, take the Quiz to check understanding, and use Review for spaced repetition.

Run the source that matches what you changed. The Dashboard's New Run dialog mirrors these same groups.

Quick (working tree)
Last commitThe most recent commit.ahadiff learn --last
StagedWhat is staged for the next commit.ahadiff learn --staged
UnstagedEdits not staged yet; add untracked.ahadiff learn --unstaged --include-untracked
Git Advanced
Time windowCommits since a point in time.ahadiff learn --since "2 hours ago"
Revision rangeAn explicit commit range.ahadiff learn HEAD~1..HEAD
Patch
Patch fileA unified diff on disk.ahadiff learn --patch change.diff
Patch stdinA diff piped from another tool.ahadiff learn --patch -
Patch URLA remote unified diff.ahadiff learn --patch-url URL
File Compare
Compare filesTwo file versions, no git.ahadiff learn --compare old.py new.py
Compare dirsTwo directory trees, no git.ahadiff learn --compare-dir old/ new/
10 modes → 8 internal source kinds Path scope: working-tree modes only · repo-relative paths
Add --against-spec SPEC.md to any of these to check the diff against a spec; add --spec-semantic-review for the semantic pass.

--since alone vs --since --author

--since

A multi-commit time window. Every commit in the window is captured.
vs

--since --author

Exactly one matching commit. It skips other authors' commits in the window.

Patch and URL captures carry caveats:

No symbol index → claims may be weak (lesson still made) Remote URL: same safety checks · loopback / private rejected compare-dir: POSIX-only · fails closed elsewhere

The WebUI patch field accepts pasted unified diff text up to 65536 bytes; use the CLI for patch files, stdin, or larger patches. CLI patch files resolve inside the repo root; pass --patch - for externally generated patches.

Long-running CLI commands show Rich status while they run. Treat that as terminal feedback, not a machine-readable output contract.

Interface

4. Use the WebUI

Run one of these to open the local React SPA. serve opens a browser by default; add --no-browser to keep it headless, or --watch to rebuild on file changes.

ahadiff serve ahadiff serve --port 8765 --no-browser ahadiff serve --watch # file watcher, included by default

Workflow at a glance

DiffCapture the code change
ClaimsBind file:line evidence
LessonGenerate the explanation
Quiz/ReviewTest recall with SRS
RatchetScore history, improve preview, and export

Page map

PageWhat it's for
/Dashboard: latest runs, KPIs, ratchet trends, and review activity.
/run/:runIdRun detail: Overview, Score, Judge, Artifacts. If the optional LLM judge fails, the Judge tab shows a redacted failure panel instead of raw provider output.
/run/:runId/lessonLesson body, claim summary, evidence, and knowledge notes.
/run/:runId/diffUnified/Split diff, claim dots, and the ClaimInspector.
/run/:runId/quizGuided / Recall / Transfer questions. The mode badge says whether the current question is a one-time Socratic quiz or an SRS review card.
/reviewFSRS review queue with Again / Hard / Good / Easy grading.
/conceptsConcept ledger, concept graph, and Graphify source.
/ratchetResult history, benchmark summary, improve preview, and export entry points.
/settingsProviders, capture, privacy, audit, preferences, and project-level AI tool guidance. AI Tool Guidance groups targets as CLI / IDE / CI, shows usage hints, and includes a provider-free built-in demo.
/guideDaily commands and the 15 project AI tool guidance targets. It shows category filters, usage hints, and what each target would write before you apply changes; writing and removal stay in Settings → AI Tool Guidance.
About a dozen AI-tool targets write tool-native files. Settings → AI Tool Guidance lists them all and previews each file before writing; they stay local unless you commit them. For example:
  • .claude/skills/ahadiff/SKILL.md
  • .agents/skills/ahadiff/SKILL.md
  • .gemini/skills/ahadiff/SKILL.md
  • .agents/skills/ahadiff-antigravity/SKILL.md
  • .agents/skills/ahadiff-antigravity-cli/SKILL.md
  • .agents/rules/ahadiff.md
  • .github/instructions/ahadiff.instructions.md
  • .opencode/agents/ahadiff.md
  • .clinerules/ahadiff.md
  • .continue/rules/ahadiff.md
  • .cursor/rules/ahadiff.mdc
  • .roo/rules/ahadiff.md
  • .windsurf/rules/ahadiff.md
Repo guidance sections still live in user-managed files such as CLAUDE.md, AGENTS.md, GEMINI.md, and .github/copilot-instructions.md. This repository ignores generated .agents/ installs so local Codex / Antigravity skill output is not committed by default.
Dashboard: recent runs, scores, pass rate, ratchet trends, and review activity.
Dashboard: runs, ratchet, and learning signals at a glance.
  1. Top stat cards: total runs, avg score, pass rate, concepts, LLM calls.
  2. Spec alignment block with score, trend, and evaluated count.
  3. Ratchet trajectory line chart of kept vs discarded scores.
  4. Review activity heatmap calendar on the right.

Claims and the diff

Every claim links to file:line evidence and carries one of five badges. Glyphs and semantics are the product's real ones; the hues are mapped to this page's palette.

verified verified evidence at file:line backs the claim
weak weak partial or indirect evidence
not_proven not_proven no evidence found either way
contradicted contradicted evidence conflicts with the claim
rejected rejected claim discarded
Lesson: claim status, claim summary, knowledge notes, and the evidence sidebar.
Lesson: the verified learning note generated from a diff.
  1. Header with PASS score, View score details, Mark as Learned.
  2. Compact / Hint / Full detail-level toggle, top right.
  3. Left outline rail: TL;DR, What Changed, Claims, Concepts.
  4. Right claims summary: verified / weak / not proven counts.
Diff: diff content, ClaimInspector, and the matching evidence lines.
Diff: code changes with inline claim evidence linking.
  1. Center code diff with red/green added and removed lines.
  2. File tabs and Unified / Split view toggle above the diff.
  3. Right Claim Inspector panel listing per-claim cards.
  4. Verified claim badges link the explanation back to code lines.

Quiz and review

Review grades each card; the grade drives the next due date.

Againforgot it; see it soon
Hardrecalled with effort
Goodrecalled cleanly
Easytrivial; push it out
Quiz: recall and transfer questions, evidence reveal, and common misconceptions.
Quiz: active-recall questions drawn from verified claims.
  1. Single multiple-choice question with A/B/C/D options.
  2. Guided / Recall / Transfer mode tabs and progress counter.
  3. Right Evidence panel locked until you answer.
  4. Right Progress list of per-question status (pending).
Review: FSRS queue with Again / Hard / Good / Easy grading.
Review: spaced-repetition queue sorted by forgetting curve.
  1. Header counts: cards due, FSRS, at-risk concepts.
  2. Review Overview activity heatmap and Concept Mastery list.
  3. Needs Practice list with per-concept stability values.
  4. New Concepts list marked New on the right.

Concepts

Concepts ledger: all learned concepts in a sortable table.
Concepts ledger: all learned concepts in a sortable table.
  1. Ledger / Graph tabs and Health Status filter counts.
  2. Table columns: concept, runs, files, claims.
  3. Per-row health tags like Orphan and Contradicted.
  4. Run and file links per concept row.
Concept graph: Graphify source, concept nodes, and graph controls.
Concepts graph: the imported knowledge graph view.
  1. Ledger / Graph tabs with a Refresh Graph button.
  2. Graphify source card: node/edge counts, FRESH badge, sha256.
  3. Community filter chips and All / code / rationale filters.
  4. Canvas of graph nodes below with Export JSON, Fit to view.
Reference

5. Common Commands

Daily work runs through learn, quiz, and review. The clusters below cover everything else: serving the UI and managing local data.

Daily loop

Learn and verify

Generate a run, then check, grade, and improve it.

ahadiff learn --last ahadiff quiz RUN_ID ahadiff review ahadiff claims RUN_ID ahadiff verify RUN_ID ahadiff score RUN_ID ahadiff improve-run RUN_ID

Serve

Open the local WebUI. --watch rebuilds on file changes.

ahadiff serve ahadiff serve --port 8765 --no-browser ahadiff serve --watch # file watcher, included by default

improve-run RUN_ID regenerates a lesson and keeps the new copy only when the deterministic score strictly improves, saving it as a separate run and leaving the original untouched. Use --candidates N (default 3, range 1-10) to control how many regeneration attempts are tried; a higher value improves the chance of beating the score but costs more tokens and time. It works in any install, including pip. The separate improve command tunes AhaDiff's own generation prompts and only runs inside an AhaDiff source checkout.

Maintenance clusters

Database

Run the SQLite integrity gate over review.sqlite.

ahadiff db check

Graph

Inspect and re-import the Graphify artifact.

ahadiff graph status ahadiff graph import ahadiff graph refresh

Concepts

List, verify, and lint the concept ledger.

ahadiff concepts list ahadiff concepts verify ahadiff concepts lint

Export

Write results to disk for sharing (see Export & Share).

ahadiff export-results ahadiff export preview RUN_ID

Challenge

Opt-in. Build a challenge from a run, then check state.

ahadiff challenge build RUN_ID ahadiff challenge status

Refresh vs learn: what touches Graphify

graph refresh / WebUI refresh

Import an existing artifact only. It does not run graphify update.
vs

learn run

Runs graphify update first when the Graphify CLI is present, then imports.
Refresh timeout 600s
Output

6. Export & Share

The Ratchet / Export entry points in the WebUI offer four formats. The CLI covers the same ground.

ahadiff export-results ahadiff export preview RUN_ID --out .ahadiff/export-preview

Static preview

A strict-local static bundle with a deterministic zip. Nothing leaves the machine.

TSV

Results as tab-separated values for spreadsheets.

JSON

Results as JSON for scripts and tooling.

Anki .apkg

Active review cards as an Anki deck.

default

No extra install needed.

APKG export works by default with pip install ahadiff because genanki is bundled.
Capabilities

7. Capabilities: Defaults, Opt-ins, and Dependencies

Capability pills distinguish default behavior from opt-in flows and dependency-backed exports. Open Details on any card for the exact behavior.

Default Works after install

Lesson / Claims / Quiz / Score

The default pipeline, produced after ahadiff learn.

Details

Which artifacts get created depends on the diff and its learnability. Patch-only captures can use weak diff-anchored claims when symbol-level proof is missing. In Quiz, source evidence stays locked until you answer.

Optional LLM judge

Advisory only; the deterministic score stays primary.

Details

Successful judge runs write judge.json. Failures write a bounded, redacted judge_failure.json with provider, model, error type, and a safe message. Missing judge artifacts return 404 rather than crashing the API.

Quiz question count

Fixed at 3 by default; adaptive is opt-in.

Details

Fixed mode accepts 1-30 questions. In Settings, or with --quiz-mode auto, AhaDiff uses diff stats to choose a bounded count; the default adaptive range is 3-12.

Structured JSON output

JSON object mode with one validation retry.

Details

Public artifacts stay the same. Native schema is used only when the provider capability reports support. Unsupported modes downgrade; truncated or malformed fallback JSON is retried, not accepted.

Adaptive capture limits

Auto for fresh configs; manual once you customize a limit.

Details

Auto mode sizes capture limits from five inputs:

provider probe model registry output reserve safety reserve CJK density

Editing any of these migrates the repo to manual mode:

capture.max_files capture.hard_limit capture.max_patch_bytes
50 MiB patch cap

Provider smart config

Draft model-limit preview in Settings before you save.

Details

The provider card previews limits from the draft provider class, model, and optional limits profile, with no remote probe on every edit. It shows thinking support, low-confidence warnings, a recommended max-output, and any clamp the save API returns. Generate and judge limits are shown separately.

WebUI

Welcome, Dashboard, Diff, and Review open with ahadiff serve.

Details

The Welcome Before/After demo collapses long raw diffs to the lesson height and shows visible / total line counts. Short or empty diffs have no collapse control.

AI Tool Guidance

Settings writes the files; Guide is read-only.

Details

Settings shows for the 15 install targets:

  • CLI / IDE / CI grouping
  • Localized quick-start steps
  • Example prompts and expected behavior
  • Platform notes
  • A provider-free local demo

Settings

Writes files with guarded atomic replacement.

Guide

Read-only preview of the same hints and file writes; no write / remove buttons.

Rollback restores content and mode where the platform supports it

Opt-in Off until you turn it on

Challenge

Off by default; enable via config.

Details

The CLI exposes build / status; full progression happens through the WebUI / API.

Spec alignment

Runs only with --against-spec.

Details

Add --spec-semantic-review for the semantic pass. A run with no spec renders this dimension as N/A.

Dependency Needs an extra or external artifact

LLM provider (BYOK)

AhaDiff runs on the LLM provider you configure: any of the 8 supported formats, your choice of model. No specific model is required. Set it up in Configure a Provider.

Details

Set the provider, base URL, model, and the API-key environment variable before use.

APKG export

Works by default; genanki is bundled.

Details

APKG export works by default with pip install ahadiff because genanki is bundled. Active review cards export without any extra install.

Watch

Works by default; the file watcher is bundled.

Details

The file watcher is included by default, so ahadiff serve --watch works out of the box. You can also run ahadiff learn manually.

Graphify refresh

Imports an existing artifact; capped at 50000 nodes.

Details

It re-imports graphify-out/graph.json into AhaDiff's cache and does not run graphify update. The cap is graph.max_nodes_import (default 50000); over-limit refreshes return GRAPH_NODE_LIMIT.

How a run is scored

Eight dimensions, 100 points. A run can fail on score gates, contradicted claims, evidence coverage, or safety gates.

accuracy20 ptsgate ≥14
evidence18 ptsgate ≥12
diff_coverage14 pts
learnability14 pts
quiz_transfer10 pts
spec_alignment10 pts
conciseness8 pts
safety_privacy6 pts
A run can score high yet still FAIL when a hard gate trips or when critical_safety_findings fire.
Run Detail overview: a single learn run's outcome and metadata.
Run Detail overview: a single learn run's outcome and metadata.
  1. Overview / Score / Judge / Artifacts tab bar.
  2. FAIL banner with score and failed-gate reason.
  3. Metadata rows: base ref, language, source kind, prompt version.
  4. Graphify Signoff cards: passed, freshness, node and edge counts.
Run Detail score: the 8-dimension evaluation breakdown.
Run Detail score: the 8-dimension evaluation breakdown.
  1. FAIL badge, overall score, and the count of failed gates.
  2. Eight scored dimension cards with progress bars.
  3. Hard Gates row; critical safety findings fail in red.
  4. Spec Alignment section noting no artifact for this run.
Help

8. FAQ

Why is there no Lesson / Diff / Quiz?

These pages need a specific run_id. Run ahadiff learn at least once, then open the corresponding run from the Dashboard or from the command's output.

What if the provider fails?

Run the diagnostics and re-test the provider. On Windows PowerShell, set the variables with the $env: form and pass --base-url $env:AHADIFF_PROVIDER_BASE_URL, as shown in Configure a Provider.

ahadiff doctor ahadiff config show --resolved

Then rerun the matching provider command from Configure a Provider.

What if the lesson gets skipped?

The diff might be too small, learnability might be too low, or there aren't enough verified claims. If you're sure this is the change you want to study, re-run with --force-learn.

How do I change quiz question count?

Use Settings → Preferences. Fixed mode always asks the configured number of questions, from 1 to 30. Adaptive mode uses the captured diff stats; its default range is 3-12, and old runs without those stats fall back to the fixed count.

How are capture limits chosen?

New repos use auto capture sizing on a priority ladder:

Live provider probe data.
The bundled model registry.
Conservative defaults.
Generate and judge limits resolve separately. The output reserve is subtracted from the input budget once. Only explicit split-envelope rows, such as Gemini, store input plus output as total context.

Manual mode keeps the numbers you set in config or Settings. The Settings provider form previews these limits from the unsaved draft, with no remote probe.

Does the LLM judge decide the final verdict?

No. The final verdict comes from deterministic score.json and hard gates. judge.json is advisory; it helps explain model feedback, but it does not override the deterministic verdict. If a dimension is not applicable, such as spec_alignment on a no-spec run, Score and Judge render it as N/A / 0/0.

Why can Accuracy / Evidence gates change?

The base thresholds hold:

Accuracy 14/20 Evidence 12/18

Large diffs can use an adaptive threshold from visible files, hunks, and changed lines, with ratio, regime, and basis written to hard_gates.*.policy. This never forgives bad evidence. The run still fails when:

high rejected-claim ratio contradicted claims safety issues genuinely missing evidence
How is Diff Coverage judged?

Diff Coverage uses only the visible files and hunks in the persisted line_map.json:

visible-files-only denominator changed-line weighting hunk-count floor adaptive anchor threshold

Omitted files do not enter the denominator. The hunk-count floor keeps tiny hunks counting. The hard gate writes its ratio, regime, and visible basis into the gate detail.

What safety findings fail a run?

An unmitigated Critical safety finding fails the run. A Critical secret finding is treated as mitigated only when the local capture record shows a complete redaction shape, including the rule, hash, source, line, and column.

Can you guarantee there are no bugs?

No. AhaDiff reduces risk by keeping data local, linking lesson claims to diff evidence, and exposing deterministic score and hard-gate details. You should still review generated lessons and run your normal test suite before relying on a change. For release status, use the current GitHub Actions and release notes instead of this static guide.

Notes

9. About

Screenshots in this guide are examples only. They carry no real provider credentials, repository contents, or user data.

Platform support

macOS and Linux are the primary tested platforms. Windows runs the core flows; two features stay POSIX-only.

FeaturemacOS · LinuxWindows
Core CLI
serve + WebUI
--compare-dir
hooks install target
Install rollbackpartial
--compare-dir needs the secure directory file descriptor available on macOS / Linux only, and fails closed elsewhere. The hooks target uses POSIX shell hooks. Rollback restores mode before atomic replace on POSIX, and uses a best-effort mode restore after replace where fchmod is missing.

Validation snapshot

Local checks recorded for the current docs snapshot. Release and CI status belong in GitHub Actions.

Validated

backend unit ruff / format / pyright viewer Vitest + build i18n parity local LLM E2E
For continuous-integration status, read the GitHub Actions tab of the upstream repository, not this document.