Changelog
All notable changes to Forgeplan are documented here. Format loosely follows
Keep a Changelog. Semver is MAJOR.MINOR.PATCH
with pre-1.0 minor bumps for breaking changes.
The canonical source is CHANGELOG.md
in the repository. This page is generated from it at build time via scripts/copy-changelog.mjs.
All notable changes to Forgeplan are documented here. Format loosely follows
Keep a Changelog. Semver: MAJOR.MINOR.PATCH
with pre-1.0 minor bumps for breaking changes.
This file starts at v0.17.0. For prior releases, see git tags and the
corresponding sprint evidence under .forgeplan/evidence/.
[0.18.0] — 2026-04-11 — Production BM25 + Russian morphology + quality gates
Section titled “[0.18.0] — 2026-04-11 — Production BM25 + Russian morphology + quality gates”Feature release upgrading the search engine and codifying quality rules.
- Production BM25 engine (
bm25crate v2.3.2). Replaces 140 LOC hand-written BM25 with production-quality implementation including stemming, stop-word removal, and unicode segmentation. - Russian morphology support.
LanguageMode::Detectwithwhichlangauto-selects Snowball stemmer per document/query. “аутентификация” now matches “аутентификации” via shared stem. 17 languages supported. - Template noise stripping.
strip_indexing_noise()removes YAML frontmatter, template placeholder lines{...}, markdown table rows|...|, and HTML comments from BM25 index. Fixes false positives whereforgeplan search "auth"matched unrelated PRDs viaauthor:in frontmatter. - O(N) batch search. Single-pass
search_scores()replaces O(N²) per-record.score()calls. 193-artifact corpus: 0.23s. - 8-point verification checklist in CLAUDE.md — mandatory before every commit/PR. Covers: unit tests, edge cases, E2E on fresh workspace, verbatim template test, dogfood stress test, regression guard (A/B), negative tests, cross-language verification.
Changed
Section titled “Changed”- Health debt resolved: 8 active stubs deprecated/superseded, 5 duplicate EVID pairs deprecated, 3 orphan NOTEs linked. Health dashboard reports “Project looks healthy!” with zero warnings.
- 1150 tests pass (+19 from v0.17.2 baseline 1131).
- New regression tests: Russian morphology (2), English stemming (1), plural forms (1), stop-word resilience (1), noise stripping (2), frontmatter false-positive guard (1).
[0.17.2] — 2026-04-09 — Quality hotfix: scoring & search integrity
Section titled “[0.17.2] — 2026-04-09 — Quality hotfix: scoring & search integrity”Fixes five real bugs found during a dedicated /forge E2E verification sprint on a fresh workspace (separate from the dogfood audit that produced v0.17.1). Each bug was reproduced on the v0.17.1 release binary before fixing, and the fix verified A/B on an identical workspace.
The headline fix is PROB-034 (CRITICAL) — a silent trust-calculus regression present since v0.17.0 that inflated R_eff scores across every workspace using the default evidence template.
-
PROB-034 (CRITICAL) — Multi-line HTML comments shadowed real structured fields in
extract_field.crates/forgeplan-core/src/scoring/evidence.rs::extract_fieldskipped only lines literally starting with<!--, not lines inside a multi-line comment block. The evidence template ships with a help comment:<!--verdict: supports | weakens | refutescongruence_level: 0 | 1 | 2 | 3 (CL3=same context, CL0=opposed)-->The placeholder line
congruence_level: 0 | 1 | 2 | 3 (CL3=...)does not start with<!--, so the parser matched it,parse::<u8>()failed on the non-numeric string,explicit_clbecameNone, and the realcongruence_level: Xin the Structured Fields section below was never inspected. Every evidence artifact ever created via the default template silently reset to CL3 (no penalty), artificially inflating R_eff across every workspace since v0.17.0.- Fix:
extract_fieldnow implements a proper multi-line comment state machine — tracks anin_multiline_commentboolean, skipping all lines between<!--and-->when they span multiple lines. - Affects all fields parsed via
extract_field:verdict,congruence_level,evidence_type,source_tier— all were silently defaulted. The fix is transitive. - A/B verification on
/tmp/fp-prob034-reprowith identical workspace: v0.17.1 binary →r_eff=1.0000, CL=3; v0.17.2 binary →r_eff=0.1000, CL=0(correct for explicit CL0 evidence). - Regression tests:
extract_field_ignores_multiline_html_comments,extract_field_multiline_comment_nested_fields_all_ignored.
- Fix:
-
PROB-030 — BM25 prefix queries returned 0 results.
crates/forgeplan-core/src/search/smart.rscomputedkeyword_score(substring match) for diagnostics but passed onlybm25_normtocombined_score. BM25 is token-based, soauthdid not match the tokenauthentication, and prefix queries silently returned nothing.- Fix:
let keyword_channel = bm25_norm.max(kw);— BM25 still wins on exact-token matches (richer signal), but substring fallback kicks in when BM25 returns 0 for prefix queries. - Regression tests:
smart_search_prefix_query_falls_back_to_substring,smart_search_exact_token_still_wins_over_prefix.
- Fix:
-
PROB-031 — CLI
scorecommand had its own divergent evidence parser. The CLIparse_evidence_from_recordinscore.rsduplicated core’s function but with a different default CL (CL0 vs CL3), creating a visible contradiction: display saidCL0 = 0.1while ther_eff_recursiverollup computed1.00via core’s parser. The local CLI parser also did NOT implement the PRD-035 Sprint 13.3 H2 security precedence (min(tier_cl, explicit_cl)), opening a trust-amplification attack surface on the display path.- Fix: deleted the local duplicate and
extract_fieldhelper; importedforgeplan_core::scoring::evidence::parse_evidence_from_record. Display and rollup now read identical values by construction. - Regression test:
score_uses_core_parser_with_cl3_default_when_no_structured_fields.
- Fix: deleted the local duplicate and
-
PROB-032 —
forgeplan searchbreakdown line lied about components. Display showedkw=0.0 sem=0.0 r=0.0 g=0.0while total was 0.57. Caused by PROB-030:kwwas computed but never flowed intocombined_score.- Auto-fixed as side effect of PROB-030. Breakdown now shows real component values.
-
PROB-033 —
forgeplan new evidenceprinted confusing session warning afterforgeplan route. The session state machine attempted aRouting → Evidencetransition, which is disallowed. The file WAS created, but stderr showedSession: Cannot go from 'routing' to 'evidence'— blocking legitimate backfill, audit, brownfield, and evidence-import flows in perception if not in fact.- Fix:
forgeplan new evidenceis now phase-agnostic — it never drives the session state machine. Only decision artifacts (prd/rfc/adr/epic/spec) advance to Shaping phase. Methodology guardrail still enforces atactivatetime via PRD-043 stub detection + validation gates. - Regression test:
new_evidence_works_in_routing_phase_without_session_warning.
- Fix:
- 1137 tests pass (+6 from v0.17.1 baseline 1131).
- 6 new regression tests cover PROB-030 (2), PROB-031 (1), PROB-033 (1), PROB-034 (2).
cargo fmt --checkclean,cargo clippy --workspace --all-targets -- -D warningsclean on both default andsemantic-searchfeature.
Impact
Section titled “Impact”If you are upgrading from v0.17.0 or v0.17.1 and you have evidence
artifacts in your workspace, your R_eff scores were potentially
inflated by the CL3 default (PROB-034). Re-run forgeplan score on
critical PRDs after upgrade — any evidence that explicitly set
congruence_level in Structured Fields will now be honored, and weak
CL values may cause R_eff to drop. This is correct behavior; the
previous values were silently wrong.
[0.17.1] — 2026-04-09 — Post-v0.17.0 dogfood hotfix
Section titled “[0.17.1] — 2026-04-09 — Post-v0.17.0 dogfood hotfix”Fixes two bugs found during the v0.17.0 final dogfood audit when running
forgeplan tree and forgeplan health on the dogfood workspace itself.
PRD-043 detection (Sprint 13.1) correctly flagged the issues but two
upstream bugs prevented them from being auto-resolved.
-
PROB-028 — Phantom rows in
forgeplan tree(PRD-044).reindexPhase 2 (orphan cleanup) previously skipped rows whosekindfield failed to parse viacontinue, letting corrupt/empty kind rows escape trim forever. Additionally, orphan relations whose source or target artifact had been deleted accumulated in the relations table and surfaced as?phantoms in tree rendering.- Fix 1:
Err(_) => continuechanged to treat unparseable kind as a definite orphan (no valid kind means no valid directory means no possible file). Rows with corrupt kind now get trimmed along with normal orphans. - Fix 2: new Phase 3 in
reindextrims orphan relations where source or target no longer exists in artifacts. - Output now reports removal reason:
corrupt kind fieldvsno .md file foundvsorphan relation (source|target|both missing). reindexoutput gains a new counter: “K removed, N orphan relations”
- Fix 1:
-
PROB-029 —
forgeplan healthverdict contradicted its own warnings (PRD-045). Sprint 13.1 addedactive_stubsandpossible_duplicatesdetection (PRD-043) and wired them into the warning display, but thegenerate_next_actionssummary function was never updated to read those signals. Result: workspace with 8 stubs + 5 duplicate pairs printed “Project looks healthy” at the bottom.- Fix:
generate_next_actionsnow takespossible_duplicatesandactive_stubsas parameters; compute order reshuffled so signals are available before the summary runs. - Next actions for stubs suggest
forgeplan supersede ID --by NEWorforgeplan deprecate ID --reason "abandoned"with the concrete offending ID. - Next actions for duplicates suggest
forgeplan deprecate B --reason "duplicate of A"with the concrete pair IDs. - “Project looks healthy” message only appears when genuinely no warnings of any category exist.
- Fix:
Methodology (NOTE-044 checklist addition)
Section titled “Methodology (NOTE-044 checklist addition)”- Phase 1 Implementation gains new rule: “Every new CLI flag / command
/ config option ships with ALL of these docs (no feature lands
without): clap
--helptext, CHANGELOG entry, CLAUDE.md workflow section if user-facing,docs/methodology/subsection if command-level.” Red flag: a PR adding a flag/command without touching clap help + CHANGELOG is incomplete — block merge.
- 1131 tests pass (+3 from v0.17.0 — PRD-045 verdict aggregator tests)
- 0 warnings on both default and
--features semantic-searchbuilds - Clippy strict (
-D warnings) clean on Rust 1.94 - Dogfood verification:
forgeplan treeon dogfood workspace no longer shows?phantoms;forgeplan healthreports 3 concrete next actions instead of “looks healthy”
- PROB-028 (phantom rows reindex bug)
- PROB-029 (health verdict logic bug)
- PRD-044 (reindex trim orphans — closes PROB-028)
- PRD-045 (health verdict aggregator — closes PROB-029)
- NOTE-044 (sprint checklist framework, docs completeness rule added)
- NOTE-046 (dogfood cleanup task — duplicate EVID pairs, deferred)
- NOTE-047 (dogfood cleanup task — false-active stubs, deferred)
0.17.0 — 2026-04-08 — EPIC-003: Search, Discovery, Intelligence
Section titled “0.17.0 — 2026-04-08 — EPIC-003: Search, Discovery, Intelligence”First release of EPIC-003. Adds keyword + semantic search, brownfield discovery, scoring/routing intelligence, FPF rule surface, methodology integrity gates, and reusable sprint checklist framework.
Highlights
Section titled “Highlights”- 1109 tests passing (+280 from v0.16.0), zero failures, zero warnings on
both default and
--features semantic-searchbuilds - 7 PRDs shipped across 8 sprints (13.0 → 13.7 + post-closeout hotfix)
- FPF Knowledge Base gains semantic vector search via BGE-M3 embeddings
- Methodology integrity gates catch stub artifacts, duplicates, orphans
- Sprint checklist framework (NOTE-044) to prevent regression in future releases
Smart Search v2 — PRD-039, Sprint 13.2
- BM25 ranking replaces substring scoring in
forgeplan search - Composable filter DSL (
--status,--depth,--since,--with-evidence) - 1-hop graph neighbor expansion (opt-out via
--no-expand) - Extended MCP
searchtool parameters
Brownfield Discovery — PRD-035, Sprints 13.3 + 13.4
- Tags system in frontmatter + LanceDB schema (v3→v4 migration)
forgeplan tag/untagcommands +list --tag key=valuefilter- SourceTier → Congruence Level mapping (T1→CL3, T2→CL2, T3→CL1)
forgeplan discoverCLI command (session state machine)- MCP tools:
forgeplan_discover_start,_scan,_next,_status
Scoring & Routing Intelligence — PRD-040, Sprint 13.5
- Routing Skills Memory with exponential decay (90-day half-life)
- R_eff confidence intervals heuristic (widens with sparse/stale evidence)
forgeplan scoredisplays[low — high]interval alongside point estimate
FPF Rules Surface — PRD-041, Sprint 13.6
forgeplan fpf rules— action-grouped tree (EXPLORE/INVESTIGATE/EXPLOIT) with--flatand--jsonmodesforgeplan fpf check <id>— per-artifact rule match introspection with--verbose(unmatched list) and--json(canonical shape)- MCP tools:
forgeplan_fpf_rules(withaction/name/summary/sourcefilters) andforgeplan_fpf_check
FPF KB Vector Search — PRD-042, Sprint 13.7 (supersedes PRD-018)
embeddingcolumn (FixedSizeList<Float32, 1024>) added tofpf_spectable, backward-compatible migration viaNewColumnTransform::AllNullsLanceStore::search_fpf_by_vector(query_vec, limit)using LanceDB nativevector_searchwithDistanceType::Cosineforgeplan fpf search <query> --semanticCLI flag- MCP
forgeplan_fpf_searchgainssemantic: Option<bool>param - Two-layer graceful fallback — compile-time (feature off) + runtime (Embedder init fail / encode fail / vector search fail) → warning + keyword fallback
- NaN/Inf rejection at
insert_fpf_chunksboundary - Runtime
Embedder::dim() == EMBEDDING_DIMassertion
Methodology Integrity — PRD-043, Sprint 13.1
- Duplicate guard (
forgeplan newdetects existing similar artifacts) - Stub detection (blocks
activateon unfilled templates) - Health detection (
forgeplan health --ciexits non-zero on blind spots) - MCP warning envelope for methodology violations
- State machine:
Phaseenum withvalidate_transitionenforcing Idle → Routing → Shaping → Coding → Evidence → PR for Standard+ depth
Sprint Checklist Framework — NOTE-044 (post-closeout deliverable)
- Reusable quality gate for every future sprint, 7 phases with red flags
- Encodes lessons from Sprint 13.7 retrospective
- Explicit “what not to skip” checklist for planning / implementation / audit / fixer / re-audit / manual UX / closeout / meta phases
Changed
Section titled “Changed”- FPF KB schema: backward-compatible migration adds
embeddingcolumn (nullable). Existing workspaces work unchanged; re-ingest to populate embeddings. - MCP tool registry expanded from ~37 to ~47 tools
- CI linter:
forgeplan health --ci+validate --ciland (Sprint 11.3) - FpfStorage trait extended —
insert_fpf_chunksnow accepts optional embeddings;search_fpf_by_vectoradded to trait (no default impl, forcing explicit backend choice per Sprint 13.7 hotfix re-audit) - CLI
fpf searchinput validation — empty / oversized (>8192 chars) queries rejected before store access - MCP param length bounds on
forgeplan_fpf_searchandforgeplan_fpf_rules(id ≤128, name ≤128, action ≤64, source ≤16) - ANSI strip on user-supplied query echoed in error messages
(
No FPF sections match '{}'sanitized against control chars)
Deprecated / Superseded
Section titled “Deprecated / Superseded”- PRD-018 “FPF Knowledge Base — semantic search” — superseded by PRD-042. PRD-018 was a false-active stub with R_eff=1.0 but no real implementation, flagged by Sprint 13.1 methodology integrity work. PRD-042 closes the gap with actual BGE-M3 integration + supersedes PRD-018 to terminal state.
- Sprint 13.1.5 hardening: LazyLock
for check_stub, typedStubReportreturn,forgeplan importgate for active stubs (security bypass closed), configurableIntegrityConfigMCP limits - Sprint 13.1.7 integrity config wiring:
IntegrityConfig::validate()now called by CLI command path;forgeplan healthno longer crashes on minimal configs via#[serde(default)]on top-levelConfigfields - Sprint 13.6 FPF Rules canonical JSON: CLI and MCP now emit identical
{artifact_id, kind, status, matched, unmatched, winning, summary}shape via typedRuleCheckResult, replacing hand-rolledserde_json::json! - Sprint 13.7 post-closeout hotfix (PR #156):
FpfStorage::search_fpf_by_vectoradded to trait (closes asymmetry)- MCP handler integration harness at
crates/forgeplan-mcp/tests/ - Real BGE-M3 end-to-end test (
#[ignore], feature-gated) - Real v3 workspace migration test
- Runtime dim assert +
fpf_spec_schemarustdoc tying 1024 → BGE-M3 InMemoryStore::search_fpf_by_vectorreturnsErr(not silent empty)- Wave 2 completer work re-audited (was originally skipped)
- 1109 tests passing (+280 from v0.16.0)
- Core crate: 897 tests; CLI: 99 + 40 integration; MCP: 15 unit + 7 handler
- 42 MB release binary (strip + lto + opt-level=z)
- ~56 CLI commands, ~47 MCP tools
- 7 new PRDs activated, 1 superseded (PRD-018 → PRD-042)
- Sprint retrospective: 19 debts found, 11 fixed in hotfix, 8 backlog (NOTE-045), 6 process lessons (NOTE-044)
Methodology lessons captured
Section titled “Methodology lessons captured”- Dependent sprint branch base verification — new CLAUDE.md section covering the Sprint 13.1.5 rebase hell that taught us to verify parent branches contain expected commits before spawning teammates
- Sprint Checklist Framework (NOTE-044) — reusable 7-phase gate to prevent planning gaps (was: “user had to ask ‘what did we miss’”)
- Sprint 13.7 Deferred Debts (NOTE-045) — backlog tracking for the 8 non-blocking items that rolled forward from the retrospective
Related PRs
Section titled “Related PRs”PRs #141 → #156. See git log main..release/v0.17.0 for full list.