Skip to content

Changelog

All notable changes to Forgeplan are documented here. Format loosely follows Keep a Changelog. Semver is MAJOR.MINOR.PATCH with pre-1.0 minor bumps for breaking changes.

The canonical source is CHANGELOG.md in the repository. This page is generated from it at build time via scripts/copy-changelog.mjs.


All notable changes to Forgeplan are documented here. Format loosely follows Keep a Changelog. Semver: MAJOR.MINOR.PATCH with pre-1.0 minor bumps for breaking changes.

This file starts at v0.17.0. For prior releases, see git tags and the corresponding sprint evidence under .forgeplan/evidence/.

[0.18.0] — 2026-04-11 — Production BM25 + Russian morphology + quality gates

Section titled “[0.18.0] — 2026-04-11 — Production BM25 + Russian morphology + quality gates”

Feature release upgrading the search engine and codifying quality rules.

  • Production BM25 engine (bm25 crate v2.3.2). Replaces 140 LOC hand-written BM25 with production-quality implementation including stemming, stop-word removal, and unicode segmentation.
  • Russian morphology support. LanguageMode::Detect with whichlang auto-selects Snowball stemmer per document/query. “аутентификация” now matches “аутентификации” via shared stem. 17 languages supported.
  • Template noise stripping. strip_indexing_noise() removes YAML frontmatter, template placeholder lines {...}, markdown table rows |...|, and HTML comments from BM25 index. Fixes false positives where forgeplan search "auth" matched unrelated PRDs via author: in frontmatter.
  • O(N) batch search. Single-pass search_scores() replaces O(N²) per-record .score() calls. 193-artifact corpus: 0.23s.
  • 8-point verification checklist in CLAUDE.md — mandatory before every commit/PR. Covers: unit tests, edge cases, E2E on fresh workspace, verbatim template test, dogfood stress test, regression guard (A/B), negative tests, cross-language verification.
  • Health debt resolved: 8 active stubs deprecated/superseded, 5 duplicate EVID pairs deprecated, 3 orphan NOTEs linked. Health dashboard reports “Project looks healthy!” with zero warnings.
  • 1150 tests pass (+19 from v0.17.2 baseline 1131).
  • New regression tests: Russian morphology (2), English stemming (1), plural forms (1), stop-word resilience (1), noise stripping (2), frontmatter false-positive guard (1).

[0.17.2] — 2026-04-09 — Quality hotfix: scoring & search integrity

Section titled “[0.17.2] — 2026-04-09 — Quality hotfix: scoring & search integrity”

Fixes five real bugs found during a dedicated /forge E2E verification sprint on a fresh workspace (separate from the dogfood audit that produced v0.17.1). Each bug was reproduced on the v0.17.1 release binary before fixing, and the fix verified A/B on an identical workspace.

The headline fix is PROB-034 (CRITICAL) — a silent trust-calculus regression present since v0.17.0 that inflated R_eff scores across every workspace using the default evidence template.

  • PROB-034 (CRITICAL) — Multi-line HTML comments shadowed real structured fields in extract_field. crates/forgeplan-core/src/scoring/evidence.rs::extract_field skipped only lines literally starting with <!--, not lines inside a multi-line comment block. The evidence template ships with a help comment:

    <!--
    verdict: supports | weakens | refutes
    congruence_level: 0 | 1 | 2 | 3 (CL3=same context, CL0=opposed)
    -->

    The placeholder line congruence_level: 0 | 1 | 2 | 3 (CL3=...) does not start with <!--, so the parser matched it, parse::<u8>() failed on the non-numeric string, explicit_cl became None, and the real congruence_level: X in the Structured Fields section below was never inspected. Every evidence artifact ever created via the default template silently reset to CL3 (no penalty), artificially inflating R_eff across every workspace since v0.17.0.

    • Fix: extract_field now implements a proper multi-line comment state machine — tracks an in_multiline_comment boolean, skipping all lines between <!-- and --> when they span multiple lines.
    • Affects all fields parsed via extract_field: verdict, congruence_level, evidence_type, source_tier — all were silently defaulted. The fix is transitive.
    • A/B verification on /tmp/fp-prob034-repro with identical workspace: v0.17.1 binary → r_eff=1.0000, CL=3; v0.17.2 binary → r_eff=0.1000, CL=0 (correct for explicit CL0 evidence).
    • Regression tests: extract_field_ignores_multiline_html_comments, extract_field_multiline_comment_nested_fields_all_ignored.
  • PROB-030 — BM25 prefix queries returned 0 results. crates/forgeplan-core/src/search/smart.rs computed keyword_score (substring match) for diagnostics but passed only bm25_norm to combined_score. BM25 is token-based, so auth did not match the token authentication, and prefix queries silently returned nothing.

    • Fix: let keyword_channel = bm25_norm.max(kw); — BM25 still wins on exact-token matches (richer signal), but substring fallback kicks in when BM25 returns 0 for prefix queries.
    • Regression tests: smart_search_prefix_query_falls_back_to_substring, smart_search_exact_token_still_wins_over_prefix.
  • PROB-031 — CLI score command had its own divergent evidence parser. The CLI parse_evidence_from_record in score.rs duplicated core’s function but with a different default CL (CL0 vs CL3), creating a visible contradiction: display said CL0 = 0.1 while the r_eff_recursive rollup computed 1.00 via core’s parser. The local CLI parser also did NOT implement the PRD-035 Sprint 13.3 H2 security precedence (min(tier_cl, explicit_cl)), opening a trust-amplification attack surface on the display path.

    • Fix: deleted the local duplicate and extract_field helper; imported forgeplan_core::scoring::evidence::parse_evidence_from_record. Display and rollup now read identical values by construction.
    • Regression test: score_uses_core_parser_with_cl3_default_when_no_structured_fields.
  • PROB-032 — forgeplan search breakdown line lied about components. Display showed kw=0.0 sem=0.0 r=0.0 g=0.0 while total was 0.57. Caused by PROB-030: kw was computed but never flowed into combined_score.

    • Auto-fixed as side effect of PROB-030. Breakdown now shows real component values.
  • PROB-033 — forgeplan new evidence printed confusing session warning after forgeplan route. The session state machine attempted a Routing → Evidence transition, which is disallowed. The file WAS created, but stderr showed Session: Cannot go from 'routing' to 'evidence' — blocking legitimate backfill, audit, brownfield, and evidence-import flows in perception if not in fact.

    • Fix: forgeplan new evidence is now phase-agnostic — it never drives the session state machine. Only decision artifacts (prd/rfc/adr/epic/spec) advance to Shaping phase. Methodology guardrail still enforces at activate time via PRD-043 stub detection + validation gates.
    • Regression test: new_evidence_works_in_routing_phase_without_session_warning.
  • 1137 tests pass (+6 from v0.17.1 baseline 1131).
  • 6 new regression tests cover PROB-030 (2), PROB-031 (1), PROB-033 (1), PROB-034 (2).
  • cargo fmt --check clean, cargo clippy --workspace --all-targets -- -D warnings clean on both default and semantic-search feature.

If you are upgrading from v0.17.0 or v0.17.1 and you have evidence artifacts in your workspace, your R_eff scores were potentially inflated by the CL3 default (PROB-034). Re-run forgeplan score on critical PRDs after upgrade — any evidence that explicitly set congruence_level in Structured Fields will now be honored, and weak CL values may cause R_eff to drop. This is correct behavior; the previous values were silently wrong.

[0.17.1] — 2026-04-09 — Post-v0.17.0 dogfood hotfix

Section titled “[0.17.1] — 2026-04-09 — Post-v0.17.0 dogfood hotfix”

Fixes two bugs found during the v0.17.0 final dogfood audit when running forgeplan tree and forgeplan health on the dogfood workspace itself. PRD-043 detection (Sprint 13.1) correctly flagged the issues but two upstream bugs prevented them from being auto-resolved.

  • PROB-028 — Phantom rows in forgeplan tree (PRD-044). reindex Phase 2 (orphan cleanup) previously skipped rows whose kind field failed to parse via continue, letting corrupt/empty kind rows escape trim forever. Additionally, orphan relations whose source or target artifact had been deleted accumulated in the relations table and surfaced as ? phantoms in tree rendering.

    • Fix 1: Err(_) => continue changed to treat unparseable kind as a definite orphan (no valid kind means no valid directory means no possible file). Rows with corrupt kind now get trimmed along with normal orphans.
    • Fix 2: new Phase 3 in reindex trims orphan relations where source or target no longer exists in artifacts.
    • Output now reports removal reason: corrupt kind field vs no .md file found vs orphan relation (source|target|both missing).
    • reindex output gains a new counter: “K removed, N orphan relations”
  • PROB-029 — forgeplan health verdict contradicted its own warnings (PRD-045). Sprint 13.1 added active_stubs and possible_duplicates detection (PRD-043) and wired them into the warning display, but the generate_next_actions summary function was never updated to read those signals. Result: workspace with 8 stubs + 5 duplicate pairs printed “Project looks healthy” at the bottom.

    • Fix: generate_next_actions now takes possible_duplicates and active_stubs as parameters; compute order reshuffled so signals are available before the summary runs.
    • Next actions for stubs suggest forgeplan supersede ID --by NEW or forgeplan deprecate ID --reason "abandoned" with the concrete offending ID.
    • Next actions for duplicates suggest forgeplan deprecate B --reason "duplicate of A" with the concrete pair IDs.
    • “Project looks healthy” message only appears when genuinely no warnings of any category exist.
  • Phase 1 Implementation gains new rule: “Every new CLI flag / command / config option ships with ALL of these docs (no feature lands without): clap --help text, CHANGELOG entry, CLAUDE.md workflow section if user-facing, docs/methodology/ subsection if command-level.” Red flag: a PR adding a flag/command without touching clap help + CHANGELOG is incomplete — block merge.
  • 1131 tests pass (+3 from v0.17.0 — PRD-045 verdict aggregator tests)
  • 0 warnings on both default and --features semantic-search builds
  • Clippy strict (-D warnings) clean on Rust 1.94
  • Dogfood verification: forgeplan tree on dogfood workspace no longer shows ? phantoms; forgeplan health reports 3 concrete next actions instead of “looks healthy”
  • PROB-028 (phantom rows reindex bug)
  • PROB-029 (health verdict logic bug)
  • PRD-044 (reindex trim orphans — closes PROB-028)
  • PRD-045 (health verdict aggregator — closes PROB-029)
  • NOTE-044 (sprint checklist framework, docs completeness rule added)
  • NOTE-046 (dogfood cleanup task — duplicate EVID pairs, deferred)
  • NOTE-047 (dogfood cleanup task — false-active stubs, deferred)

First release of EPIC-003. Adds keyword + semantic search, brownfield discovery, scoring/routing intelligence, FPF rule surface, methodology integrity gates, and reusable sprint checklist framework.

  • 1109 tests passing (+280 from v0.16.0), zero failures, zero warnings on both default and --features semantic-search builds
  • 7 PRDs shipped across 8 sprints (13.0 → 13.7 + post-closeout hotfix)
  • FPF Knowledge Base gains semantic vector search via BGE-M3 embeddings
  • Methodology integrity gates catch stub artifacts, duplicates, orphans
  • Sprint checklist framework (NOTE-044) to prevent regression in future releases

Smart Search v2 — PRD-039, Sprint 13.2

  • BM25 ranking replaces substring scoring in forgeplan search
  • Composable filter DSL (--status, --depth, --since, --with-evidence)
  • 1-hop graph neighbor expansion (opt-out via --no-expand)
  • Extended MCP search tool parameters

Brownfield Discovery — PRD-035, Sprints 13.3 + 13.4

  • Tags system in frontmatter + LanceDB schema (v3→v4 migration)
  • forgeplan tag / untag commands + list --tag key=value filter
  • SourceTier → Congruence Level mapping (T1→CL3, T2→CL2, T3→CL1)
  • forgeplan discover CLI command (session state machine)
  • MCP tools: forgeplan_discover_start, _scan, _next, _status

Scoring & Routing Intelligence — PRD-040, Sprint 13.5

  • Routing Skills Memory with exponential decay (90-day half-life)
  • R_eff confidence intervals heuristic (widens with sparse/stale evidence)
  • forgeplan score displays [low — high] interval alongside point estimate

FPF Rules Surface — PRD-041, Sprint 13.6

  • forgeplan fpf rules — action-grouped tree (EXPLORE/INVESTIGATE/EXPLOIT) with --flat and --json modes
  • forgeplan fpf check <id> — per-artifact rule match introspection with --verbose (unmatched list) and --json (canonical shape)
  • MCP tools: forgeplan_fpf_rules (with action/name/summary/source filters) and forgeplan_fpf_check

FPF KB Vector Search — PRD-042, Sprint 13.7 (supersedes PRD-018)

  • embedding column (FixedSizeList<Float32, 1024>) added to fpf_spec table, backward-compatible migration via NewColumnTransform::AllNulls
  • LanceStore::search_fpf_by_vector(query_vec, limit) using LanceDB native vector_search with DistanceType::Cosine
  • forgeplan fpf search <query> --semantic CLI flag
  • MCP forgeplan_fpf_search gains semantic: Option<bool> param
  • Two-layer graceful fallback — compile-time (feature off) + runtime (Embedder init fail / encode fail / vector search fail) → warning + keyword fallback
  • NaN/Inf rejection at insert_fpf_chunks boundary
  • Runtime Embedder::dim() == EMBEDDING_DIM assertion

Methodology Integrity — PRD-043, Sprint 13.1

  • Duplicate guard (forgeplan new detects existing similar artifacts)
  • Stub detection (blocks activate on unfilled templates)
  • Health detection (forgeplan health --ci exits non-zero on blind spots)
  • MCP warning envelope for methodology violations
  • State machine: Phase enum with validate_transition enforcing Idle → Routing → Shaping → Coding → Evidence → PR for Standard+ depth

Sprint Checklist Framework — NOTE-044 (post-closeout deliverable)

  • Reusable quality gate for every future sprint, 7 phases with red flags
  • Encodes lessons from Sprint 13.7 retrospective
  • Explicit “what not to skip” checklist for planning / implementation / audit / fixer / re-audit / manual UX / closeout / meta phases
  • FPF KB schema: backward-compatible migration adds embedding column (nullable). Existing workspaces work unchanged; re-ingest to populate embeddings.
  • MCP tool registry expanded from ~37 to ~47 tools
  • CI linter: forgeplan health --ci + validate --ci land (Sprint 11.3)
  • FpfStorage trait extendedinsert_fpf_chunks now accepts optional embeddings; search_fpf_by_vector added to trait (no default impl, forcing explicit backend choice per Sprint 13.7 hotfix re-audit)
  • CLI fpf search input validation — empty / oversized (>8192 chars) queries rejected before store access
  • MCP param length bounds on forgeplan_fpf_search and forgeplan_fpf_rules (id ≤128, name ≤128, action ≤64, source ≤16)
  • ANSI strip on user-supplied query echoed in error messages (No FPF sections match '{}' sanitized against control chars)
  • PRD-018 “FPF Knowledge Base — semantic search” — superseded by PRD-042. PRD-018 was a false-active stub with R_eff=1.0 but no real implementation, flagged by Sprint 13.1 methodology integrity work. PRD-042 closes the gap with actual BGE-M3 integration + supersedes PRD-018 to terminal state.
  • Sprint 13.1.5 hardening: LazyLock for check_stub, typed StubReport return, forgeplan import gate for active stubs (security bypass closed), configurable IntegrityConfig MCP limits
  • Sprint 13.1.7 integrity config wiring: IntegrityConfig::validate() now called by CLI command path; forgeplan health no longer crashes on minimal configs via #[serde(default)] on top-level Config fields
  • Sprint 13.6 FPF Rules canonical JSON: CLI and MCP now emit identical {artifact_id, kind, status, matched, unmatched, winning, summary} shape via typed RuleCheckResult, replacing hand-rolled serde_json::json!
  • Sprint 13.7 post-closeout hotfix (PR #156):
    • FpfStorage::search_fpf_by_vector added to trait (closes asymmetry)
    • MCP handler integration harness at crates/forgeplan-mcp/tests/
    • Real BGE-M3 end-to-end test (#[ignore], feature-gated)
    • Real v3 workspace migration test
    • Runtime dim assert + fpf_spec_schema rustdoc tying 1024 → BGE-M3
    • InMemoryStore::search_fpf_by_vector returns Err (not silent empty)
    • Wave 2 completer work re-audited (was originally skipped)
  • 1109 tests passing (+280 from v0.16.0)
  • Core crate: 897 tests; CLI: 99 + 40 integration; MCP: 15 unit + 7 handler
  • 42 MB release binary (strip + lto + opt-level=z)
  • ~56 CLI commands, ~47 MCP tools
  • 7 new PRDs activated, 1 superseded (PRD-018 → PRD-042)
  • Sprint retrospective: 19 debts found, 11 fixed in hotfix, 8 backlog (NOTE-045), 6 process lessons (NOTE-044)
  • Dependent sprint branch base verification — new CLAUDE.md section covering the Sprint 13.1.5 rebase hell that taught us to verify parent branches contain expected commits before spawning teammates
  • Sprint Checklist Framework (NOTE-044) — reusable 7-phase gate to prevent planning gaps (was: “user had to ask ‘what did we miss’”)
  • Sprint 13.7 Deferred Debts (NOTE-045) — backlog tracking for the 8 non-blocking items that rolled forward from the retrospective

PRs #141 → #156. See git log main..release/v0.17.0 for full list.