Skip to content

forgeplan score

forgeplan score computes the R_eff (effective reliability) of a decision artifact based on the EvidencePacks linked to it. R_eff follows the weakest-link rule from Quint-code: R_eff = min(evidence_scores)never an average. One weak piece of evidence sinks the whole decision, which is the whole point: a PRD backed by one strong benchmark and one refuted test is still a risky PRD.

Each evidence contribution is shaped by its congruence_level (CL0..CL3), verdict (supports / weakens / refuses), and valid_until (decay applies when expired). Without any linked evidence, R_eff = 0.0 and forgeplan health will flag the artifact as a blind spot.

  • Right after linking a new EvidencePack (forgeplan link EVID-012 PRD-001 --relation informs).
  • Before forgeplan activate — if R_eff is still 0, you are activating a promise without proof.
  • In bulk with --all after a sprint to refresh cached R_eff on every active decision.
  • As a debugging tool: if health flags a blind spot, score --json shows which evidence is dragging the min down.
  • On Notes, Problems, or Epics — they are not decisions and carry no R_eff.
  • Before you have any evidence — the answer will always be 0.0.
forgeplan score [OPTIONS] [ID]
[ID] Artifact ID (omit with --all to score everything)
--all Score all active decision artifacts and update cached R_eff
--json Output as JSON for machine consumption
-h, --help Print help
-V, --version Print version
Terminal window
forgeplan score PRD-001

Typical output:

PRD-001 — Auth System
Evidence contributions:
EVID-012 supports CL3 valid → 1.00
EVID-015 supports CL2 valid → 0.90
EVID-018 weakens CL1 valid → 0.40
R_eff = 0.40 (weakest link: EVID-018)
DerivedStatus: COMPARED

The weakest link (EVID-018) is pulling R_eff down. Either strengthen that evidence, refute it, or replace it.

Terminal window
forgeplan score --all

Used at sprint end to update cached R_eff across the workspace. forgeplan health reads these cached values.

Terminal window
forgeplan score PRD-001 --json | jq '.r_eff, .weakest_link'
R_eff rangeDerivedStatusWhat it means
0.00UNDERFRAMEDNo evidence linked — blind spot
0.01–0.39FRAMEDWeak evidence or refuting signals dominate
0.40–0.69EXPLORINGSome support, some doubt
0.70–0.89COMPAREDStrong support, at least one caveat
0.90–1.00DECIDED/APPLIEDHigh-trust, ready to build on

Congruence Level (CL) penalties: CL3=0.0, CL2=0.1, CL1=0.4, CL0=0.9. Evidence from the wrong context (CL0) is treated as almost-missing. Expired valid_until caps the contribution at 0.1.

R_eff confidence intervals (v0.17+, PRD-040)

Section titled “R_eff confidence intervals (v0.17+, PRD-040)”

As of v0.17.0 (PRD-040, Scoring Intelligence) forgeplan score reports a confidence interval alongside the point estimate. The interval widens when evidence is sparse, stale, or concentrated on a single EvidencePack, and narrows as more independent evidence is linked.

Old format (v0.16 and earlier):

PRD-001 — Auth System
R_eff = 0.80

New format (v0.17+):

PRD-001 — Auth System
R_eff = 0.80 [0.65 — 0.92]

The bracketed range is a lower / upper bound. Read it as: “point estimate 0.80, but with the evidence we currently have, the true reliability could plausibly be anywhere from 0.65 to 0.92.” Use the interval, not the point, when deciding whether a decision is safe to ship.

A point R_eff of 0.80 looks the same whether it is backed by one benchmark at CL3 or five benchmarks at CL3. But one-piece evidence is brittle — if that single EvidencePack turns out to be wrong, your R_eff collapses. The confidence interval exposes that brittleness:

SituationR_effInterval
1 supports / CL3 evidence0.80[0.40 — 0.92] wide
5 supports / CL3, 1 weakens / CL20.80[0.72 — 0.88] narrow
1 supports / CL3 with valid_until expiring0.80[0.30 — 0.90] wide (decay)

A wide interval means “R_eff looks fine but you are one surprise away from falling off a cliff — add more evidence.” A narrow interval means “R_eff is stable, multiple independent proofs agree.”

The interval is computed as a heuristic over evidence count, CL distribution, verdict mix, and valid_until proximity — it is not a formal statistical confidence interval, and you should not treat it as one.

code → new evidence → link → score → review → activate
└── R_eff must be > 0 to pass review gate

score is cheap, call it often. The cached R_eff feeds forgeplan health, forgeplan decay, and the project dashboard.