Skip to content

forgeplan calibrate-estimate

forgeplan calibrate-estimate closes the loop on forgeplan estimate. You pass the actual hours you spent on an artifact, and it reports how far the estimate was off — per grade, per FR, and overall. Over time this feeds back into your grade_profile config so estimates get sharper.

Without calibration, estimates drift silently: you think you’re “senior on backend” but consistently overshoot 50%. This command surfaces that gap so you can adjust the multiplier, not just your intuition.

  • End of sprint: run on every PRD/RFC you closed to build a calibration dataset.
  • After a surprise (big over/under-run) — immediate feedback while context is fresh.
  • Quarterly review — aggregate --json output to tune estimate.grade_profile in config.
  • Benchmarking a new grade profile — compare actual vs different grade assumptions.
  • Mid-sprint on an unfinished artifact — actuals must be final.
  • On artifacts without FR/Phase items — the original estimate was a guess, not a model.
forgeplan calibrate-estimate [OPTIONS] --actual-hours <ACTUAL_HOURS> <ID>
<ID> Artifact ID to calibrate
--actual-hours <ACTUAL_HOURS> Actual hours spent
--grade <GRADE> Grade to compare (junior, mid, senior). Defaults to total score
-h, --help Print help
-V, --version Print version
Terminal window
forgeplan calibrate-estimate PRD-001 --actual-hours 18

Output:

PRD-001 — Auth System
estimated (senior): 13.5h
actual: 18.0h
drift: +33.3% (over)
verdict: estimate undershot — adjust senior backend multiplier +15%
Terminal window
forgeplan calibrate-estimate PRD-001 --actual-hours 18 --grade middle

Output:

estimated (middle): 22.0h
actual: 18.0h
drift: -18.2% (under)
verdict: you performed between middle and senior on this one
Terminal window
for id in PRD-001 PRD-002 PRD-003; do
forgeplan calibrate-estimate "$id" --actual-hours "$(cat actuals/$id.txt)"
done

Pipe results to a script to compute average drift per grade/domain and propose new multipliers for .forgeplan/config.yaml.

Drift rangeInterpretation
within ±15%estimate is accurate — no tuning needed
+15% … +40%under-estimated — increase grade multiplier
> +40%depth was wrong, not estimate — escalate
-15% … -40%over-estimated — decrease multiplier
< -40%scope was cut or estimate inflated

Track drift per domain (backend, frontend, devops, ai_ml). A single PRD is noise; ten PRDs is signal.

sprint close → calibrate-estimate (per artifact) → tune grade_profile → next sprint sharper

Think of this as telemetry for your own estimation. The CLI doesn’t auto-adjust config — you review the drift and decide what to change.