forgeplan calibrate-estimate
forgeplan calibrate-estimate closes the loop on forgeplan estimate. You pass the
actual hours you spent on an artifact, and it reports how far the estimate was off —
per grade, per FR, and overall. Over time this feeds back into your grade_profile
config so estimates get sharper.
Without calibration, estimates drift silently: you think you’re “senior on backend” but consistently overshoot 50%. This command surfaces that gap so you can adjust the multiplier, not just your intuition.
When to use
Section titled “When to use”- End of sprint: run on every PRD/RFC you closed to build a calibration dataset.
- After a surprise (big over/under-run) — immediate feedback while context is fresh.
- Quarterly review — aggregate
--jsonoutput to tuneestimate.grade_profilein config. - Benchmarking a new grade profile — compare actual vs different grade assumptions.
When NOT to use
Section titled “When NOT to use”- Mid-sprint on an unfinished artifact — actuals must be final.
- On artifacts without FR/Phase items — the original estimate was a guess, not a model.
forgeplan calibrate-estimate [OPTIONS] --actual-hours <ACTUAL_HOURS> <ID>Arguments
Section titled “Arguments” <ID> Artifact ID to calibrateOptions
Section titled “Options” --actual-hours <ACTUAL_HOURS> Actual hours spent --grade <GRADE> Grade to compare (junior, mid, senior). Defaults to total score -h, --help Print help -V, --version Print versionExamples
Section titled “Examples”Calibrate one PRD after sprint close
Section titled “Calibrate one PRD after sprint close”forgeplan calibrate-estimate PRD-001 --actual-hours 18Output:
PRD-001 — Auth System estimated (senior): 13.5h actual: 18.0h drift: +33.3% (over) verdict: estimate undershot — adjust senior backend multiplier +15%Compare against a different grade
Section titled “Compare against a different grade”forgeplan calibrate-estimate PRD-001 --actual-hours 18 --grade middleOutput:
estimated (middle): 22.0hactual: 18.0hdrift: -18.2% (under)verdict: you performed between middle and senior on this oneAggregate across the sprint
Section titled “Aggregate across the sprint”for id in PRD-001 PRD-002 PRD-003; do forgeplan calibrate-estimate "$id" --actual-hours "$(cat actuals/$id.txt)"donePipe results to a script to compute average drift per grade/domain and propose new
multipliers for .forgeplan/config.yaml.
Output interpretation
Section titled “Output interpretation”| Drift range | Interpretation |
|---|---|
| within ±15% | estimate is accurate — no tuning needed |
| +15% … +40% | under-estimated — increase grade multiplier |
| > +40% | depth was wrong, not estimate — escalate |
| -15% … -40% | over-estimated — decrease multiplier |
| < -40% | scope was cut or estimate inflated |
Track drift per domain (backend, frontend, devops, ai_ml). A single PRD is noise; ten PRDs is signal.
How it fits the workflow
Section titled “How it fits the workflow”sprint close → calibrate-estimate (per artifact) → tune grade_profile → next sprint sharperThink of this as telemetry for your own estimation. The CLI doesn’t auto-adjust config — you review the drift and decide what to change.
See also
Section titled “See also”forgeplan estimate— the estimate this calibrates againstforgeplan config— where grade_profile lives- Unified Workflow — sprint close rituals
- CLI overview