Documenting a legacy Fortran codebase¶

fortranspire doc is a standalone documentation generator for legacy Fortran 90 code. It can run completely outside the GPU porting pipeline — its only goal is to produce readable, structured documentation for codes that have lost their original authors.

Two outputs are available, individually or together:

Inline Doxygen-style docstrings (!> blocks) injected directly above each subroutine / function. Idempotent — re-runs replace the previous block rather than stacking new ones.
A self-contained Sphinx site (--sphinx or --site-only) with one .rst per file and per-routine sections, including a Show source toggle when the togglebutton extension is present.

Every routine is documented at two levels in a single LLM call:

short_summary (≤ 1 line) — stakeholder / project-manager view.
detailed (2–4 sentences) — developer view, mentioning INTENT semantics, invariants, and known gotchas (hidden SAVE, COMMON blocks, index ordering).

Quick start¶

# Annotate every kernel in src/ with inline !> docstrings
fortranspire doc src/

# In addition, generate a Sphinx site under documentation/<project>/
fortranspire doc --sphinx src/

# Sphinx site only — leave the source files alone
fortranspire doc --site-only src/

# Show what would be inserted, do not modify the source
fortranspire doc --dry-run src/kernel.f90

# No LLM call — useful in CI / for offline runs / to verify the plumbing
fortranspire doc --no-llm src/

The generated Sphinx site builds with:

cd documentation/<project>
pip install -r requirements.txt
make html
open build/html/index.html

What the inline docstring looks like¶

For a routine update_vx with three arguments, the generated block above the subroutine keyword looks like this:

!> @generated_by fortranspire v1 routine=update_vx body=8c9f1e2a3b4c
!> @brief   Update the horizontal velocity component using a one-sided FD
!>          stencil on the σxx field.
!> @details Reads sigma_xx at (i,j) and (i-1,j); writes vx in place.
!>          INTENT(INOUT) on vx is required because the time loop calls
!>          this routine N times accumulating updates. Assumes a regular
!>          Cartesian grid; dx must be > 0.
!> @param[inout] vx        Horizontal velocity field, modified in place.
!> @param[in]    sigma_xx  Stress tensor xx component, read-only.
!> @param[in]    dx        Grid spacing in the x direction.
subroutine update_vx(vx, sigma_xx, dx, nx, ny)
  ...

The @generated_by fortranspire marker on the first line is the idempotency anchor. Re-running fortranspire doc detects the existing block, strips it, and emits a fresh one — your source file never accumulates duplicate documentation.

The trailing body=<hash> lets a future incremental mode skip routines whose body has not changed since the last documentation pass.

Operating modes¶

Flag	Effect
(default)	Inline `!>` docstrings, source files modified in place
`--sphinx`	Inline + generate `documentation/<project>/` Sphinx site
`--site-only`	Generate the Sphinx site only, leave source files untouched
`--no-llm`	Skip the LLM calls (signatures / argument list only, no narrative)
`--dry-run`	Print the rewritten source to stdout, do not write anything
`--project NAME`	Project name used in the Sphinx site title (default: inferred from path)
`--output DIR`	Where the Sphinx site is written (default: `documentation/`)

Cost model¶

One LLM call per routine, against the Codestral model (MISTRAL_MODEL_CODE, which defaults to codestral-latest). At Codestral tariffs:

Codebase size	LLM calls	Wall-clock	Cost
10 routines	10	~30 s	~0.02 USD
50 routines	50	~3 min	~0.10 USD
200 routines	200	~10 min	~0.40 USD

Re-runs on unchanged routines will become free in a future iteration (the body hash is already recorded in the inline block — only the cache lookup is missing). For now, prefer --dry-run on a single file when iterating on prompts.

Pairing with the analyzer¶

fortranspire doc and fortranspire analyze complement each other:

fortranspire analyze answers “is this code GPU-ready?” — deterministic, zero LLM, runs in CI.
fortranspire doc answers “what does this code do?” — LLM-driven, one-shot, run by the maintainer or as a release artifact.

Run the analyzer first to surface the structural issues (COMMON, SAVE, missing INTENT); then run fortranspire doc to capture the intent the analyzer cannot guess.

Limitations¶

The Loki AST extraction handles modules and free-form Fortran 90 well; fixed-form sources (Fortran 77 with * columns) may need a manual conversion first.
LLM-generated narratives describe what the routine looks like it does, not what it was meant to do. Always review for domain-specific accuracy before publishing.
The Sphinx site uses furo and pure-RST per-routine sections. It does not currently use sphinxfortran (the project is largely unmaintained); a Show source toggle is added when sphinx-togglebutton is installed.