Write an extractor script
Generated Markdown for references/process_writing_extractor.md.
Open book page Back to the skill graph
# Write an extractor script
## Purpose
Automate one machine-derivable slice of the WAD so it stays honest against its source of truth.
## Prerequisites
- uv installed
- A source of truth worth automating (CI config, build metadata, a database, an API)
## Flowchart

## Steps
### Step 1: Pick one concern
One script = one output file = one source of truth. Name it `extract_<thing>.py`; its output is `data/generated/<thing>.wcl`.
### Step 2: Start from the template
Copy `extractor_template.py` (bundled with this skill; `extract_repo.py` ships in every scaffold). Set the PEP 723 header's `dependencies` for whatever the source needs (`pyyaml`, a DB driver, `httpx`…).
### Step 3: Map source records to WAD blocks
Decide which block kinds the findings become (relations? pipeline stages? a `code_item :db_schema` with `db_table`/`db_column` children?). When the data genuinely fits no base family, declare a **typed custom block** in `schema/extensions.wcl` and emit that instead, adding a matching render to the book template — better a first-class block than data contorted into the wrong shape. Anything needing human naming — like source name → WAD container id — goes in a small in-script table, so customising the mapping is editing data, not logic.
### Step 4: Emit deterministically
Banner comment first, `namespace wcl.wad`, then blocks — collections sorted, ids derived from source names, no timestamps. Full overwrite of the one output file; an empty result still writes the banner.
### Step 5: Wire and verify
Add the one `import "./<thing>.wcl"` line in `data/generated/main.wcl` (once, committed). Run the script twice — the second run must be byte-identical — then `wcl check` and render. If the extractor replaces hand-authored blocks, delete those and leave a comment pointing at the script.
> [!TIP]
> **Verification**
> Running the script twice leaves `git status` quiet; `wcl check` stays green; the output file's blocks render in the book.
## Related
- [Extractor scripts](../references/fact_extractor_anatomy.md)
- [Generated vs hand-authored data](../references/concept_generated_vs_hand.md)
[← Back to SKILL.md](../SKILL.md)