Add Schema Followups
tasks20/20
1. Reconciliation primitive
- Add
src/manifest-reconcile.tsexportingparseManifestStreams,parseSchemaStreams,scanEmittedStreams,reconcile, andreconcileFromDisk. Pure functions only — no globbing or DB access. - Add 16 unit tests under
src/manifest-reconcile.test.tscovering every parser branch and reconcile case (aligned, undeclared-emit, schemaless-emit, ghost-stream, dynamic-emit-miss). - Add
bin/reconcile-manifests.tsCLI that checks schema-bearing connectors by default, finds matching manifests/connectors, reports drift, exits nonzero on any drift, and exposes--allfor the broader schemaless-connector audit. - Add
bin/reconcile-manifests.test.ts— regression net that asserts every connector with aschemas.tsaligns. 11 tests today (amazon, chase, chatgpt, claude_code, codex, github, gmail, reddit, slack, usaa, ynab).
2. Pilot fixtures
- Author 42 fixture files (62 total records) at
fixtures/<connector>/scrubbed/pilot-real-shape/records/<stream>.jsonlfor the six newly-schemed connectors. Authored by sub-agents fromschemas.ts+parsers.ts+types.ts; reviewed and replayed. - Synthetic content uses
[REDACTED_*]placeholders for identifying fields. No real owner data, no DB extracts. Records are committable. - Add
bin/replay-pilot-fixtures.ts— diagnostic that replays every committed fixture throughvalidateRecordand reports drift. - Add
src/pilot-fixture-test-helper.ts—registerPilotFixtureTests({ connector, validateRecord }). Each invocation registers one test per stream file; missing fixtures or schema-failing records fail the test loud. - Add per-connector
pilot-fixture.test.tsfor github, gmail, ynab, codex, claude_code, slack. Three lines each.
3. zod cleanup
- Replace
z.string().url()withz.url()in amazon, chase, usaa schemas (zod 4 idiomatic; deprecation hint cleared).
4. Followups documented
- Update
add-polyfill-connector-system/tasks.mdusaa row with the schema-replay findings (4 stale-currency records, 2 empty-string records) and the decision: re-ingest, do not loosen schema. - Confirm reddit re-ingest tracking exists in
add-polyfill-layer-two-stream-coverage/tasks.md(it does, both in §1.7 and §1.8). No new tracking needed.
5. Validation gates
-
pnpm --dir packages/polyfill-connectors run typecheckclean. - All previously-existing tests continue to pass.
- 42 new pilot-fixture replay tests pass.
- 11 new fleet-reconciliation tests pass.
- 16 new manifest-reconcile unit tests pass.
-
pnpm --dir packages/polyfill-connectors run verify. -
openspec validate add-schema-followups --strictandopenspec validate add-schema-validation-coverage --strict. -
openspec validate --all --strict.
6. Out of scope
- USAA data cleanup (re-ingest the 6 stale records). Tracked in
add-polyfill-connector-system/tasks.md. - Reddit re-ingest. Tracked in
add-polyfill-layer-two-stream-coverage/tasks.md. - Browser-daemon retirement. Owner-reviewed separately.
- The 12 connectors with
parsers.tsbut noschemas.ts(apple_health, github prior to this tranche, ical, imessage, pocket, slack-extra-streams, spotify, strava, twitter_archive, whatsapp, etc.) — out of scope for this change; tracked separately as a connector-by-connector effort.