Define Collection Acquisition Coverage
tasks52/52
1. Spec
- 1.1 Capture prior-art synthesis in
docs/research/acquisition-coverage-profile-slvp-evaluation-2026-06-13.md. - 1.2 Add OpenSpec proposal, design, tasks, and spec deltas for acquisition/coverage semantics.
- 1.3 Validate with
openspec validate define-collection-acquisition-coverage --strict.
2. First Implementation Tranche
- 2.1 Add connector-manifest metadata for acquisition methods and trigger/setup posture without source-specific UI branching.
- 2.2 Add an acquisition-batch recording shape carrying acquisition method, source format, parser version, artifact/content hash when applicable, event time range, parsed/accepted/duplicate/skipped/failed counts, media coverage facts, and safe warnings.
- 2.3 Make repeated owner-artifact upload idempotent and return the existing batch receipt when content is already known.
- 2.4 Add a generic pre-commit validation preview for owner artifacts where the parser can inspect before durable import.
- 2.4a Setup-status receipt down-payment: manual-upload validation evidence already stored in
source_binding.import_validationis now projected through owner setup status as a non-secretimport_receiptso the owner can review what the parser found after submission. This is not yet a separate validate-confirm-commit screen. - 2.4b The manual-upload owner flow now calls a non-durable validation-preview endpoint before import, renders "What PDPP found", blocks import until validation succeeds when a parser exists, and returns duplicate-artifact status without creating another draft.
- 2.4a Setup-status receipt down-payment: manual-upload validation evidence already stored in
- 2.5 Add a generic coverage receipt after commit with accepted, duplicate, skipped, failed, time-range, and gap facts.
- 2.5a Validation receipt down-payment: the setup-status page renders the uploaded file, validation status, detected format, estimated point/segment counts, coverage window, and acquisition method.
- 2.5b Acquisition-batch storage now carries committed accepted/duplicate/skipped/failed counts and setup status renders them in the durable receipt after ingest.
- 2.6 Project owner-artifact and multi-acquisition coverage into the reference owner-control projection alongside connection health without labeling expected manual staleness or missing media as generic failure.
- 2.6a
_ref/connectorsnow projects owner-onlyacquisition_coveragesummaries, and the records list can link to the latest import receipt without exposing acquisition diagnostics on grant-scoped reads. - 2.6b The source detail page renders acquisition batches as coverage receipts, not as scheduler/run failures.
- 2.6a
- 2.7 Ensure same-stream multi-acquisition keeps acquisition provenance and refuses silent cross-method merge without explicit identity rules.
- 2.7a Accepted records from batch-backed ingest now write a
record_acquisition_provenanceside table keyed by connector instance, stream, and record key. Stable record keys are the explicit identity rule for the first same-stream fixture; broader cross-method merge inference remains deferred. - 2.7b Committed-count updates now scope to the latest active acquisition batch instead of mutating every historical batch for a connection.
- 2.7a Accepted records from batch-backed ingest now write a
- 2.8 Replace source-specific owner setup branches with a manifest-driven source catalog and acquisition path chooser.
- 2.8a UI copy down-payment: the manifest-driven source catalog now presents a source journey (name, recommended next action, current support fact, low-noise detail disclosure) instead of "one status and one next action"; the manual/upload page orders primary acquisition methods first with advanced paths behind one disclosure. No source-specific React branches added.
- 2.8b Catalog entries now carry manifest-authored acquisition paths, and Add Source renders primary paths plus secondary/advanced paths through a generic source-acquisition-path renderer.
- 2.9 Add the generic coverage-assistant UI flow: instructions, pre-commit preview, commit progress, durable receipt, and next-action copy.
- 2.9a Copy/framing down-payment: manual/upload page reads as a coverage-assistant start (manifest-generated, validate-before-commit language when a validator exists, import not "first sync" CTA for owner artifacts); setup status page uses import/receipt language for
manual_uploadand drops provider-credential semantics for imports. - 2.9b Status receipt down-payment: manual-upload setup status now includes and renders a "What PDPP found" coverage preview, with copy explicitly distinguishing validation estimates from future acquisition-batch committed counts.
- 2.9c The manual/upload page now provides a two-step review/import interaction with manifest-authored accepted files, help links, validation expectations, duplicate receipt handoff, and source-neutral copy.
- 2.9d Setup status now renders generic import progress phases from existing setup-state and receipt facts: received, parsed, deduplicated, committed, indexed, and health-projected. No new lifecycle enum or source-specific branch was added.
- 2.9a Copy/framing down-payment: manual/upload page reads as a coverage-assistant start (manifest-generated, validate-before-commit language when a validator exists, import not "first sync" CTA for owner artifacts); setup status page uses import/receipt language for
- 2.10 Add acquisition lanes and recent batch receipts to source/connection detail pages so historical import, current sync, media sync, and backup import remain visible under one source.
- 2.10a The records list now surfaces the latest import receipt as a compact cue linked to setup status.
- 2.10b The source detail page now renders recent acquisition batches as source-neutral coverage lanes with durable receipt links and warning/count/date facts.
- 2.11 Add owner-journey fixtures for duplicate artifact upload, stale manual export, missing media, wrong-account artifact, parser failure, and same-stream historical-plus-current acquisition.
- 2.11a Fixture-backed tests now cover duplicate artifact upload, unsupported/parser-failure artifacts, WhatsApp missing-media warning facts, and Google Timeline stale/empty/too-large validation.
- 2.11b Wrong-source/account-report artifacts with accepted filenames now fail before commit and create no draft; true same-account matching remains explicitly impossible unless a future connector declares a verifiable identity extractor.
- 2.11c Same-stream historical-plus-current acquisition is covered by the owner-artifact plus provider-API fixture that preserves record-level acquisition provenance.
- 2.12 Support owner-artifact variant parity for WhatsApp chat exports:
.txtwithout media and.zipwith media both validate through the connector parser, with media presence recorded as coverage evidence rather than overclaimed attachment. - 2.13 Allow repeat owner-artifact imports to target an existing manual/upload source while new account/profile/device/source identities create distinct owner-facing connections.
3. Connector Pilots
- 3.1 Wire Google Maps Timeline import through acquisition batches and coverage receipts.
- 3.2 Wire WhatsApp chat-export import through the same acquisition-batch and coverage-receipt substrate.
- 3.3 Add a media-sync pilot or explicit deferred design note for WhatsApp media folder sync as
device_sync, distinct from chat exportowner_artifact. - 3.4 Add an acceptance fixture showing historical owner artifact plus current browser/API acquisition populating the same stream with preserved provenance.
4. Deferred
- 4.1 Defer cross-method automatic merge beyond explicit identity rules.
- 4.2 Defer parser-upgrade reprocessing queues until at least two owner-artifact connectors prove the batch model.
- 4.3 Defer watched-folder automation until the manual/share/upload flow is owner-accepted.
- 4.4 Defer device-backup extraction implementation until platform-specific support can be proven end-to-end.
5. Validation
- 5.1 Run focused manifest/runtime tests after implementation.
- 5.2 Run owner-journey acceptance checks for Add Source, upload validation, coverage receipt, stale manual source, and same-stream multi-acquisition.
- 5.3 Verify dashboard, CLI, and owner API read the same health/coverage projection.
- 5.4 Verify grant-scoped REST/MCP reads expose records, not owner-only acquisition diagnostics unless separately authorized.
- 5.4a Public
/v1records read regression now proves an imported record is visible whileacquisition_coverage,import_receipt,artifact_sha256, andmedia_coveragestay off the read envelope. - 5.4b MCP canonical mirror regression now proves
query_recordsreturns grant-scoped records without adapter-added owner-only acquisition diagnostics.
- 5.4a Public
- 5.5 Review the implemented UI against the SLVP screen choreography in
design.md, including cognitive-load, progressive-disclosure, copy, responsive behavior, and no developer-only instructions. - 5.6 Run fixture-backed UI checks for Google Timeline and WhatsApp-style owner artifacts before declaring the UX owner-accepted.
- 5.7 Re-run focused WhatsApp text/zip validation and manual-upload route tests for media variants, malformed zip rejection, and existing-source repeat import.