Proposal: migrate-postgres-semantic-index-to-pgvector
Artifacts
Official change artifacts tracked under openspec/.
The Postgres semantic-search path stores embeddings in semanticsearchblob.embedding as JSONB (384-dim float arrays, roughly 4.8 KB/row versus roughly 1.5 KB as a pgvector vector) and answers queries by SELECTing candidate rows and brute-force cosine-scoring them in JavaScript (postgres-search.js postgresSemanticSearch). The live deployment already runs the pgvector/pgvector:pg16 image, so the vector extension is available but unused. At the live table size (~1.85M rows / ~10 GB) the JSONB representation wastes roughly 3× the storage and the brute-force read path ships every candidate embedding over the wire to score it in JS — worse, the candidate SELECT carries a bare LIMIT with no ordering, so on scopes larger than the per-connector overscan the JS pass scores an arbitrary candidate subset rather than the true nearest neighbors.
semanticsearchblob on the Postgres backend stores one embedding per (connectorinstanceid, scopekey, recordkey) as a JSONB float array. postgresSemanticSearch SELECTs candidate rows (bare LIMIT, no ordering), parses each JSONB array, and computes cosine distance in JS, then sorts and slices. The live deployment runs pgvector/pgvector:pg16 (extension available, unused) with ~1.85M rows / ~10 GB in this table.
Affected capabilities
Capability specs this change proposes to modify.