Skip to content

Multivector (ColBERT / ColPali)

Multivector collections store token-level vector representations. Late interaction models like ColBERT and ColPali compute similarity at the token level using max_sim, enabling fine-grained reranking.

CREATE COLLECTION pdf_retrieval (
original VECTOR(128, COSINE) WITH MULTIVECTOR (comparator = 'max_sim') WITH HNSW (m = 0),
mean_pooling_columns VECTOR(128, COSINE) WITH MULTIVECTOR (comparator = 'max_sim'),
mean_pooling_rows VECTOR(128, COSINE) WITH MULTIVECTOR (comparator = 'max_sim')
)
CREATE COLLECTION docs (
dense VECTOR(384, COSINE),
colbert VECTOR(128, COSINE) WITH MULTIVECTOR (comparator = 'max_sim') WITH HNSW (m = 0)
)
  • dense — used for fast first-stage ANN search
  • colbert — used for accurate late-interaction reranking (HNSW (m = 0) = no index, pure reranking)

Provide pre-computed vectors using the vector key:

-- Single multivector
INSERT INTO docs VALUES {
'id': 1,
'text': 'Qdrant vector database',
'vector': {'dense': [0.1, 0.2, 0.3], 'colbert': [[0.1, 0.2], [0.3, 0.4]]}
}
-- PDF retrieval — three named vectors
INSERT INTO pdf_retrieval VALUES {
'id': 1,
'vector': {
'original': [[0.1, 0.2], [0.3, 0.4]],
'mean_pooling_columns': [[0.1, 0.2]],
'mean_pooling_rows': [[0.3, 0.4]]
}
}

The vector value is a map of named_vector → value:

  • 1D array [...] — dense vector
  • 2D array [[...], [...]] — multivector (one sub-array per token)

The recommended ColBERT / ColPali retrieval pattern: fast first stage with mean-pooled vectors, accurate reranking with original multivectors.

WITH
_pf0 AS (QUERY [0.1, 0.2, 0.3] USING 'mean_pooling_columns' LIMIT 100),
_pf1 AS (QUERY [0.1, 0.2, 0.3] USING 'mean_pooling_rows' LIMIT 100)
QUERY [0.1, 0.2, 0.3] FROM pdf_retrieval USING 'original' LIMIT 10
PREFETCH (_pf0, _pf1)
-- 1. Create collection
CREATE COLLECTION colbert_docs (
dense VECTOR(384, COSINE),
colbert VECTOR(128, COSINE) WITH MULTIVECTOR (comparator = 'max_sim') WITH HNSW (m = 0)
)
-- 2. Insert with pre-computed vectors
INSERT INTO colbert_docs VALUES {
'id': 1,
'text': 'vector search with late interaction',
'vector': {
'dense': [0.1, 0.2, 0.3],
'colbert': [[0.1, 0.2], [0.3, 0.4], [0.5, 0.6]]
}
}
-- 3. Two-stage retrieval
WITH
first_stage AS (QUERY [0.1, 0.2, 0.3] USING 'dense' LIMIT 100)
QUERY [0.1, 0.2, 0.3] FROM colbert_docs USING 'colbert' LIMIT 10
PREFETCH (first_stage)

Use EXPLAIN to verify your multivector setup:

EXPLAIN a multivector query
qql-go explain "WITH _pf0 AS (QUERY [0.1, 0.2] USING 'dense' LIMIT 100) QUERY [0.1, 0.2] FROM docs USING 'colbert' LIMIT 10 PREFETCH (_pf0)"

The explain output will show USING: colbert, CTEs: [_pf0], and PREFETCH REFS: [_pf0].

UPDATE docs SET VECTOR 'colbert' = [[0.1, 0.2], [0.3, 0.4]] WHERE id = 1