Multi-Stage Retrieval

QQL supports multi-stage retrieval pipelines using Common Table Expressions (CTEs) and PREFETCH DAGs. Each stage can have its own vector, filter, limit, and score threshold.

Basic CTE Prefetch

WITH
  dense AS (QUERY 'emergency care' USING dense LIMIT 200),
  sparse AS (QUERY 'emergency care' USING sparse LIMIT 300)
QUERY 'emergency care' FROM docs LIMIT 10
  PREFETCH (dense, sparse)
  FUSION RRF

Each CTE is a named retrieval stage. The top-level QUERY merges the prefetch results via FUSION RRF or FUSION DBSF.

Per-Prefetch Filters and Score Thresholds

Apply independent filters and score thresholds to each CTE reference at the prefetch level:

WITH
  dense AS (QUERY 'search' USING dense LIMIT 200 WHERE category = 'tech'),
  sparse AS (QUERY 'search' USING sparse LIMIT 300)
QUERY 'search' FROM docs LIMIT 10
  PREFETCH (
    dense WHERE priority = 'high' SCORE THRESHOLD 0.6,
    sparse SCORE THRESHOLD 0.3
  )
  FUSION RRF WITH (rrf_k = 20, rrf_weights = [0.6, 0.4])

Named Vector Targeting

Each CTE can target a different named vector:

WITH
  _pf0 AS (QUERY [0.1, 0.2, 0.3] USING 'mean_pooling_columns' LIMIT 100),
  _pf1 AS (QUERY [0.1, 0.2, 0.3] USING 'mean_pooling_rows' LIMIT 100)
QUERY [0.1, 0.2, 0.3] FROM pdf_retrieval USING 'original' LIMIT 10
  PREFETCH (_pf0, _pf1)

Nested CTEs (Coarse-to-Fine)

CTEs can reference other CTEs to build hierarchical pipelines:

WITH
  broad AS (
    QUERY 'emergency neurological assessment' USING dense LIMIT 500
    WHERE department = 'emergency'
  ),
  narrow AS (
    QUERY 'emergency neurological assessment' USING sparse LIMIT 100
    PREFETCH (broad)
  )
QUERY 'emergency neurological assessment' FROM clinical_docs LIMIT 5
  PREFETCH (narrow)
  FUSION RRF

Pure Fusion (No Search Target)

Fuse CTE results without a new search:

WITH
  dense AS (QUERY 'search' USING 'dense' LIMIT 100),
  sparse AS (QUERY 'search' USING 'sparse' LIMIT 100)
FUSION RRF LIMIT 10 PREFETCH (dense, sparse)

Or without explicit CTEs (inline):

FUSION RRF LIMIT 10 PREFETCH (dense, sparse)

Three-Leg Priority Retrieval

WITH
  high_priority AS (QUERY 'kubernetes deployment' USING dense LIMIT 50
    WHERE priority = 'critical' AND status = 'open'),
  general AS (QUERY 'kubernetes deployment' USING dense LIMIT 200),
  keyword AS (QUERY 'kubernetes deployment' USING sparse LIMIT 200)
QUERY 'kubernetes deployment' FROM incidents LIMIT 10
  PREFETCH (
    high_priority SCORE THRESHOLD 0.7,
    general SCORE THRESHOLD 0.4,
    keyword SCORE THRESHOLD 0.3
  )
  FUSION RRF
  WITH (rrf_k = 30, rrf_weights = [0.5, 0.3, 0.2])

Effect: Critical incidents get 50% of the RRF weight, general dense 30%, keyword 20%.

Tiered Retrieval (Coarse → Fine)

Nested CTE prefetch for coarse-to-fine retrieval — broad dense first pass, narrow sparse second pass:

WITH
  broad AS (
    QUERY 'emergency neurological assessment' USING dense LIMIT 500
    WHERE department = 'emergency'
  ),
  narrow AS (
    QUERY 'emergency neurological assessment' USING sparse LIMIT 100
    PREFETCH (broad)
  )
QUERY 'emergency neurological assessment' FROM clinical_docs LIMIT 5
  PREFETCH (narrow)
  FUSION RRF

Pipeline

broad (dense, 500 candidates, emergency dept)
  ↓
narrow (sparse, 100 results scoped to broad)
  ↓
FUSION RRF (final 5)

Decision	Why
`broad` retrieves 500 dense candidates	Wide semantic coverage
`narrow` sparse search inside `broad`	Keyword precision within semantic neighborhood
Top-level fuses only `narrow`	2-stage: dense broad → sparse narrow → RRF

Syntax Reference

WITH
  <name> AS (QUERY <target> [USING <vector>] [LIMIT <n>] [WHERE <filter>] [PREFETCH (<name>)]),
  <name> AS (...)
QUERY '<text>' FROM <collection> [USING '<vector>'] LIMIT <n>
  PREFETCH (
    <name> [WHERE <filter>] [SCORE THRESHOLD <float>],
    <name> [SCORE THRESHOLD <float>]
  )
  FUSION <RRF | DBSF>
  [WITH (rrf_k = <n>, rrf_weights = [<f>, <f>])]