“`html
Meet Turbovec: A Rust Vector Index with Python Bindings, and Built on Google’s TurboQuant Algorithm
Vector search is fundamental for most retrieval-augmented generation (RAG) pipelines. At scale, it becomes expensive. Storing 10 million document embeddings in float32 requires 31 GB of RAM. For development teams running local or on-premise inference, this creates real constraints.
The TurboQuant Paper
TurboQuant was introduced by Google’s research team. They propose TurboQuant as a data-oblivious quantizer, achieving near-optimal distortion rates across all bit-widths and dimensions without requiring any training or passes over the data.
Most production-grade vector quantizers, including FAISS’s Product Quantization, need a codebook training step. This involves running k-means on a representative sample of vectors before indexing begins. If your corpus grows or shifts, you may need to retrain and rebuild the index entirely. TurboQuant skips this step by using an analytical property of rotated vectors.
How turbovec Quantizes Vectors
The quantization pipeline consists of four steps:
- (1) Each vector is normalized. The length (norm) is stripped and stored as a single float. Every vector becomes a unit direction on a high-dimensional hypersphere.
- (2) A random rotation is applied. All vectors are multiplied by the same random orthogonal matrix, resulting in each coordinate independently following a Beta distribution. In high dimensions, this converges to Gaussian N(0, 1/d). This makes the coordinate distribution predictable regardless of input data.
- (3) Lloyd-Max scalar quantization is applied. Because the distribution is known analytically, the optimal bucket boundaries and centroids can be precomputed from the math alone. For 2-bit quantization, this means 4 buckets per coordinate; for 4-bit, it means 16 buckets. No data passes are needed.
- (4) The quantized coordinates are bit-packed into bytes. A 1536-dimensional vector shrinks from 6,144 bytes in FP32 to 384 bytes at 2-bit. This results in a 16x compression ratio.
At search time, the query is rotated once into the same domain. Scoring happens directly against the codebook values using SIMD intrinsics — NEON on ARM and AVX-512BW on modern x86, with an AVX2 fallback — for throughput. TurboQuant achieves distortion within approximately 2.7x of the information-theoretic Shannon lower bound.
Recall and Speed: The Numbers
All benchmarks use 100K vectors, 1,000 queries, k=64, and report the median of 5 runs. For recall, turbovec compares against FAISS IndexPQ (LUT256, nbits=8, float32 LUT). Despite using a higher-precision LUT at scoring time and codebook training with k-means++, TurboQuant and FAISS are within 0–1 point at R@1 for OpenAI embeddings at d=1536 and d=3072. Both converge to 1.0 recall by k=4–8. GloVe at d=200 is harder; at that dimension, TurboQuant trails FAISS by 3–6 points at R@1, closing by k≈16–32.
On speed, ARM results (Apple M3 Max) show turbovec beating FAISS IndexPQFastScan by 12–20% across every configuration. On x86 (Intel Xeon Platinum 8481C / Sapphire Rapids, 8 vCPUs), turbovec wins every 4-bit configuration by 1–6%. It runs within ~1% of FAISS on 2-bit single-threaded. Two configurations sit slightly behind FAISS: 2-bit multi-threaded at d=1536 and d=3072. There, the inner accumulate loop is too short for unrolling amortization. FAISS’s AVX-512 VBMI path holds the edge in those two cases (2–4%).
Python API
Installation is a single command: pip install turbovec. The primary class is TurboQuantIndex, initialized with a dimension and bit width.
from turbovec import TurboQuantIndex
index = TurboQuantIndex(dim=1536, bit_width=4)
index.add(vectors)
scores, indices = index.search(query, k=10)
index.write("my_index.tq")A second class, IdMapIndex, supports stable external uint64 IDs that survive deletes. Removal is O(1) by ID. This is useful for document stores where vectors are frequently updated or deleted.
Turbovec integrates with LangChain (pip install turbovec[langchain]), LlamaIndex (pip install turbovec[llama-index]), and Haystack (pip install turbovec[haystack]). The Rust crate is available via cargo add turbovec.
Marktechpost’s Visual Explainer
01 / 07
“`
Note: The `styles.css` and `scripts.js` files are not included in the HTML snippet as they were referenced but not provided. They would contain CSS for styling and JavaScript for interactivity, respectively.
Originally published at marktechpost.com. Curated by AI Maestro.
Stay ahead of AI. Get the most important stories delivered to your inbox — no spam, no noise.




