“`html

Turbovec: A Rust Vector Index with Python Bindings

Meet Turbovec: A Rust Vector Index with Python Bindings, and Built on Google’s TurboQuant Algorithm

Vector search is fundamental for most retrieval-augmented generation (RAG) pipelines. At scale, it becomes expensive. Storing 10 million document embeddings in float32 requires 31 GB of RAM. For development teams running local or on-premise inference, this creates real constraints.

The TurboQuant Paper

TurboQuant was introduced by Google’s research team. They propose TurboQuant as a data-oblivious quantizer, achieving near-optimal distortion rates across all bit-widths and dimensions without requiring any training or passes over the data.

Most production-grade vector quantizers, including FAISS’s Product Quantization, need a codebook training step. This involves running k-means on a representative sample of vectors before indexing begins. If your corpus grows or shifts, you may need to retrain and rebuild the index entirely. TurboQuant skips this step by using an analytical property of rotated vectors.

How turbovec Quantizes Vectors

The quantization pipeline consists of four steps:

(1) Each vector is normalized. The length (norm) is stripped and stored as a single float. Every vector becomes a unit direction on a high-dimensional hypersphere.
(2) A random rotation is applied. All vectors are multiplied by the same random orthogonal matrix, resulting in each coordinate independently following a Beta distribution. In high dimensions, this converges to Gaussian N(0, 1/d). This makes the coordinate distribution predictable regardless of input data.
(3) Lloyd-Max scalar quantization is applied. Because the distribution is known analytically, the optimal bucket boundaries and centroids can be precomputed from the math alone. For 2-bit quantization, this means 4 buckets per coordinate; for 4-bit, it means 16 buckets. No data passes are needed.
(4) The quantized coordinates are bit-packed into bytes. A 1536-dimensional vector shrinks from 6,144 bytes in FP32 to 384 bytes at 2-bit. This results in a 16x compression ratio.

At search time, the query is rotated once into the same domain. Scoring happens directly against the codebook values using SIMD intrinsics — NEON on ARM and AVX-512BW on modern x86, with an AVX2 fallback — for throughput. TurboQuant achieves distortion within approximately 2.7x of the information-theoretic Shannon lower bound.

Recall and Speed: The Numbers

All benchmarks use 100K vectors, 1,000 queries, k=64, and report the median of 5 runs. For recall, turbovec compares against FAISS IndexPQ (LUT256, nbits=8, float32 LUT). Despite using a higher-precision LUT at scoring time and codebook training with k-means++, TurboQuant and FAISS are within 0–1 point at R@1 for OpenAI embeddings at d=1536 and d=3072. Both converge to 1.0 recall by k=4–8. GloVe at d=200 is harder; at that dimension, TurboQuant trails FAISS by 3–6 points at R@1, closing by k≈16–32.

On speed, ARM results (Apple M3 Max) show turbovec beating FAISS IndexPQFastScan by 12–20% across every configuration. On x86 (Intel Xeon Platinum 8481C / Sapphire Rapids, 8 vCPUs), turbovec wins every 4-bit configuration by 1–6%. It runs within ~1% of FAISS on 2-bit single-threaded. Two configurations sit slightly behind FAISS: 2-bit multi-threaded at d=1536 and d=3072. There, the inner accumulate loop is too short for unrolling amortization. FAISS’s AVX-512 VBMI path holds the edge in those two cases (2–4%).

Python API

Installation is a single command: pip install turbovec. The primary class is TurboQuantIndex, initialized with a dimension and bit width.

from turbovec import TurboQuantIndex

index = TurboQuantIndex(dim=1536, bit_width=4)
index.add(vectors)
scores, indices = index.search(query, k=10)
index.write("my_index.tq")

A second class, IdMapIndex, supports stable external uint64 IDs that survive deletes. Removal is O(1) by ID. This is useful for document stores where vectors are frequently updated or deleted.

Turbovec integrates with LangChain (pip install turbovec[langchain]), LlamaIndex (pip install turbovec[llama-index]), and Haystack (pip install turbovec[haystack]). The Rust crate is available via cargo add turbovec.

Marktechpost’s Visual Explainer

turbovec

How to Use turbovec

TurboQuant vector search — Rust + Python

01 / 07

Compresses vectors by up to 16x at 2-bit.
Beats FAISS on ARM hardware by 12–20% in speed benchmarks.
Fully local — no data egress required for querying and indexing.
Mit-licensed open-source library available via Cargo or pip install.

“`

Note: The `styles.css` and `scripts.js` files are not included in the HTML snippet as they were referenced but not provided. They would contain CSS for styling and JavaScript for interactivity, respectively.

Originally published at marktechpost.com. Curated by AI Maestro.

Stay ahead of AI. Get the most important stories delivered to your inbox — no spam, no noise.

Meet Turbovec: A Rust Vector Index with Python Bindings, and Built on Google’s TurboQuant Algorithm

Meet Turbovec: A Rust Vector Index with Python Bindings, and Built on Google’s TurboQuant Algorithm

The TurboQuant Paper

How turbovec Quantizes Vectors

Recall and Speed: The Numbers

Python API

Marktechpost’s Visual Explainer

Empowering Businesses with AI — Smart Tools, Smarter Business Decisions.

follow us

Popular Tag

Popular Post

Warelay -> OpenClaw

Anthropic will pay xAI…

Musk’s xAI is being…

The TurboQuant Paper

How turbovec Quantizes Vectors

Recall and Speed: The Numbers

Python API

Marktechpost’s Visual Explainer

More in AI News

Empowering Businesses with AI — Smart Tools, Smarter Business Decisions.

follow us

Popular Tag

Popular Post

Warelay -> OpenClaw

Anthropic will pay xAI…

Musk’s xAI is being…