For makers and artists building multilingual search tools, Liquid AI has released two new retrieval models: LFM2.5-ColBERT-350M and LFM2.5-Embedding-350M. Both carry 350M parameters and mark the first bidirectional additions to the LFM family, building on the LFM2.5-350M-Base checkpoint from March. They are designed for rapid multilingual and cross-lingual search across 11 languages, with a small footprint suitable for deployment almost anywhere. Both are available now on Hugging Face under the LFM Open License v1.0.
LFM2.5 Retrievers
While sharing a single backbone, the two models process text differently. LFM2.5-Embedding-350M functions as a dense bi-encoder, converting each document into a solitary vector. Choose this when you prioritise the fastest search speeds and the smallest, most affordable index.
LFM2.5-ColBERT-350M operates as a late-interaction model, turning each token into a vector rather than generating one vector per document. This allows for word-by-word matching, delivering higher accuracy and better generalisation. The trade-off is a larger index footprint. Select this when precision outweighs storage constraints. It supports a query length cap of 32 tokens and can also rerank results from a first-stage retriever without requiring an index.
Both models target short-context search, making them ideal for product catalogs, FAQ knowledge bases, and support documentation. Liquid AI positions both as a drop-in replacement for an existing RAG pipeline.
The Architecture Change: Causal to Bidirectional
Both models originate from LFM2.5-350M-Base, a mid-trained general-purpose checkpoint. Liquid AI applies a specific set of bidirectional patches to the LFM2 architecture, adapting it from a causal decoder to a bidirectional encoder.
In a standard causal setup, each token relies only on itself and preceding tokens. While suitable for left-to-right generation, this is less natural for retrieval tasks. The team replaces the causal attention mask with a bidirectional one, allowing every token to attend to both left and right context. They also render the LFM2 short convolutions non-causal, mixing local information symmetrically around each token rather than solely from the past.
This preserves the LFM2 backbone’s efficiency while generating the full-context representations retrieval demands. Each model features 17 layers: 10 convolutional, 6 attention, and 1 pooling or dense layer. Context length extends to 32,768 tokens, though documents are tuned to 512 tokens. From the shared encoder, the two models diverge only in output. Embedding uses CLS-style pooling for a single 1024-dim vector, while ColBERT maintains 128-dim per-token embeddings for MaxSim late interaction.
Training and Data
Both models follow an identical three-stage training recipe:
- Stage one involves large-scale contrastive pretraining in English.
- Stage two consists of multilingual and cross-lingual distillation from a strong teacher across all 11 languages.
- Stage three is final fine-tuning on hard-mined negatives.
The Embedding model receives slightly more cross-lingual data than ColBERT, as cross-lingual retrieval emerges more naturally in the late-interaction setup. Training data combines curated internal resources with open-source English retrieval datasets, expanded via LLM-based translation for multilingual and cross-lingual pairs.
Benchmark
Liquid AI evaluated two capabilities: multilingual retrieval using NanoBEIR and cross-lingual open-domain QA using MKQA-11. Both report results across all 11 languages: Arabic, German, English, Spanish, French, Italian, Japanese, Korean, Norwegian, Portuguese, and Swedish.
On average, both models lead their respective classes. Here are the comparison details:
| Model | Type | NanoBEIR ML (NDCG@10) | MKQA-11 (Recall@20) |
|---|---|---|---|
| LFM2.5-ColBERT-350M | late interaction | 0.605 | 0.694 |
| LFM2.5-Embedding-350M | dense | 0.577 | 0.691 |
| Qwen/Qwen3-Embedding-0.6B | dense | 0.556 | 0.638 |
| LFM2-ColBERT-350M | late interaction | 0.540 | 0.646 |
| Alibaba-NLP/gte-multilingual-base | dense | 0.528 | 0.675 |
| lightonai/GTE-ModernColBERT-v1 | late interaction | 0.489 | 0.459 |
| BAAI/bge-large-en-v1.5 | dense | 0.359 | 0.413 |
ColBERT leads on both averages. Embedding trails closely behind on MKQA-11 at 0.691. Both outperform Qwen3-Embedding-0.6B, a larger model. The new ColBERT also improves on the earlier LFM2-ColBERT-350M, rising from 0.540 to 0.605 on NanoBEIR. Liquid AI notes that NanoBEIR English tracks the more expensive full BEIR, with the two staying highly correlated and NanoBEIR scoring a near-constant ~15% higher. The research team therefore uses NanoBEIR as a practical proxy during training runs.
Latency and Edge Deployment
Liquid AI released GGUF variants for llama.cpp, enabling both models to run on CPUs, laptops, and edge devices. The figures below use a MacBook Pro M4 Max at FP16, with queries of 32 tokens and documents of 256 tokens.
| Model | Stage | Docs cached | p50 |
|---|---|---|---|
| LFM2.5-Embedding-350M | Query embedding | yes | 7.3 ms |
| LFM2.5-ColBERT-350M | Query embedding + MaxSim | yes | 8.2 ms |
| LFM2.5-ColBERT-350M | Query + Doc embedding + MaxSim | no | 34.3 ms |
When document embeddings are pre-computed, median (p50) query latency stays under 10 ms. Encoding documents at query time pushes ColBERT to 34.3 ms. For enterprise scale, Liquid AI also built an internal GPU stack. On an H100 at FP16, it observes latencies as low as 1 ms, with embedding query latency at 1.5 ms p50.
Use Cases With Examples
- E-commerce: Search a product catalog across many languages with one index. A shopper types a Korean query and the system surfaces an English product listing. Cross-lingual retrieval makes this work without per-language indexes.
- FAQ and support knowledge bases: Retrieve the right answer reliably across customer-facing surfaces. A French support question maps to an English help article.
- On-device semantic search: Search files, emails, and notes locally on consumer hardware. The GGUF build keeps data on the device at near-zero cost.
- Enterprise knowledge assistants: Retrieve internal legal, financial, and technical documents across languages. ColBERT suits this when answer accuracy outranks index size.
Code: Getting Started
The Embedding model runs through sentence-transformers. Always pass the asymmetric prompts, query: and document:. Omitting them silently degrades retrieval quality.
Source Read original →Stay ahead of AI. Get the most important stories delivered to your inbox — no spam, no noise.




