Introducing the Ettin Reranker Family
TL;DR
Today I’m releasing six new Sentence Transformers CrossEncoder rerankers, state-of-the-art at their respective sizes, built on top of the Ettin ModernBERT encoders, together with the data and full training recipe that produced them:
cross-encoder/ettin-reranker-17m-v1
cross-encoder/ettin-reranker-32m-v1
cross-encoder/ettin-reranker-68m-v1
cross-encoder/ettin-reranker-150m-v1
cross-encoder/ettin-reranker-400m-v1
cross-encoder/ettin-reranker-1b-v1
The models were trained with a distillation recipe: pointwise MSE on
mixedbread-ai/mxbai-rerank-large-v2
scores over
cross-encoder/ettin-reranker-v1-data
, which is a subset of
lightonai/embeddings-pre-training
mixed with a reranked subset of
lightonai/embeddings-fine-tuning
.
We pair our six rerankers with
google/embeddinggemma-300m
on MTEB(eng, v2) Retrieval. See Results for five more embedder pairings.
I bootstrapped the training recipe below with the new
train-sentence-transformersAgent Skill shipped in Sentence Transformers v5.5.0. Install it with
hf skills add train-sentence-transformers [--global] [--claude]and ask your AI coding agent (Claude Code, Codex, Cursor, Gemini CLI, …) to fine-tune a
SentenceTransformer,
CrossEncoder, or
SparseEncodermodel on your data.
Table of contents
- What is a reranker, and why pair one with an embedder?
- Usage
- Architecture Details
- Results
- Training
- Conclusion
- Acknowledgements
What is a reranker, and why pair one with an embedder?
A reranker (a.k.a. pointwise cross-encoder) is a neural model that takes a
(query, document)
pair and outputs a single relevance score. Unlike an embedding model, which encodes the query and document separately and computes their similarity from the two embedding vectors, a reranker lets the two texts attend to each other through every transformer layer. That joint encoding is more accurate but also more expensive: the model has to be run once per
(query, document)
pair rather than once per text.
Because cross-encoders are too expensive to run over a full corpus, the common production pattern is retrieve-then-rerank: a fast embedding model retrieves the top-K candidates (cheap), then a cross-encoder re-orders just those K with high accuracy. The total cost stays bounded while the final ranking is much closer to what an exhaustive cross-encoder pass would produce.
Throughout this blogpost I’ll use “reranker” and “cross-encoder” interchangeably.
Usage
The released models are normal Sentence Transformers
CrossEncoder
models, so you can use them with just 3 lines of code:
from sentence_transformers import CrossEncoder
model = CrossEncoder("cross-encoder/ettin-reranker-32m-v1")
scores = model.predict([
("Where was Apple founded?", "Apple Inc. was founded in Cupertino, California in 1976 by Steve Jobs, Steve Wozniak, and Ronald Wayne."),
("Where was Apple founded?", "The Fuji apple is an apple cultivar developed in the late 1930s.")
])
print(scores)
# [11.393298 2.968891] <- larger means more relevant
For a query and a list of candidates, you can also use
rank
to get back sorted indices and scores:
from sentence_transformers import CrossEncoder
model = CrossEncoder("cross-encoder/ettin-reranker-32m-v1")
ranked = model.rank(
query="Which planet is known as the Red Planet?",
documents=[
"Venus is often called Earth's twin because of its similar size and proximity.",
"Mars, known for its reddish appearance, is often referred to as the Red Planet.",
"Jupiter, the largest planet in our solar system, has a prominent red spot.",
"Saturn, famous for its rings, is sometimes mistaken for the Red Planet."
],
top_k=4,
return_documents=True
)
for r in ranked:
print(f"({r['score']:.2f}): {r['text']}")
# (10.82): Mars, known for its reddish appearance, is often referred to as the Red Planet.
# (9.86): Saturn, famous for its rings, is sometimes mistaken for the Red Planet.
# (8.55): Jupiter, the largest planet in our solar system, has a prominent red spot.
# (6.21): Venus is often called Earth's twin because of its similar size and proximity.
You can swap
cross-encoder/ettin-reranker-32m-v1
for any other size to trade quality for speed. All six accept up to 8K tokens of context (useful for long-document reranking) thanks to ModernBERT’s long-context pre-training.
It is recommended to install
kernels
and set
model_kwargs={"dtype": "bfloat16", "attn_implementation": "flash_attention_2"} for the highest throughput. See the Speed section below for more details, but in general you can expect a 1.7x-8.3x speedup over default loading depending on model size and sequence length.
End-to-end retrieve-then-rerank pipeline
A complete example with a fast embedder for retrieval and the reranker for the final ordering:
from sentence_transformers import SentenceTransformer, CrossEncoder
# Fast retrieval with a static embedder (sub-millisecond on CPU per query)
embedder = SentenceTransformer("sentence-transformers/static-retrieval-mrl-en-v1")
reranker = CrossEncoder("cross-encoder/ettin-reranker-68m-v1")
corpus = [
"Apple Inc. was founded in Cupertino, California in 1976 by Steve Jobs, Steve Wozniak, and Ronald Wayne.",
"The Fuji apple is an apple cultivar developed in the late 1930s."
# ... thousands or millions more in production
]
query = "Where was Apple founded?"
# Step 1: encode + retrieve top-100
query_emb = embedder.encode_query(query, convert_to_tensor=True)
corpus_emb = embedder.encode_document(corpus, convert_to_tensor=True)
scores = embedder.similarity(query_emb, corpus_emb)[0]
top_k_idx = scores.topk(min(100, len(corpus))).indices.tolist()
# Step 2: rerank
top_k_docs = [corpus[i] for i in top_k_idx]
ranked = reranker.rank(
query=query,
documents=top_k_docs,
top_k=5,
return_documents=True
)
for r in ranked:
print(f"({r['score']:.2f}): {r['text']}")
# (11.63): Apple Inc. was founded in Cupertino, California in 1976 by Steve Jobs, Steve Wozniak, and Ronald Wayne.
# (4.71): Steve Jobs introduced the iPhone in 2007 at Macworld.
# (1.96): The Fuji apple is an apple cultivar developed in the late 1930s.
# (1.49): Macintosh computers were sold by Apple from 1984 onward.
This is the same shape used by most modern search systems. The retriever decides what enters the funnel, the reranker decides what wins.
Architecture Details
All six rerankers share the same architecture and differ only in their backbone size. The backbone is one of the six Ettin encoders from Johns Hopkins University's Ettin suite. These are ModernBERT-style models with unpadded attention, RoPE positional encodings, GeGLU, and 2T tokens of open-license pre-training, supporting up to 8192 tokens of context.
On top of each encoder, the reranker uses a 4-module classification head that mirrors
ModernBertForSequenceClassification
but is built from Sentence Transformers' modular components. The underlying
Transformer
is a plain
AutoModel
rather than
AutoModelForSequenceClassification
, which lets us use sequence unpadding for variable-length inputs for Flash Attention 2. At medium-document sequence lengths this is a 1.7x-8.3x speedup over fp32+SDPA depending on model size (see Speed for the full benchmark):
| Model | Backbone | Hidden size | Layers | Params (head incl.) |
|---|---|---|---|---|
cross-encoder/ettin-reranker-17m-v1 | jhu-clsp/ettin-encoder-17m | 256 | 7 | 17.6M |
cross-encoder/ettin-reranker-32m-v1 | jhu-clsp/ettin-encoder-32m | 384 Source Read original → Stay ahead of AI. Get the most important stories delivered to your inbox — no spam, no noise. |




