Introducing the Ettin Reranker Family

TL;DR

Today I’m releasing six new Sentence Transformers CrossEncoder rerankers, state-of-the-art at their respective sizes, built on top of the Ettin ModernBERT encoders, together with the data and full training recipe that produced them:

```
cross-encoder/ettin-reranker-17m-v1
```
```
cross-encoder/ettin-reranker-32m-v1
```
```
cross-encoder/ettin-reranker-68m-v1
```
```
cross-encoder/ettin-reranker-150m-v1
```
```
cross-encoder/ettin-reranker-400m-v1
```
```
cross-encoder/ettin-reranker-1b-v1
```

The models were trained with a distillation recipe: pointwise MSE on

mixedbread-ai/mxbai-rerank-large-v2

scores over

cross-encoder/ettin-reranker-v1-data

, which is a subset of

lightonai/embeddings-pre-training

mixed with a reranked subset of

lightonai/embeddings-fine-tuning

We pair our six rerankers with

google/embeddinggemma-300m

on MTEB(eng, v2) Retrieval. See Results for five more embedder pairings.

I bootstrapped the training recipe below with the new
train-sentence-transformers
Agent Skill shipped in Sentence Transformers v5.5.0. Install it with
hf skills add train-sentence-transformers [--global] [--claude]
and ask your AI coding agent (Claude Code, Codex, Cursor, Gemini CLI, …) to fine-tune a
SentenceTransformer
,
CrossEncoder
, or
SparseEncoder
model on your data.

What is a reranker, and why pair one with an embedder?
Usage
Architecture Details
Results
Training
Conclusion
Acknowledgements

What is a reranker, and why pair one with an embedder?

A reranker (a.k.a. pointwise cross-encoder) is a neural model that takes a

(query, document)

pair and outputs a single relevance score. Unlike an embedding model, which encodes the query and document separately and computes their similarity from the two embedding vectors, a reranker lets the two texts attend to each other through every transformer layer. That joint encoding is more accurate but also more expensive: the model has to be run once per

(query, document)

pair rather than once per text.

Because cross-encoders are too expensive to run over a full corpus, the common production pattern is retrieve-then-rerank: a fast embedding model retrieves the top-K candidates (cheap), then a cross-encoder re-orders just those K with high accuracy. The total cost stays bounded while the final ranking is much closer to what an exhaustive cross-encoder pass would produce.

Throughout this blogpost I’ll use “reranker” and “cross-encoder” interchangeably.

Usage

The released models are normal Sentence Transformers

CrossEncoder

models, so you can use them with just 3 lines of code:

from sentence_transformers import CrossEncoder

model = CrossEncoder("cross-encoder/ettin-reranker-32m-v1")
scores = model.predict([
    ("Where was Apple founded?", "Apple Inc. was founded in Cupertino, California in 1976 by Steve Jobs, Steve Wozniak, and Ronald Wayne."),
    ("Where was Apple founded?", "The Fuji apple is an apple cultivar developed in the late 1930s.")
])
print(scores)
# [11.393298  2.968891]   <- larger means more relevant

For a query and a list of candidates, you can also use

rank

to get back sorted indices and scores:

from sentence_transformers import CrossEncoder

model = CrossEncoder("cross-encoder/ettin-reranker-32m-v1")
ranked = model.rank(
    query="Which planet is known as the Red Planet?",
    documents=[
        "Venus is often called Earth's twin because of its similar size and proximity.",
        "Mars, known for its reddish appearance, is often referred to as the Red Planet.",
        "Jupiter, the largest planet in our solar system, has a prominent red spot.",
        "Saturn, famous for its rings, is sometimes mistaken for the Red Planet."
    ],
    top_k=4,
    return_documents=True
)
for r in ranked:
    print(f"({r['score']:.2f}): {r['text']}")
# (10.82): Mars, known for its reddish appearance, is often referred to as the Red Planet.
# (9.86): Saturn, famous for its rings, is sometimes mistaken for the Red Planet.
# (8.55): Jupiter, the largest planet in our solar system, has a prominent red spot.
# (6.21): Venus is often called Earth's twin because of its similar size and proximity.

You can swap

cross-encoder/ettin-reranker-32m-v1

for any other size to trade quality for speed. All six accept up to 8K tokens of context (useful for long-document reranking) thanks to ModernBERT’s long-context pre-training.

It is recommended to install

kernels

and set

model_kwargs={"dtype": "bfloat16", "attn_implementation": "flash_attention_2"} for the highest throughput. See the Speed section below for more details, but in general you can expect a 1.7x-8.3x speedup over default loading depending on model size and sequence length.
End-to-end retrieve-then-rerank pipeline
A complete example with a fast embedder for retrieval and the reranker for the final ordering:
from sentence_transformers import SentenceTransformer, CrossEncoder

# Fast retrieval with a static embedder (sub-millisecond on CPU per query)
embedder = SentenceTransformer("sentence-transformers/static-retrieval-mrl-en-v1")
reranker = CrossEncoder("cross-encoder/ettin-reranker-68m-v1")

corpus = [
"Apple Inc. was founded in Cupertino, California in 1976 by Steve Jobs, Steve Wozniak, and Ronald Wayne.",
"The Fuji apple is an apple cultivar developed in the late 1930s."
# ... thousands or millions more in production
]
query = "Where was Apple founded?"

# Step 1: encode + retrieve top-100
query_emb = embedder.encode_query(query, convert_to_tensor=True)
corpus_emb = embedder.encode_document(corpus, convert_to_tensor=True)
scores = embedder.similarity(query_emb, corpus_emb)[0]
top_k_idx = scores.topk(min(100, len(corpus))).indices.tolist()

# Step 2: rerank
top_k_docs = [corpus[i] for i in top_k_idx]
ranked = reranker.rank(
query=query,
documents=top_k_docs,
top_k=5,
return_documents=True
)
for r in ranked:
print(f"({r['score']:.2f}): {r['text']}")
# (11.63): Apple Inc. was founded in Cupertino, California in 1976 by Steve Jobs, Steve Wozniak, and Ronald Wayne.
# (4.71): Steve Jobs introduced the iPhone in 2007 at Macworld.
# (1.96): The Fuji apple is an apple cultivar developed in the late 1930s.
# (1.49): Macintosh computers were sold by Apple from 1984 onward.
This is the same shape used by most modern search systems. The retriever decides what enters the funnel, the reranker decides what wins.
Architecture Details
All six rerankers share the same architecture and differ only in their backbone size. The backbone is one of the six Ettin encoders from Johns Hopkins University's Ettin suite. These are ModernBERT-style models with unpadded attention, RoPE positional encodings, GeGLU, and 2T tokens of open-license pre-training, supporting up to 8192 tokens of context.
On top of each encoder, the reranker uses a 4-module classification head that mirrors
ModernBertForSequenceClassification
but is built from Sentence Transformers' modular components. The underlying
Transformer
is a plain
AutoModel
rather than
AutoModelForSequenceClassification
, which lets us use sequence unpadding for variable-length inputs for Flash Attention 2. At medium-document sequence lengths this is a 1.7x-8.3x speedup over fp32+SDPA depending on model size (see Speed for the full benchmark):
Model Backbone Hidden size Layers Params (head incl.)
cross-encoder/ettin-reranker-17m-v1 jhu-clsp/ettin-encoder-17m 256 7 17.6M
cross-encoder/ettin-reranker-32m-v1 jhu-clsp/ettin-encoder-32m 384
Source Read original →
Stay ahead of AI. Get the most important stories delivered to your inbox — no spam, no noise.
Please enable JavaScript in your browser to complete this form.

Email Name
Name *First
Last
Email *

AI Maestro is an independent British AI publication.
We test what we recommend. More about us →
Share

Copy link
More in AI News

1
AI enthusiasts are in a race against time, AI skeptics are in a race against entropy

2
NVIDIA AI Releases Nemotron 3 Ultra: An Open 550B Mixture-of-Experts Hybrid Mamba-Transformer for Long-Running Agents

3
Kevin O’Leary agrees to downsize massive Utah data center

4
I Must Attempt to Explain the LEGO Scandal Rocking YouTube, Entire State of Utah
More in AI News

AI News AI enthusiasts are in a race against time, AI skeptics are in a race against entropy
Jun 5, 2026

AI News NVIDIA AI Releases Nemotron 3 Ultra: An Open 550B Mixture-of-Experts Hybrid Mamba-Transformer for Long-Running Agents
Jun 4, 2026

AI News Kevin O’Leary agrees to downsize massive Utah data center
Jun 4, 2026
Empowering Businesses with AI — Smart Tools, Smarter Business Decisions.
follow us
Popular Tag
AI Ethics & Society
AI for Business
AI Guides & Tutorials
AI Music
AI News
AI Research & Science
Popular Post
Mira Murati steps back…
AI enthusiasts are in…
Building a Semantic Search…
© 2026 AI Maestro · All rights reserved

Manage Consent

To provide the best experiences, we use technologies like cookies to store and/or access device information. Consenting to these technologies will allow us to process data such as browsing behaviour or unique IDs on this site. Not consenting or withdrawing consent, may adversely affect certain features and functions.

Functional

Always active

The technical storage or access is strictly necessary for the legitimate purpose of enabling the use of a specific service explicitly requested by the subscriber or user, or for the sole purpose of carrying out the transmission of a communication over an electronic communications network.

Preferences

The technical storage or access is necessary for the legitimate purpose of storing preferences that are not requested by the subscriber or user.

Statistics

The technical storage or access that is used exclusively for statistical purposes.
The technical storage or access that is used exclusively for anonymous statistical purposes. Without a subpoena, voluntary compliance on the part of your Internet Service Provider, or additional records from a third party, information stored or retrieved for this purpose alone cannot usually be used to identify you.

Marketing

The technical storage or access is required to create user profiles to send advertising, or to track the user on a website or across several websites for similar marketing purposes.
Manage options
Manage services
Manage {vendor_count} vendors
Read more about these purposes

View preferences
{title}
{title}
{title}

Scroll to Top

Introducing the Ettin Reranker Family

TL;DR

Table of contents

What is a reranker, and why pair one with an embedder?

Usage

End-to-end retrieve-then-rerank pipeline

Architecture Details

More in AI News

Empowering Businesses with AI — Smart Tools, Smarter Business Decisions.

follow us

Popular Tag

Popular Post

Mira Murati steps back…

AI enthusiasts are in…

Building a Semantic Search…