Introducing the Ettin Reranker Family

Disclosure: Some links in this article are affiliate links. AI Maestro may earn a commission if you make a purchase, at no…

By AI Maestro May 19, 2026 5 min read
Introducing the Ettin Reranker Family


Introducing the Ettin Reranker Family

TL;DR

Today I’m releasing six new Sentence Transformers CrossEncoder rerankers, state-of-the-art at their respective sizes, built on top of the Ettin ModernBERT encoders, together with the data and full training recipe that produced them:

  • cross-encoder/ettin-reranker-17m-v1
  • cross-encoder/ettin-reranker-32m-v1
  • cross-encoder/ettin-reranker-68m-v1
  • cross-encoder/ettin-reranker-150m-v1
  • cross-encoder/ettin-reranker-400m-v1
  • cross-encoder/ettin-reranker-1b-v1

The models were trained with a distillation recipe: pointwise MSE on

mixedbread-ai/mxbai-rerank-large-v2

scores over

cross-encoder/ettin-reranker-v1-data

, which is a subset of

lightonai/embeddings-pre-training

mixed with a reranked subset of

lightonai/embeddings-fine-tuning

.

We pair our six rerankers with

google/embeddinggemma-300m

on MTEB(eng, v2) Retrieval. See Results for five more embedder pairings.

I bootstrapped the training recipe below with the new

train-sentence-transformers

Agent Skill shipped in Sentence Transformers v5.5.0. Install it with

hf skills add train-sentence-transformers [--global] [--claude]

and ask your AI coding agent (Claude Code, Codex, Cursor, Gemini CLI, …) to fine-tune a

SentenceTransformer

,

CrossEncoder

, or

SparseEncoder

model on your data.

Table of contents

  • What is a reranker, and why pair one with an embedder?
  • Usage
  • Architecture Details
  • Results
  • Training
  • Conclusion
  • Acknowledgements

What is a reranker, and why pair one with an embedder?

A reranker (a.k.a. pointwise cross-encoder) is a neural model that takes a

(query, document)

pair and outputs a single relevance score. Unlike an embedding model, which encodes the query and document separately and computes their similarity from the two embedding vectors, a reranker lets the two texts attend to each other through every transformer layer. That joint encoding is more accurate but also more expensive: the model has to be run once per

(query, document)

pair rather than once per text.

Because cross-encoders are too expensive to run over a full corpus, the common production pattern is retrieve-then-rerank: a fast embedding model retrieves the top-K candidates (cheap), then a cross-encoder re-orders just those K with high accuracy. The total cost stays bounded while the final ranking is much closer to what an exhaustive cross-encoder pass would produce.

Throughout this blogpost I’ll use “reranker” and “cross-encoder” interchangeably.

Usage

The released models are normal Sentence Transformers

CrossEncoder

models, so you can use them with just 3 lines of code:

from sentence_transformers import CrossEncoder

model = CrossEncoder("cross-encoder/ettin-reranker-32m-v1")
scores = model.predict([
    ("Where was Apple founded?", "Apple Inc. was founded in Cupertino, California in 1976 by Steve Jobs, Steve Wozniak, and Ronald Wayne."),
    ("Where was Apple founded?", "The Fuji apple is an apple cultivar developed in the late 1930s.")
])
print(scores)
# [11.393298  2.968891]   <- larger means more relevant

For a query and a list of candidates, you can also use

rank

to get back sorted indices and scores:

from sentence_transformers import CrossEncoder

model = CrossEncoder("cross-encoder/ettin-reranker-32m-v1")
ranked = model.rank(
    query="Which planet is known as the Red Planet?",
    documents=[
        "Venus is often called Earth's twin because of its similar size and proximity.",
        "Mars, known for its reddish appearance, is often referred to as the Red Planet.",
        "Jupiter, the largest planet in our solar system, has a prominent red spot.",
        "Saturn, famous for its rings, is sometimes mistaken for the Red Planet."
    ],
    top_k=4,
    return_documents=True
)
for r in ranked:
    print(f"({r['score']:.2f}): {r['text']}")
# (10.82): Mars, known for its reddish appearance, is often referred to as the Red Planet.
# (9.86): Saturn, famous for its rings, is sometimes mistaken for the Red Planet.
# (8.55): Jupiter, the largest planet in our solar system, has a prominent red spot.
# (6.21): Venus is often called Earth's twin because of its similar size and proximity.

You can swap

cross-encoder/ettin-reranker-32m-v1

for any other size to trade quality for speed. All six accept up to 8K tokens of context (useful for long-document reranking) thanks to ModernBERT’s long-context pre-training.

It is recommended to install

kernels

and set

model_kwargs={"dtype": "bfloat16", "attn_implementation": "flash_attention_2"}
 for the highest throughput. See the Speed section below for more details, but in general you can expect a 1.7x-8.3x speedup over default loading depending on model size and sequence length.

End-to-end retrieve-then-rerank pipeline

A complete example with a fast embedder for retrieval and the reranker for the final ordering:

from sentence_transformers import SentenceTransformer, CrossEncoder

# Fast retrieval with a static embedder (sub-millisecond on CPU per query)
embedder = SentenceTransformer("sentence-transformers/static-retrieval-mrl-en-v1")
reranker = CrossEncoder("cross-encoder/ettin-reranker-68m-v1")

corpus = [
    "Apple Inc. was founded in Cupertino, California in 1976 by Steve Jobs, Steve Wozniak, and Ronald Wayne.",
    "The Fuji apple is an apple cultivar developed in the late 1930s."
    # ... thousands or millions more in production
]
query = "Where was Apple founded?"

# Step 1: encode + retrieve top-100
query_emb = embedder.encode_query(query, convert_to_tensor=True)
corpus_emb = embedder.encode_document(corpus, convert_to_tensor=True)
scores = embedder.similarity(query_emb, corpus_emb)[0]
top_k_idx = scores.topk(min(100, len(corpus))).indices.tolist()

# Step 2: rerank
top_k_docs = [corpus[i] for i in top_k_idx]
ranked = reranker.rank(
    query=query,
    documents=top_k_docs,
    top_k=5,
    return_documents=True
)
for r in ranked:
    print(f"({r['score']:.2f}): {r['text']}")
# (11.63): Apple Inc. was founded in Cupertino, California in 1976 by Steve Jobs, Steve Wozniak, and Ronald Wayne.
# (4.71): Steve Jobs introduced the iPhone in 2007 at Macworld.
# (1.96): The Fuji apple is an apple cultivar developed in the late 1930s.
# (1.49): Macintosh computers were sold by Apple from 1984 onward.

This is the same shape used by most modern search systems. The retriever decides what enters the funnel, the reranker decides what wins.

Architecture Details

All six rerankers share the same architecture and differ only in their backbone size. The backbone is one of the six Ettin encoders from Johns Hopkins University's Ettin suite. These are ModernBERT-style models with unpadded attention, RoPE positional encodings, GeGLU, and 2T tokens of open-license pre-training, supporting up to 8192 tokens of context.

On top of each encoder, the reranker uses a 4-module classification head that mirrors

ModernBertForSequenceClassification

but is built from Sentence Transformers' modular components. The underlying

Transformer

is a plain

AutoModel

rather than

AutoModelForSequenceClassification

, which lets us use sequence unpadding for variable-length inputs for Flash Attention 2. At medium-document sequence lengths this is a 1.7x-8.3x speedup over fp32+SDPA depending on model size (see Speed for the full benchmark):

ModelBackboneHidden sizeLayersParams (head incl.)
cross-encoder/ettin-reranker-17m-v1
jhu-clsp/ettin-encoder-17m
256717.6M
cross-encoder/ettin-reranker-32m-v1
jhu-clsp/ettin-encoder-32m
384

Stay ahead of AI. Get the most important stories delivered to your inbox — no spam, no noise.

Name
Scroll to Top