Yandex Open-Sources YaFF: A Zero-Copy Wire Format for Protobuf With Near-Struct Read Speed

Why this matters for makers and engineers For developers building high-throughput backends, YaFF offers a practical escape from the CPU tax imposed…

By AI Maestro June 20, 2026 5 min read
Yandex Open-Sources YaFF: A Zero-Copy Wire Format for Protobuf With Near-Struct Read Speed

Why this matters for makers and engineers

For developers building high-throughput backends, YaFF offers a practical escape from the CPU tax imposed by standard Protobuf parsing. By introducing a zero-copy wire format that retains full Protobuf semantics, it allows you to access hot data with near-raw struct speeds while keeping your schema definitions unchanged. This means you can offload the parsing step entirely for read-heavy paths, introducing the format incrementally without rewriting your entire codebase.

What is YaFF?

Yandex has released YaFF, known internally as Yet Another Flat Format, under an Apache 2.0 license. It is a high-performance C++ library designed to sit alongside the existing Protobuf ecosystem. The core philosophy is simple: your .proto files remain the single source of truth; only the physical memory layout of the data changes. While the library focuses on server-side runtimes, it does not replace Protobuf. Instead, it acts as an alternative wire format that generates a compatible C++ API. Because reads bypass the parsing loop, fields are retrieved directly from the buffer. For code paths that do not require maximum speed, the library supports converting these raw buffers back into standard Protobuf messages, enabling a gradual migration strategy where you can swap the format in specific hot paths while leaving the rest of the system untouched.

The bottleneck it solves

In high-load backend environments, the act of parsing Protobuf messages can consume a significant portion of available CPU cycles, often reaching double-digit percentages. At scale, this inefficiency translates to the need for thousands of physical cores to maintain throughput. The industry standard for zero-copy alternatives is FlatBuffers, also from Google. However, FlatBuffers is not a drop-in replacement for Protobuf; it requires maintaining a separate schema and a dedicated conversion layer. The two formats are semantically incompatible, meaning migration involves duplicated schemas, conflicting evolution rules, and hand-written field converters. Many engineering teams have concluded that the operational cost outweighs the benefits. YaFF targets this specific gap by offering zero-copy reads while preserving Protobuf semantics.

Understanding the layouts

A layout defines how a message is physically stored within a memory buffer. Changing the layout alters only the physical representation, leaving the schema and generated interfaces identical. YaFF provides four distinct layouts, each trading off read speed against schema flexibility. The Fixed layout is a plain packed struct with no header and a frozen schema, ideal for small, inlined primitives. The Flat layout adds a two-byte header to support schema evolution, making it suitable for dense, hot data. The Sparse layout addresses fields via a meta table, fitting schemas that are sparse or evolve freely. The Dynamic layout is the default; it automatically selects between Flat and Sparse at runtime, using Flat when the schema permits and switching to Sparse when evolution breaks flat alignment.

LayoutRead accessPer-message overheadSchema evolutionBest for
Fixed1 read, 0 branches0 bytesFrozenSmall inlined primitives
Flat2 reads, 1 branch2 bytesRestricted (type preservation)Dense, hot data
Sparse4 reads, 2 branches6 bytesUnrestrictedSparse schemas, free evolution
Dynamic (default)Flat or Sparse at runtime2 or 6 bytesUnrestrictedGeneral application logic

Performance benchmarks

Yandex has provided a reproducible benchmark suite built with google/benchmark in a Release configuration. The following figures represent median nanoseconds per read on an AMD EPYC 7713 processor using Clang 20.1.8. Lower numbers indicate faster performance. In a hot hierarchical read scenario, the Flat Layout achieves a read time of 9.79 ns. By comparison, FlatBuffers requires 37.30 ns, and standard Protobuf requires 219.35 ns. The baseline for a raw C++ struct is 8.14 ns. Consequently, the Flat Layout reads approximately 3.8 times faster than FlatBuffers and roughly 22 times faster than Protobuf, staying within 1.2 times of the raw struct performance.

FormatRead time (ns)Slowdown vs raw struct
Raw C++ struct8.141.0×
YaFF Flat Layout9.791.2×
YaFF Sparse Layout21.232.6×
FlatBuffers37.304.6×
Protobuf219.3526.9×
Median ns per read, hierarchical / hot / no chain caching. Source: https://yaff.tech/docs/en/benchmarks/access

It is important to note that absolute timings depend on the host CPU and memory architecture, though the performance ratios between formats are expected to hold consistent across different hardware.

Compiler aliasing and code reuse

Both FlatBuffers and YaFF read fields by reinterpreting raw memory as the target type. This type-punning leaves the compiler without strong enough facts for Type-Based Alias Analysis (TBAA), causing LLVM’s alias analysis to fall back to a conservative MayAlias verdict. As a result, the compiler cannot prove that repeated accesses are safe to reuse, forcing the generation of code that re-walks the tree for calls like root.intermediate().leaf().a() even if the data has not changed. YaFF addresses this by adding specific annotations in its generated code that inform the compiler when reuse is safe. As long as memory is not modified between reads, YaFF caches the access chain internally, allowing the compiler to optimise away redundant traversals.

Where it fits: Use cases

YaFF is best suited for systems where you control both the producer and the consumer of the data. Recommendation engines and ad-serving backends are the clearest fit. Yandex reports that YaFF is currently running in their advertising recommendation system, where it delivers 10–20% CPU savings at production scale. Memory-mapped indexes are another strong use case; a host can hold tens of gigabytes of local data, and these mmap-able indexes survive service restarts without needing to re-parse. Search indexes, feature stores, and feed services share this read-heavy profile. Looking ahead, a planned Columnar Layout will target analytics and ML pipelines involving large repeated fields. Furthermore, YaFF can be more compact than FlatBuffers, which improves cache behaviour.

Getting started with the code

The read path mirrors Protobuf but omits the parsing step entirely. Integration is straightforward via CMake or Conan. The build process runs protobuf_generate() followed by yaff_generate(). The generated YaFF types reside in the protoyaff::<package> namespace. Most projects will only need to link against yaff::core and yaff::proto.

#include "feed.pb.h"     // generated by protoc
#include "feed.yaff.h"   // generated by yaff_generate()

// 1. Serialize an existing Protobuf message into a YaFF buffer.
feed::FeedResponse proto = LoadFeedResponse();
const auto buffer = yaff::Serialize<protoyaff::feed::FeedResponse>(proto);

// 2. Read fields directly from the buffer. There is no parsing step.
const auto& response = yaff::ReadMessage<protoyaff::feed::FeedResponse>(buffer.Data());
for (const auto& item : response.items()) {
    std::string_view title  = item.title();
    std::string_view author = item.author().name();  // empty if author is unset
}

// 3. Convert back to Protobuf when a consumer needs the parsed message.
feed::FeedResponse restored;
response.ParseTo(restored);

Key takeaways

  • YaFF allows engineers to achieve near-raw struct read speeds (within 1.2×) for Protobuf data, offering a 3.8× speedup over FlatBuffers in hot hierarchical scenarios.
  • The library preserves the Protobuf workflow, meaning your .proto files remain the source of truth while enabling zero-copy reads without a full system rewrite.
  • Adoption is designed to be incremental, allowing teams to drop the format into specific hot paths while using two-way conversion at the edges.
  • Real-world testing in Yandex’s ad-recommendation system shows CPU savings of 10–20% at production scale, particularly beneficial for read-heavy services like feed generators and search indexes.

Stay ahead of AI. Get the most important stories delivered to your inbox — no spam, no noise.

Name
Scroll to Top