Transformers.js developers face a storage problem with cross-origin models

A developer running an automatic speech recognition pipeline on one website cannot reuse a cached model downloaded by a different website, even if both sites use the exact same Hugging Face model.

The cache challenge

Transformers.js allows web developers to run inference in the browser by creating an instance of the pipeline() function. A developer might set up an automatic speech recognition task like this:

import { pipeline } from 'https://cdn.jsdelivr.net/npm/@huggingface/transformers@4.2.0';

const asr = await pipeline(
  'automatic-speech-recognition',
  'Xenova/whisper-tiny.en',
  { device: 'webgpu' },
);
const result = await asr('jfk.wav');
console.log(result);
The code specifies Xenova/whisper-tiny.en as the model. This is a standard choice for English speech recognition and is the default model for this task in Transformers.js.
Model resources
When you run the example in a browser, Transformers.js downloads and caches the necessary model resources and WebAssembly (Wasm) files. Chrome DevTools shows these files in the Cache storage section. On a page reload, the browser serves the resources from the Cache API, and the model returns results almost instantly.
However, Xenova/whisper-tiny.en is a popular model. Many different apps might use it. If you visit a different origin running the same example, the browser must download and cache all the model resources again, even if they are byte-for-byte identical to what another site already has. In this test, the duplicate download and storage added up to 177 MB. This waste grows quickly.
Wasm runtime resources
The issue worsens when you add a second pipeline, such as sentiment analysis. This task uses the Xenova/distilbert-base-uncased-finetuned-sst-2-english model by default. Transformers.js selects this automatically if you do not specify a model.
const classifier = await pipeline('sentiment-analysis');
const sentiment = await classifier(result.text);
pre.append('\n\n' + JSON.stringify(sentiment, null, 2));
Two different AI models rely on the same 4,733 kB ort-wasm-simd-threaded.asyncify.wasm WebAssembly runtime file from the underlying ONNX Runtime library. Open the extended demo on a different origin, and you will see in the Network tab that the Wasm runtime downloads and caches again.
Even if apps do not share the same AI models, your browser makes redundant requests for shared Wasm resources you already have. The browser also caches them again, consuming space on your hard disk.
Cache isolation
AI model resources serving
AI model resources come from the Hugging Face Hub, ultimately the Hugging Face CDN. The browser requests a resource like https://huggingface.co/Xenova/distilbert-base-uncased-finetuned-sst-2-english/resolve/main/config.json. This redirects to a final CDN URL like https://huggingface.co/api/resolve-cache/models/Xenova/distilbert-base-uncased-finetuned-sst-2-english/0b6928efcb76139cae2c6881d49cda67fe119f42/config.json?%2FXenova%2Fdistilbert-base-uncased-finetuned-sst-2-english%2Fresolve%2Fmain%2Fconfig.json=&etag=%223c36342ef1f74de2797d667c68c6b7b988d0b87c%22.
Wasm runtime resources serving
The Wasm runtime resources are served from the jsDelivr CDN by default. The file ort-wasm-simd-threaded.asyncify.wasm comes from https://cdn.jsdelivr.net/npm/onnxruntime-web@1.26.0-dev.20260416-b7804b056c/dist/ort-wasm-simd-threaded.asyncify.wasm at the time of writing.
One might assume that if different apps serve resources from the same CDN URLs, caching should work. Browsers have not worked this way for a long time. Caches are isolated by origin to prevent timing attacks. The time a website takes to respond to HTTP requests can reveal that the browser has accessed the same resource in the past, creating security and privacy leaks. The article Gaining security and privacy by partitioning the cache explains the details.
Chrome’s implementation
Chrome caches resources using a Network Isolation Key in addition to the resource URL. The key is composed of the top-level site and the current-frame site. Consider the toy examples hosted on https://googlechrome.github.io and https://rawcdn.rawgit.net. If both use the Wasm runtime from https://cdn.jsdelivr.net/npm/onnxruntime-web@1.26.0-dev.20260416-b7804b056c/dist/ort-wasm-simd-threaded.asyncify.wasm, their cache keys differ.
Network Isolation Key Resource URL
Top-level site Current-frame site
https://googlechrome.github.io https://googlechrome.github.io https://cdn.jsdelivr.net/npm/onnxruntime-web@1.26.0-dev.20260416-b7804b056c/dist/ort-wasm-simd-threaded.asyncify.wasm
https://rawcdn.rawgit.net https://rawcdn.rawgit.net https://cdn.jsdelivr.net/npm/onnxruntime-web@1.26.0-dev.20260416-b7804b056c/dist/ort-wasm-simd-threaded.asyncify.wasm
Even if the resource URLs are identical, the Network Isolation Keys do not match. There is no cache hit, leading to duplicate downloads and storage. This is the problem the Cross-Origin Storage proposal solves.
Enter the Cross-Origin Storage API
💡 Note: The Cross-Origin Storage API is an early-stage proposal that isn’t final. While the proposed API is not yet natively implemented in any browser, you don’t have to wait to experiment with it. Install the Cross-Origin Storage extension to inject the
navigator.crossOriginStorage
polyfill on all pages and test the complete flow.
The proposed Cross-Origin Storage (COS) API introduces a dedicated navigator.crossOriginStorage interface. Web apps can store and retrieve large files across origin boundaries. Identification uses a cryptographic hash rather than a URL or origin.
Identifying files by hash is the key. The ort-wasm-simd-threaded.asyncify.wasm runtime downloaded while visiting https://googlechrome.github.io is recognized as identical to the one https://rawcdn.rawgit.net requests, regardless of where either origin fetched it.
const hash = {
  algorithm: 'SHA-256',
  value: '8f434346648f6b96df89dda901c5176b10a6d83961dd3c1ac88b59b2dc327aa4',
};

try {
  const handle = await navigator.crossOriginStorage.requestFileHandle(hash);
  // Cache hit! Get the file as a Blob and use it directly.
  const fileBlob = await handle.getFile();
} catch (err) {
  // Cache miss. Download from network, then store for next time.
  const fileBlob = await fetch('https://cdn.jsdelivr.net/.../ort-wasm-simd-threaded.asyncify.wasm')
    .then(r => r.blob());
  const handle = await navigator.crossOriginStorage.requestFileHandle(
    hash,
    { create: true, origins: '*' },
  );
  const writableStream = await handle.createWritable();
  await writableStream.write(fileBlob);
  await writableStream.close();  
}
If the resource is in COS, you get back a FileSystemFileHandle to read the blob directly via getFile(). The resulting File inherits from Blob. If the resource is not in COS, you fall back to the network and write the resource into COS for the next app that needs it. That next app could be your own or an unrelated one on a different origin.
The API follows the File System Standard’s FileSystemDirectoryHandle.getFileHandle() from the Origin Private File System (OPFS) API. The hash parameter acts like the name parameter in OPFS, uniquely identifying a resource. The options.create flag behaves similarly: absent or false for read-only access, true when you intend to write.
Control who can read what
Not every resource should be globally shared. COS gives developers precise control over visibility through the origins option when storing a file.
Setting origins: '*' makes a file globally available. Any origin can find it by hash. This suits AI model resources or the Wasm runtime in the Transformers.js example, where every web app benefits from a single cached copy.
Passing a specific list of origins, like origins: ['https://write.example.com', 'https://calculate.example.com'], restricts access to those sites. This works for proprietary resources shared across a company’s properties that should not be discoverable by others, such as a proprietary proofreading AI model used in a commercial office suite.
Omitting origins entirely makes the file available only to same-site origins. This is a sensible default for resources shared across an organization’s subdomainsSource Read original →
The SignalThe Signal: Edition 01Read this edition →Every Friday: the one AI story that actually mattered, plus the tools worth your time.

AM
AI Maestro is an independent British AI publication. We test what we recommend, and we write it the way we would say it. More about us

Experimenting with the proposed Cross-Origin Storage API in Transformers.js

Transformers.js developers face a storage problem with cross-origin models

The cache challenge

Model resources

Wasm runtime resources

Cache isolation

AI model resources serving

Wasm runtime resources serving

Chrome’s implementation

Enter the Cross-Origin Storage API

Control who can read what

Empowering Businesses with AI: Smart Tools, Smarter Business Decisions.

follow us

Popular Tag

Popular Post

OPFS + Pyodide test…

How to Use NVIDIA…

Why corporate AI super…

Transformers.js developers face a storage problem with cross-origin models

The cache challenge

Model resources

Wasm runtime resources

Cache isolation

AI model resources serving

Wasm runtime resources serving

Chrome’s implementation

Enter the Cross-Origin Storage API

Control who can read what

More in AI Guides & Tutorials

Empowering Businesses with AI: Smart Tools, Smarter Business Decisions.

follow us

Popular Tag

Popular Post

OPFS + Pyodide test…

How to Use NVIDIA…

Why corporate AI super…