Transformers.js developers face a storage problem with cross-origin models
A developer running an automatic speech recognition pipeline on one website cannot reuse a cached model downloaded by a different website, even if both sites use the exact same Hugging Face model.
The cache challenge
Transformers.js allows web developers to run inference in the browser by creating an instance of the pipeline() function. A developer might set up an automatic speech recognition task like this:
import { pipeline } from 'https://cdn.jsdelivr.net/npm/@huggingface/transformers@4.2.0'; const asr = await pipeline( 'automatic-speech-recognition', 'Xenova/whisper-tiny.en', { device: 'webgpu' }, ); const result = await asr('jfk.wav'); console.log(result);The code specifies
Xenova/whisper-tiny.enas the model. This is a standard choice for English speech recognition and is the default model for this task in Transformers.js.Model resources
When you run the example in a browser, Transformers.js downloads and caches the necessary model resources and WebAssembly (Wasm) files. Chrome DevTools shows these files in the Cache storage section. On a page reload, the browser serves the resources from the Cache API, and the model returns results almost instantly.
However,
Xenova/whisper-tiny.enis a popular model. Many different apps might use it. If you visit a different origin running the same example, the browser must download and cache all the model resources again, even if they are byte-for-byte identical to what another site already has. In this test, the duplicate download and storage added up to 177 MB. This waste grows quickly.Wasm runtime resources
The issue worsens when you add a second pipeline, such as sentiment analysis. This task uses the
Xenova/distilbert-base-uncased-finetuned-sst-2-englishmodel by default. Transformers.js selects this automatically if you do not specify a model.const classifier = await pipeline('sentiment-analysis'); const sentiment = await classifier(result.text); pre.append('\n\n' + JSON.stringify(sentiment, null, 2));Two different AI models rely on the same 4,733 kB
ort-wasm-simd-threaded.asyncify.wasmWebAssembly runtime file from the underlying ONNX Runtime library. Open the extended demo on a different origin, and you will see in the Network tab that the Wasm runtime downloads and caches again.Even if apps do not share the same AI models, your browser makes redundant requests for shared Wasm resources you already have. The browser also caches them again, consuming space on your hard disk.
Cache isolation
AI model resources serving
AI model resources come from the Hugging Face Hub, ultimately the Hugging Face CDN. The browser requests a resource like
https://huggingface.co/Xenova/distilbert-base-uncased-finetuned-sst-2-english/resolve/main/config.json. This redirects to a final CDN URL likehttps://huggingface.co/api/resolve-cache/models/Xenova/distilbert-base-uncased-finetuned-sst-2-english/0b6928efcb76139cae2c6881d49cda67fe119f42/config.json?%2FXenova%2Fdistilbert-base-uncased-finetuned-sst-2-english%2Fresolve%2Fmain%2Fconfig.json=&etag=%223c36342ef1f74de2797d667c68c6b7b988d0b87c%22.Wasm runtime resources serving
The Wasm runtime resources are served from the jsDelivr CDN by default. The file
ort-wasm-simd-threaded.asyncify.wasmcomes fromhttps://cdn.jsdelivr.net/npm/onnxruntime-web@1.26.0-dev.20260416-b7804b056c/dist/ort-wasm-simd-threaded.asyncify.wasmat the time of writing.One might assume that if different apps serve resources from the same CDN URLs, caching should work. Browsers have not worked this way for a long time. Caches are isolated by origin to prevent timing attacks. The time a website takes to respond to HTTP requests can reveal that the browser has accessed the same resource in the past, creating security and privacy leaks. The article Gaining security and privacy by partitioning the cache explains the details.
Chrome’s implementation
Chrome caches resources using a Network Isolation Key in addition to the resource URL. The key is composed of the top-level site and the current-frame site. Consider the toy examples hosted on
https://googlechrome.github.ioandhttps://rawcdn.rawgit.net. If both use the Wasm runtime fromhttps://cdn.jsdelivr.net/npm/onnxruntime-web@1.26.0-dev.20260416-b7804b056c/dist/ort-wasm-simd-threaded.asyncify.wasm, their cache keys differ.
| Network Isolation Key | Resource URL | |
|---|---|---|
| Top-level site | Current-frame site | |
|
|
|
|
|
|
Even if the resource URLs are identical, the Network Isolation Keys do not match. There is no cache hit, leading to duplicate downloads and storage. This is the problem the Cross-Origin Storage proposal solves.
Enter the Cross-Origin Storage API
💡 Note: The Cross-Origin Storage API is an early-stage proposal that isn’t final. While the proposed API is not yet natively implemented in any browser, you don’t have to wait to experiment with it. Install the Cross-Origin Storage extension to inject the
navigator.crossOriginStoragepolyfill on all pages and test the complete flow.
The proposed Cross-Origin Storage (COS) API introduces a dedicated navigator.crossOriginStorage interface. Web apps can store and retrieve large files across origin boundaries. Identification uses a cryptographic hash rather than a URL or origin.
Identifying files by hash is the key. The ort-wasm-simd-threaded.asyncify.wasm runtime downloaded while visiting https://googlechrome.github.io is recognized as identical to the one https://rawcdn.rawgit.net requests, regardless of where either origin fetched it.
const hash = { algorithm: 'SHA-256', value: '8f434346648f6b96df89dda901c5176b10a6d83961dd3c1ac88b59b2dc327aa4', }; try { const handle = await navigator.crossOriginStorage.requestFileHandle(hash); // Cache hit! Get the file as a Blob and use it directly. const fileBlob = await handle.getFile(); } catch (err) { // Cache miss. Download from network, then store for next time. const fileBlob = await fetch('https://cdn.jsdelivr.net/.../ort-wasm-simd-threaded.asyncify.wasm') .then(r => r.blob()); const handle = await navigator.crossOriginStorage.requestFileHandle( hash, { create: true, origins: '*' }, ); const writableStream = await handle.createWritable(); await writableStream.write(fileBlob); await writableStream.close(); }If the resource is in COS, you get back a
FileSystemFileHandleto read the blob directly viagetFile(). The resultingFileinherits fromBlob. If the resource is not in COS, you fall back to the network and write the resource into COS for the next app that needs it. That next app could be your own or an unrelated one on a different origin.The API follows the File System Standard’s
FileSystemDirectoryHandle.getFileHandle()from the Origin Private File System (OPFS) API. Thehashparameter acts like thenameparameter in OPFS, uniquely identifying a resource. Theoptions.createflag behaves similarly: absent orfalsefor read-only access,truewhen you intend to write.Control who can read what
Not every resource should be globally shared. COS gives developers precise control over visibility through the
originsoption when storing a file.
- Setting
origins: '*'makes a file globally available. Any origin can find it by hash. This suits AI model resources or the Wasm runtime in the Transformers.js example, where every web app benefits from a single cached copy. - Passing a specific list of origins, like
origins: ['https://write.example.com', 'https://calculate.example.com'], restricts access to those sites. This works for proprietary resources shared across a company’s properties that should not be discoverable by others, such as a proprietary proofreading AI model used in a commercial office suite. - Omitting
originsentirely makes the file available only to same-site origins. This is a sensible default for resources shared across an organization’s subdomainsSource Read original →




