DeepInfra on Hugging Face Inference Providers ๐ฅ
We’re excited to announce that DeepInfra has joined the Hugging Face Hub as a supported Inference Provider. This integration enhances our ecosystem, providing users with more options for serverless inference.
DeepInfra is now seamlessly integrated into both the client SDKs (for JavaScript and Python) and directly on model pages within the Hugging Face Hub. This makes it easier to integrate a wide range of AI models into applications without additional setup.
As part of this integration, DeepInfra offers one of the most cost-effective pricing per token in its industry. With over 100 models available, developers can easily incorporate various AI capabilities into their projects with minimal effort.
DeepInfra supports a diverse array of model types including LLMs, text-to-image, text-to-video, and embeddings. It initially launches support for conversational tasks using popular open-weight Language Models like DeepSeek V4 Pro, Kimi-K2.6, GLM-5.1, and more.
For those interested in learning how to use DeepInfra as an Inference Provider, you can find detailed documentation on its dedicated page.
To see the full list of models supported by DeepInfra, please visit this link.
You can follow DeepInfra on Hugging Face at this link.
How it works
In the website UI
- You can set your own API keys for the Inference Providers you’ve signed up with. If no custom key is configured, requests will be routed through Hugging Face.
- Providers are ordered by user preference in both the widget and code snippets on model pages.
- Inference calls can be made either using your own API key or via a routed call to Hugging Face. The choice depends on whether you want to use a specific provider’s API keys or utilize Hugging Faceโs billing structure.
From the client SDKs
DeepInfra is available through the Hugging Face SDKsโhuggingface_hub (>= 1.11.2) for Python and @huggingface/inference for JavaScript.
From your favorite Agent Harnesses
Hugging Face Inference Providers are integrated into various Agent Harnesses, including Pi, OpenCode, Hermes Agents, OpenClaw, and others. This allows you to use DeepInfra-hosted models seamlessly within your preferred tools without needing additional glue code.
From Python
import os from openai import OpenAI client = OpenAI( base_url="https://router.huggingface.co/v1", api_key=os.environ["HF_TOKEN"], ) completion = client.chat.completions.create( model="deepseek-ai/DeepSeek-V4-Pro:deepinfra", messages=[ { "role": "user", "content": "Write a Python function that returns the nth Fibonacci number using memoization." } ], ) print(completion.choices[0].message)
From JS
import { OpenAI } from "openai";
const client = new OpenAI({
baseURL: "https://router.huggingface.co/v1",
apiKey: process.env.HF_TOKEN,
});
const chatCompletion = await client.chat.completions.create(
model: "deepseek-ai/DeepSeek-V4-Pro:deepinfra",
messages: [
{
role: "user",
content: "Write a Python function that returns the nth Fibonacci number using memoization.",
},
],
);
console.log(chatCompletion.choices[0].message);
Billing
For direct requests, you are billed by the corresponding Inference Provider. For routed requests authenticated via Hugging Face, you only pay the standard API rates for the provider.
We provide free inference with a small quota for signed-in users, but we recommend upgrading to our PRO plan for more benefits like additional credits and higher limits.
Feedback and next steps
We would love your feedback on this integration! Please share your thoughts or comments via the discussion space: HuggingDiscussions.
Key Takeaways
- DeepInfra has joined the Hugging Face Hub as a supported Inference Provider.
- The integration allows seamless use of various AI models via different SDKs and in-browser tools.
- This provides cost-effective options for developers looking to incorporate AI capabilities into their applications.
Stay ahead of AI. Get the most important stories delivered to your inbox — no spam, no noise.




