The
hf
command-line interface is the official terminal entrypoint to the Hugging Face Hub. Any operation available through the Python SDK—managing models, datasets, and Spaces; handling repositories, branches, and pull requests; executing Jobs; or configuring Buckets and Inference Endpoints—is accessible directly from your shell. While originally designed for human developers, the tool is now increasingly driven by coding agents like Claude Code, Codex, and Cursor. We recently rebuilt the CLI to serve both audiences simultaneously. Our findings indicate that for complex, multi-step workflows, agents relying on raw
curl
or the Python SDK consume up to six times more tokens than those using the
hf
CLI.
Tracking AI agent traffic on the Hub
We began monitoring agent usage in April 2026. The
hf
CLI and its underlying
huggingface_hub
SDK detect when a coding agent is in control by reading specific environment variables. These include
CLAUDECODE
or
CLAUDE_CODE
for Claude Code,
CODEX_SANDBOX
for Codex, plus flags for Cursor, Gemini, and Pi, alongside the universal
AI_AGENT
tag. This signal serves a dual purpose: it adapts the CLI’s output format and tags every Hub request with an
agent/<name>
user-agent header, allowing us to attribute traffic accurately.
Claude Code and Codex currently lead in distinct user count, significantly outpacing other agents. Claude Code alone accounts for roughly 40,000 users and nearly 49 million requests, with Codex trailing closely. Although these are early figures from our April 2026 tracking start, the volume is already substantial. We anticipate growth as coding agents become the standard method for interacting with the Hub.
Optimising for humans and agents
Humans and coding agents require fundamentally different outputs for identical commands. Humans expect rich terminal rendering: ANSI colours, truncated tables fitting the screen width, success indicators like
✔
, and prose hints. Agents require the opposite: no ANSI codes, zero truncation, and full data density to minimise token usage. Agents cannot answer interactive prompts and will happily re-run commands after timeouts. The
hf
CLI now adapts to these needs, introducing agent-mode output in version 1.9.0 and gradually migrating other features.
One command, multiple renderings
When
hf
auto-detects agent usage via environment variables, it renders the same command differently, optimising the format for the user type without requiring a flag:
Human output (default): An aligned table, truncated to fit the terminal, accompanied by a hint. It uses colour cues for status, such as a green
✓for success.
Agent output (auto-detected): A complete record in TSV format. It includes full repository IDs, ISO timestamps, and every tag. Nothing is truncated, and there are no ANSI codes, making it clean for parsing and light on tokens.
We have implemented logging methods such as
.table(...)
,
.result(...)
, and
.json()
that handle formatting based on raw data. Beyond human and agent modes, we added
--json
and
--quiet
options to facilitate piping. While the default mode is context-aware, users can force a specific format using
--format human | agent | json | quiet
.
Next-command hints
CLI commands rarely run in isolation; one step usually implies the next. Many
hf
commands now conclude with a hint: the exact next command, pre-filled with the IDs just used. This allows users or agents to chain steps without deriving parameters from scratch. Starting a Job in the background points to its logs; creating a Space indicates its boot status.
For example, running a detached Job provides a hint to fetch logs using the generated ID. Errors behave similarly, suggesting the fix rather than simply failing. For instance, a missing authentication prompts the user to run
hf auth login.
These hints, warnings, and errors are sent to
stderr
, while data goes to
stdout
. This ensures guidance does not pollute the output stream that agents are parsing.
Non-blocking and safe to retry
The
hf
CLI never waits on an interactive prompt that an agent cannot answer. Destructive commands still require human confirmation, but in agent mode, they fail fast with a suggested fix, such as
Use --yes to skip confirmation.
The
-y
or
--yes
flags bypass this check. Furthermore, operations are designed to be safe to repeat if an agent retries on timeout or lost context. Commands like
hf repos create --exist-ok
act as no-ops if the repository already exists, and re-running an upload cleanly re-commits. Data-moving commands support a
--dry-run
flag to preview transfers before execution, preventing blind syncs or unnecessary downloads.
Discoverable, predictable commands
The
hf
CLI is built to be probed. Running
hf
displays resource groups, and
--help
on any command provides real, copy-pasteable examples that agents can match against faster than parsing descriptions. The command tree is consistent, using resource-plus-verb structures with obvious aliases (e.g.,
hf models ls
,
hf repos create
,
hf jobs ps
). This consistency allows agents to generalise once they learn one command. Output is also composable:
-q
prints one ID per line for piping, while
--json
produces output suitable for
jq
.
Benchmarking the hf CLI for Coding Agents
To verify efficiency, we constructed an evaluation harness running identical Hub tasks through different interfaces. The headline result is clear: the
hf
CLI outperforms other methods, particularly on complex, multi-step tasks where token usage drops significantly.
| agent | tool | success score | token usage | self-report error |
|---|---|---|---|---|
| Claude Code (Sonnet 4.6) | hf CLI | 0.94 | baseline | 2 / 163 |
| Claude Code (Sonnet 4.6) | curl / Python SDK | 0.84 | 1.3-1.6× tokens | 11 / 163 |
| Codex (GPT-5.5) | hf CLI | 0.93 | baseline | 3 / 163 |
| Codex (GPT-5.5) | curl / Python SDK | 0.92 | 1.6-1.8× tokens | 10 / 163 |
“Self-report error” indicates cases where the agent claimed success but the Hub reported failure. The
hf
CLI rows represent the CLI with its agent skill installed. The reduction in tool calls provided by this skill is detailed in the skill section below. Representative transcripts are available in our public bucket.
The setup
We defined 18 non-trivial Hub tasks, moving beyond simple file downloads to realistic workflows: aggregating models from a trending organisation, inspecting repository files and sizes, uploading folders with include/exclude rules, deleting files, copying files across repos, opening PRs with licenses, creating repos with branches and tags, syncing and pruning buckets, and building collections. Each task was assigned to a fresh coding agent with exactly one way to interact with the Hub:
- The
hf
CLI
- Raw
curl
or the Python SDK: no
hf
CLI usage, forcing the agent to use
curl
against the REST API or the
huggingface_hub
library.
We executed the
hf
CLI against these tasks to measure performance and token efficiency.
Key takeaways
hf
CLI reduces token consumption by up to six times compared to raw
curl
or SDK usage for complex, multi-step agent tasks.
- The CLI automatically detects agent usage via environment variables and switches to a structured, non-truncated output format to save tokens.
- Features like next-command hints,
--dry-run
, and non-blocking execution make the tool robust for autonomous agent workflows.
Stay ahead of AI. Get the most important stories delivered to your inbox — no spam, no noise.




