NVIDIA has released the BioNeMo Agent Toolkit, a set of instructions that allows AI agents to call specific biomolecular models as standard tools. The update addresses a core problem in scientific computing: general coding agents cannot reliably execute wet-lab tasks or interpret complex biological data without a structured interface.
In this article
The gap between code and discovery
Current AI agents can read papers, write scripts, and call APIs. However, scientific discovery does not follow the same logic as software engineering. A hypothesis cannot be validated by a passing test suite. Research remains iterative, uncertain, and dependent on physical reality. Without a way to interact with specialised models, an agent is limited to generating text or code, not results.
What is BioNeMo Agent Toolkit
The toolkit is an open-source collection of ‘skills’ that package NVIDIA biomolecular models for direct agent use. These skills cover protein folding, molecular docking, generative chemistry, genomics, and protein design. NVIDIA structures the platform into two layers. The first is an accelerated tool layer using NVIDIA NIM (NVIDIA Inference Microservices) and libraries like cuEquivariance and Parabricks. The second layer wraps these capabilities into agent-ready interfaces.
Each skill documents the model’s purpose, inputs, parameters, and expected outputs. Model Context Protocol (MCP) server wrappers expose open models that are not yet packaged as NIMs. This setup allows an agent to discover, select, and invoke models independently. The repository organises skills into nim-skills, open-models-skills, and library-skills. A workflows folder contains multi-step meta-skills. One example is generative_protein_binder_design, which chains RFdiffusion, ProteinMPNN, and OpenFold3.
How a BioNeMo Skill Works
Every skill is a directory containing a SKILL.md file with YAML frontmatter and instructions. An agent reads this documentation to understand how to act. The prompt pattern remains consistent across models. NVIDIA’s documentation uses OpenFold3 as a reference, but the structure applies to other NIMs for biology including Boltz-2, DiffDock, GenMol, ProteinMPNN, MSA Search, RFdiffusion, and Evo 2. You provide the skill name, input, and endpoint.
Installation pulls skills via the open-source skills CLI:
# Browse and pick a skill interactively
npx skills add NVIDIA-BioNeMo/bionemo-agent-toolkit
# Or install one skill for a specific agent
npx skills add NVIDIA-BioNeMo/bionemo-agent-toolkit --skill boltz2-nim --agent claude-codeDeployment is a choice. Use hosted NIM endpoints for fast access without managing infrastructure. Move selected models to a local environment when you need lower warm latency, data locality, or repeated iteration.
Performance metrics
NVIDIA measured whether skills improved an agent’s workflow. All reported metrics came from Codex CLI running GPT-5.5 fast. The team compared the same agent with and without each skill.
Task completion was the first metric. Without skills, the agent completed 57.1% of required tasks on average. With access to NIM skills, completion reached 100%.
Efficiency was the second metric. NVIDIA counted passing assertions, the individual steps that compose a task. With skills, an agent produced 2x more passing assertions per 1,000 tokens. That gain held across all ten NIM skills tested.
Use cases with examples
- Protein structure prediction: An agent folds a peptide sequence with Boltz-2 or OpenFold3. It returns a CIF file for downstream inspection.
- Multiple sequence alignment: An agent generates an MSA with MMseqs2 through the MSA Search skill. The artifact is an A3M file.
- Generative chemistry: An agent generates candidate molecules with GenMol. Outputs arrive as SDF or SMILES for filtering.
- Protein binder design: The
generative_protein_binder_designworkflow chains three models. RFdiffusion builds a backbone, ProteinMPNN designs the sequence, and OpenFold3 validates the fold. - Each loop follows the same shape: The agent selects a model, prepares inputs, runs it, inspects outputs, and explains results with caveats.
Comparison: Agent With vs Without Skills
| Dimension | General agent (no skills) | Agent + BioNeMo Skills |
|---|---|---|
| Task completion | 57.1% average | 100% average |
| Token efficiency | Baseline | 2x passing assertions per 1k tokens |
| Model selection | Guesses tool, format, and inputs | Reads purpose, inputs, and artifacts |
| Deployment | Manual setup from source | Hosted or local NIM, documented |
| Failure handling | Unknown failure modes | Documented failure modes per skill |
| Workflows | Isolated single calls | Multi-step meta-skills (binder design) |
Getting started
The prerequisites are minimal. You need an agent runtime such as Claude or Codex. You need an NVIDIA API key for hosted BioNeMo NIM endpoints. A GPU node is optional, for local NIM deployment.
Point the agent at the repository first. Let it enumerate the available capabilities before it acts. Then hand it a single skill to operate one model.
NVIDIA flags two cautions. The build.nvidia.com endpoints are for small-scale development and testing only. They are not production-grade inference. NVIDIA also stresses validation: check low-confidence structures and filter generated molecules before trusting them.
What it means
The toolkit moves AI from generating ideas to executing steps. By defining clear inputs and outputs, it allows agents to perform tasks that require specific scientific models without guessing the correct tool or format. This reduces the need for manual oversight in repetitive discovery loops.




