“`html
Perplexity Open-Sources Bumblebee: A Read-Only Supply-Chain Scanner for Developer Endpoints
Attackers are increasingly targeting the packages, editor extensions, and AI tool configs on developer machines rather than just production systems. Perplexity has open-sourced an internal tool it uses to address this issue.
Problem That Bumblebee Solves
If you’re a software engineer or data scientist, you likely have dozens of packages installed locally. You may also have editor extensions, browser add-ons, and possibly MCP (Model Context Protocol) configs on your machine. When a new vulnerability is discovered, your security team must quickly answer: which developer machines are exposed right now?
Existing tools don’t fully address this question. Software Bills of Materials (SBOMs) and vulnerability scanners focus on build artifacts and repositories. Endpoint Detection and Response (EDR) products track processes that ran or touched the network. Neither checks local developer state—the lockfiles, package metadata, extension manifests, and AI tool configs scattered across a laptop’s filesystem.
Bumblebee fills this gap by answering which machines show a match in their on-disk metadata for any named package, extension, or version. The ecosystem scope was deliberate: it maps to recent active supply-chain campaigns like the Mini Shai-Hulud series that hit npm, PyPI, RubyGems, Go modules, and Composer packages across companies including TanStack, SAP, and Zapier.
How Bumblebee Works
Bumblebee is a one-shot scanner. Each invocation performs a single scan and exits. The cadence of scans is the operator’s responsibility—cron jobs, launchd tasks, systemd services, or MDM fleet tools can be used to schedule them. It outputs structured records as NDJSON (newline-delimited JSON), with diagnostics going to stderr.
Bumblebee supports three scan profiles: baseline, which scans common global and user package roots, language toolchains, editor extensions, browser extensions, and MCP configs; project, targeting configured development directories like ~/code or ~/src; and deep, sweeping operator-supplied roots, typically a bare home directory during an active incident.
Internally, Perplexity uses Bumblebee within a five-step workflow. A threat signal arrives from public disclosures or third-party intel feeds. The Computer then drafts a catalog update, entering the signal as a structured entry with ecosystem, package name, and version—opening a GitHub PR with source links. A human dev reviews and merges the PR. Bumblebee then runs on endpoints with the updated catalog, and findings are shared with the security team.
What Bumblebee Scans
Bumblebee covers four surface areas that existing tools typically handle separately:
- NPM packages: Reads from npm, pnpm, Yarn, Bun, PyPI, Go modules, RubyGems, Composer. It reads lockfiles and installed package metadata directly—sources like
package-lock.json,pnpm-lock.yaml,go.sum, and*.dist-info/METADATA. The textbun.lockformat is supported, but the binary lockfile formatbun.lockbis not. - AI agent configs: Reads MCP JSON host configuration files:
mcp.json,.mcp.json,claude_desktop_config.json,mcp_config.json,mcp_settings.json,cline_mcp_settings.json, and~/.gemini/settings.jsonfor Gemini CLI. Non-JSON MCP configs such as Codexconfig.tomland Continue YAML are not parsed in v0.1. It parses these files for server inventory but does not emit environment values or key names found inenvblocks. - Editor extensions: Reads manifests from VS Code, Cursor, Windsurf, and VSCodium. For browser extensions, it covers Chromium-family browsers—Chrome, Comet, Edge, Brave, and Arc—as well as Firefox.
Why Read-Only
npm packages can carry postinstall scripts that execute automatically on npm install. A scanner that invokes npm to check exposure has already triggered the attack it was looking for. Bumblebee avoids this entirely by never running install scripts or lifecycle hooks, never invoking npm, pnpm, bun, or pip, never reading application source files, and performing no process or network monitoring. It is not an EDR.
Output and Exposure Catalog
Each package record includes the hostname, OS, architecture, ecosystem, package name, version, source file, and a confidence field:
- The confidence is
highwhen exact identity and version came from canonical metadata,mediumwhen identity is reliable but version or source is partial, andlowwhen only a config path or spec reference is found.
The tool never executes install scripts or invokes package managers. Security teams supply their own exposure catalogs—simple JSON files specifying ecosystem, package name, and affected versions. When Bumblebee finds a match, it emits a finding record including severity, catalog ID, and evidence. Each finding is fully traceable back to which catalog entry triggered it.
The repo also includes a threat_intel/ directory with maintained exposure catalogs built from public supply-chain campaign reporting.
Getting Started
Bumblebee requires Go 1.25 or later. Install with:
go install github.com/perplexityai/bumblebee/cmd/bumblebee@latest
After install, bumblebee selftest verifies the binary works correctly against embedded fixtures. The tool is licensed under Apache License 2.0. The current release is v0.1.1.
Key Takeaways
- Bumblebee is Perplexity’s open-sourced, read-only developer endpoint scanner for supply-chain exposure checks.
- It covers npm, pnpm, Yarn, Bun, PyPI, Go modules, RubyGems, Composer, MCP configs, editor extensions, and browser extensions.
- The tool supports three scan profiles:
baseline,project, anddeep. - The tool never executes install scripts or invokes package managers, preventing scan-triggered attacks.
- Built in Go with zero non-stdlib dependencies; available now on GitHub under Apache 2.0.
Check out the GitHub Repo and Technical details. Also, follow us on Twitter and join our ML SubReddit. Are you on Telegram? Now you can join us there as well.
Need to partner with us for promoting your GitHub Repo OR Hugging Face Page OR Product Release OR Webinar etc.? Connect with us here
“`
Originally published at marktechpost.com. Curated by AI Maestro.
Stay ahead of AI. Get the most important stories delivered to your inbox — no spam, no noise.




