This is the full developer documentation for Dreadnode
# Dreadnode Documentation
> Security engineering platform for testing, evaluating, and shipping AI systems with confidence.
import { Card, CardGrid, LinkCard } from '@astrojs/starlight/components';
## Use Cases
## Platform Overview
| Component | What it provides |
| -------------- | ---------------------------------------------------------- |
| TypeScript SDK | Programmatic access to tasks, evaluations, and experiments |
| CLI | Local developer workflows and automation |
| Platform API | First-class API surface backed by OpenAPI |
| Self-Hosting | Docker-based deployment for enterprise environments |
## For AI Agents
If you are an AI agent or tool, use [`/llms.txt`](/llms.txt) for a condensed index of the docs or [`/llms-full.txt`](/llms-full.txt) for a full text snapshot.
# Authentication
> Log in, manage provider keys, and connect your CLI to a Dreadnode server.
Use the CLI to authenticate with the Dreadnode platform and manage provider API keys for local agent runs.
## Login methods
You can authenticate with a browser-based device flow or by pasting an API key.
```bash
/login browser
```
```bash
/login apikey
```
## Provider API keys
Use the `/keys` command to store provider keys used by generators.
```bash
/keys set
```
Supported providers: `anthropic`, `openai`, `google`, `mistral`, `groq`, `custom`.
| Provider | Key format (example) |
| --------- | -------------------- |
| anthropic | `sk-ant-...` |
| openai | `sk-...` |
| google | `AIza...` |
| mistral | `mistral-...` |
| groq | `gsk_...` |
| custom | `custom-...` |
## Profiles
Profiles store credentials per server. When you log in to a server URL, the CLI creates a profile named after that host and sets it as active.
To switch profiles, log in to another server or set the profile explicitly via the `DREADNODE_PROFILE` environment variable:
```bash
DREADNODE_PROFILE=dev.app.dreadnode.io dreadnode
```
## Connect to a Dreadnode server
If you are using a self-hosted deployment, connect your CLI session to the server URL:
```bash
/connect
```
# Installation
> Install the Dreadnode CLI and TypeScript SDK so you can start building agents.
Get set up with the Dreadnode CLI and TypeScript SDK in minutes.
## System requirements
- Node.js 20+
## Install the CLI globally
The CLI gives you the `dreadnode` binary for interactive workflows and slash commands.
```bash
npm install -g @dreadnode/agents
```
Verify the installation:
```bash
dreadnode --version
```
## Install the SDK in a project
If you only need the SDK (or want per-project versioning), install the library locally:
```bash
npm install @dreadnode/agents
```
## What you get
- **CLI:** `dreadnode` for interactive workflows and authentication.
- **SDK:** `@dreadnode/agents` for building and running TypeScript agents.
# Quickstart
> Build your first Dreadnode agent in 10 minutes with the TypeScript SDK.
import { Aside } from '@astrojs/starlight/components';
**Build your first agent in 10 minutes.** This walkthrough installs the SDK, sets up a provider key, and runs a tiny agent locally.
## Step 1: Install the SDK
```bash
npm install @dreadnode/agents @ai-sdk/anthropic
```
## Step 2: Set a provider API key
Start the CLI and store a provider key (for example, Anthropic):
```bash
dreadnode
```
```bash
/keys set anthropic sk-ant-...
```
## Step 3: Create your first agent
Create a file called `quickstart.ts`:
```ts
import { anthropic } from '@ai-sdk/anthropic';
import { createAgent, createGenerator } from '@dreadnode/agents';
const generator = createGenerator(anthropic('claude-sonnet-4-20250514'));
const agent = createAgent({
name: 'quickstart-agent',
generator,
systemPrompt: 'You are a helpful assistant.',
});
async function main(): Promise {
const result = await agent.run({
input: 'Say hello and summarize what Dreadnode does in one sentence.',
});
const lastMessage = result.trajectory.lastMessage;
if (lastMessage) {
console.log(lastMessage.text);
}
}
main().catch((error) => {
console.error(error);
process.exit(1);
});
```
## Step 4: Run it
```bash
npx tsx quickstart.ts
```
## Step 5: View the output
You should see a short response printed to the terminal.
## What's next
- Explore the CLI reference: [/cli](/cli)
- Dive into the SDK reference: [/sdk](/sdk)
- Learn core concepts: [/concepts](/concepts)
# Page not found
> The documentation page you requested could not be found.
The page you’re looking for doesn’t exist. Use the navigation sidebar to find the right section.
# Capabilities
> Load capability bundles that package prompts, tools, and skills.
## What is a capability?
Capabilities bundle a system prompt, tool access, and skills into a single loadable package.
They let you swap workflows quickly without leaving the CLI.
Built-in capabilities include:
- `dreadairt` — AI red teaming workflows
- `dreadweb` — web application pentesting workflows
## Load a capability and inspect tools
```bash
$ dreadnode
dreadnode> /capabilities
Available: dreadairt, dreadweb
dreadnode> load dreadairt
Capability "dreadairt" loaded
dreadnode> /cap-tools
Tools: prompt.attack, prompt.evaluate, tool.browser
```
## Command reference
| Command | Arguments | Description |
| --------------- | -------------- | ------------------------------------------- |
| `/capabilities` | — | List available capabilities |
| `/caps` | — | Shortcut for listing capabilities |
| `load` | `` | Load a capability bundle |
| `reload` | — | Reload the active capability |
| `/cap-tools` | — | Show tools exposed by the active capability |
## Reload when developing
### Keep capability tweaks in sync
Use `reload` to refresh the active capability after editing its config or tools.
# Chat and Sessions
> Manage live conversations, save sessions, and export transcripts in the CLI.
## Manage a conversation end-to-end
```bash
$ dreadnode
dreadnode> Hello!
dreadnode> /tokens
Session tokens: 1,248 prompt / 412 completion
dreadnode> /save kickoff-demo
Saved session "kickoff-demo"
dreadnode> /clear
Conversation cleared
dreadnode> /load kickoff-demo
Loaded session "kickoff-demo"
dreadnode> /export ./kickoff-demo.md
Exported conversation to ./kickoff-demo.md
```
## Command reference
| Command | Arguments | Description |
| ----------- | ------------ | ------------------------------------------------- |
| `/clear` | — | Clear the current conversation history |
| `/compact` | `[strategy]` | Summarize the conversation to reduce context size |
| `/tokens` | — | Show token usage for the current session |
| `/export` | `[path]` | Export the current conversation to a file |
| `/save` | `[name]` | Save the current session |
| `/load` | `[name]` | Load a saved session |
| `/sessions` | — | List all saved sessions |
| `/delete` | `` | Delete a saved session |
## Tips for everyday use
### Keep context lean
Use `/compact` after long back-and-forths to keep memory tight without losing intent.
### Organize by project
Prefix session names with a project slug (for example `acme-kickoff`) so `/sessions` stays tidy.
# Models and Configuration
> Switch models, set custom endpoints, and tune CLI configuration.
## Switch models in-session
```bash
$ dreadnode
dreadnode> /model anthropic/claude-sonnet-4-20250514
Active model set to anthropic/claude-sonnet-4-20250514
dreadnode> /model openai/gpt-4.1-mini
Active model set to openai/gpt-4.1-mini
```
## Configure a custom endpoint
Use `-e` to point the CLI at an OpenAI-compatible endpoint like Ollama.
```bash
$ dreadnode -e http://localhost:11434/v1 -m openai/gpt-4.1-mini
dreadnode> /config
api_endpoint: http://localhost:11434/v1
```
## Supported model providers
| Provider | Example model |
| ---------- | -------------------------------------- |
| Anthropic | anthropic/claude-sonnet-4-20250514 |
| OpenAI | openai/gpt-4.1-mini |
| Google | google/gemini-2.0-flash |
| Mistral | mistral/mistral-small |
| OpenRouter | openrouter/anthropic/claude-3.5-sonnet |
## Command reference
| Command | Arguments | Description |
| --------- | ------------------ | --------------------------------------- |
| `/model` | `[provider/model]` | Switch the active model |
| `/config` | — | Show or edit CLI configuration |
| `-m` | `` | Select a model when starting the CLI |
| `-e` | `` | Set a custom API endpoint when starting |
## Configuration workflow
### Pick a default model
Set a default model during startup with `-m`, then keep switching in-session with `/model`.
# CLI Overview
> Learn how the Dreadnode CLI works, how to start it, and find every command in one place.
## Architecture at a glance
The Dreadnode CLI is an interactive shell. You start it once, then drive your workflow with
slash commands and short flags. Think of it as a live workspace for conversations, sandboxes,
capabilities, skills, and swarms.
## Start the CLI
```bash
$ dreadnode
Connected to Dreadnode CLI v0.x
dreadnode>
```
## Pick a model and endpoint up front
Use `-m` to select a model and `-e` to point the CLI at a custom OpenAI-compatible endpoint.
```bash
$ dreadnode -m anthropic/claude-sonnet-4-20250514
$ dreadnode -e http://localhost:11434/v1 -m openai/gpt-4.1-mini
```
## Quick reference (all commands)
### Chat & sessions
| Command | Arguments | Description |
| ----------- | ------------ | ----------------------------------------- |
| `/clear` | — | Clear the current conversation history |
| `/compact` | `[strategy]` | Summarize history to keep context lean |
| `/tokens` | — | Show token usage for the current session |
| `/export` | `[path]` | Export the current conversation to a file |
| `/save` | `[name]` | Save the current session |
| `/load` | `[name]` | Load a saved session |
| `/sessions` | — | List available saved sessions |
| `/delete` | `` | Delete a saved session |
### Models & configuration
| Command | Arguments | Description |
| --------- | ------------------ | --------------------------------------- |
| `/model` | `[provider/model]` | Switch the active model |
| `/config` | — | Show or edit CLI configuration |
| `-m` | `` | Select a model when starting the CLI |
| `-e` | `` | Set a custom API endpoint when starting |
### Server & sandboxes
| Command | Arguments | Description |
| ------------- | --------- | --------------------------------------- |
| `/connect` | `[url]` | Connect to a Dreadnode server |
| `/disconnect` | — | Disconnect from the current server |
| `/status` | — | Show connection and environment status |
| `/sandbox` | `[name]` | Provision or select a sandbox |
| `/eval` | `[suite]` | Run an evaluation in the active sandbox |
### Capabilities
| Command | Arguments | Description |
| --------------- | -------------- | ------------------------------------------- |
| `/capabilities` | — | List available capabilities |
| `/caps` | — | Shortcut for listing capabilities |
| `load` | `` | Load a capability bundle |
| `reload` | — | Reload the currently active capability |
| `/cap-tools` | — | Show tools exposed by the active capability |
### Skills
| Command | Arguments | Description |
| --------- | --------- | ------------------------- |
| `/skills` | — | List installed skills |
| `reload` | — | Reload the skill registry |
### Swarms
| Command | Arguments | Description |
| --------------- | ---------------------- | -------------------------------- |
| `/swarm` | `` | Start a swarm from a YAML config |
| `/swarm list` | — | List running swarms |
| `/swarm stop` | `` | Stop a running swarm |
| `/swarm status` | `` | Check swarm status |
| `/swarm send` | `` | Send a message to a swarm |
## Pre-1.0 note
The CLI is stable, but the project is still pre-1.0. Expect small UX refinements while the
command surface stays backwards compatible.
# Server and Sandboxes
> Connect to a Dreadnode server, provision sandboxes, and run evaluations.
## Connect → sandbox → eval
```bash
$ dreadnode
dreadnode> /connect https://api.dreadnode.io
Connected to https://api.dreadnode.io
dreadnode> /sandbox redteam-lab
Sandbox "redteam-lab" ready
dreadnode> /eval safety-smoke
Evaluation "safety-smoke" started
```
## Command reference
| Command | Arguments | Description |
| ------------- | --------- | --------------------------------------- |
| `/connect` | `[url]` | Connect to a Dreadnode server |
| `/disconnect` | — | Disconnect from the current server |
| `/status` | — | Show connection and environment status |
| `/sandbox` | `[name]` | Provision or select a sandbox |
| `/eval` | `[suite]` | Run an evaluation in the active sandbox |
## Status checks
### Confirm where you are connected
Use `/status` to confirm the server URL and active sandbox before running evaluations.
# Skills
> Discover and refresh skill packs available to the CLI.
## What are skills?
Skills are discoverable, loadable skill packs that extend what the CLI can do. They power
repeatable workflows and can be refreshed as new packs are added.
## List and reload skills
```bash
$ dreadnode
dreadnode> /skills
Installed skills: prompt.attack, prompt.evaluate, tool.browser
dreadnode> reload
Skills reloaded
```
## Command reference
| Command | Arguments | Description |
| --------- | --------- | ------------------------- |
| `/skills` | — | List installed skills |
| `reload` | — | Reload the skill registry |
## Keeping your catalog up to date
### Refresh after installs
Run `reload` whenever you add or update skill packs locally.
# Swarms
> Orchestrate multi-agent swarms from YAML configs in the CLI.
## What is a swarm?
Swarms are multi-agent orchestration runs driven by a YAML config. They coordinate
specialized agents to collaborate on a single task.
## Start a swarm and interact with it
```bash
$ dreadnode
dreadnode> /swarm ./configs/redteam-swarm.yaml
Swarm started: swarm_9f3c
dreadnode> /swarm status swarm_9f3c
Status: running (3 agents)
dreadnode> /swarm send swarm_9f3c "Focus on prompt injection vectors"
Message delivered
dreadnode> /swarm stop swarm_9f3c
Swarm stopped
```
## YAML config overview
```yaml
name: redteam-swarm
agents:
- name: lead
model: anthropic/claude-sonnet-4-20250514
- name: scout
model: openai/gpt-4.1-mini
```
## Command reference
| Command | Arguments | Description |
| --------------- | ---------------------- | -------------------------------- |
| `/swarm` | `` | Start a swarm from a YAML config |
| `/swarm list` | — | List running swarms |
| `/swarm stop` | `` | Stop a running swarm |
| `/swarm status` | `` | Check swarm status |
| `/swarm send` | `` | Send a message to a swarm |
## Learn more
### Extend swarm configs
For deeper YAML options, see the Extensibility section.
# Chat Sessions
> Understand how chat sessions capture conversations between users and agents.
Chat sessions are persistent conversation threads between a user and an agent. Each session captures the full exchange — messages, tool calls, model responses, and token usage — so you can resume, review, and export your work.
## What a session is
A session belongs to a single user within an organization and is optionally scoped to a project. It records:
- The model used for the conversation
- A running count of messages and tokens
- A timeline of events stored for replay and analysis
Sessions are created automatically when you start a conversation in the CLI or Studio. You can also save, load, and manage sessions explicitly.
## Session events
Every interaction within a session is recorded as a typed event. Events are stored in order and include metadata like the model, role, tool name, and token counts.
| Event type | What it captures |
| ---------------- | ------------------------------------------- |
| User message | Text sent by the user |
| Agent response | Model-generated reply |
| Tool call | Tool invocation and its result |
| Generation start | Beginning of a model inference call |
| Generation end | Completion of inference with token counts |
| Heartbeat | Keepalive signal during long-running events |
Events are stored in ClickHouse for efficient querying and are partitioned by month.
## Lifecycle
| Action | What happens |
| ------- | ------------------------------------------------------------ |
| Create | Session is created when you start a conversation |
| Update | Message and token counts increment as the conversation grows |
| Title | An AI-generated title is produced from the first message |
| Archive | Session is soft-deleted and hidden from listings |
| Delete | Session and all its events are permanently removed |
## Managing sessions
Sessions can be managed from the CLI or the API:
- **CLI:** Use `/save`, `/load`, `/sessions`, `/delete`, `/export`, and `/clear` during a conversation. See [Chat & Sessions](/cli/chat-and-sessions/) for the full command reference.
- **API:** Sessions are available at `GET /api/v1/user/session` and scoped to the authenticated user.
## Project scoping
Sessions can be associated with a project. When scoped to a project, sessions appear in that project's context and their events contribute to project-level analytics. Unscoped sessions live at the organization level.
# Evaluations
> Run repeatable evaluations to measure agent performance across datasets.
{/* Source: docs/domains/evaluations.md */}
Evaluations run a task function over a dataset, score the outputs, and report results so you can compare agent performance over time.
## What an evaluation is
An evaluation answers: **How well does this agent perform on a defined workload?** You provide:
- A dataset of inputs
- A task function that produces outputs
- One or more scorers to grade those outputs
The SDK’s `Evaluation` class orchestrates the run and streams progress events while the agent executes inside its sandbox.
## Core building blocks
### Configuration essentials
| Setting | What it controls | Typical use |
| ---------------------- | ------------------------------- | ------------------------------- |
| `dataset` | Items the task runs on | Benchmarks, test cases, prompts |
| `task` | Function that produces outputs | Agent call, tool workflow |
| `scorers` | How outputs are scored | Accuracy, safety, or assertions |
| `scenarios` | Parameter variations | Prompt versions or tool configs |
| `iterations` | Repeats of the full dataset | Variance and stability checks |
| `concurrency` | Parallel samples per batch | Faster runs with safe limits |
| `maxErrors` | Total errors before stop | Circuit breaker |
| `maxConsecutiveErrors` | Back-to-back errors before stop | Circuit breaker |
### Scoring guidance
Prefer built-in scorers when possible to keep results consistent. Custom scorers are supported for domain-specific grading.
## Lifecycle and execution flow
Evaluations follow a predictable loop:
1. **Configure** the evaluation (dataset, task, scorers).
2. **Execute** samples across scenarios and iterations in batches.
3. **Score** each sample and aggregate metrics.
4. **Finish** with a summary report and stop reason.
## Results and reporting
### Result hierarchy
| Level | What it contains |
| ---------- | ------------------------------------- |
| Evaluation | Overall stop reason and timing |
| Scenario | Parameter set for the run |
| Iteration | One full pass over the dataset |
| Sample | Input, output, scores, and assertions |
### Where results show up
Evaluation results stream in real time and are stored for later analysis. Use the platform UI to review summaries, pass rates, and metrics across runs.
# Projects
> Learn how projects organize agent work, sandboxes, and traces within a workspace.
Projects are the primary unit of organization on Dreadnode. Each project groups an agent's sandbox, chat sessions, traces, and configuration into a single context that you can switch between.
## What a project is
A project lives inside a workspace and represents a focused piece of work — a red team engagement, a pentesting target, an evaluation suite, or an experiment. Projects provide:
- **A sandbox** — isolated compute for the agent scoped to this project
- **Chat sessions** — conversation history between you and the agent
- **Traces** — structured telemetry from agent runs (spans, tools, model calls)
- **Secret selection** — which credentials are injected into the sandbox
- **An agent type** — the kind of agent running in this project (e.g. `dreadnode`, `dreadweb`, `dreadairt`)
## Project keys
Every project has a `key` — a URL-safe slug that uniquely identifies it within its workspace. Keys appear in URLs, API paths, and CLI output. They are immutable after creation in most contexts.
## Sandboxes and activation
Each project can have an associated sandbox. When you open a project in Studio, the platform **activates** it:
1. If another project's sandbox is running, it is paused.
2. The target project's sandbox is resumed or provisioned fresh.
3. The sandbox URL and token are returned so the agent can connect.
Only one sandbox per user runs at a time. Switching projects automatically handles the pause/resume cycle.
### Sandbox states
| State | What it means | Typical trigger |
| ------- | -------------------------------------------- | ---------------------------------- |
| Running | Active agent session is available | Provisioning or resuming a project |
| Paused | Session is idle but preserved | Inactivity timeout or manual pause |
| Killed | Session was terminated and must be recreated | Manual restart or hard timeout |
### Keepalive
Active sandboxes require periodic keepalive signals to prevent timeout. The platform UI sends these automatically. If a keepalive is missed, the sandbox is paused after the inactivity timeout.
## Traces and telemetry
Agent activity within a project is captured as traces. Each trace contains spans representing model calls, tool invocations, and agent decisions. Traces are queryable by project and are the basis for evaluation and debugging workflows.
## Secret selection
Projects store a list of `selected_secret_ids` that determine which of your secrets are injected as environment variables when the sandbox starts. Changing the selection restarts the sandbox to apply the new values. See [Secrets](/platform/secrets/) for more detail.
## Managing projects
Projects can be managed from Studio, the CLI, or the API:
- **Studio:** Create, rename, delete, and switch between projects from the sidebar.
- **API:** Projects are available at `GET /api/v1/org/{org}/ws/{workspace}/projects` (workspace-scoped) or `GET /api/v1/projects` (sandbox-backed, user-scoped).
# Tasks
> Learn how tasks define security challenges and how attempts are verified.
{/* Source: docs/domains/tasks.md */}
Tasks are the unit of security challenge execution on Dreadnode. Each task defines a sandboxed environment plus verification logic that determines whether an agent succeeded.
## What a task is
Tasks are authored and published by platform admins. When you start a task, the platform provisions a **task sandbox** built from a pre-made template and records an attempt for your user.
## Task definition
### Core task components
| Component | Purpose | Example |
| ------------ | ------------------------- | -------------------------------------------- |
| Instruction | Prompt given to the agent | “Find the admin endpoint and read the flag.” |
| Environment | Docker compose services | Web app, database, API |
| Verification | How completion is checked | Script or flag submission |
Tasks are immutable once published, so every attempt runs against a consistent environment.
## Task lifecycle
### Attempt states
| State | What it means |
| --------- | ---------------------------------------------- |
| Active | Task sandbox is running |
| Verifying | Completion signal received; checks are running |
| Passed | Verification succeeded |
| Failed | Verification failed |
| Abandoned | User stopped the sandbox |
| Expired | Sandbox timed out before verification |
### Execution model
- One sandbox session equals one attempt.
- Each new attempt starts a fresh task sandbox.
- Task sandboxes are one-shot and do not pause or resume.
## Results and verification
Verification runs inside the task sandbox and never exposes the solution scripts to the agent. You’ll see the attempt result in the UI after verification completes.
# Custom Capabilities
> Bundle prompts, tools, hooks, and skills into a reusable capability package.
import { Aside } from '@astrojs/starlight/components';
Capabilities are portable bundles that combine a system prompt, tools, hooks, and optional skills
into a single package. They are the main unit of extensibility when you want a domain-specific
agent setup that can be loaded on demand.
## What a capability contains
A capability is defined by a `capability.yaml` manifest and optional supporting files:
- **System prompt**: `system-prompt.md` (optional) to extend or replace the base prompt
- **Tools**: shell or HTTP tools exposed by the manifest
- **Hooks, scorers, stop conditions**: optional runtime behaviors
- **Skills**: bundled skill packs (a `skills/` directory or custom path)
## Capability manifest (capability.yaml)
```yaml
name: threat-hunting
version: 0.1.0
description: Threat hunting tools + skills for indicator triage.
skills: true
config:
intel_api_key:
type: secret
env: INTEL_API_KEY
required: true
tools:
- name: lookup_indicator
description: Look up an indicator in the intel service.
runtime: shell
entry: tools/lookup_indicator.py
parameters:
type: object
properties:
indicator:
type: string
description: IP, domain, or hash.
limit:
type: number
required: [indicator]
```
## Implement a tool entry script
Shell-runtime tools read JSON from stdin and return JSON on stdout.
```python
# tools/lookup_indicator.py
import json
import os
import sys
def main() -> None:
payload = json.load(sys.stdin)
indicator = payload.get("parameters", {}).get("indicator")
api_key = payload.get("config", {}).get("intel_api_key") or os.environ.get("INTEL_API_KEY")
if not indicator:
print(json.dumps({"error": "Missing indicator"}))
return
# Replace this stub with real API calls.
result = {
"indicator": indicator,
"verdict": "suspicious",
"source": "example-intel",
"api_key_set": bool(api_key),
}
print(json.dumps({"result": result}))
if __name__ == "__main__":
main()
```
## Load and register a capability
Use `loadCapability` to parse the manifest and `wrapCapability` to turn tools/hooks into SDK
primitives. Capability tools are already AI SDK `tool()` instances, so you can pass them straight
into an agent's tool map.
```ts
import { anthropic } from '@ai-sdk/anthropic';
import { createAgent, createGenerator, loadCapability, wrapCapability } from '@dreadnode/agents';
async function main(): Promise {
const generator = await createGenerator(anthropic('claude-sonnet-4-20250514'));
const loaded = await loadCapability('./capabilities/threat-hunting');
const wrapped = wrapCapability(loaded);
const capabilityTools = Object.fromEntries(
wrapped.tools.map((tool) => {
const meta = tool as Record;
const name = (meta._capName as string) ?? (meta.name as string);
return [name, tool];
})
);
const agent = createAgent({
name: 'threat-hunter',
generator,
systemPrompt: 'You are a threat hunting assistant.',
hooks: wrapped.hooks,
generateOptions: { tools: capabilityTools },
});
const result = await agent.run({ input: 'Check 8.8.8.8 for suspicious activity.' });
console.log(result.trajectory.lastMessage?.text ?? 'No output');
}
main().catch((error) => {
console.error(error);
process.exit(1);
});
```
# Custom Skills
> Author discoverable skill packs and load them with discoverSkills and createSkillTools.
import { Aside } from '@astrojs/starlight/components';
Skills are discoverable, loadable packs of instructions and assets. Each skill lives in its own
directory with a `SKILL.md` file that contains YAML frontmatter and markdown instructions.
## Skill format
The directory name must match the skill name in frontmatter. Use `allowed-tools` to scope what
the agent can call when the skill is active.
```
.skills/
incident-response/
SKILL.md
scripts/
triage.py
references/
playbook.md
```
```md
---
name: incident-response
description: Triage host compromise signals and summarize next actions.
allowed-tools: read_logs run_skill_script
license: MIT
compatibility: dreadnode>=0.9
metadata:
owner: security
---
Follow this process:
1. Identify the host and timeframe.
2. Run the triage script for baseline indicators.
3. Summarize findings and next actions.
```
## Discover and load skills
Use `discoverSkills` for a specific directory, or `discoverAllSkills` to search the default
project paths and any extra paths (such as a capability's bundled skills path).
```ts
import { discoverAllSkills, discoverSkills, createSkillTools } from '@dreadnode/agents';
async function main(): Promise {
const projectSkills = await discoverSkills('.skills');
const allSkills = await discoverAllSkills(['./capabilities/threat-hunting/skills']);
const skillTools = createSkillTools([...projectSkills, ...allSkills]);
const toolNames = skillTools.map((tool) => tool.name);
console.log(`Loaded ${skillTools.length} skill tools:`, toolNames.join(', '));
}
main().catch((error) => {
console.error(error);
process.exit(1);
});
```
## Using skills in an agent
`createSkillTools` returns four tools (`list_skills`, `load_skill`, `read_skill_file`,
`run_skill_script`) that you can merge into an agent's tool map. Skills are loaded on-demand,
so the agent only sees metadata until it requests full instructions.
# Custom Tools
> Define tools with schemas, wrap capability tools, and group them into toolkits.
import { Aside } from '@astrojs/starlight/components';
Tools are structured functions that an LLM can call. The SDK uses Zod schemas to validate inputs
and serialize parameters for model providers.
## Define tools with schemas
Use `defineTool` to describe inputs and return types. The SDK will validate parameters before
execution and expose a JSON schema to the model.
```ts
import { defineTool } from '@dreadnode/agents';
import { z } from 'zod';
export const enrichIndicator = defineTool({
name: 'enrich_indicator',
description: 'Look up an indicator in a local cache.',
parameters: z.object({
indicator: z.string().describe('IP, domain, or hash to enrich'),
}),
execute: async ({ indicator }) => ({ indicator, verdict: 'unknown' }),
});
```
## Group tools with createToolkit
`createToolkit` turns a list of tools into a keyed map with helpers for execution and schema
generation.
```ts
import { createToolkit } from '@dreadnode/agents';
import { enrichIndicator } from './tools/enrichIndicator';
const toolkit = createToolkit([enrichIndicator]);
const schemas = toolkit.toSchema();
console.log(schemas);
```
## Wrap capability tools
Capabilities ship tool definitions in `capability.yaml`. Use `wrapTool` and `wrapCapability` to
convert capability tool defs into AI SDK `tool()` instances that can be merged into your tool map.
```ts
import { loadCapability, wrapCapability, wrapTool } from '@dreadnode/agents';
async function main(): Promise {
const loaded = await loadCapability('./capabilities/threat-hunting');
const wrapped = wrapCapability(loaded);
const firstTool = loaded.manifest.tools?.[0];
if (firstTool) {
const single = wrapTool(firstTool, loaded);
console.log('Wrapped tool:', (single as Record)._capName);
}
console.log(`Wrapped ${wrapped.tools.length} tools from ${wrapped.name}.`);
}
main().catch((error) => {
console.error(error);
process.exit(1);
});
```
## Tool design best practices
- Keep inputs small and explicit (prefer enums or constrained strings).
- Return structured objects, not raw strings.
- Fail fast with clear error messages when preconditions are not met.
# Swarm Configs
> Define multi-agent swarm YAML configs, shared state, and delegate roles.
import { Aside } from '@astrojs/starlight/components';
Swarm configs let you define multi-agent runs with a coordinator and named worker delegates.
Configs are loaded from `~/.dreadnode/swarms/` and can be started from the CLI.
## YAML config format
Swarm configs are YAML files with a top-level `name`, optional default `model`, a `coordinator`
prompt, and a list of `workers` (delegates). Each worker includes an `input` seed.
```yaml
name: incident-response
model: anthropic/claude-sonnet-4-20250514
coordinator:
prompt: |
You coordinate the incident response swarm. Assign tasks, collect results,
and publish a final summary to shared state.
workers:
- name: timeline
prompt: Build a timeline of the incident.
input: Start with system logs and note key timestamps.
- name: iocs
prompt: Extract indicators of compromise.
input: List hashes, IPs, and domains.
- name: containment
prompt: Recommend containment steps.
input: Propose immediate containment actions.
```
Save the file as `~/.dreadnode/swarms/incident-response.yaml` and run it with:
```bash
/swarm incident-response
```
## SharedState and messaging
Swarm agents share a `SharedState` instance. Agents can communicate using the built-in swarm tools
(`read_state`, `write_state`, `send_message`, `read_messages`), which write to the same state log.
```ts
import { anthropic } from '@ai-sdk/anthropic';
import { Agent, createGenerator, withSwarm } from '@dreadnode/agents';
async function main(): Promise {
const generator = await createGenerator(anthropic('claude-sonnet-4-20250514'));
const coordinator = new Agent({ name: 'coordinator', generator });
const researcher = new Agent({ name: 'researcher', generator });
const swarm = withSwarm(coordinator, [
{ name: 'researcher', agent: researcher, input: 'Find related incidents.' },
]);
const result = await swarm.run({ input: 'Coordinate the swarm.' });
const summary = result.state.get('summary');
console.log('SharedState summary:', summary);
}
main().catch((error) => {
console.error(error);
process.exit(1);
});
```
## Delegates and direct delegation tools
Workers are the delegates in a swarm. If you want to expose explicit delegation tools,
use the `delegate()` helper to create `delegate_` tools for direct hand-offs.
# Credits
> Understand how credits power usage-based billing in SaaS deployments.
import { Aside } from '@astrojs/starlight/components';
{/* Source: docs/domains/credits.md */}
Credits are the platform’s unit of usage measurement. In SaaS mode, your organization uses credits as sandboxes run. Credits are shared across all members of the organization.
## How credits work
Credits are consumed in real time while sandboxes are active. Usage is recorded automatically so you can track spend and remaining balance.
| Event | What happens |
| -------------------- | -------------------------------------------------------------- |
| Sandbox keepalive | Credits are deducted based on runtime since the last deduction |
| Sandbox pause/stop | A final deduction is recorded |
| Balance reaches zero | All running sandboxes are terminated |
## Purchasing and balance
Organizations receive an initial credit allocation at signup and can purchase additional credits through Stripe. Each purchase increases the shared org balance.
### Transaction types
| Type | Description |
| ----------------- | ------------------------------------------- |
| Signup allocation | Initial credits granted at org creation |
| Purchase | Stripe-backed credit purchase |
| Usage | Runtime deductions from sandbox activity |
| Admin adjustment | Manual credit changes by platform operators |
## Deployment modes
In Enterprise mode, credit endpoints are unavailable and sandboxes are not limited by credit balance.
# Organizations
> Understand how organizations group users, workspaces, and billing on Dreadnode.
Organizations are the top-level container on Dreadnode. Everything — users, workspaces, projects, credits, and billing — is scoped to an organization.
## What an organization is
An organization represents a team, company, or group that shares access to the platform. Each organization has:
- A unique `key` (URL slug) used in API paths and URLs
- A display `name`
- A member list with role-based access
- Workspaces that contain projects
## Membership and roles
Users are added to an organization as members. Each member has a role that determines their permissions:
| Role | What they can do |
| ----------- | ------------------------------------------------------- |
| Owner | Full access — manage members, workspaces, billing, keys |
| Contributor | Create and manage workspaces and projects |
| Reader | View workspaces, projects, and traces |
### Invitations
Organization owners can invite users by email. Invitations have an expiration window and can be accepted or rejected by the recipient. External invites can be toggled on or off per organization.
## Organization limits
Each organization has a configurable maximum member count (default: 500). Platform administrators can adjust this limit.
## Managing organizations
- **API:** Organization details are available at `GET /api/v1/org/{org}`. Members are listed at `GET /api/v1/org/{org}/members`.
- **Workspaces:** Listed at `GET /api/v1/org/{org}/ws`.
## Relationship to other concepts
```
Organization
├── Members (users with roles)
├── Invitations (pending)
├── Workspaces
│ ├── Projects
│ │ ├── Sandboxes
│ │ ├── Sessions
│ │ └── Traces
│ └── Permissions (user + team)
└── Credits (SaaS mode)
```
# Secrets
> Store and inject sensitive credentials into sandboxes safely.
{/* Source: docs/domains/secrets.md */}
Secrets are encrypted credentials (API keys, tokens, and passwords) that you can inject into sandboxes as environment variables without exposing them in API responses.
## What secrets are
- **Private to you:** secrets are owned by your user and never shared by default.
- **Encrypted at rest:** plaintext values are never returned by any API.
- **Injected at runtime:** secrets are decrypted only when a sandbox is provisioned.
## Scoping and selection
Secrets are **user-owned**. You maintain a personal library of secrets and choose which of your secrets to inject when provisioning a sandbox for a project.
When you create or update a project sandbox, you pass the list of secret IDs to inject (`selected_secret_ids`). That selection is stored on the project and used for subsequent sandbox provisioning.
## Injection into sandboxes
Secrets are injected as environment variables at sandbox creation time. If you change the selected secrets for a project, the platform restarts the sandbox so the new values are applied.
## Lifecycle and management
### Common actions
- Create and update secrets from the UI or CLI (`dreadnode secrets set`).
- List available secrets and presets (`dreadnode secrets list`).
- Delete secrets you no longer use (`dreadnode secrets delete`).
### Lifecycle expectations
| Step | What happens |
| --------- | ---------------------------------------------------------- |
| Create | Secret is stored encrypted and shown with a masked preview |
| Select | You choose which secrets to inject for a project |
| Provision | Secrets are decrypted and injected into the sandbox |
| Rotate | Update the value and restart the sandbox to apply |
# Workspaces
> Learn how workspaces organize projects and control access within an organization.
Workspaces are containers within an organization that group related projects and control who can access them.
## What a workspace is
A workspace lives inside an organization and provides:
- A boundary for grouping related projects (e.g. by team, engagement, or client)
- Fine-grained access control via user and team permissions
- A unique `key` (URL slug) within the organization
Each user gets a **default workspace** that is private to them. Additional workspaces can be created and shared with other members.
## Permissions
Workspace access is controlled separately from organization roles. Permissions can be granted to individual users or to teams.
| Permission | What it allows |
| ----------- | ------------------------------------------- |
| Owner | Full access — manage permissions, delete |
| Contributor | Create and manage projects within workspace |
| Reader | View projects and traces |
### User permissions
Individual users can be added to a workspace with a specific permission level. The workspace creator is automatically assigned the `owner` permission.
### Team permissions
Teams (groups of users within the organization) can also be granted workspace access. All members of the team inherit the team's permission level for that workspace.
## Default workspaces
When a user joins an organization, they receive a default workspace that is private to them. Default workspaces:
- Are automatically created and cannot be deleted
- Are not shared with other members unless explicitly configured
- Provide a personal space for individual projects
## Managing workspaces
- **API:** Create a workspace with `POST /api/v1/org/{org}/ws`. Retrieve details with `GET /api/v1/org/{org}/ws/{workspace}`.
- **Sharing:** Add users with `POST /api/v1/org/{org}/ws/{workspace}/users`.
- **Listing:** All workspaces in an organization are listed at `GET /api/v1/org/{org}/ws`.
# Agents
> Build and run agents with trajectories, hooks, and reactions.
import { Aside } from '@astrojs/starlight/components';
The Agent class is the core runtime loop in the TypeScript SDK. It coordinates generations,
tool calls, and lifecycle events, while a Trajectory records everything that happened.
## Key types & signatures
```ts
class Agent {
constructor(config: AgentConfig);
run(options: RunOptions): Promise;
stream(options: RunOptions): AsyncGenerator;
}
function createAgent(config: AgentConfig): Agent;
class Trajectory {
constructor(options: {
sessionId?: string;
agentId: string;
agentName?: string;
systemPrompt?: string;
});
get events(): readonly AgentEvent[];
get messages(): Message[];
get lastMessage(): Message | undefined;
}
const reactions: {
continue(options?: { messages?: Message[]; feedback?: string }): Reaction;
retry(): Reaction;
retryWithFeedback(feedback: string): Reaction;
fail(reason: string): Reaction;
finish(result?: unknown): Reaction;
};
```
## Create and run an agent
```ts
import { anthropic } from '@ai-sdk/anthropic';
import { createAgent, createGenerator } from '@dreadnode/agents';
const generator = await createGenerator(anthropic('claude-sonnet-4-20250514'));
const agent = createAgent({
name: 'support-agent',
generator,
systemPrompt: 'You are a crisp, friendly support agent.',
});
async function main(): Promise {
const result = await agent.run({
input: 'Summarize the Dreadnode platform in one sentence.',
});
const message = result.trajectory.lastMessage;
if (message) {
console.log(message.text);
}
}
main().catch((error) => {
console.error(error);
process.exit(1);
});
```
## Add reactions with hooks
Hooks listen to agent events and can return reactions to steer or stop the loop.
```ts
import { anthropic } from '@ai-sdk/anthropic';
import {
createAgent,
createGenerator,
hook,
reactions,
type GenerationStepEvent,
} from '@dreadnode/agents';
const generator = await createGenerator(anthropic('claude-sonnet-4-20250514'));
const qualityHook = hook('retry-on-empty', 'GenerationStep', (event) => {
const hasOutput = event.messages.some((msg) => msg.role === 'assistant');
return hasOutput ? null : reactions.retryWithFeedback('Please provide a complete answer.');
});
const agent = createAgent({
name: 'quality-agent',
generator,
hooks: [qualityHook],
});
const result = await agent.run({ input: 'Write a one-line mission statement.' });
console.log(result.trajectory.lastMessage?.text ?? 'No output');
```
## Trajectory highlights
The Trajectory records all events, messages, usage, and stop reasons. Use it to:
- Inspect the final output (`trajectory.lastMessage`)
- Review all events (`trajectory.events`)
- Compute token usage (`trajectory.usage`)
# Data
> Load datasets from the Hub, URLs, or inline arrays with Arrow support.
import { Aside } from '@astrojs/starlight/components';
The data module loads datasets for evaluations. It supports inline arrays, Hugging Face Hub
datasets, and URLs, and can return either plain arrays or Arrow tables.
## Key types & signatures
```ts
type DatasetRef = unknown[] | { url: string } | { hf: string; split?: string; revision?: string };
function loadDataset(ref: DatasetRef, options?: DatasetLoadOptions): Promise;
function loadDatasetFromHub(
repoId: string,
options?: DatasetLoadOptions & { split?: string; revision?: string }
): Promise;
type ArrowTable = import('apache-arrow').Table;
```
## Load a dataset from the Hub
```ts
import { loadDatasetFromHub } from '@dreadnode/agents';
const rows = await loadDatasetFromHub('dreadnode/evals-support', {
split: 'train',
});
console.log(rows.length);
```
## Load an inline dataset and keep Arrow format
```ts
import { loadDatasetAsArrow, type ArrowTable } from '@dreadnode/agents';
const table: ArrowTable = await loadDatasetAsArrow([
{ input: 'What is Dreadnode?', expected: 'An AI agent platform.' },
{ input: 'What is a generator?', expected: 'A model wrapper.' },
]);
console.log(table.numRows);
```
## Use loadDataset in evaluations
```ts
import { loadDataset } from '@dreadnode/agents';
const samples = await loadDataset({
hf: 'dreadnode/evals-support',
split: 'test',
});
for (const sample of samples) {
console.log(sample);
}
```
# Evaluations
> Run dataset-driven evaluations with SampleExecutor and TrialExecutor.
import { Aside } from '@astrojs/starlight/components';
Evaluations are built from worker primitives: SampleExecutor runs a task once, while
TrialExecutor runs a batch with parameters for studies and optimization.
## Key types & signatures
```ts
type TaskFn = (input: In, context: TaskContext) => Promise | Out;
class SampleExecutor {
constructor(options: { task: TaskFn; scorers?: EvalScorerConfig[] });
execute(request: SampleRequest, signal?: AbortSignal): Promise>;
executeBatch(
requests: SampleRequest[],
options?: { signal?: AbortSignal; concurrency?: number }
): Promise[]>;
}
class TrialExecutor {
constructor(options: {
task: TaskFn;
scorers?: EvalScorerConfig[];
objectiveMetric: string;
objectiveMode?: 'maximize' | 'minimize';
});
execute(
trial: TrialRequest,
samples: { input: In; context?: Record }[],
signal?: AbortSignal
): Promise;
}
class Evaluation {
constructor(config: EvaluationConfig);
run(): Promise;
}
```
## Run a sample evaluation
```ts
import { SampleExecutor, evalScorer, similarity, type SampleRequest } from '@dreadnode/agents';
type Input = { question: string; expected: string };
type Output = string;
const task = async (input: Input): Promise