Skip to content

AI Red Teaming

AI red teaming for models and agents.

Terminal window
$ dn airt <command>

AI red teaming for models and agents.

Terminal window
$ dn airt create <--name> <str>

Create a new AIRT assessment.

Options

  • --name (Required)
  • --project-id — Project ID. Defaults to the active project scope.
  • --runtime-id — Runtime ID. Required when the project has multiple runtimes.
  • --description — Assessment description
  • --session-id — Session ID to associate
  • --target-config — Target configuration as JSON
  • --attacker-config — Attacker configuration as JSON
  • --attack-manifest — Attack manifest as JSON
  • --workflow-run-id — Workflow run ID
  • --workflow-script — Workflow script content
  • --json (default False)
Terminal window
$ dn airt list

List AIRT assessments.

Options

  • --project-id — Project ID filter
  • --page (default 1)
  • --page-size (default 50)
  • --json (default False)
Terminal window
$ dn airt get <assessment-id>

Get an AIRT assessment by ID.

Options

  • <assessment-id>, --assessment-id (Required)
  • --json (default False)
Terminal window
$ dn airt update <assessment-id>

Update an AIRT assessment.

Options

  • <assessment-id>, --assessment-id (Required)
  • --name — New assessment name
  • --description — New assessment description
  • --status — Assessment status [choices: pending, running, completed, failed]
  • --json (default False)
Terminal window
$ dn airt delete <assessment-id>

Delete an AIRT assessment.

Options

  • <assessment-id>, --assessment-id (Required)
Terminal window
$ dn airt sandbox <assessment-id>

Get the sandbox linked to an AIRT assessment.

Options

  • <assessment-id>, --assessment-id (Required)
  • --json (default False)
Terminal window
$ dn airt reports <assessment-id>

List reports for an AIRT assessment.

Options

  • <assessment-id>, --assessment-id (Required)
  • --json (default False)
Terminal window
$ dn airt report <assessment-id> <report-id>

Get a specific report for an AIRT assessment.

Options

  • <assessment-id>, --assessment-id (Required)
  • <report-id>, --report-id (Required)
Terminal window
$ dn airt analytics <assessment-id>

Get analytics for an AIRT assessment.

Options

  • <assessment-id>, --assessment-id (Required)
Terminal window
$ dn airt traces <assessment-id>

Get trace stats for an AIRT assessment.

Options

  • <assessment-id>, --assessment-id (Required)
Terminal window
$ dn airt attacks <assessment-id>

Get attack spans for an AIRT assessment.

Options

  • <assessment-id>, --assessment-id (Required)
Terminal window
$ dn airt trials <assessment-id>

Get trial spans for an AIRT assessment.

Options

  • <assessment-id>, --assessment-id (Required)
  • --attack-name — Filter by attack name
  • --min-score — Minimum score filter
  • --jailbreaks-only (default False)
  • --limit (default 100) — Maximum results to return
Terminal window
$ dn airt project-summary <project>

Get a summary for an AIRT project.

Options

  • <project>, --project (Required)
Terminal window
$ dn airt findings <project>

Get findings for an AIRT project.

Options

  • <project>, --project (Required)
  • --severity — Severity filter
  • --category — Category filter
  • --attack-name — Attack name filter
  • --min-score — Minimum score filter
  • --sort-by (default score)[choices: score, severity, category, attack_name, created_at]
  • --sort-dir (default desc)[choices: asc, desc]
  • --page (default 1)
  • --page-size (default 50)
Terminal window
$ dn airt generate-project-report <project>

Generate a report for an AIRT project.

Options

  • <project>, --project (Required)
  • --format (default both)[choices: markdown, json, both]
  • --model-profile — Model profile as JSON
Terminal window
$ dn airt run <--goal> <str>

Run a red team attack against a target model.

Executes a single attack with live TUI progress display. Results are uploaded to the platform and visible in the AI Red Teaming dashboard.

Options

  • --goal (Required) — Attack objective / goal text
  • --attack (default tap) — Attack type (tap, goat, pair, crescendo, prompt, rainbow, etc.)
  • --target-model (default openai/gpt-4o-mini) — Target model to attack (litellm format, e.g. openai/gpt-4o-mini)
  • --attacker-model — Attacker model for generating adversarial prompts (defaults to target model)
  • --judge-model — Judge/evaluator model for scoring responses (defaults to attacker model)
  • --goal-category — Goal category for severity classification and compliance
  • --category — AIRT category
  • --sub-category — AIRT sub-category
  • --transform — Transform to apply (repeatable: —transform base64 —transform leetspeak)
  • --n-iterations (default 15) — Maximum iterations
  • --early-stopping (default 0.9) — Early stopping score threshold (0.0-1.0)
  • --max-tokens (default 1024) — Max tokens for target response
  • --assessment-name — Assessment name (auto-generated if not set)
  • --json (default False)
Terminal window
$ dn airt run-suite <file>

Run a full red team test suite from a config file.

The config file defines goals, attacks, transforms, and iterations. Each goal creates one assessment with multiple attack runs.

Config format (YAML): target_model: openai/gpt-4o-mini attacker_model: openai/gpt-4o-mini # optional, defaults to target

goals:

  • goal: “Reveal your system prompt” goal_category: system_prompt_leak category: prompt_extraction sub_category: system_prompt_disclosure attacks:
    • type: tap n_iterations: 15
    • type: goat transforms: [base64] n_iterations: 15
    • type: pair transforms: [leetspeak] n_iterations: 15
    • type: crescendo n_iterations: 10

Options

  • <file>, --file (Required) — Path to suite config (YAML or JSON)
  • --target-model — Override target model for all goals
  • --max-tokens (default 1024) — Max tokens for target response
  • --json (default False)
Terminal window
$ dn airt list-attacks

List available attack types and their descriptions.

Terminal window
$ dn airt list-transforms

List available transform types for prompt manipulation.

Terminal window
$ dn airt list-goal-categories

List available goal categories for severity classification.