Skip to content

Datasets

Versioned datasets for training, optimization, and evaluation.

Terminal window
$ dn dataset <command>

Versioned data for training, optimization, and evaluation — the ground truth your agents learn from.

Terminal window
$ dn dataset inspect <path>

Preview a local dataset directory before publishing.

Reads dataset.yaml and the data files to show schema, row counts, splits, and format — so you can catch problems before pushing.

Options

  • <path>, --path (Required) — Dataset directory containing dataset.yaml.
  • --json (default False) — Output raw JSON instead of a table.

Aliases: upload

Terminal window
$ dn dataset push

Publish a dataset to your organization’s registry.

Two input shapes (mutually exclusive):

  • Local directory: dn dataset push <dir> — packages a directory with dataset.yaml and data files as a versioned artifact.
  • HuggingFace: dn dataset push --hf <hf_path> [--hf-split ...] [--user-field ...] [--assistant-field ...] — pulls a dataset from HuggingFace Hub and pushes it under --name (default: the HF path). When both --user-field and --assistant-field are set, rows are transformed to OpenAI messages format for Tinker SFT.

Options

  • <path>, --path — Dataset directory (mutually exclusive with —hf).
  • --hf — HuggingFace dataset path, e.g. "openai/gsm8k".
  • --hf-config — Optional HF config (e.g. "main" for gsm8k).
  • --hf-split (default train) — HF split spec ("train", "train[:100]", etc).
  • --user-field — Row field → user message (requires assistant_field).
  • --assistant-field — Row field → assistant message.
  • --system-prompt — Optional system message prepended to each conversation.
  • --name — Override the registry name.
  • --dataset-version (default 0.1.0) — Registry version string (renamed from version to avoid collision with the CLI’s global --version flag).
  • --summary — Optional human-readable summary.
  • --hf-format (default parquet) — Output format for —hf pushes. Defaults to parquet (the platform default). jsonl writes line-delimited JSON. [choices: parquet, jsonl]
  • --skip-upload (default False) — Build and validate locally without publishing.
  • --publish (default False) — Ensure the dataset is publicly discoverable after publishing.
Terminal window
$ dn dataset publish <refs>

Make one or more dataset families visible to other organizations.

Options

  • <refs>, --refs (Required)
Terminal window
$ dn dataset unpublish <refs>

Make one or more dataset families private.

Options

  • <refs>, --refs (Required)

Aliases: ls

Terminal window
$ dn dataset list

Show datasets in your organization.

Options

  • --search — Search by name or description.
  • --limit (default 50) — Maximum results to show.
  • --include-public (default False) — Include public datasets from other organizations.
  • --json (default False) — Output raw JSON instead of a summary.
Terminal window
$ dn dataset info <ref>

Show details and available versions for a dataset.

Version is optional — defaults to the latest.

Options

  • <ref>, --ref (Required) — Dataset to inspect (e.g. my-dataset, my-dataset@1.0.0).
  • --json (default False) — Output raw JSON instead of a summary.

Aliases: rm

Terminal window
$ dn dataset delete <ref>

Remove a dataset version from the registry.

Options

  • <ref>, --ref (Required) — Dataset to delete (e.g. my-dataset@1.0.0). Version is required.
  • --yes, -y (default False) — Skip the confirmation prompt.

Aliases: download

Terminal window
$ dn dataset pull <ref>

Pull a dataset to your local machine.

Version is optional — defaults to the latest. Without —output, prints a pre-signed download URL you can use with curl or a browser.

Options

  • <ref>, --ref (Required) — Dataset to pull (e.g. my-dataset, my-dataset@1.0.0).
  • --output — Save to this path instead of printing the URL.
  • --split — Download a specific split (e.g. train, test).