Datasets
Versioned datasets for training, optimization, and evaluation.
$ dn dataset <command>Versioned data for training, optimization, and evaluation — the ground truth your agents learn from.
inspect
Section titled “inspect”$ dn dataset inspect <path>Preview a local dataset directory before publishing.
Reads dataset.yaml and the data files to show schema, row counts, splits, and format — so you can catch problems before pushing.
Options
<path>,--path(Required) — Dataset directory containing dataset.yaml.--json(defaultFalse) — Output raw JSON instead of a table.
Aliases: upload
$ dn dataset pushPublish a dataset to your organization’s registry.
Two input shapes (mutually exclusive):
- Local directory:
dn dataset push <dir>— packages a directory withdataset.yamland data files as a versioned artifact. - HuggingFace:
dn dataset push --hf <hf_path> [--hf-split ...] [--user-field ...] [--assistant-field ...]— pulls a dataset from HuggingFace Hub and pushes it under--name(default: the HF path). When both--user-fieldand--assistant-fieldare set, rows are transformed to OpenAI messages format for Tinker SFT.
Options
<path>,--path— Dataset directory (mutually exclusive with —hf).--hf— HuggingFace dataset path, e.g."openai/gsm8k".--hf-config— Optional HF config (e.g."main"for gsm8k).--hf-split(defaulttrain) — HF split spec ("train","train[:100]", etc).--user-field— Row field → user message (requires assistant_field).--assistant-field— Row field → assistant message.--system-prompt— Optional system message prepended to each conversation.--name— Override the registry name.--dataset-version(default0.1.0) — Registry version string (renamed fromversionto avoid collision with the CLI’s global--versionflag).--summary— Optional human-readable summary.--hf-format(defaultparquet) — Output format for —hf pushes. Defaults to parquet (the platform default). jsonl writes line-delimited JSON. [choices: parquet, jsonl]--skip-upload(defaultFalse) — Build and validate locally without publishing.--publish(defaultFalse) — Ensure the dataset is publicly discoverable after publishing.
publish
Section titled “publish”$ dn dataset publish <refs>Make one or more dataset families visible to other organizations.
Options
<refs>,--refs(Required)
unpublish
Section titled “unpublish”$ dn dataset unpublish <refs>Make one or more dataset families private.
Options
<refs>,--refs(Required)
Aliases: ls
$ dn dataset listShow datasets in your organization.
Options
--search— Search by name or description.--limit(default50) — Maximum results to show.--include-public(defaultFalse) — Include public datasets from other organizations.--json(defaultFalse) — Output raw JSON instead of a summary.
$ dn dataset info <ref>Show details and available versions for a dataset.
Version is optional — defaults to the latest.
Options
<ref>,--ref(Required) — Dataset to inspect (e.g. my-dataset, my-dataset@1.0.0).--json(defaultFalse) — Output raw JSON instead of a summary.
delete
Section titled “delete”Aliases: rm
$ dn dataset delete <ref>Remove a dataset version from the registry.
Options
<ref>,--ref(Required) — Dataset to delete (e.g. my-dataset@1.0.0). Version is required.--yes,-y(defaultFalse) — Skip the confirmation prompt.
Aliases: download
$ dn dataset pull <ref>Pull a dataset to your local machine.
Version is optional — defaults to the latest. Without —output, prints a pre-signed download URL you can use with curl or a browser.
Options
<ref>,--ref(Required) — Dataset to pull (e.g. my-dataset, my-dataset@1.0.0).--output— Save to this path instead of printing the URL.--split— Download a specific split (e.g. train, test).