CLI Reference - EZVals

Commands

EZVals has three main commands:

ezvals serve - Start the web UI to browse and run evaluations interactively
ezvals run - Run evaluations headlessly (for CI/CD pipelines)
ezvals export - Export a run to various formats (JSON, CSV, Markdown)

Programmatic Invocation

You can call the SDK equivalent of ezvals run from Python:

from ezvals import run

result = run(path="examples", concurrency=4, session="adhoc")
print(result["saved_path"])

ezvals serve

Start the web UI to discover and run evaluations interactively.

ezvals serve PATH [OPTIONS]

Where PATH can be:

A directory: ezvals serve evals/
A file: ezvals serve evals/customer_service.py
A specific function: ezvals serve evals.py::test_refund
A run JSON file: ezvals serve .ezvals/sessions/default/run_123.json

The UI opens automatically in your browser by default. Use --no-open to disable browser launch. Evaluations are discovered and displayed but not run until you click the Run button.

Loading Previous Runs

You can load a previous run by passing the JSON file path directly:

ezvals serve .ezvals/sessions/default/sleek-wolf_1705312200.json

This opens the UI with that run’s results loaded. If the original eval source file still exists, you can rerun evaluations. If the source file was moved or deleted, the UI works in view-only mode.

Options

-d, --dataset

string

default:"all"

Filter evaluations by dataset.

ezvals serve evals/ --dataset customer_service

-l, --label

string

default:"all"

Filter evaluations by label. Can be specified multiple times.

ezvals serve evals/ --label production -l critical

--results-dir

string

default:".ezvals/sessions"

Directory for JSON results storage.

ezvals serve evals/ --results-dir ./my-results

--port

integer

default:"8000"

Port for web UI server.

ezvals serve evals/ --port 3000

--session

string

Name for this evaluation session. Groups related runs together.

ezvals serve evals/ --session model-comparison

--run-name

string

Open an existing run in the current session by run name. If not found, this becomes the pending run name for the next run.

ezvals serve evals/ --session model-comparison --run-name baseline

--compare-runs

string

Comma-separated run names (2-4) to pre-load comparison mode at startup. These are resolved to compare_run_id query params in the opened URL.

ezvals serve evals/ --session model-comparison --compare-runs baseline,improved

--search

string

Initial search text applied at startup.

ezvals serve evals/ --search auth

--has-error/--no-has-error

boolean

Initial error filter for startup state.

--has-url/--no-has-url

boolean

Initial trace URL filter for startup state.

--has-messages/--no-has-messages

boolean

Initial trace messages filter for startup state.

--annotation

choice

default:"any"

Initial annotation filter (any, yes, or no).

ezvals serve evals/ --annotation yes --has-error

--run

flag

Automatically run all evaluations on startup. Same as clicking the Run button immediately.

ezvals serve evals/ --run

--open/--no-open

boolean

default:"open"

Control whether ezvals serve automatically opens your browser.

ezvals serve evals/ --no-open

ezvals run

Run evaluations headlessly. Outputs minimal text by default (optimized for LLM agents). Use --visual for rich table output.

ezvals run PATH [OPTIONS]

Where PATH can be:

A directory: ezvals run evals/
A file: ezvals run evals/customer_service.py
A specific function: ezvals run evals.py::test_refund
A case variant: ezvals run evals.py::test_math[2-3-5]

Filtering Options

-d, --dataset

string

default:"all"

Filter evaluations by dataset. Can be specified multiple times.

ezvals run evals/ --dataset customer_service
ezvals run evals/ -d customer_service -d technical_support

-l, --label

string

default:"all"

Filter evaluations by label. Can be specified multiple times.

ezvals run evals/ --label production
ezvals run evals/ -l production -l critical

--limit

integer

Limit the number of evaluations to run.

ezvals run evals/ --limit 10

Execution Options

-c, --concurrency

integer

default:"1"

Number of concurrent evaluations. 1 means sequential execution.

# Run 4 evaluations in parallel
ezvals run evals/ --concurrency 4
ezvals run evals/ -c 4

--timeout

float

Global timeout in seconds for all evaluations.

ezvals run evals/ --timeout 30.0

Output Options

-v, --verbose

flag

Show stdout from eval functions (print statements, logs).

ezvals run evals/ --verbose
ezvals run evals/ -v

--visual

flag

Show rich progress dots, results table, and summary. Without this flag, output is minimal.

ezvals run evals/ --visual

-o, --output

string

Override the default results path. When specified, results are saved only to this path (not to .ezvals/runs/).

ezvals run evals/ --output results.json
ezvals run evals/ -o results.json

--no-save

flag

Skip saving results to file. Outputs JSON to stdout instead.

# Get results as JSON without writing to disk
ezvals run evals/ --no-save | jq '.passed'

Session Options

--session

string

Name for this evaluation session. Groups related runs together.

ezvals run evals/ --session model-comparison

--run-name

string

Name for this specific run. Used as file prefix.

ezvals run evals/ --session model-comparison --run-name gpt4-baseline

ezvals export

Export a run file to various formats. Useful for sharing results, generating reports, or integrating with other tools.

ezvals export RUN_PATH [OPTIONS]

Where RUN_PATH is the path to a run JSON file (e.g., .ezvals/sessions/default/run_123.json).

Options

-f, --format

choice

default:"json"

Export format: json, csv, or md.

ezvals export run.json -f md
ezvals export run.json -f csv

-o, --output

string

Output file path. Defaults to {run_name}.{format}.

ezvals export run.json -f md -o report.md

Export Formats

Format	Description
`json`	Copy the raw JSON file
`csv`	Flat CSV with all results
`md`	Markdown with ASCII bar charts and results table

Examples

# Export to Markdown
ezvals export .ezvals/sessions/default/run_123.json -f md

# Export to CSV
ezvals export run.json -f csv -o results.csv

Examples

Start the Web UI

# Discover evals in a directory and open the UI
ezvals serve evals/

# Start UI on a custom port
ezvals serve evals/ --port 8080

# Filter what's shown in the UI
ezvals serve evals/ --dataset qa --label production

# Load a previous run to view or continue
ezvals serve .ezvals/sessions/default/sleek-wolf_1705312200.json

Run All Evaluations

ezvals run evals/

Run Specific File

ezvals run evals/customer_service.py

Run Specific Function

ezvals run evals/customer_service.py::test_refund

Run Parametrized Variant

ezvals run evals/math.py::test_addition[2-3-5]

Filter by Dataset and Label

ezvals run evals/ --dataset qa --label production

Run with Concurrency and Timeout

ezvals run evals/ -c 8 --timeout 60.0

Export Results

# Results auto-save to .ezvals/runs/ by default
ezvals run evals/

# Override output path
ezvals run evals/ -o results.json

Verbose Debug Run

# Show eval stdout and rich output
ezvals run evals/ -v --visual --limit 5

Production CI Pipeline

# Minimal output for LLM agents/CI
ezvals run evals/ -c 16 --timeout 120

Session Tracking

# Group runs under a session
ezvals run evals/ --session model-comparison --run-name baseline

# Continue the session with another run
ezvals run evals/ --session model-comparison --run-name improved

Configuration File

EZVals supports a ezvals.json config file for persisting default CLI options. The file is auto-generated in your project root on first run.

Default Config

{
  "concurrency": 1,
  "results_dir": ".ezvals/runs"
}

Supported Options

Option	Type	Description	Used by
`concurrency`	integer	Number of concurrent evaluations	`run`
`timeout`	float	Global timeout in seconds	`run`
`verbose`	boolean	Show stdout from eval functions	`run`
`results_dir`	string	Directory for results storage	`serve`
`port`	integer	Web UI server port	`serve`

Precedence

CLI flags always override config values:

# Config has concurrency: 1, but this uses 4
ezvals run evals/ -c 4

Editing via UI

Click the settings icon in the web UI header to view and edit config values. Changes are saved to ezvals.json.

Exit Codes

Code	Meaning
0	Evaluations completed (regardless of pass/fail)
Non-zero	Error during execution (bad path, exceptions, etc.)

The CLI does not currently set non-zero exit codes for failed evaluations—only for execution errors. Check the JSON output or summary for pass/fail status.

Environment Variables

Variable	Description
`EZVALS_CONCURRENCY`	Default concurrency level
`EZVALS_TIMEOUT`	Default timeout in seconds

Output Format

Minimal Output (Default)

By default, ezvals run outputs minimal text optimized for LLM agents and CI pipelines:

Running...
Results saved to .ezvals/runs/swift-falcon_2024-01-15T10-30-00Z.json

Visual Output (—visual)

Use --visual for rich progress dots, results table, and summary:

Running...
customer_service.py ..F

┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃                     customer_service                           ┃
┣━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┫
┃ Name                ┃ Status   ┃ Score    ┃ Latency           ┃
┡━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━┩
│ test_refund         │ ✓ passed │ 1.0      │ 0.23s             │
│ test_complaint      │ ✗ failed │ 0.0      │ 0.45s             │
└─────────────────────┴──────────┴──────────┴───────────────────┘

Summary: 1/2 passed (50.0%)

JSON File Output

Results are always saved as JSON to .ezvals/runs/ (or custom path via -o):

{
  "run_id": "2024-01-15T10-30-00Z",
  "total": 2,
  "passed": 1,
  "failed": 1,
  "results": [...]
}

​Commands

​Programmatic Invocation

​ezvals serve

​Loading Previous Runs

​Options

​ezvals run

​Filtering Options

​Execution Options

​Output Options

​Session Options

​ezvals export

​Options

​Export Formats

​Examples

​Examples

​Start the Web UI

​Run All Evaluations

​Run Specific File

​Run Specific Function

​Run Parametrized Variant

​Filter by Dataset and Label

​Run with Concurrency and Timeout

​Export Results

​Verbose Debug Run

​Production CI Pipeline

​Session Tracking

​Configuration File

​Default Config

​Supported Options

​Precedence

​Editing via UI

​Exit Codes

​Environment Variables

​Output Format

​Minimal Output (Default)

​Visual Output (—visual)

​JSON File Output

Commands

Programmatic Invocation

ezvals serve

Loading Previous Runs

Options

ezvals run

Filtering Options

Execution Options

Output Options

Session Options

ezvals export

Options

Export Formats

Examples

Examples

Start the Web UI

Run All Evaluations

Run Specific File

Run Specific Function

Run Parametrized Variant

Filter by Dataset and Label

Run with Concurrency and Timeout

Export Results

Verbose Debug Run

Production CI Pipeline

Session Tracking

Configuration File

Default Config

Supported Options

Precedence

Editing via UI

Exit Codes

Environment Variables

Output Format

Minimal Output (Default)

Visual Output (—visual)

JSON File Output