Skip to main content

Commands

EZVals has three main commands:
  • ezvals serve - Start the web UI to browse and run evaluations interactively
  • ezvals run - Run evaluations headlessly (for CI/CD pipelines)
  • ezvals export - Export a run to various formats (JSON, CSV, Markdown)

ezvals serve

Start the web UI to discover and run evaluations interactively.
ezvals serve PATH [OPTIONS]
Where PATH can be:
  • A directory: ezvals serve evals/
  • A file: ezvals serve evals/customer_service.py
  • A specific function: ezvals serve evals.py::test_refund
  • A run JSON file: ezvals serve .ezvals/sessions/default/run_123.json
The UI opens automatically in your browser. Evaluations are discovered and displayed but not run until you click the Run button.

Loading Previous Runs

You can load a previous run by passing the JSON file path directly:
ezvals serve .ezvals/sessions/default/sleek-wolf_1705312200.json
This opens the UI with that run’s results loaded. If the original eval source file still exists, you can rerun evaluations. If the source file was moved or deleted, the UI works in view-only mode.

Options

-d, --dataset
string
default:"all"
Filter evaluations by dataset.
ezvals serve evals/ --dataset customer_service
-l, --label
string
default:"all"
Filter evaluations by label. Can be specified multiple times.
ezvals serve evals/ --label production -l critical
--results-dir
string
default:".ezvals/runs"
Directory for JSON results storage.
ezvals serve evals/ --results-dir ./my-results
--port
integer
default:"8000"
Port for web UI server.
ezvals serve evals/ --port 3000
--session
string
Name for this evaluation session. Groups related runs together.
ezvals serve evals/ --session model-comparison
--run
flag
Automatically run all evaluations on startup. Same as clicking the Run button immediately.
ezvals serve evals/ --run

ezvals run

Run evaluations headlessly. Outputs minimal text by default (optimized for LLM agents). Use --visual for rich table output.
ezvals run PATH [OPTIONS]
Where PATH can be:
  • A directory: ezvals run evals/
  • A file: ezvals run evals/customer_service.py
  • A specific function: ezvals run evals.py::test_refund
  • A parametrized variant: ezvals run evals.py::test_math[2-3-5]

Filtering Options

-d, --dataset
string
default:"all"
Filter evaluations by dataset. Can be specified multiple times.
ezvals run evals/ --dataset customer_service
ezvals run evals/ -d customer_service -d technical_support
-l, --label
string
default:"all"
Filter evaluations by label. Can be specified multiple times.
ezvals run evals/ --label production
ezvals run evals/ -l production -l critical
--limit
integer
Limit the number of evaluations to run.
ezvals run evals/ --limit 10

Execution Options

-c, --concurrency
integer
default:"1"
Number of concurrent evaluations. 1 means sequential execution.
# Run 4 evaluations in parallel
ezvals run evals/ --concurrency 4
ezvals run evals/ -c 4
--timeout
float
Global timeout in seconds for all evaluations.
ezvals run evals/ --timeout 30.0

Output Options

-v, --verbose
flag
Show stdout from eval functions (print statements, logs).
ezvals run evals/ --verbose
ezvals run evals/ -v
--visual
flag
Show rich progress dots, results table, and summary. Without this flag, output is minimal.
ezvals run evals/ --visual
-o, --output
string
Override the default results path. When specified, results are saved only to this path (not to .ezvals/runs/).
ezvals run evals/ --output results.json
ezvals run evals/ -o results.json
--no-save
flag
Skip saving results to file. Outputs JSON to stdout instead.
# Get results as JSON without writing to disk
ezvals run evals/ --no-save | jq '.passed'

Session Options

--session
string
Name for this evaluation session. Groups related runs together.
ezvals run evals/ --session model-comparison
--run-name
string
Name for this specific run. Used as file prefix.
ezvals run evals/ --session model-comparison --run-name gpt4-baseline

ezvals export

Export a run file to various formats. Useful for sharing results, generating reports, or integrating with other tools.
ezvals export RUN_PATH [OPTIONS]
Where RUN_PATH is the path to a run JSON file (e.g., .ezvals/sessions/default/run_123.json).

Options

-f, --format
choice
default:"json"
Export format: json, csv, or md.
ezvals export run.json -f md
ezvals export run.json -f csv
-o, --output
string
Output file path. Defaults to {run_name}.{format}.
ezvals export run.json -f md -o report.md

Export Formats

FormatDescription
jsonCopy the raw JSON file
csvFlat CSV with all results
mdMarkdown with ASCII bar charts and results table

Examples

# Export to Markdown
ezvals export .ezvals/sessions/default/run_123.json -f md

# Export to CSV
ezvals export run.json -f csv -o results.csv

Examples

Start the Web UI

# Discover evals in a directory and open the UI
ezvals serve evals/

# Start UI on a custom port
ezvals serve evals/ --port 8080

# Filter what's shown in the UI
ezvals serve evals/ --dataset qa --label production

# Load a previous run to view or continue
ezvals serve .ezvals/sessions/default/sleek-wolf_1705312200.json

Run All Evaluations

ezvals run evals/

Run Specific File

ezvals run evals/customer_service.py

Run Specific Function

ezvals run evals/customer_service.py::test_refund

Run Parametrized Variant

ezvals run evals/math.py::test_addition[2-3-5]

Filter by Dataset and Label

ezvals run evals/ --dataset qa --label production

Run with Concurrency and Timeout

ezvals run evals/ -c 8 --timeout 60.0

Export Results

# Results auto-save to .ezvals/runs/ by default
ezvals run evals/

# Override output path
ezvals run evals/ -o results.json

Verbose Debug Run

# Show eval stdout and rich output
ezvals run evals/ -v --visual --limit 5

Production CI Pipeline

# Minimal output for LLM agents/CI
ezvals run evals/ -c 16 --timeout 120

Session Tracking

# Group runs under a session
ezvals run evals/ --session model-comparison --run-name baseline

# Continue the session with another run
ezvals run evals/ --session model-comparison --run-name improved

Configuration File

EZVals supports a ezvals.json config file for persisting default CLI options. The file is auto-generated in your project root on first run.

Default Config

{
  "concurrency": 1,
  "results_dir": ".ezvals/runs"
}

Supported Options

OptionTypeDescriptionUsed by
concurrencyintegerNumber of concurrent evaluationsrun
timeoutfloatGlobal timeout in secondsrun
verbosebooleanShow stdout from eval functionsrun
results_dirstringDirectory for results storageserve
portintegerWeb UI server portserve

Precedence

CLI flags always override config values:
# Config has concurrency: 1, but this uses 4
ezvals run evals/ -c 4

Editing via UI

Click the settings icon in the web UI header to view and edit config values. Changes are saved to ezvals.json.

Exit Codes

CodeMeaning
0Evaluations completed (regardless of pass/fail)
Non-zeroError during execution (bad path, exceptions, etc.)
The CLI does not currently set non-zero exit codes for failed evaluations—only for execution errors. Check the JSON output or summary for pass/fail status.

Environment Variables

VariableDescription
EZVALS_CONCURRENCYDefault concurrency level
EZVALS_TIMEOUTDefault timeout in seconds

Output Format

Minimal Output (Default)

By default, ezvals run outputs minimal text optimized for LLM agents and CI pipelines:
Running...
Results saved to .ezvals/runs/swift-falcon_2024-01-15T10-30-00Z.json

Visual Output (—visual)

Use --visual for rich progress dots, results table, and summary:
Running...
customer_service.py ..F

┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃                     customer_service                           ┃
┣━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┫
┃ Name                ┃ Status   ┃ Score    ┃ Latency           ┃
┡━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━┩
│ test_refund         │ ✓ passed │ 1.0      │ 0.23s             │
│ test_complaint      │ ✗ failed │ 0.0      │ 0.45s             │
└─────────────────────┴──────────┴──────────┴───────────────────┘

Summary: 1/2 passed (50.0%)

JSON File Output

Results are always saved as JSON to .ezvals/runs/ (or custom path via -o):
{
  "run_id": "2024-01-15T10-30-00Z",
  "total": 2,
  "passed": 1,
  "failed": 1,
  "results": [...]
}