Skip to main content

Install the Skill

The fastest way to get started—install the EZVals skill for your coding agent:
npx skills add camronh/evals-skill
This teaches your agent how to write evals, choose grading strategies, and analyze results. See the Agent Skill guide for global install, version management, and other options.
If you already have EZVals installed, you can install the skill directly:
ezvals skills add --claude
This installs the version-matched skill from your local package. At least one target flag is required.

Install the Library

If you want to install EZVals directly (or your agent hasn’t done it yet):
uv add ezvals --dev

How It Works

1

Decorate

Use @eval to define test cases with inputs, references, and datasets.
from ezvals import eval, EvalContext

@eval(input="What is 2+2?", reference="4")
async def test_math(ctx: EvalContext):
    ctx.output = await my_llm(ctx.input)
    assert ctx.output == ctx.reference
2

Execute

Your target function runs and populates an EvalContext with inputs, outputs, and metadata.
3

Score

Assertions and evaluators produce scores. Passing assertions score as pass, failures capture the assertion message.
4

Store

Results save as JSON in your repo under .ezvals/sessions/. Every run is versioned and grouped by session.
5

Analyze

View, compare, annotate, and export from the web dashboard—or let your agent parse the JSON directly.

Run Evals

Web UI

ezvals serve evals.py
Opens a local dashboard at http://127.0.0.1:8000 where you can run, filter, compare, annotate, and export results.

CLI

ezvals run evals.py
Compact output for agents. Results save to .ezvals/ and can be analyzed programmatically. See the CLI Reference for filtering, sessions, concurrency, and other options.

Project Structure

After running evals, your project looks like:
your-project/
├── evals.py                # Your eval functions
├── ezvals.json             # Optional config
└── .ezvals/
    └── sessions/
        └── default/
            └── cool-cloud_a1b2c3d4.json

Configuration

Create ezvals.json in your project root to set defaults:
{
  "concurrency": 4,
  "results_dir": ".ezvals/runs"
}

Next Steps