Setup - EZVals

Install the Skill

The fastest way to get started—install the EZVals skill for your coding agent:

npx skills add camronh/evals-skill

This teaches your agent how to write evals, choose grading strategies, and analyze results. See the Agent Skill guide for global install, version management, and other options.

Alternative: install from the ezvals CLI

If you already have EZVals installed, you can install the skill directly:

ezvals skills add --claude

This installs the version-matched skill from your local package. At least one target flag is required.

Install the Library

If you want to install EZVals directly (or your agent hasn’t done it yet):

uv add ezvals --dev

How It Works

Decorate

Use @eval to define test cases with inputs, references, and datasets.

from ezvals import eval, EvalContext

@eval(input="What is 2+2?", reference="4")
async def test_math(ctx: EvalContext):
    ctx.output = await my_llm(ctx.input)
    assert ctx.output == ctx.reference

Execute

Your target function runs and populates an EvalContext with inputs, outputs, and metadata.

Score

Assertions and evaluators produce scores. Passing assertions score as pass, failures capture the assertion message.

Store

Results save as JSON in your repo under .ezvals/sessions/. Every run is versioned and grouped by session.

Analyze

View, compare, annotate, and export from the web dashboard—or let your agent parse the JSON directly.

Run Evals

Web UI

ezvals serve evals.py

Opens a local dashboard at http://127.0.0.1:8000 where you can run, filter, compare, annotate, and export results.

CLI

ezvals run evals.py

Compact output for agents. Results save to .ezvals/ and can be analyzed programmatically. See the CLI Reference for filtering, sessions, concurrency, and other options.

Project Structure

After running evals, your project looks like:

your-project/
├── evals.py                # Your eval functions
├── ezvals.json             # Optional config
└── .ezvals/
    └── sessions/
        └── default/
            └── cool-cloud_a1b2c3d4.json

Configuration

Create ezvals.json in your project root to set defaults:

{
  "concurrency": 4,
  "results_dir": ".ezvals/runs"
}

Next Steps

Decorators

@eval parameters and patterns

Scoring

Assertions, custom scores, and evaluators

Patterns

Common evaluation patterns

RAG Agent Example

Full RAG agent evaluation walkthrough

​Install the Skill

​Install the Library

​How It Works

​Run Evals

​Web UI

​CLI

​Project Structure

​Configuration

​Next Steps