The cases= argument on @eval lets you generate multiple evaluations from one function.
Basic Usage
from ezvals import eval, EvalContext
@eval(
dataset="math",
cases=[
{"input": {"a": 2, "b": 3}, "reference": 5},
{"input": {"a": 10, "b": 20}, "reference": 30},
{"input": {"a": 0, "b": 0}, "reference": 0},
],
)
def test_addition(ctx: EvalContext):
result = ctx.input["a"] + ctx.input["b"]
ctx.output = result
assert result == ctx.reference, f"Expected {ctx.reference}, got {result}"
This generates three evaluations with numeric IDs:
test_addition[0]
test_addition[1]
test_addition[2]
Without custom IDs, test variants are numbered sequentially. Provide IDs for readable names.
Case Shape
cases must be a list of dicts. Each dict can override any @eval argument plus an id:
@eval(
dataset="sentiment",
labels=["prod"],
cases=[
{"id": "pos", "input": "I love this!", "reference": "positive"},
{"id": "neg", "input": "Terrible!", "reference": "negative", "labels": ["edge"]},
],
)
def test_sentiment(ctx: EvalContext):
ctx.output = analyze_sentiment(ctx.input)
assert ctx.output == ctx.reference
Per-Case Overrides
Case dicts can override:
input, reference, metadata, dataset, labels, default_score_key
timeout, target, evaluators
id (for naming)
Rules:
- If a key is omitted, the decorator default is used.
- If a key is present with
None, it clears the default.
labels merge with the default (duplicates removed); labels: None clears.
metadata merges (case wins).
Custom IDs
@eval(cases=[
{"id": "low", "input": 0.2},
{"id": "mid", "input": 0.5},
{"id": "high", "input": 0.8},
])
def test_thresholds(ctx: EvalContext):
...
Generates:
test_thresholds[low]
test_thresholds[mid]
test_thresholds[high]
Explicit Grids
You can build explicit grids using list comprehensions:
MODEL_CASES = [
{"input": {"model": m, "temperature": t}}
for m in ["gpt-4", "gpt-3.5"]
for t in [0.0, 0.7, 1.0]
]
@eval(dataset="models", cases=MODEL_CASES)
def test_model_grid(ctx: EvalContext):
ctx.output = run_model(ctx.input["model"], ctx.input["temperature"])
assert ctx.output is not None
Running Specific Variants
# Run all variants
ezvals run evals.py::test_math
# Run specific variant
ezvals run evals.py::test_math[low]