EvalContext that accumulates evaluation data and builds into an EvalResult.
Pre-populated Fields
Set context fields directly in the@eval decorator:
@parametrize, special parameter names (input, reference, metadata, trace_data, latency) auto-populate context fields:
Setting Fields
Input, Output, and Reference
Metadata
Store additional context for debugging and analysis:Scoring with Assertions
Use assertions to score:Using store() for Explicit Scoring
For numeric scores or multiple named metrics, usestore():
Setting Multiple Fields at Once
store() lets you set multiple context fields in one call:
Auto-Return Behavior
You don’t need to explicitly return anything—the context automatically builds into anEvalResult when the function completes:
Exception Safety
If your evaluation throws an exception, partial data is preserved:EvalResult will have:
inputandoutputpreservederrorfield with the exception message- A failing score automatically added
Default Scoring
If no score is added and no assertions fail, EZVals auto-adds a passing score:Custom Parameters
For parametrized tests with custom parameter names, include them in your function signature:Run Metadata (for Observability)
When your eval function runs, the context includes metadata about the current run and eval. This is useful for tagging traces in LangSmith or other observability tools.Run-Level Metadata
| Property | Type | Description |
|---|---|---|
run_id | str | None | Unique run identifier (timestamp) |
session_name | str | None | Session name for the run |
run_name | str | None | Human-readable run name |
eval_path | str | None | Path to eval file(s) being run |
Per-Eval Metadata
| Property | Type | Description |
|---|---|---|
function_name | str | None | Name of the eval function |
dataset | str | None | Dataset from @eval decorator |
labels | list[str] | None | Labels from @eval decorator |
API Reference
| Method | Description |
|---|---|
store(input, output, reference, latency, scores, messages, trace_url, metadata, trace_data) | Set any context fields at once |
build() | Convert to immutable EvalResult |
store() Parameters
| Parameter | Type | Description |
|---|---|---|
input | Any | The test input |
output | Any | The system output |
reference | Any | Expected output |
latency | float | Execution time in seconds |
scores | bool/float/dict/list | Score(s) to add |
messages | list | Conversation messages (sets trace_data.messages) |
trace_url | str | External trace link (sets trace_data.trace_url) |
metadata | dict | Merges into ctx.metadata |
trace_data | dict | Merges into ctx.trace_data |
| Property | Type | Description |
|---|---|---|
input | Any | The test input |
output | Any | The system output |
reference | Any | Expected output (optional) |
metadata | dict | Custom metadata |
trace_data | TraceData | Debug/trace data |
latency | float | Execution time |
scores | list | List of Score objects |

