Agent Skill
EZVals includes a skill that teaches AI coding agents how to write and analyze evaluations.What’s a Skill?
Skills are markdown files that provide context and instructions to AI coding agents like Claude Code, Cursor, and others. When you ask your agent to help with evals, it can reference the skill to understand:- EZVals API and patterns
- Best practices for eval design
- Grading strategies (code vs model vs human)
- Patterns for different agent types
Installation
From Package (Version-Matched)
Install the skill that matches your installed EZVals version:Global Installation
Install globally for all projects:Target Flags
ezvals skills add requires at least one explicit target flag.
--agentsinstalls canonical source to.agents/skills/evals/--claude,--codex,--cursor,--windsurf,--kiro,--rooinstall/link those agent directories- You can combine flags, like
ezvals skills add --claude --codex - If
--agentsis included with other targets,.agentsis canonical and selected agent targets link to it
From Marketplace (Latest)
Install the latest version from the skills marketplace:Usage
After installation, invoke the skill in your AI coding agent:What’s Included
The skill provides comprehensive guidance across multiple reference files:| File | Purpose |
|---|---|
| SKILL.md | Overview and navigation hub |
| EZVALS_REFERENCE.md | Complete API reference for @eval, EvalContext, @parametrize |
| BEST_PRACTICES.md | Eval design principles from Anthropic’s research |
| GRADERS.md | Choosing between code, model, and human graders |
| AGENT_EVALS.md | Patterns for coding, conversational, research agents |
| ROADMAP.md | Zero-to-one guide for building evals |
Example Prompts
Try these prompts with your AI coding agent:Getting Started
- “Help me write my first eval for my customer support agent”
- “What’s the best way to evaluate my RAG pipeline?”
- “Set up an eval suite for my coding assistant”
Writing Evals
- “Create an eval that tests my agent’s ability to handle refund requests”
- “Write a parametrized eval for testing sentiment analysis across edge cases”
- “How do I test multi-turn conversations?”
Improving Evals
- “My eval is flaky - help me make it more deterministic”
- “Should I use code-based or model-based grading for this eval?”
- “Review my evals and suggest improvements”
Analysis
- “Analyze my eval results and suggest improvements”
- “Help me understand why my agent is failing this eval”
- “What patterns do you see in these failures?”
Managing the Skill
Check Installation
Remove Skill
Reinstall
How It Works
The skill is installed only to the selected target directories:--agents is selected, .agents/skills/evals/ is the canonical source and other selected targets symlink to it.
Keeping Updated
When you upgrade EZVals, runezvals skills add --claude again (or your chosen target flags) to get the latest skill content:

