evaluation
This skill should be used when building agent evaluation systems: deterministic checks, regression suites, multi-dimensional rubrics, quality gates, production monitoring, baseline comparison, and outcome measurement for agent pipelines.
Install this skill
or
evaluation3 files
Comments
Sign in to leave a comment.
No comments yet. Be the first to comment!