agent-evaluation

Enables the design and implementation of evaluation systems for AI agents, enhancing their performance through structured benchmarks and grading.

Install this skill

47/100

Security score

The agent-evaluation skill was audited on Mar 7, 2026 and we found 7 security issues across 2 threat categories, including 3 high-severity. Review the findings below before installing.

Categories Tested

Security Issues

high line 454

Direct command execution function call

SourceSKILL.md

454	exec(code) # In sandbox

high line 400

Eval function call - arbitrary code execution

SourceSKILL.md

400	2. Run eval (expect failure)

high line 402

Eval function call - arbitrary code execution

SourceSKILL.md

402	4. Run eval (expect pass)

medium line 67

Python subprocess execution

SourceSKILL.md

67	result = subprocess.run(

low line 429

External URL reference

SourceSKILL.md

429	- [Anthropic: Demystifying evals for AI agents](https://www.anthropic.com/engineering/demystifying-evals-for-ai-agents)

low line 430

External URL reference

SourceSKILL.md

430	- [SWE-bench](https://www.swebench.com/)

low line 431

External URL reference

SourceSKILL.md

431	- [WebArena](https://webarena.dev/)

Scanned on Mar 7, 2026

View Security Dashboard

Install this skill with one command

/learn @supercent-io/agent-evaluation

GitHub Stars 28

Rate this skill

Categorydevelopment

UpdatedMarch 29, 2026

claude chatgpt gemini-cli testing api ml-ai-engineer qa-engineer data-scientist product-manager technical-pm development data analytics product

supercent-io/skills-template