agent-evaluation
Enables the design and implementation of evaluation systems for AI agents, enhancing their performance through structured benchmarks and grading.
Install this skill
or
47/100
Security score
The agent-evaluation skill was audited on Mar 7, 2026 and we found 7 security issues across 2 threat categories, including 3 high-severity. Review the findings below before installing.
Categories Tested
Security Issues
high line 454
Direct command execution function call
SourceSKILL.md
| 454 | exec(code) # In sandbox |
high line 400
Eval function call - arbitrary code execution
SourceSKILL.md
| 400 | 2. Run eval (expect failure) |
high line 402
Eval function call - arbitrary code execution
SourceSKILL.md
| 402 | 4. Run eval (expect pass) |
medium line 67
Python subprocess execution
SourceSKILL.md
| 67 | result = subprocess.run( |
low line 429
External URL reference
SourceSKILL.md
| 429 | - [Anthropic: Demystifying evals for AI agents](https://www.anthropic.com/engineering/demystifying-evals-for-ai-agents) |
low line 430
External URL reference
SourceSKILL.md
| 430 | - [SWE-bench](https://www.swebench.com/) |
low line 431
External URL reference
SourceSKILL.md
| 431 | - [WebArena](https://webarena.dev/) |
Scanned on Mar 7, 2026
View Security DashboardInstall this skill with one command
/learn @supercent-io/agent-evaluationGitHub Stars 28
Rate this skill
Categorydevelopment
UpdatedMarch 29, 2026
claudechatgptgemini-clitestingapiml-ai-engineerqa-engineerdata-scientistproduct-managertechnical-pmdevelopmentdata analyticsproduct
supercent-io/skills-template