evaluating-code-models
Evaluates code generation models across HumanEval, MBPP, MultiPL-E, and 15+ benchmarks with pass@k metrics. Use when benchmarking code models, comparing coding
93/100
Security score
The evaluating-code-models skill was audited on Feb 28, 2026 and we found 3 security issues across 2 threat categories. Review the findings below before installing.
Categories Tested
Security Issues
medium line 230
Template literal with variable interpolation in command context
SourceSKILL.md
| 230 | ```bash |
low line 403
External URL reference
SourceSKILL.md
| 403 | - **BigCode Leaderboard**: https://huggingface.co/spaces/bigcode/bigcode-models-leaderboard |
low line 404
External URL reference
SourceSKILL.md
| 404 | - **HumanEval Dataset**: https://huggingface.co/datasets/openai/openai_humaneval |
Scanned on Feb 28, 2026
View Security Dashboard