Skip to main content

evaluating-code-models

Evaluates code generation models across HumanEval, MBPP, MultiPL-E, and 15+ benchmarks with pass@k metrics. Use when benchmarking code models, comparing coding

93/100

Security score

The evaluating-code-models skill was audited on Feb 28, 2026 and we found 3 security issues across 2 threat categories. Review the findings below before installing.

Categories Tested

Security Issues

medium line 230

Template literal with variable interpolation in command context

SourceSKILL.md
230```bash
low line 403

External URL reference

SourceSKILL.md
403- **BigCode Leaderboard**: https://huggingface.co/spaces/bigcode/bigcode-models-leaderboard
low line 404

External URL reference

SourceSKILL.md
404- **HumanEval Dataset**: https://huggingface.co/datasets/openai/openai_humaneval
Scanned on Feb 28, 2026
View Security Dashboard