evaluating-code-models
93
Evaluates code generation models across HumanEval, MBPP, MultiPL-E, and 15+ benchmarks with pass@k metrics. Use when benchmarking code models, comparing coding abilities, testing multi-language support, or measuring code generation quality. Industry standard from BigCode Project used by HuggingFa...
Evaluates code generation models using benchmarks like HumanEval and MBPP, providing insights into coding abilities and quality.
Install this skill
or
evaluating-code-models4 files
Comments
Sign in to leave a comment.
No comments yet. Be the first to comment!
Install this skill with one command
/learn @davila7/evaluation-bigcode-evaluation-harnessGitHub Stars 22.3K
Rate this skill
Categorydata analytics
UpdatedMarch 16, 2026
openclawapitestingdata-scientistml-ai-engineerdata-analystbackend-developerqa-engineerdata analyticsdevelopment
davila7/claude-code-templates