Skip to main content

evaluating-code-models

93

Evaluates code generation models across HumanEval, MBPP, MultiPL-E, and 15+ benchmarks with pass@k metrics. Use when benchmarking code models, comparing coding abilities, testing multi-language support, or measuring code generation quality. Industry standard from BigCode Project used by HuggingFa...

Evaluates code generation models using benchmarks like HumanEval and MBPP, providing insights into coding abilities and quality.

Install this skill

or
evaluating-code-models4 files

Comments

Sign in to leave a comment.

No comments yet. Be the first to comment!