nemo-evaluator-sdk
91
Evaluates LLMs across 100+ benchmarks from 18+ harnesses (MMLU, HumanEval, GSM8K, safety, VLM) with multi-backend execution. Use when needing scalable evaluation on local Docker, Slurm HPC, or cloud platforms. NVIDIA's enterprise-grade platform with container-first architecture for reproducible b...
Evaluates LLMs using 100+ benchmarks with scalable execution on Docker and Slurm HPC for reproducible results.
Install this skill
or
nemo-evaluator-sdk5 files
Comments
Sign in to leave a comment.
No comments yet. Be the first to comment!
Install this skill with one command
/learn @davila7/evaluation-nemo-evaluatorGitHub Stars 22.3K
Rate this skill
Categorydevelopment
UpdatedMarch 16, 2026
davila7/claude-code-templates