Skip to main content

nemo-evaluator-sdk

91

Evaluates LLMs across 100+ benchmarks from 18+ harnesses (MMLU, HumanEval, GSM8K, safety, VLM) with multi-backend execution. Use when needing scalable evaluation on local Docker, Slurm HPC, or cloud platforms. NVIDIA's enterprise-grade platform with container-first architecture for reproducible b...

Evaluates LLMs using 100+ benchmarks with scalable execution on Docker and Slurm HPC for reproducible results.

Install this skill

or
nemo-evaluator-sdk5 files

Comments

Sign in to leave a comment.

No comments yet. Be the first to comment!