nemo-evaluator-sdk

Evaluates LLMs across 100+ benchmarks from 18+ harnesses (MMLU, HumanEval, GSM8K, safety, VLM) with multi-backend execution. Use when needing scalable evaluation on local Docker, Slurm HPC, or cloud platforms. NVIDIA's enterprise-grade platform with container-first architecture for reproducible b...

Evaluates LLMs using 100+ benchmarks with scalable execution on Docker and Slurm HPC for reproducible results.

Install this skill

nemo-evaluator-sdk5 files

Comments

No comments yet. Be the first to comment!

Install this skill with one command

/learn @davila7/evaluation-nemo-evaluator

GitHub Stars 22.3K

Rate this skill

Categorydevelopment

UpdatedMarch 16, 2026

openclaw api testing ml-ai-engineer data-scientist data-analyst docker development data analytics

davila7/claude-code-templates

Read full security audit