Skip to main content

ai-model-benchmarking

Enables rigorous benchmarking of AI models using over 60 academic evaluation suites and metrics for reliable performance assessment.

Install this skill

or
96/100

Security score

The ai-model-benchmarking skill was audited on May 18, 2026 and we found 4 security issues across 1 threat category. Review the findings below before installing.

Categories Tested

Security Issues

low line 206

External URL reference

SourceSKILL.md
206- [Open LLM Leaderboard](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard) -- Hugging Face community leaderboard
low line 207

External URL reference

SourceSKILL.md
207- [MMLU paper](https://arxiv.org/abs/2009.03300) -- Hendrycks et al., 2021
low line 208

External URL reference

SourceSKILL.md
208- [Holistic Evaluation of Language Models (HELM)](https://crfm.stanford.edu/helm/) -- Stanford CRFM
low line 209

External URL reference

SourceSKILL.md
209- [Chatbot Arena](https://chat.lmsys.org/) -- Human preference-based evaluation (LMSYS)
Scanned on May 18, 2026
View Security Dashboard
Installation guide →