ai-model-benchmarking

Enables rigorous benchmarking of AI models using over 60 academic evaluation suites and metrics for reliable performance assessment.

Install this skill

96/100

Security score

The ai-model-benchmarking skill was audited on May 18, 2026 and we found 4 security issues across 1 threat category. Review the findings below before installing.

Categories Tested

Security Issues

low line 206

External URL reference

SourceSKILL.md

206	- [Open LLM Leaderboard](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard) -- Hugging Face community leaderboard

low line 207

External URL reference

SourceSKILL.md

207	- [MMLU paper](https://arxiv.org/abs/2009.03300) -- Hendrycks et al., 2021

low line 208

External URL reference

SourceSKILL.md

208	- [Holistic Evaluation of Language Models (HELM)](https://crfm.stanford.edu/helm/) -- Stanford CRFM

low line 209

External URL reference

SourceSKILL.md

209	- [Chatbot Arena](https://chat.lmsys.org/) -- Human preference-based evaluation (LMSYS)

Scanned on May 18, 2026

View Security Dashboard

Installation guide →

GitHub Stars 231

Rate this skill

Categorydevelopment

UpdatedJune 24, 2026

openclaw api ml-ai-engineer data-scientist researcher marketing-analyst product-manager development data analytics education research marketing product

wentorai/research-plugins