ai-model-benchmarking
Enables rigorous benchmarking of AI models using over 60 academic evaluation suites and metrics for reliable performance assessment.
Install this skill
or
96/100
Security score
The ai-model-benchmarking skill was audited on May 18, 2026 and we found 4 security issues across 1 threat category. Review the findings below before installing.
Categories Tested
Security Issues
low line 206
External URL reference
SourceSKILL.md
| 206 | - [Open LLM Leaderboard](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard) -- Hugging Face community leaderboard |
low line 207
External URL reference
SourceSKILL.md
| 207 | - [MMLU paper](https://arxiv.org/abs/2009.03300) -- Hendrycks et al., 2021 |
low line 208
External URL reference
SourceSKILL.md
| 208 | - [Holistic Evaluation of Language Models (HELM)](https://crfm.stanford.edu/helm/) -- Stanford CRFM |
low line 209
External URL reference
SourceSKILL.md
| 209 | - [Chatbot Arena](https://chat.lmsys.org/) -- Human preference-based evaluation (LMSYS) |
Scanned on May 18, 2026
View Security DashboardGitHub Stars 222
Rate this skill
Categorydevelopment
UpdatedJune 10, 2026
openclawapiml-ai-engineerdata-scientistresearchermarketing-analystproduct-managerdevelopmentdata analyticseducation researchmarketingproduct
wentorai/research-plugins