evaluating-llms-harness
Evaluates LLMs using academic benchmarks like MMLU and GSM8K, aiding in model quality assessment and comparison.
Install this skill
or
54/100
Security score
The evaluating-llms-harness skill was audited on May 17, 2026 and we found 6 security issues across 2 threat categories, including 1 critical. Review the findings below before installing.
Categories Tested
Security Issues
critical line 192
Eval function call - arbitrary code execution
SourceSKILL.md
| 192 | Avoid for frequent eval (too slow): |
medium line 206
System command execution
SourceSKILL.md
| 206 | os.system(f"./eval_checkpoint.sh checkpoints step-{step}") |
medium line 223
System command execution
SourceSKILL.md
| 223 | os.system(f"lm_eval --model hf --model_args pretrained={checkpoint_path} ...") |
medium line 206
Python os.system command execution
SourceSKILL.md
| 206 | os.system(f"./eval_checkpoint.sh checkpoints step-{step}") |
medium line 223
Python os.system command execution
SourceSKILL.md
| 223 | os.system(f"lm_eval --model hf --model_args pretrained={checkpoint_path} ...") |
low line 495
External URL reference
SourceSKILL.md
| 495 | - Leaderboard: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard (uses this harness) |
Scanned on May 17, 2026
View Security DashboardGitHub Stars 185.0K
Rate this skill
Categorydata analytics
UpdatedJune 10, 2026
openclawapidata-scientistml-ai-engineerresearchermarketing-analystproduct-managerdata analyticsdevelopmenteducation researchmarketingproduct
NousResearch/hermes-agent