Skip to main content

advanced-evaluation

This skill should be used for advanced LLM evaluation: LLM-as-judge systems, direct scoring, pairwise comparison, rubric calibration, evaluator bias mitigation, confidence scoring, and automated quality assessment.

Install this skill

or
96/100

Security score

The advanced-evaluation skill was audited on Jun 24, 2026 and we found 4 security issues across 1 threat category. Review the findings below before installing.

Categories Tested

Security Issues

low line 392

External URL reference

SourceSKILL.md
392- [Eugene Yan: Evaluating the Effectiveness of LLM-Evaluators](https://eugeneyan.com/writing/llm-evaluators/) - Read when: surveying the state of the art in LLM evaluation
low line 393

External URL reference

SourceSKILL.md
393- [Judging LLM-as-a-Judge (Zheng et al., 2023)](https://arxiv.org/abs/2306.05685) - Read when: understanding position bias and MT-Bench methodology
low line 394

External URL reference

SourceSKILL.md
394- [G-Eval: NLG Evaluation using GPT-4 (Liu et al., 2023)](https://arxiv.org/abs/2303.16634) - Read when: implementing chain-of-thought evaluation scoring
low line 395

External URL reference

SourceSKILL.md
395- [Large Language Models are not Fair Evaluators (Wang et al., 2023)](https://arxiv.org/abs/2305.17926) - Read when: diagnosing systematic bias in evaluation outputs
Scanned on Jun 24, 2026
View Security Dashboard
Installation guide →
GitHub Stars 16.7K
Rate this skill
Categorydevops
UpdatedJune 24, 2026
muratcankoylan/Agent-Skills-for-Context-Engineering