advanced-evaluation
This skill should be used for advanced LLM evaluation: LLM-as-judge systems, direct scoring, pairwise comparison, rubric calibration, evaluator bias mitigation, confidence scoring, and automated quality assessment.
Install this skill
or
96/100
Security score
The advanced-evaluation skill was audited on Jun 24, 2026 and we found 4 security issues across 1 threat category. Review the findings below before installing.
Categories Tested
Security Issues
low line 392
External URL reference
SourceSKILL.md
| 392 | - [Eugene Yan: Evaluating the Effectiveness of LLM-Evaluators](https://eugeneyan.com/writing/llm-evaluators/) - Read when: surveying the state of the art in LLM evaluation |
low line 393
External URL reference
SourceSKILL.md
| 393 | - [Judging LLM-as-a-Judge (Zheng et al., 2023)](https://arxiv.org/abs/2306.05685) - Read when: understanding position bias and MT-Bench methodology |
low line 394
External URL reference
SourceSKILL.md
| 394 | - [G-Eval: NLG Evaluation using GPT-4 (Liu et al., 2023)](https://arxiv.org/abs/2303.16634) - Read when: implementing chain-of-thought evaluation scoring |
low line 395
External URL reference
SourceSKILL.md
| 395 | - [Large Language Models are not Fair Evaluators (Wang et al., 2023)](https://arxiv.org/abs/2305.17926) - Read when: diagnosing systematic bias in evaluation outputs |
Scanned on Jun 24, 2026
View Security Dashboard