advanced-evaluation

This skill should be used for advanced LLM evaluation: LLM-as-judge systems, direct scoring, pairwise comparison, rubric calibration, evaluator bias mitigation, confidence scoring, and automated quality assessment.

Install this skill

96/100

Security score

The advanced-evaluation skill was audited on Jun 24, 2026 and we found 4 security issues across 1 threat category. Review the findings below before installing.

Categories Tested

Security Issues

low line 392

External URL reference

SourceSKILL.md

392	- [Eugene Yan: Evaluating the Effectiveness of LLM-Evaluators](https://eugeneyan.com/writing/llm-evaluators/) - Read when: surveying the state of the art in LLM evaluation

low line 393

External URL reference

SourceSKILL.md

393	- [Judging LLM-as-a-Judge (Zheng et al., 2023)](https://arxiv.org/abs/2306.05685) - Read when: understanding position bias and MT-Bench methodology

low line 394

External URL reference

SourceSKILL.md

394	- [G-Eval: NLG Evaluation using GPT-4 (Liu et al., 2023)](https://arxiv.org/abs/2303.16634) - Read when: implementing chain-of-thought evaluation scoring

low line 395

External URL reference

SourceSKILL.md

395	- [Large Language Models are not Fair Evaluators (Wang et al., 2023)](https://arxiv.org/abs/2305.17926) - Read when: diagnosing systematic bias in evaluation outputs

Scanned on Jun 24, 2026

View Security Dashboard

Installation guide →

GitHub Stars 16.7K

Rate this skill

Categorydevops

UpdatedJune 24, 2026

frontend design excel docx api testing devops

muratcankoylan/Agent-Skills-for-Context-Engineering