Skip to main content

ai-llm-inference

Provides operational patterns for optimizing LLM inference performance, cost, and reliability in production environments.

Install this skill

or
95/100

Security score

The ai-llm-inference skill was audited on Mar 8, 2026 and we found 5 security issues across 1 threat category. Review the findings below before installing.

Categories Tested

Security Issues

low line 11

External URL reference

SourceSKILL.md
11- Use **continuous batching / smart scheduling** when serving many concurrent requests (Orca scheduling: https://www.usenix.org/conference/osdi22/presentation/yu).
low line 12

External URL reference

SourceSKILL.md
12- Use **KV-cache aware serving** (PagedAttention/vLLM: https://arxiv.org/abs/2309.06180) and **efficient attention kernels** (FlashAttention: https://arxiv.org/abs/2205.14135).
low line 13

External URL reference

SourceSKILL.md
13- Use **speculative decoding** when latency is critical and draft-model quality is acceptable (speculative decoding: https://arxiv.org/abs/2302.01318).
low line 123

External URL reference

SourceSKILL.md
123- **Security & privacy**: prompts/outputs can contain sensitive data; scrub logs, enforce auth/tenancy, and rate-limit abuse (OWASP LLM Top 10: https://owasp.org/www-project-top-10-for-large-language-
low line 129

External URL reference

SourceSKILL.md
129- **Export telemetry**: request-level tokens, TTFT/ITL, queue depth, GPU memory headroom, and error classes (OpenTelemetry GenAI semantic conventions: https://opentelemetry.io/docs/specs/semconv/gen-a
Scanned on Mar 8, 2026
View Security Dashboard
Installation guide →