ai-llm-inference
Provides operational patterns for optimizing LLM inference performance, cost, and reliability in production environments.
Install this skill
or
95/100
Security score
The ai-llm-inference skill was audited on Mar 8, 2026 and we found 5 security issues across 1 threat category. Review the findings below before installing.
Categories Tested
Security Issues
low line 11
External URL reference
SourceSKILL.md
| 11 | - Use **continuous batching / smart scheduling** when serving many concurrent requests (Orca scheduling: https://www.usenix.org/conference/osdi22/presentation/yu). |
low line 12
External URL reference
SourceSKILL.md
| 12 | - Use **KV-cache aware serving** (PagedAttention/vLLM: https://arxiv.org/abs/2309.06180) and **efficient attention kernels** (FlashAttention: https://arxiv.org/abs/2205.14135). |
low line 13
External URL reference
SourceSKILL.md
| 13 | - Use **speculative decoding** when latency is critical and draft-model quality is acceptable (speculative decoding: https://arxiv.org/abs/2302.01318). |
low line 123
External URL reference
SourceSKILL.md
| 123 | - **Security & privacy**: prompts/outputs can contain sensitive data; scrub logs, enforce auth/tenancy, and rate-limit abuse (OWASP LLM Top 10: https://owasp.org/www-project-top-10-for-large-language- |
low line 129
External URL reference
SourceSKILL.md
| 129 | - **Export telemetry**: request-level tokens, TTFT/ITL, queue depth, GPU memory headroom, and error classes (OpenTelemetry GenAI semantic conventions: https://opentelemetry.io/docs/specs/semconv/gen-a |
Scanned on Mar 8, 2026
View Security DashboardGitHub Stars 3
Rate this skill
Categorydevelopment
UpdatedApril 10, 2026
diegosouzapw/awesome-omni-skill