ai-llm-inference

Provides operational patterns for optimizing LLM inference performance, cost, and reliability in production environments.

Install this skill

95/100

Security score

The ai-llm-inference skill was audited on Mar 8, 2026 and we found 5 security issues across 1 threat category. Review the findings below before installing.

Categories Tested

Security Issues

low line 11

External URL reference

SourceSKILL.md

11	- Use continuous batching / smart scheduling when serving many concurrent requests (Orca scheduling: https://www.usenix.org/conference/osdi22/presentation/yu).

low line 12

External URL reference

SourceSKILL.md

12	- Use KV-cache aware serving (PagedAttention/vLLM: https://arxiv.org/abs/2309.06180) and efficient attention kernels (FlashAttention: https://arxiv.org/abs/2205.14135).

low line 13

External URL reference

SourceSKILL.md

13	- Use speculative decoding when latency is critical and draft-model quality is acceptable (speculative decoding: https://arxiv.org/abs/2302.01318).

low line 123

External URL reference

SourceSKILL.md

123	- Security & privacy: prompts/outputs can contain sensitive data; scrub logs, enforce auth/tenancy, and rate-limit abuse (OWASP LLM Top 10: https://owasp.org/www-project-top-10-for-large-language-

low line 129

External URL reference

SourceSKILL.md

129	- Export telemetry: request-level tokens, TTFT/ITL, queue depth, GPU memory headroom, and error classes (OpenTelemetry GenAI semantic conventions: https://opentelemetry.io/docs/specs/semconv/gen-a

Scanned on Mar 8, 2026

View Security Dashboard

Installation guide →

GitHub Stars 3

Rate this skill

Categorydevelopment

UpdatedApril 10, 2026

openclaw api backend ml-ai-engineer data-engineer product-manager development product

diegosouzapw/awesome-omni-skill