sglang

Fast structured generation and serving for LLMs with RadixAttention prefix caching. Use for JSON/regex outputs, constrained decoding, agentic workflows with tool calls, or when you need 5× faster inference than vLLM with prefix sharing. Powers 300,000+ GPUs at xAI, AMD, NVIDIA, and LinkedIn.

Enables fast structured generation and serving for LLMs, optimizing inference with RadixAttention for efficient workflows.

Install this skill

sglang4 files

Comments

No comments yet. Be the first to comment!

Install this skill with one command

/learn @davila7/inference-serving-sglang

GitHub Stars 22.3K

Rate this skill

Categorydevelopment

UpdatedMarch 16, 2026

openclaw api backend ml-ai-engineer data-scientist backend-developer development data analytics

davila7/claude-code-templates

Read full security audit