sglang
89
Fast structured generation and serving for LLMs with RadixAttention prefix caching. Use for JSON/regex outputs, constrained decoding, agentic workflows with tool calls, or when you need 5× faster inference than vLLM with prefix sharing. Powers 300,000+ GPUs at xAI, AMD, NVIDIA, and LinkedIn.
Enables fast structured generation and serving for LLMs, optimizing inference with RadixAttention for efficient workflows.
Install this skill
or
sglang4 files
Comments
Sign in to leave a comment.
No comments yet. Be the first to comment!
Install this skill with one command
/learn @davila7/inference-serving-sglangGitHub Stars 22.3K
Rate this skill
Categorydevelopment
UpdatedMarch 16, 2026
davila7/claude-code-templates