tensorrt-llm

Optimizes LLM inference with NVIDIA TensorRT for maximum throughput and lowest latency. Use for production deployment on NVIDIA GPUs (A100/H100), when you need 10-100x faster inference than PyTorch, or for serving models with quantization (FP8/INT4), in-flight batching, and multi-GPU scaling.

Optimizes LLM inference using NVIDIA TensorRT for high throughput and low latency on NVIDIA GPUs, enhancing production deployment efficiency.

Install this skill

tensorrt-llm4 files

Comments

No comments yet. Be the first to comment!

Install this skill with one command

/learn @davila7/inference-serving-tensorrt-llm

GitHub Stars 22.3K

Rate this skill

Categorydevelopment

UpdatedMarch 16, 2026

openclaw backend api ml-ai-engineer data-engineer backend-developer docker development

davila7/claude-code-templates

Read full security audit