deepspeed
85
Expert guidance for distributed training with DeepSpeed - ZeRO optimization stages, pipeline parallelism, FP16/BF16/FP8, 1-bit Adam, sparse attention
Provides expert guidance for distributed training with DeepSpeed, optimizing large-scale deep learning models efficiently.
Install this skill
or
deepspeed8 files
Comments
Sign in to leave a comment.
No comments yet. Be the first to comment!
Install this skill with one command
/learn @davila7/distributed-training-deepspeedGitHub Stars 22.3K
Rate this skill
Categorydevelopment
UpdatedMarch 16, 2026
davila7/claude-code-templates