deepspeed

Expert guidance for distributed training with DeepSpeed - ZeRO optimization stages, pipeline parallelism, FP16/BF16/FP8, 1-bit Adam, sparse attention

Provides expert guidance for distributed training with DeepSpeed, optimizing large-scale deep learning models efficiently.

Install this skill

deepspeed8 files

No comments yet. Be the first to comment!

Install this skill with one command

GitHub Stars 22.3K

Rate this skill

Categorydevelopment

UpdatedMarch 16, 2026