Skip to main content

deepspeed

85

Expert guidance for distributed training with DeepSpeed - ZeRO optimization stages, pipeline parallelism, FP16/BF16/FP8, 1-bit Adam, sparse attention

Provides expert guidance for distributed training with DeepSpeed, optimizing large-scale deep learning models efficiently.

Install this skill

or
deepspeed8 files

Comments

Sign in to leave a comment.

No comments yet. Be the first to comment!