bc
← tags

tag

#mlops

projects

Distributed Data Parallel training implemented from first principles in PyTorch — gradient synchronization, process groups, and multi-GPU scaling without using the DDP wrapper.