Alerting system for the pytorch org
A Python-embedded DSL that makes it easy to write fast, scalable ML kernels with minimal boilerplate.
Collective communications library with various primitives for multi-machine training.
Fault tolerance for PyTorch (HSDP, LocalSGD, DiLoCo, Streaming DiLoCo)
TORCH_LOGS parser for PT2