Category: Tutorials

  • Exploring torch.func: Functional Transformations in PyTorch

    🧠 Introduction: What Is torch.func? In PyTorch, most models are built using stateful objects like nn.Module. While this is intuitive and powerful, functional programming can unlock more control, composability, and efficiency in advanced machine learning workflows. That’s where torch.func comes in. torch.func is a PyTorch module introduced to bring functional programming paradigms into the PyTorch…

  • 🧠 torch.fft in PyTorch: fft(), rfft(), ifft() with Real Code Examples

    If you’re working with signal processing or deep learning in PyTorch, Fourier Transforms can help analyze frequencies and patterns in data. Fortunately, PyTorch provides a built-in module, torch.fft, that makes it easy to apply 1D, 2D, or N-dimensional FFTs. In this tutorial, we’ll explain how to use torch.fft effectively — including functions like fft(), ifft(),…

  • Mastering torch.distributions: Probabilistic Modeling in PyTorch

    🧠 Introduction: What Is torch.distributions? Probabilistic modeling is at the core of many machine learning and deep learning algorithms—from variational autoencoders (VAEs) to Bayesian inference. PyTorch offers a powerful, flexible module to handle these needs: torch.distributions. torch.distributions is a PyTorch subpackage that provides a rich set of probability distributions and probabilistic tools. It allows you…

  • A Complete Guide to torch.distributed.checkpoint in PyTorch

    🧠 Introduction: What Is torch.distributed.checkpoint? In large-scale distributed training, saving and restoring model state is not as simple as calling torch.save() and torch.load(). When training across many GPUs or nodes, traditional checkpointing becomes slow, memory-intensive, and often fails due to inconsistency or I/O bottlenecks. That’s where torch.distributed.checkpoint comes in. torch.distributed.checkpoint is a PyTorch module designed…

  • Understanding torch.distributed.pipelining

    🧠 Introduction: What Is torch.distributed.pipelining? As deep learning models become increasingly large, training them on a single GPU or even with traditional data parallelism becomes challenging. That’s where pipeline parallelism comes in. torch.distributed.pipelining is a part of PyTorch’s distributed framework designed to enable pipeline parallelism. This technique divides a model into stages, where each stage…

  • Mastering torch.distributed.optim: Distributed Optimizers in PyTorch

    🚀 Introduction: What Is torch.distributed.optim? In distributed deep learning, syncing model weights across devices is crucial for consistent training. That’s where torch.distributed.optim comes in. torch.distributed.optim is a PyTorch module that wraps standard optimizers (like Adam, SGD) to work efficiently in distributed training environments. It ensures gradient updates are synchronized across processes and devices, enabling scalable…

  • torch.distributed.tensor.parallel

    Introduction: What Is torch.distributed.tensor.parallel? torch.distributed.tensor.parallel is a module in PyTorch that provides tools to implement tensor parallelism—a technique used to split large model tensors (e.g., weights) across multiple GPUs. Unlike data parallelism, where each GPU holds a copy of the model and processes different inputs, tensor parallelism divides the model itself across devices to handle…

  • PyTorch Fully Shard Your Models

    What is torch.distributed.fsdp.fully_shard? The fully_shard function is PyTorch’s granular, module-level API for applying Fully Sharded Data Parallelism (FSDP) to specific model components. Unlike wrapping entire models with FSDP, fully_shard enables: Key Differences from FSDP Wrapping Feature FSDP() Wrapper fully_shard Scope Entire model Per-module Flexibility Less More Adoption All-or-nothing Gradual Use Case Standard FSDP Custom parallelism Code Examples 1. Basic fully_shard Application…

  • PyTorch Elastic Distributed Training

    What is torch.distributed.fsdp? torch.distributed.fsdp (Fully Sharded Data Parallel) is PyTorch’s advanced distributed training strategy that optimizes memory usage by sharding model parameters, gradients, and optimizer states across multiple GPUs. Unlike traditional DDP (DistributedDataParallel) that replicates the entire model, FSDP: Key Benefits Code Examples 1. Basic FSDP Setup from torch.distributed.fsdp import FullyShardedDataParallel as FSDP from torch.distributed.fsdp import ShardingStrategy #…

  • PyTorch Elastic Distributed Training

    What is torch.distributed.elastic? torch.distributed.elastic is PyTorch’s framework for fault-tolerant, elastic distributed training that automatically adapts to cluster changes. Unlike static distributed training, elastic training: Key Components Code Examples 1. Basic Elastic Training Setup import torch.distributed.elastic as elastic from torch.nn.parallel import DistributedDataParallel as DDP def train_loop(): # Initialize elastic process group elastic.init_process_group(backend=”nccl”) model = DDP(MyModel().cuda()) optimizer = torch.optim.Adam(model.parameters()) #…