NVIDIA BioNeMo Recipes Scale Biology Transformer Models with PyTorch

Published on November 5, 2025 at 12:00 AM
NVIDIA BioNeMo Recipes Scale Biology Transformer Models with PyTorch
NVIDIA announced on November 5, 2025, that its BioNeMo Recipes are now available to simplify and accelerate the training of large-scale AI models for biology. These recipes provide step-by-step guides built on PyTorch and Hugging Face, lowering the barrier to entry for large-scale model training. By integrating accelerated libraries like NVIDIA Transformer Engine (TE), researchers can unlock speed and memory efficiency through techniques like Fully Sharded Data Parallel (FSDP) and Context Parallelism. The BioNeMo Recipes accelerate transformer-style AI models for biology using the Hugging Face ESM-2 protein language model with a native PyTorch training loop. Key features include:
  • Transformer Engine (TE) Integration: Optimizes transformer computations on NVIDIA GPUs.
  • FSDP2 Integration: Enables auto-parallelism.
  • Sequence Packing: Achieves greater performance by removing padding tokens.
The Transformer Engine (TE) optimizes transformer computations on NVIDIA GPUs and can be integrated into existing training pipelines without significant changes. For architectures that deviate from a standard Transformer block, TE can be integrated at the layer level by replacing standard PyTorch modules with their TE counterparts and using FP8 autocasting. Sequence packing, an alternative to standard input data formats, improves efficiency when samples have varying sequence lengths. By using index vectors to denote boundaries between input sequences, padding tokens are removed, reducing memory usage and increasing token throughput. NVIDIA's BioNeMo Recipes also offer THD-aware collators, such as Hugging Face’s DataCollatorWithFlattening. According to Tom Sercu, co-founder and VP of Engineering at EvolutionaryScale, integrating the NVIDIA Transformer Engine was crucial to training ESM3, the largest foundation model trained on biological data, at a 98B parameter scale with high throughput and GPU utilization. TE layers can be embedded directly inside a Hugging Face Transformers PreTrainedModel and are fully compatible with AutoModel.from_pretrained. The NVIDIA BioNeMo Collection on the Hugging Face Hub offers pre-optimized models. To get started, users need PyTorch, NVIDIA CUDA 12.8, and the BioNeMo Framework Recipes from GitHub, along with the NVIDIA BioNeMo Framework documentation.