Skip to content

Low Precision

Papers on quantization and low-precision training techniques.

Overview

This section contains 5 papers covering:

  • SageAttention3 - FP4 attention for inference and 8-bit training
  • QERL - Quantization-enhanced reinforcement learning
  • NVFP4 - Pretraining LLMs in 4-bit floating point
  • BitNet - 1-bit weight LLMs with competitive accuracy
  • FP8 Training Framework - Full FP8 mixed-precision beyond matrix ops