Low Precision¶
Papers on quantization and low-precision training techniques.
Overview¶
This section contains 5 papers covering:
- SageAttention3 - FP4 attention for inference and 8-bit training
- QERL - Quantization-enhanced reinforcement learning
- NVFP4 - Pretraining LLMs in 4-bit floating point
- BitNet - 1-bit weight LLMs with competitive accuracy
- FP8 Training Framework - Full FP8 mixed-precision beyond matrix ops