Skip to content

Pretraining

Papers on pretraining methods, data quality, and scaling laws.

Overview

This section contains 3 papers covering:

  • PaLM 2 - Compute-optimal scaling with multilingual data
  • phi-1.5 - Textbook-quality synthetic data for small models
  • In-Context Pretraining - Document ordering for better learning