Skip to content

Alignment

Papers on alignment techniques and fine-tuning methods for large language models.

Overview

This section contains 6 papers covering:

  • LIMA - Minimal alignment with 1000 examples
  • Large Reasoning Models - Learning alignment from flawed thinking
  • RECAP - Mitigating capabilities forgetting during RL
  • PPO-max - Stabilizing RLHF with careful PPO engineering
  • Reducing Sycophancy - Synthetic data to reduce agreement bias
  • TÜLU 2 - Advanced instruction-tuning dataset mixture