Skip to content

Architecture

Papers on novel architectures and efficiency improvements.

Overview

This section contains 10 papers covering:

  • Ring-linear - Hybrid attention mechanisms for long context
  • Core Attention Disaggregation - Efficient long-context training
  • Looped Language Models - Latent reasoning via recurrence
  • AutoDeco - Learned dynamic decoding parameters
  • Mamba - Selective state space models with dynamic selection
  • LongNet - Dilated attention for billion-token sequences
  • YaRN - RoPE extension for long contexts
  • ReLU Attention - Softmax-free attention for Vision Transformers
  • LongLoRA - Efficient context extension with shifted sparse attention
  • Relax - ML compiler unifying computational graphs