Skip to content

Paper Summaries

Architecture

bzhng-development/summary-of-some-paper-in-cuda

Architecture¶

Papers on novel architectures and efficiency improvements.

Overview¶

This section contains 10 papers covering:

Ring-linear - Hybrid attention mechanisms for long context
Core Attention Disaggregation - Efficient long-context training
Looped Language Models - Latent reasoning via recurrence
AutoDeco - Learned dynamic decoding parameters
Mamba - Selective state space models with dynamic selection
LongNet - Dilated attention for billion-token sequences
YaRN - RoPE extension for long contexts
ReLU Attention - Softmax-free attention for Vision Transformers
LongLoRA - Efficient context extension with shifted sparse attention
Relax - ML compiler unifying computational graphs