Titans and MIRAS: Rethinking Long Context in AI

What comes after Transformers?

Google Research proposes a new architecture, Titans, and a framework, MIRAS, to enhance the usability of long-term memory in sequence models while maintaining efficient training and linear inference.

Titans Architecture: Deep Neural Memory

Titans integrates a deep neural memory into a Transformer-style backbone. This architecture aims to solve issues related to the limitations of standard Transformers, which rely on key-value caches that scale poorly with increasing context length. Titans retains efficient learning mechanics that competitive algorithms like Mamba-2 and FlashAttention struggle to achieve.

Why Titans and MIRAS?

Standard Transformers experience exponential costs with large context lengths. This becomes increasingly impractical for tasks that require handling extreme lengths, such as genome sequence modeling. Titans combines traditional attention with a dedicated long-term memory module, enhancing performance while keeping computational cost manageable.

Titans: Long-term Memory Learning

The Titans module functions as a deep multi-layer perceptron, where attention is used for short-term memory but a separate neural memory retains information long-term. The architecture utilizes three memory branches:

Core branch: Standard attention learning.
Contextual memory branch: Adapts from recent sequences.
Persistent memory branch: Encodes pre-training knowledge.

By compressing prior tokens into summaries, Titans effectively integrates this extra context into the attention mechanism, enabling the model to access significant historical data when processing current inputs.

Experimental Effectiveness

In benchmarks like C4 and HellaSwag, Titans achieves superior performance compared to state-of-the-art models, handling extreme long contexts up to 2,000,000 tokens efficiently without losing training agility. The model’s design allows it to maintain high accuracy while scaling down on parameters compared to larger models like GPT-4.

MIRAS Framework: Associative Memory

MIRAS generalizes the associative memory structure, allowing sequence models to map keys to values and manage learning and forgetting effectively. By defining sequence models through four pivotal design choices—memory structure, attentional bias, retention gate, and optimization rule—MIRAS encourages the development of new attention-free architectures.

Key Takeaways

Titans provides a scalable deep neural memory architecture allowing selective storage of tokens through an associative memory loss.
IRAS establishes a unified approach to sequence modeling by defining architectures in terms of memory management principles.
Titans show a drastic improvement in language modeling and reasoning tasks, making it a game-changer in handling extensive contexts efficiently.