Quick answer
AI Summary: This MIT-led research solves the 'memory wall' problem for LLMs by drastically shrinking the amount of data the model needs to store in its active memory. Attention Matching identifies the most important parts of a conversation or document and 'compacts' the rest without losing the model's ability to recall key details.