Quick answer
2, a model that harmonizes high computational efficiency with superior reasoning and agent performance. 2 are as follows: (1) DeepSeek Sparse Attention (DSA): We introduce DSA, an efficient attention mechanism that substantially reduces computational complexity while preserving model performance in long-context scenarios.