Quick answer
5 focuses on the 'industrialization' of agent training through asynchronous reinforcement learning. It solves the efficiency problem where GPUs sit idle during long agent actions by separating the 'experience generation' from the 'model training' phase.