BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models
Paper • Jun 15, 2023 • arxiv.org • Junnan Li, Dongxu Li, Silvio Savarese, Steven Hoi
The cost of vision-and-language pre-training has become increasingly prohibitive due to end-to-end training of large-scale models. This paper proposes BLIP-2, a generic and efficient pre-training s...