Topic: World Model

Track this topic after sign-in.

Short answer

This page shows the most relevant public items for World Model, ranked by trend activity and review signal. Use weekly for fast changes, monthly for more stable patterns, and all-time for evergreen picks.

WeeklyMonthlyAll time

← Back to home

  1. An X-ray Significantly Variable, Luminous, Type 2 Quasar at z = 2.99 with a Massive Host Galaxy

    PaperSep 3, 2024arxiv.orgXiurui Zhao, Stefano Marchesi, Marco Ajello, Francesca Civano, Roberto Gilli, Giorgio Lanzuisi, Iván E. López, Ross Silver, Nuria Torres-Albà, Peter G. Boorman, Andrealuna Pizzetti

    We present a comprehensive X-ray analysis and spectral energy distribution (SED) fitting of WISEA J171419.96+602724.6, an extremely luminous type 2 quasar at $z$ = 2.99. The source was suggested as...

  2. ByteCheckpoint: A Unified Checkpointing System for Large Foundation Model Development

    PaperApr 2, 2025arxiv.orgBorui Wan, Mingji Han, Yiyao Sheng, Yanghua Peng, Haibin Lin, Mofan Zhang, Zhichao Lai, Menghan Yu, Junda Zhang, Zuquan Song, Xin Liu, Chuan Wu

    Checkpointing to preserve training states is crucial during the development of Large Foundation Models (LFMs), for training resumption upon various failures or changes in GPU resources and parallel...

  3. IDA-VLM: Towards Movie Understanding via ID-Aware Large Vision-Language Model

    PaperJul 10, 2024arxiv.orgYatai Ji, Shilong Zhang, Jie Wu, Peize Sun, Weifeng Chen, Xuefeng Xiao, Sidi Yang, Yujiu Yang, Ping Luo

    The rapid advancement of Large Vision-Language models (LVLMs) has demonstrated a spectrum of emergent capabilities. Nevertheless, current models only focus on the visual content of a single scenari...

  4. Seed-ASR: Understanding Diverse Speech and Contexts with LLM-based Speech Recognition

    PaperJul 10, 2024arxiv.orgYe Bai, Jingping Chen, Jitong Chen, Wei Chen, Zhuo Chen, Chuang Ding, Linhao Dong, Qianqian Dong, Yujiao Du, Kepan Gao, Lu Gao, Yi Guo, Minglun Han, Ting Han, Wenchao Hu, Xinying Hu, Yuxiang Hu, Deyu Hua, Lu Huang, Mingkun Huang, Youjia Huang, Jishuo Jin, Fanliu Kong, Zongwei Lan, Tianyu Li, Xiaoyang Li, Zeyang Li, Zehua Lin, Rui Liu, Shouda Liu, Lu Lu, Yizhou Lu, Jingting Ma, Shengtao Ma, Yulin Pei, Chen Shen, Tian Tan, Xiaogang Tian, Ming Tu, Bo Wang, Hao Wang, Yuping Wang, Yuxuan Wang, Hanzhang Xia, Rui Xia, Shuangyi Xie, Hongmin Xu, Meng Yang, Bihong Zhang, Jun Zhang, Wanyi Zhang, Yang Zhang, Yawei Zhang, Yijie Zheng, Ming Zou

    Modern automatic speech recognition (ASR) model is required to accurately transcribe diverse speech signals (from different domains, languages, accents, etc) given the specific contextual informati...

  5. Let the Code LLM Edit Itself When You Edit the Code

    PaperMar 4, 2025arxiv.orgZhenyu He, Jun Zhang, Shengjie Luo, Jingjing Xu, Zhi Zhang, Di He

    In this work, we investigate a typical scenario in code generation where a developer edits existing code in real time and requests a code assistant, e.g., a large language model, to re-predict the ...

  6. SD-Eval: A Benchmark Dataset for Spoken Dialogue Understanding Beyond Words

    PaperJan 16, 2025arxiv.orgJunyi Ao, Yuancheng Wang, Xiaohai Tian, Dekun Chen, Jun Zhang, Lu Lu, Yuxuan Wang, Haizhou Li, Zhizheng Wu

    Speech encompasses a wealth of information, including but not limited to content, paralinguistic, and environmental information. This comprehensive nature of speech significantly impacts communicat...

  7. Depth Anything V2

    PaperOct 20, 2024arxiv.orgLihe Yang, Bingyi Kang, Zilong Huang, Zhen Zhao, Xiaogang Xu, Jiashi Feng, Hengshuang Zhao

    This work presents Depth Anything V2. Without pursuing fancy techniques, we aim to reveal crucial findings to pave the way towards building a powerful monocular depth estimation model. Notably, com...

  8. Autoregressive Pretraining with Mamba in Vision

    PaperJun 11, 2024arxiv.orgSucheng Ren, Xianhang Li, Haoqin Tu, Feng Wang, Fangxun Shu, Lei Zhang, Jieru Mei, Linjie Yang, Peng Wang, Heng Wang, Alan Yuille, Cihang Xie

    The vision community has started to build with the recently developed state space model, Mamba, as the new backbone for a range of tasks. This paper shows that Mamba's visual capability can be sign...

  9. Towards Semantic Equivalence of Tokenization in Multimodal LLM

    PaperFeb 26, 2025arxiv.orgShengqiong Wu, Hao Fei, Xiangtai Li, Jiayi Ji, Hanwang Zhang, Tat-Seng Chua, Shuicheng Yan

    Multimodal Large Language Models (MLLMs) have demonstrated exceptional capabilities in processing vision-language tasks. One of the crux of MLLMs lies in vision tokenization, which involves efficie...

  10. Seed-TTS: A Family of High-Quality Versatile Speech Generation Models

    PaperJun 4, 2024arxiv.orgPhilip Anastassiou, Jiawei Chen, Jitong Chen, Yuanzhe Chen, Zhuo Chen, Ziyi Chen, Jian Cong, Lelai Deng, Chuang Ding, Lu Gao, Mingqing Gong, Peisong Huang, Qingqing Huang, Zhiying Huang, Yuanyuan Huo, Dongya Jia, Chumin Li, Feiya Li, Hui Li, Jiaxin Li, Xiaoyang Li, Xingxing Li, Lin Liu, Shouda Liu, Sichao Liu, Xudong Liu, Yuchen Liu, Zhengxi Liu, Lu Lu, Junjie Pan, Xin Wang, Yuping Wang, Yuxuan Wang, Zhen Wei, Jian Wu, Chao Yao, Yifeng Yang, Yuanhao Yi, Junteng Zhang, Qidi Zhang, Shuo Zhang, Wenjie Zhang, Yang Zhang, Zilin Zhao, Dejian Zhong, Xiaobin Zhuang

    We introduce Seed-TTS, a family of large-scale autoregressive text-to-speech (TTS) models capable of generating speech that is virtually indistinguishable from human speech. Seed-TTS serves as a fo...

  11. Seeing the Image: Prioritizing Visual Correlation by Contrastive Alignment

    PaperNov 5, 2024arxiv.orgXin Xiao, Bohong Wu, Jiacong Wang, Chunyuan Li, Xun Zhou, Haoyuan Guo

    Existing image-text modality alignment in Vision Language Models (VLMs) treats each text token equally in an autoregressive manner. Despite being simple and effective, this method results in sub-op...

  12. 3DitScene: Editing Any Scene via Language-guided Disentangled Gaussian Splatting

    PaperMay 28, 2024arxiv.orgQihang Zhang, Yinghao Xu, Chaoyang Wang, Hsin-Ying Lee, Gordon Wetzstein, Bolei Zhou, Ceyuan Yang

    Scene image editing is crucial for entertainment, photography, and advertising design. Existing methods solely focus on either 2D individual object or 3D global scene editing. This results in a lac...

  13. ClassDiffusion: More Aligned Personalization Tuning with Explicit Class Guidance

    PaperMar 14, 2025arxiv.orgJiannan Huang, Jun Hao Liew, Hanshu Yan, Yuyang Yin, Yao Zhao, Humphrey Shi, Yunchao Wei

    Recent text-to-image customization works have proven successful in generating images of given concepts by fine-tuning diffusion models on a few examples. However, tuning-based methods inherently te...

  14. Unveiling the Tapestry of Consistency in Large Vision-Language Models

    PaperOct 6, 2024arxiv.orgYuan Zhang, Fei Xiao, Tao Huang, Chun-Kai Fan, Hongyuan Dong, Jiawen Li, Jiacong Wang, Kuan Cheng, Shanghang Zhang, Haoyuan Guo

    Large vision-language models (LVLMs) have recently achieved rapid progress, exhibiting great perception and reasoning abilities concerning visual information. However, when faced with prompts in di...

  15. PeRFlow: Piecewise Rectified Flow as Universal Plug-and-Play Accelerator

    PaperSep 2, 2024arxiv.orgHanshu Yan, Xingchao Liu, Jiachun Pan, Jun Hao Liew, Qiang Liu, Jiashi Feng

    We present Piecewise Rectified Flow (PeRFlow), a flow-based method for accelerating diffusion models. PeRFlow divides the sampling process of generative flows into several time windows and straight...

  16. Hyper-SD: Trajectory Segmented Consistency Model for Efficient Image Synthesis

    PaperNov 4, 2024arxiv.orgYuxi Ren, Xin Xia, Yanzuo Lu, Jiacheng Zhang, Jie Wu, Pan Xie, Xing Wang, Xuefeng Xiao

    Recently, a series of diffusion-aware distillation algorithms have emerged to alleviate the computational overhead associated with the multi-step inference process of Diffusion Models (DMs). Curren...

  17. HQ-Edit: A High-Quality Dataset for Instruction-based Image Editing

    PaperApr 15, 2024arxiv.orgMude Hui, Siwei Yang, Bingchen Zhao, Yichun Shi, Heng Wang, Peng Wang, Yuyin Zhou, Cihang Xie

    This study introduces HQ-Edit, a high-quality instruction-based image editing dataset with around 200,000 edits. Unlike prior approaches relying on attribute guidance or human feedback on building ...

← PreviousPage 5Next →

Top Entities In This Topic

Related Topics

FAQ

What does this World Model page rank?

It ranks public content for World Model using recent discussion, review, and engagement signals so you can triage faster. This guidance is specific to World Model topic page on Attendemia and is written so it still makes sense without reading other sections on the page.

How should I use weekly vs monthly vs all-time?

Use weekly for fast-moving updates, monthly for stable trend confirmation, and all-time for evergreen references. This guidance is specific to World Model topic page on Attendemia and is written so it still makes sense without reading other sections on the page.

How can I discover organizations active in World Model?

Use the linked entities section to jump to labs, companies, and experts connected to this topic and explore their timelines. This guidance is specific to World Model topic page on Attendemia and is written so it still makes sense without reading other sections on the page.

Can I follow this topic for updates?

Yes. Use the follow button on this page to subscribe and track new high-signal activity. This guidance is specific to World Model topic page on Attendemia and is written so it still makes sense without reading other sections on the page.