Topic: World Model

Track this topic after sign-in.

Short answer

This page shows the most relevant public items for World Model, ranked by trend activity and review signal. Use weekly for fast changes, monthly for more stable patterns, and all-time for evergreen picks.

WeeklyMonthlyAll time

← Back to home

  1. Magic-Boost: Boost 3D Generation with Multi-View Conditioned Diffusion

    PaperJan 9, 2025arxiv.orgFan Yang, Jianfeng Zhang, Yichun Shi, Bowen Chen, Chenxu Zhang, Huichao Zhang, Xiaofeng Yang, Xiu Li, Jiashi Feng, Guosheng Lin

    Benefiting from the rapid development of 2D diffusion models, 3D content generation has witnessed significant progress. One promising solution is to finetune the pre-trained 2D diffusion models to ...

  2. MegaScale: Scaling Large Language Model Training to More Than 10,000 GPUs

    PaperFeb 23, 2024arxiv.orgZiheng Jiang, Haibin Lin, Yinmin Zhong, Qi Huang, Yangrui Chen, Zhi Zhang, Yanghua Peng, Xiang Li, Cong Xie, Shibiao Nong, Yulu Jia, Sun He, Hongmin Chen, Zhihao Bai, Qi Hou, Shipeng Yan, Ding Zhou, Yiyao Sheng, Zhuo Jiang, Haohan Xu, Haoran Wei, Zhang Zhang, Pengfei Nie, Leqi Zou, Sida Zhao, Liang Xiang, Zherui Liu, Zhe Li, Xiaoying Jia, Jianxi Ye, Xin Jin, Xin Liu

    We present the design, implementation and engineering experience in building and deploying MegaScale, a production system for training large language models (LLMs) at the scale of more than 10,000 ...

  3. SDXL-Lightning: Progressive Adversarial Diffusion Distillation

    PaperMar 2, 2024arxiv.orgShanchuan Lin, Anran Wang, Xiao Yang

    We propose a diffusion distillation method that achieves new state-of-the-art in one-step/few-step 1024px text-to-image generation based on SDXL. Our method combines progressive and adversarial dis...

  4. Magic-Me: Identity-Specific Video Customized Diffusion

    PaperMar 20, 2024arxiv.orgZe Ma, Daquan Zhou, Chun-Hsiao Yeh, Xue-She Wang, Xiuyu Li, Huanrui Yang, Zhen Dong, Kurt Keutzer, Jiashi Feng

    Creating content with specified identities (ID) has attracted significant interest in the field of generative models. In the field of text-to-image generation (T2I), subject-driven creation has ach...

  5. Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data

    PaperApr 7, 2024arxiv.orgLihe Yang, Bingyi Kang, Zilong Huang, Xiaogang Xu, Jiashi Feng, Hengshuang Zhao

    This work presents Depth Anything, a highly practical solution for robust monocular depth estimation. Without pursuing novel technical modules, we aim to build a simple yet powerful foundation mode...

  6. MagicAnimate: Temporally Consistent Human Image Animation using Diffusion Model

    PaperNov 27, 2023arxiv.orgZhongcong Xu, Jianfeng Zhang, Jun Hao Liew, Hanshu Yan, Jia-Wei Liu, Chenxu Zhang, Jiashi Feng, Mike Zheng Shou

    This paper studies the human image animation task, which aims to generate a video of a certain reference identity following a particular motion sequence. Existing animation works typically employ t...

  7. Make Pixels Dance: High-Dynamic Video Generation

    PaperNov 18, 2023arxiv.orgYan Zeng, Guoqiang Wei, Jiani Zheng, Jiaxin Zou, Yang Wei, Yuchen Zhang, Hang Li

    Creating high-dynamic videos such as motion-rich actions and sophisticated visual effects poses a significant challenge in the field of artificial intelligence. Unfortunately, current state-of-the-...

  8. SALMONN: Towards Generic Hearing Abilities for Large Language Models

    PaperApr 8, 2024arxiv.orgChangli Tang, Wenyi Yu, Guangzhi Sun, Xianzhao Chen, Tian Tan, Wei Li, Lu Lu, Zejun Ma, Chao Zhang

    Hearing is arguably an essential ability of artificial intelligence (AI) agents in the physical world, which refers to the perception and understanding of general auditory information consisting of...

  9. MagicEdit: High-Fidelity and Temporally Coherent Video Editing

    PaperAug 28, 2023arxiv.orgJun Hao Liew, Hanshu Yan, Jianfeng Zhang, Zhongcong Xu, Jiashi Feng

    In this report, we present MagicEdit, a surprisingly simple yet effective solution to the text-guided video editing task. We found that high-fidelity and temporally coherent video-to-video translat...

  10. AudioLDM 2: Learning Holistic Audio Generation with Self-supervised Pretraining

    PaperMay 11, 2024arxiv.orgHaohe Liu, Yi Yuan, Xubo Liu, Xinhao Mei, Qiuqiang Kong, Qiao Tian, Yuping Wang, Wenwu Wang, Yuxuan Wang, Mark D. Plumbley

    Although audio generation shares commonalities across different types of audio, such as speech, music, and sound effects, designing models for each type requires careful consideration of specific o...

  11. PolyVoice: Language Models for Speech to Speech Translation

    PaperJun 13, 2023arxiv.orgQianqian Dong, Zhiying Huang, Qiao Tian, Chen Xu, Tom Ko, Yunlong Zhao, Siyuan Feng, Tang Li, Kexin Wang, Xuxin Cheng, Fengpeng Yue, Ye Bai, Xi Chen, Lu Lu, Zejun Ma, Yuping Wang, Mingxuan Wang, Yuxuan Wang

    We propose PolyVoice, a language model-based framework for speech-to-speech translation (S2ST) system. Our framework consists of two language models: a translation language model and a speech synth...

  12. Efficient Neural Music Generation

    PaperMay 25, 2023arxiv.orgMax W. Y. Lam, Qiao Tian, Tang Li, Zongyu Yin, Siyuan Feng, Ming Tu, Yuliang Ji, Rui Xia, Mingbo Ma, Xuchen Song, Jitong Chen, Yuping Wang, Yuxuan Wang

    Recent progress in music generation has been remarkably advanced by the state-of-the-art MusicLM, which comprises a hierarchy of three LMs, respectively, for semantic, coarse acoustic, and fine aco...

  13. DINOISER: Diffused Conditional Sequence Learning by Manipulating Noises

    PaperMay 1, 2024arxiv.orgJiasheng Ye, Zaixiang Zheng, Yu Bao, Lihua Qian, Mingxuan Wang

    While diffusion models have achieved great success in generating continuous signals such as images and audio, it remains elusive for diffusion models in learning discrete sequence data like natural...

  14. ByteTransformer: A High-Performance Transformer Boosted for Variable-Length Inputs

    PaperFeb 20, 2023arxiv.orgYujia Zhai, Chengquan Jiang, Leyuan Wang, Xiaoying Jia, Shang Zhang, Zizhong Chen, Xin Liu, Yibo Zhu

    Transformers have become keystone models in natural language processing over the past decade. They have achieved great popularity in deep learning applications, but the increasing sizes of the para...

  15. Cross-modal Contrastive Learning for Speech Translation

    PaperMay 5, 2022arxiv.orgRong Ye, Mingxuan Wang, Lei Li

    How can we learn unified representations for spoken utterances and their written text? Learning similar representations for semantically similar speech and text is important for speech translation....

← PreviousPage 6Next →

Top Entities In This Topic

Related Topics

FAQ

What does this World Model page rank?

It ranks public content for World Model using recent discussion, review, and engagement signals so you can triage faster. This guidance is specific to World Model topic page on Attendemia and is written so it still makes sense without reading other sections on the page.

How should I use weekly vs monthly vs all-time?

Use weekly for fast-moving updates, monthly for stable trend confirmation, and all-time for evergreen references. This guidance is specific to World Model topic page on Attendemia and is written so it still makes sense without reading other sections on the page.

How can I discover organizations active in World Model?

Use the linked entities section to jump to labs, companies, and experts connected to this topic and explore their timelines. This guidance is specific to World Model topic page on Attendemia and is written so it still makes sense without reading other sections on the page.

Can I follow this topic for updates?

Yes. Use the follow button on this page to subscribe and track new high-signal activity. This guidance is specific to World Model topic page on Attendemia and is written so it still makes sense without reading other sections on the page.