Topic: Multimodal Model

Track this topic after sign-in.

Short answer

This page shows the most relevant public items for Multimodal Model, ranked by trend activity and review signal. Use weekly for fast changes, monthly for more stable patterns, and all-time for evergreen picks.

Weekly Monthly All time

← Back to home

MagicEdit: High-Fidelity and Temporally Coherent Video Editing
Paper • Aug 28, 2023 • arxiv.org • Jun Hao Liew, Hanshu Yan, Jianfeng Zhang, Zhongcong Xu, Jiashi Feng
In this report, we present MagicEdit, a surprisingly simple yet effective solution to the text-guided video editing task. We found that high-fidelity and temporally coherent video-to-video translat...
AudioLDM 2: Learning Holistic Audio Generation with Self-supervised Pretraining
Paper • May 11, 2024 • arxiv.org • Haohe Liu, Yi Yuan, Xubo Liu, Xinhao Mei, Qiuqiang Kong, Qiao Tian, Yuping Wang, Wenwu Wang, Yuxuan Wang, Mark D. Plumbley
Although audio generation shares commonalities across different types of audio, such as speech, music, and sound effects, designing models for each type requires careful consideration of specific o...
PolyVoice: Language Models for Speech to Speech Translation
Paper • Jun 13, 2023 • arxiv.org • Qianqian Dong, Zhiying Huang, Qiao Tian, Chen Xu, Tom Ko, Yunlong Zhao, Siyuan Feng, Tang Li, Kexin Wang, Xuxin Cheng, Fengpeng Yue, Ye Bai, Xi Chen, Lu Lu, Zejun Ma, Yuping Wang, Mingxuan Wang, Yuxuan Wang
We propose PolyVoice, a language model-based framework for speech-to-speech translation (S2ST) system. Our framework consists of two language models: a translation language model and a speech synth...
Efficient Neural Music Generation
Paper • May 25, 2023 • arxiv.org • Max W. Y. Lam, Qiao Tian, Tang Li, Zongyu Yin, Siyuan Feng, Ming Tu, Yuliang Ji, Rui Xia, Mingbo Ma, Xuchen Song, Jitong Chen, Yuping Wang, Yuxuan Wang
Recent progress in music generation has been remarkably advanced by the state-of-the-art MusicLM, which comprises a hierarchy of three LMs, respectively, for semantic, coarse acoustic, and fine aco...
DINOISER: Diffused Conditional Sequence Learning by Manipulating Noises
Paper • May 1, 2024 • arxiv.org • Jiasheng Ye, Zaixiang Zheng, Yu Bao, Lihua Qian, Mingxuan Wang
While diffusion models have achieved great success in generating continuous signals such as images and audio, it remains elusive for diffusion models in learning discrete sequence data like natural...
Diffusion Glancing Transformer for Parallel Sequence to Sequence Learning
Paper • Nov 29, 2023 • arxiv.org • Lihua Qian, Mingxuan Wang, Yang Liu, Hao Zhou
Previously, non-autoregressive models were widely perceived as being superior in generation efficiency but inferior in generation quality due to the difficulties of modeling multiple target modalit...
ByteTransformer: A High-Performance Transformer Boosted for Variable-Length Inputs
Paper • Feb 20, 2023 • arxiv.org • Yujia Zhai, Chengquan Jiang, Leyuan Wang, Xiaoying Jia, Shang Zhang, Zizhong Chen, Xin Liu, Yibo Zhu
Transformers have become keystone models in natural language processing over the past decade. They have achieved great popularity in deep learning applications, but the increasing sizes of the para...
Cross-modal Contrastive Learning for Speech Translation
Paper • May 5, 2022 • arxiv.org • Rong Ye, Mingxuan Wang, Lei Li
How can we learn unified representations for spoken utterances and their written text? Learning similar representations for semantically similar speech and text is important for speech translation....
Contrastive Learning for Many-to-many Multilingual Neural Machine Translation
Paper • Jul 22, 2021 • arxiv.org • Xiao Pan, Mingxuan Wang, Liwei Wu, Lei Li
Existing multilingual machine translation approaches mainly focus on English-centric directions, while the non-English directions still lag behind. In this work, we aim to build a many-to-many tran...

← PreviousPage 8Next →

FAQ

What does this Multimodal Model page rank?

It ranks public content for Multimodal Model using recent discussion, review, and engagement signals so you can triage faster. This guidance is specific to Multimodal Model topic page on Attendemia and is written so it still makes sense without reading other sections on the page.

How should I use weekly vs monthly vs all-time?

Use weekly for fast-moving updates, monthly for stable trend confirmation, and all-time for evergreen references. This guidance is specific to Multimodal Model topic page on Attendemia and is written so it still makes sense without reading other sections on the page.

How can I discover organizations active in Multimodal Model?

Use the linked entities section to jump to labs, companies, and experts connected to this topic and explore their timelines. This guidance is specific to Multimodal Model topic page on Attendemia and is written so it still makes sense without reading other sections on the page.

Can I follow this topic for updates?

Yes. Use the follow button on this page to subscribe and track new high-signal activity. This guidance is specific to Multimodal Model topic page on Attendemia and is written so it still makes sense without reading other sections on the page.

Topic: Multimodal Model

Short answer

MagicEdit: High-Fidelity and Temporally Coherent Video Editing

AudioLDM 2: Learning Holistic Audio Generation with Self-supervised Pretraining

PolyVoice: Language Models for Speech to Speech Translation

Efficient Neural Music Generation

DINOISER: Diffused Conditional Sequence Learning by Manipulating Noises

Diffusion Glancing Transformer for Parallel Sequence to Sequence Learning

ByteTransformer: A High-Performance Transformer Boosted for Variable-Length Inputs

Cross-modal Contrastive Learning for Speech Translation

Contrastive Learning for Many-to-many Multilingual Neural Machine Translation

Top Entities In This Topic

Related Topics

FAQ

What does this Multimodal Model page rank?

How should I use weekly vs monthly vs all-time?

How can I discover organizations active in Multimodal Model?

Can I follow this topic for updates?