Generative AI for Audio-Visual Content Creation

1st Workshop on Gen4AVC, ICCV 2025
Date: 10/19 (Sunday)  ·  Location: TBD  ·  Honolulu, Hawaii, USA

Workshop Overview

Seamless integration of audio and visual elements is crucial for creating immersive and engaging content. Audio-visual generation, involving the synthesis of one modality from the other or both jointly, has become a key research area. This capability holds significant potential for applications like virtual reality, gaming, film production, and interactive media, using advanced generative models to enhance multimedia quality and realism.

This workshop highlights the growing importance of audio-visual generation in modern content creation, bringing together researchers and practitioners from academia and industry to explore the latest advances, challenges, and emerging opportunities in this dynamic field.

Topics Include:

  • Vision-to-audio synthesis
  • Audio-to-vision synthesis
  • Joint generation of audio and video

Schedule

Morning Session, October 19th, 2025

Note: All times are in Hawaii Standard Time (HST).

8:55 - 9:00

Opening Remarks

Welcome and introduction to the workshop

9:00 - 9:30
Danilo Comminiello

Invited Talk 1: Danilo Comminiello

Sapienza University of Rome

"Weaving Time, Space & Semantics: Multimodal Alignment for Audio-Visual Generation"

9:30 - 10:00
Andrew Owens

Invited Talk 2: Andrew Owens

Cornell Tech

"Generating Sounds from Physical Interactions in 3D Scenes"

10:00 - 10:15

Coffee Break & Poster Setup

Authors set up posters

10:15 - 11:00

Poster Session

Poster presentations - see poster list below

11:00 - 11:30
Gunhee Kim

Invited Talk 3: Gunhee Kim

Seoul National University

"ViSAGe: Towards Scene-Aware Video-to-Spatial Audio Generation"

11:30 - 12:00
Kristen Grauman

Invited Talk 4: Kristen Grauman

University of Texas at Austin

"Discovering and Generating Action Sounds from Video"

Poster Presentations

Regular Posters

  1. "LD-LAudio-V1: Video-to-Long-Form-Audio Generation Extension with Dual Lightweight Adapters"

    Authors: Haomin Zhang, Kristin Qi, Shuxin Yang, Zihao Chen, Chaofan Ding, and Xinhan Di

  2. "Do State-of-the-art Audio-visual VLMs Understand Audio-video Temporal Misalignment"

    Authors: Motonobu Kimura, Ren Ohkubo, Yue Qiu, and Yutaka Satoh

  3. "Seeing What You Say: Expressive Image Generation from Speech"

    Authors: Jiyoung Lee, Song Park, Sanghyuk Chun, and Soo-Whan Chung

  4. "KeyVID: Keyframe-Aware Video Diffusion for Audio-Synchronized Visual Animation"

    Authors: Xingrui Wang, Jiang Liu, Ze Wang, Xiaodong Yu, Jialian Wu, Ximeng Sun, Yusheng Su, Alan Yuille, Zicheng Liu, and Emad Barsoum

  5. "Not Like Transformers: Drop the Beat Representation for Dance Generation with Mamba-Based Diffusion Model"

    Authors: Sangjune Park, Inhyeok Choi, Donghyeon Soon, Youngwoo Jeon, and Kyungdon Joo

  6. "High-Fidelity Talking Portrait Synthesis with Personalized 3D Generative Prior"

    Authors: Jaehoon Ko, Kyusun Cho, JoungBin Lee, Heeji Yoon, and Seungryong Kim

  7. "Dance Video Generation using Music-to-Pose Encoder Trained on Synthetic Dataset Generation Pipeline leveraging Latent Diffusion Framework"

    Author: Nokap Tony Park

  8. "Differentiable Room Acoustic Rendering with Multi-View Vision Priors"

    Authors: Derong Jin and Ruohan Gao

  9. "SpecMaskFoley: Efficient Yet Effective Synchronized Video-to-audio Synthesis via Pretraining and ControlNet"

    Authors: Zhi Zhong, Akira Takahashi, Shuyang Cui, Keisuke Toyama, Shusuke Takahashi, and Yuki Mitsufuji

  10. "JWB-DH-V1: Benchmark for Joint Whole-Body Talking Avatar and Speech Generation Version I"

    Authors: Xinhan Di and Kristin Qi

Invited Posters

  1. "TITAN-Guide: Taming Inference-Time Alignment for Guided Text-to-Video Diffusion Models"

    Authors: Christian Simon, Masato Ishii, Akio Hayakawa, Zhi Zhong, Shusuke Takahashi, Takashi Shibuya, and Yuki Mitsufuji

    ICCV 2025

  2. "TARO: Timestep-Adaptive Representation Alignment with Onset-Aware Conditioning for Synchronized Video-to-Audio Synthesis"

    Authors: Tri Ton, Ji Woo Hong, and Chang D. Yoo

    ICCV 2025

  3. "How Would It Sound? Material-Controlled Multimodal Acoustic Profile Generation for Indoor Scenes"

    Authors: Mahnoor Fatima Saad and Ziad Al-Halah

    ICCV 2025

Important Dates

Paper Submission (Regular Track)

July 9, 2025 23:59 AoE (Anywhere on Earth)

Paper Submission (Invited Track)

Closed

Decision Notification

August 8, 2025

Camera Ready

August 22, 2025 23:59 AoE (Anywhere on Earth)

Workshop Date

October 19th, 2025 (morning session)

Call for Papers (Regular Track)

Overview

We welcome submissions on (but not limited to) the following topics:

  • Vision-to-audio synthesis
  • Audio-to-vision synthesis
  • Joint generation of audio and video
  • Cross-modal representation learning
  • Evaluation of audio-visual alignment
  • Datasets for audio-visual generation
  • Applications of audio-visual generation models

Submission Guidelines

Papers are limited to four pages in the ICCV style, excluding references and appendices.

Submissions are closed.

Important Dates

  • Paper Submission Deadline: July 9, 2025 23:59 AoE (Anywhere on Earth)
  • Decision Notification: August 8, 2025
  • Camera Ready Deadline: August 22, 2025 23:59 AoE (Anywhere on Earth)
  • Workshop Date: October 19th, 2025 (morning session)

Review Process

All submissions will undergo a double-blind review process. Please ensure that your submission does not contain any identifying information about the authors.


Publication

The workshop will be non-archival. Authors of accepted papers retain the full copyright of their work and are free to submit extended versions to conferences or journals.

Call for Papers (Invited Track)

We also welcome papers that have been accepted to ICCV 2025 or were presented at recent top-tier conferences (e.g., CVPR, ECCV, NeurIPS, ICML, ICLR, AAAI). Invited papers can be submitted as-is and will not undergo a formal peer review process. Accepted submissions will be presented as invited posters at the venue. Please apply using the following form: https://forms.gle/6ZVbn7yBPC1gifKo9.

Submissions are closed.

Keynote Speakers

Organizers

Contact Us

Workshop Location

ICCV 2025

Honolulu, Hawaii, USA

Exact venue details will be announced closer to the event