Immersive Audio-Visual 3D Scene Reproduction Using a Single 360 Camera

使用单个 360 度摄像头实现沉浸式视听 3D 场景再现

基本信息

  • 批准号:
    EP/V03538X/1
  • 负责人:
  • 金额:
    $ 34.08万
  • 依托单位:
  • 依托单位国家:
    英国
  • 项目类别:
    Research Grant
  • 财政年份:
    2021
  • 资助国家:
    英国
  • 起止时间:
    2021 至 无数据
  • 项目状态:
    已结题

项目摘要

The COVID-19 pandemic has changed our lifestyle and caused high demand for remote communication and experience. Many organizations have had to set up remote work systems with video conferencing platforms. However, current video conferencing systems do not meet basic requirements for remote collaboration due to the lack of eye contact, gaze awareness and spatial audio synchronisation. Reproduction of a real space as an audio-visual 3D model allows users to remotely experience real-time interaction in real environments, thus it can be widely utilised in various applications such as healthcare, teleconferencing, education, entertainments, etc. The goal of this project is to develop a simple and practical solution to estimate geometrical structure and acoustic properties of general scenes allowing spatial audio to be adapted to the environment and listener location to give an immersive rendering of the scene to improve user experience.Existing 3D scene reproduction systems have two problems. (i) Audio and vision systems have been researched separately. Computer vision research has mainly focused on improving the visual side of scene reconstruction. In an immersive display, such as a VR system, the experience is not perceived as "realistic" by users if sound is not matched with the visual cues. On the other hand, audio researches have been using only audio sensors to measure acoustic properties without considering the complementary effect with visual sensors. (ii) Current capture and recording systems for 3D scene reproduction require too invasive set up and professional process to be deployed by users in their private places. A LiDAR sensor is expensive and requires long scanning time. Perspective images require large number of photos to cover the whole scene. The objective of this research is to develop an end-to-end audio-visual 3D scene reproduction pipeline using a single shot from a consumer 360 (panoramic) camera. In order to make the system easily accessible by common users in their own private spaces, automatic solution using computer vision and artificial intelligence algorithms should be included in the back-end. A deep neural network (DNN) jointly trained for semantic scene reconstruction and acoustic property prediction for the captured environments will be developed. This process includes inference for invisible regions from the camera. Impulse Responses (IRs) characterising acoustic attributes of an environment allow to reproduce the acoustics of the space with any sound sources. It also allows to extract the original (dry) sound by eliminating acoustic effects from recorded sound so that this source can be re-rendered in new environments with different acoustic effects. A simple and efficient method to estimate acoustic IRs from the captured single 360 photo will be investigated. This semantic scene data is used to provide immersive audio-visual experience to users. Two types of display scenarios will be considered: personalised display system such as a VR headset with headphones and communal display system (e.g., TV or projector) with loudspeakers. Real-time 3D human pose tracking using a single 360 camera will be developed to accurately render 3D audio-visual scene at the locations of users. Delivering binaural sound to listeners using loudspeakers is a challenging task. Audio beam-forming techniques aligned with human-pose tracking for multiple loudspeakers will be investigated in collaboration with the project partners in audio processing. The resulting system would have a significant impact on innovation of VR and multimedia systems, and open up new and interesting applications for their deployment. This award should provide the foundation for the PI to establish and lead a group with a unique research direction which is aligned with national priorities and will address a major long-term research challenge.
COVID-19疫情改变了我们的生活方式,并对远程通信和体验产生了很高的需求。许多组织不得不使用视频会议平台建立远程工作系统。然而,当前的视频会议系统由于缺乏目光接触、凝视意识和空间音频同步而不满足远程协作的基本要求。将真实的空间再现为视听3D模型允许用户远程体验真实的环境中的实时交互,因此其可以广泛地用于各种应用,例如医疗保健、电话会议、教育、娱乐、该项目的目标是开发一种简单实用的解决方案,以估计一般场景的几何结构和声学特性,从而使空间音频适应环境和收听者位置,以给出场景的沉浸式渲染,从而改善用户体验。(i)音频和视频系统已分别进行了研究。计算机视觉的研究主要集中在改善场景重建的视觉方面。在诸如VR系统的沉浸式显示器中,如果声音与视觉提示不匹配,则用户不会将体验感知为“真实的”。另一方面,音频研究一直只使用音频传感器来测量声学特性,而没有考虑与视觉传感器的互补作用。(ii)当前用于3D场景再现的捕获和记录系统需要过于侵入性的设置和专业的过程以由用户在其私人场所中部署。LiDAR传感器价格昂贵,需要很长的扫描时间。透视图像需要大量的照片来覆盖整个场景。本研究的目的是开发一个端到端的视听3D场景再现管道使用一个单一的拍摄从消费者360(全景)相机。为了让普通用户在自己的私人空间内轻松访问该系统,后端应包括使用计算机视觉和人工智能算法的自动解决方案。将开发一个联合训练的深度神经网络(DNN),用于捕获环境的语义场景重建和声学特性预测。此过程包括从相机推断不可见区域。表征环境的声学属性的脉冲响应(IR)允许再现具有任何声源的空间的声学。它还允许通过从记录的声音中消除声学效果来提取原始(干)声音,以便该源可以在具有不同声学效果的新环境中重新呈现。一个简单而有效的方法来估计声学IR从捕获的单一360照片将进行调查。该语义场景数据用于向用户提供沉浸式视听体验。将考虑两种类型的显示场景:个性化显示系统,如带有耳机的VR头戴式耳机和公共显示系统(例如,电视或投影仪)与扬声器。将开发使用单个360摄像机的实时3D人体姿势跟踪,以准确地在用户的位置处呈现3D视听场景。使用扬声器将双耳声音传递给听众是一项具有挑战性的任务。将与音频处理方面的项目合作伙伴合作,研究与多个扬声器的人体姿态跟踪相一致的音频波束形成技术。由此产生的系统将对VR和多媒体系统的创新产生重大影响,并为其部署开辟新的有趣的应用程序。该奖项将为PI建立和领导一个具有独特研究方向的小组提供基础,该研究方向与国家优先事项保持一致,并将解决重大的长期研究挑战。

项目成果

期刊论文数量(9)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
Computer Vision, Imaging and Computer Graphics Theory and Applications - 17th International Joint Conference, VISIGRAPP 2022, Virtual Event, February 6-8, 2022, Revised Selected Papers
计算机视觉、成像和计算机图形理论与应用 - 第 17 届国际联合会议,VISIGRAPP 2022,虚拟活动,2022 年 2 月 6-8 日,修订后的精选论文
  • DOI:
    10.1007/978-3-031-45725-8_4
  • 发表时间:
    2023
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Heng Y
  • 通讯作者:
    Heng Y
Material Recognition for Immersive Interactions in Virtual/Augmented Reality
虚拟/增强现实中沉浸式交互的材料识别
  • DOI:
    10.1109/vrw58643.2023.00131
  • 发表时间:
    2023
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Heng Y
  • 通讯作者:
    Heng Y
CAM-SegNet: A Context-Aware Dense Material Segmentation Network for Sparsely Labelled Datasets
  • DOI:
    10.5220/0010853200003124
  • 发表时间:
    2022
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Yuwen Heng;Yihong Wu;S. Dasmahapatra;Hansung Kim
  • 通讯作者:
    Yuwen Heng;Yihong Wu;S. Dasmahapatra;Hansung Kim
DBAT: Dynamic Backward Attention Transformer for Material Segmentation with Cross-Resolution Patches
  • DOI:
    10.48550/arxiv.2305.03919
  • 发表时间:
    2023-05
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Yuwen Heng;S. Dasmahapatra;Hansung Kim
  • 通讯作者:
    Yuwen Heng;S. Dasmahapatra;Hansung Kim
Spatial Audio Reconstruction for VR Applications Using a Combined Method Based on SIRR and RSAO Approaches
使用基于 SIRR 和 RSAO 方法的组合方法进行 VR 应用的空间音频重建
  • DOI:
    10.1109/mmsp59012.2023.10337683
  • 发表时间:
    2023
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Alinaghi A
  • 通讯作者:
    Alinaghi A
{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ monograph.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ sciAawards.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ conferencePapers.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ patent.updateTime }}

HANSUNG KIM其他文献

HANSUNG KIM的其他文献

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

相似海外基金

EduSay™ - developing a digital, audio-visual and kinesthetic English pronunciation training programme for international students and professionals; upskilling communications for education, employability, UK productivity and integration
EduSay™ - 为国际学生和专业人士开发数字、视听和动觉英语发音培训计划;
  • 批准号:
    10063001
  • 财政年份:
    2023
  • 资助金额:
    $ 34.08万
  • 项目类别:
    Collaborative R&D
Empowering Archivists: Applying New Tools and Approaches for Better Representation of Women in Audio-Visual Collections
赋予档案管理员权力:应用新工具和方法在音像收藏中更好地代表女性
  • 批准号:
    AH/Y007328/1
  • 财政年份:
    2023
  • 资助金额:
    $ 34.08万
  • 项目类别:
    Research Grant
User-centric Audio-Visual Scene Understanding for Augmented Reality Smart Glasses in the Wild
以用户为中心的野外增强现实智能眼镜的视听场景理解
  • 批准号:
    23K16912
  • 财政年份:
    2023
  • 资助金额:
    $ 34.08万
  • 项目类别:
    Grant-in-Aid for Early-Career Scientists
Audio-visual poetics for the environmental pollutions: A research on the documentaries and expressions of "Kogai" films
环境污染的视听诗学——“小外”电影的纪录片与表达研究
  • 批准号:
    22H00613
  • 财政年份:
    2022
  • 资助金额:
    $ 34.08万
  • 项目类别:
    Grant-in-Aid for Scientific Research (B)
Using eye tracking to examine audio-visual rhythm perception in infants
使用眼动追踪检查婴儿的视听节律感知
  • 批准号:
    572614-2022
  • 财政年份:
    2022
  • 资助金额:
    $ 34.08万
  • 项目类别:
    University Undergraduate Student Research Awards
Emotional McGurk: Developing a novel tool to examine audio-visual integration of affective signals
Emotional McGurk:开发一种新颖的工具来检查情感信号的视听整合
  • 批准号:
    574638-2022
  • 财政年份:
    2022
  • 资助金额:
    $ 34.08万
  • 项目类别:
    University Undergraduate Student Research Awards
Neural Rendering of object-based audio-visual scenes
基于对象的视听场景的神经渲染
  • 批准号:
    2644080
  • 财政年份:
    2022
  • 资助金额:
    $ 34.08万
  • 项目类别:
    Studentship
Ghosts amongst us: an audio-visual exploration of haunting in Palestine
我们身边的鬼魂:对巴勒斯坦闹鬼事件的视听探索
  • 批准号:
    2733997
  • 财政年份:
    2022
  • 资助金额:
    $ 34.08万
  • 项目类别:
    Studentship
Audio-visual object-based dynamic scene representation from monocular video
单目视频中基于视听对象的动态场景表示
  • 批准号:
    2701695
  • 财政年份:
    2022
  • 资助金额:
    $ 34.08万
  • 项目类别:
    Studentship
Towards in-vehicle situation awareness using visual and audio sensors
使用视觉和音频传感器实现车内态势感知
  • 批准号:
    LP210200931
  • 财政年份:
    2022
  • 资助金额:
    $ 34.08万
  • 项目类别:
    Linkage Projects
{{ showInfoDetail.title }}

作者:{{ showInfoDetail.author }}

知道了