High-Fidelity and High-Efficiency Talking Portrait Synthesis With Detail-Aware Neural Radiance Fields

doi:10.1109/TVCG.2024.3488960

UM > Faculty of Science and Technology

Residential College	false
Status	已發表Published
	High-Fidelity and High-Efficiency Talking Portrait Synthesis With Detail-Aware Neural Radiance Fields
	Wang, Muyu 1; Zhao, Sanyuan 2; Dong, Xingping 1; Shen, Jianbing3
	2024-11
Source Publication	IEEE Transactions on Visualization and Computer Graphics
ISSN	1077-2626
Abstract	In this paper, we propose a novel rendering framework based on neural radiance fields (NeRF) named HH-NeRF that can generate high-resolution audio-driven talking portrait videos with high fidelity and fast rendering. Specifically, our framework includes a detail-aware NeRF module and an efficient conditional super-resolution module. Firstly, a detail-aware NeRF is proposed to efficiently generate a high-fidelity low-resolution talking head, by using the encoded volume density estimation and audio-eye-aware color calculation. This module can capture natural eye blinks and high-frequency details, and maintain a similar rendering time as previous fast methods. Secondly, we present an efficient conditional super-resolution module on the dynamic scene to directly generate the high-resolution portrait with our low-resolution head. Incorporated with the prior information, such as depth map and audio features, our new proposed efficient conditional super resolution module can adopt a lightweight network to efficiently generate realistic and distinct high-resolution videos. Extensive experiments demonstrate that our method can generate more distinct and fidelity talking portraits on high resolution (900 × 900) videos compared to state-of-the-art methods.
Keyword	Neural Radiance Fields Talking Portrait Audio Super-resolution
DOI	10.1109/TVCG.2024.3488960
URL	View the original
Language	英語English
Scopus ID	2-s2.0-85208531865
Fulltext Access	View Full-Text via DOI View Full-Text via Scopus
Citation statistics
Document Type	Journal article
Collection	Faculty of Science and Technology THE STATE KEY LABORATORY OF INTERNET OF THINGS FOR SMART CITY (UNIVERSITY OF MACAU) DEPARTMENT OF COMPUTER AND INFORMATION SCIENCE
Affiliation	1.Wuhan University, School of Computer Science, National Engineering Research Center for Multimedia Software, Institute of Artificial Intelligence, Hubei Key Laboratory of Multimedia and Network Communication Engineering, Wuhan, 430072, China 2.Beijing Institute of Technology, School of Computer Science, Beijing, China 3.University of Macau, State Key Laboratory of Internet of Things for Smart City, Department of Computer and Information Science, Macao
Recommended Citation GB/T 7714	Wang, Muyu,Zhao, Sanyuan,Dong, Xingping,et al. High-Fidelity and High-Efficiency Talking Portrait Synthesis With Detail-Aware Neural Radiance Fields[J]. IEEE Transactions on Visualization and Computer Graphics, 2024.
APA	Wang, Muyu., Zhao, Sanyuan., Dong, Xingping., & Shen, Jianbing (2024). High-Fidelity and High-Efficiency Talking Portrait Synthesis With Detail-Aware Neural Radiance Fields. IEEE Transactions on Visualization and Computer Graphics.
MLA	Wang, Muyu,et al."High-Fidelity and High-Efficiency Talking Portrait Synthesis With Detail-Aware Neural Radiance Fields".IEEE Transactions on Visualization and Computer Graphics (2024).

Files in This Item:
There are no files associated with this item.

If you have any objections to this item, please fill out the form below and the administrator will contact you as soon as possible.
Content:
Email：	*
Affiliation No.
Verification Code:	Refresh

Any comments and suggestions are welcomed.
Title:	*
Content:
Email：	*
Verification Code:	Refresh