Given a short RGB video captured by a monocular camera, the corresponding editable 3D avatar can be efficiently generated by our method, achieving both text and image-guided 3D editing with locally adapted geometry and photorealistic renderings
Personalized 3D avatar editing holds significant promise due to its user-friendliness and availability to applications such as AR/VR and virtual try-ons. Previous studies have explored the feasibility of 3D editing, but often struggle to generate visually pleasing results, possibly due to the unstable representation learning under mixed optimization of geometry and texture in complicated reconstructed scenarios. In this paper, we aim to provide an accessible solution for ordinary users to create their editable 3D avatars with precise region localization, geometric adaptability, and photorealistic renderings. To tackle this challenge, we introduce a meticulously designed framework that decouples the editing process into local spatial adaptation and realistic appearance learning, utilizing a hybrid Tetrahedron-constrained Gaussian Splatting (TetGS) as the underlying representation. TetGS combines the controllable explicit structure of tetrahedral grids with the high-precision rendering capabilities of 3D Gaussian Splatting and is optimized in a progressive manner comprising three stages: 3D avatar instantiation from real-world monocular videos to provide accurate priors for TetGS initialization; localized spatial adaptation with explicitly partitioned tetrahedrons to guide the redistribution of Gaussian kernels; and geometry-based appearance generation with a coarse-to-fine activation strategy. Both qualitative and quantitative experiments demonstrate the effectiveness and superiority of our approach in generating photorealistic 3D editable avatars.
An overview of our method, built upon the proposed hybrid Tetrahedron-constrained Gaussian Splatting (TetGS). Our method first learns accurate TetGS initialization from a monocular video, then updates the spatial allocation of localized editing Gaussians along with explicitly partitioned tetrahedrons under diffusion guidance. With the learned distribution, we perform texture editing by optimizing restricted Gaussians with few-shot inpainted images and activating their attributes under augmented guidance.
Multi-view renderings and the underlying geometries before and after editing with various subjects and accessories