Given a 3D Gaussian Splatting model $G_{1:N}$ pretrained on multi-view images $I_{1:K}$ at camera poses $\xi_{1:K}$, our goal is to perform 3D inpainting based on the object masks $M_{1:K}$ (e.g., provided by SAM). With the rendered depth maps $D_{1:K}$, the stage of Inferring Depth-Guide Inpainting Mask is able to refine the inpainting masks to preserve visible backgrounds across camera views. The stage of Inpainting-guided 3DGS Refinement then utilizes such masks to joint- ly update the new Gaussians $G'_{1:N'}$ for both novel-view rendering and inpainting purposes.
Taking $\{I_1, M_1\}$ at view $\xi_1$ as an example reference view, the original background region $I^B_1$ can be first produced. We then project the background region $I^B_2$ from $\xi_2$ to $\xi_1$, updating ${I'}^B_1$ and the associated inpainting mask $M'_1$. By repeating this process across camera views, the final inpainting mask $M'_1$ contains only the regions that are \textit{not} visible at any training camera views.
Although the original object mask $M_1$ can be used to inpaint the occluded regions, it may also include some background regions. This leads to a inpainted background conflicting existing one. In contrast, our mask $M'_1$ is more accurate and can preserve visible background contents.
We verify the effectiveness of our Inferring Depth-Guided Inpainting Mask and Inpainting-guided 3DGS Refinement.
@inproceedings{huang20253d,
title={3d gaussian inpainting with depth-guided cross-view consistency},
author={Huang, Sheng-Yu and Chou, Zi-Ting and Wang, Yu-Chiang Frank},
booktitle={Proceedings of the Computer Vision and Pattern Recognition Conference},
pages={26704--26713},
year={2025}
}