초록 |
Depth estimation on surrounding environments is one of the key components in autonomous driving applications, such as object detection and SLAM. To predict the depth of scene accurately, depth completion combines information from a sparse LiDAR depth map and a camera image. In this paper, we proposed a two branch architecture based on CNN and a Vision transformer to fuse different modalities from heterogeneous sensors. To fuse two modalities, the proposed model consists of two branch modules: a coarse-branch and a fine branch. Using the sparse depth map and an image, the coarse-branch module generates a coarse depth map that focused on color information. The fine-branch module estimates a final dense depth map focused on depth information through the color-dominant depth map and the sparse depth map. Experiments on the NYUv2 dataset demonstrated that the proposed method outperformed previous models. |