Papers
arxiv:2511.10647

Depth Anything 3: Recovering the Visual Space from Any Views

Published on Nov 13
Β· Submitted by Adina Yakefu on Nov 14
#2 Paper of the day
Authors:
,
,
,
,
,
,

Abstract

Depth Anything 3 (DA3) uses a plain transformer for geometry prediction from visual inputs, achieving state-of-the-art results in camera pose estimation, any-view geometry, visual rendering, and monocular depth estimation.

AI-generated summary

We present Depth Anything 3 (DA3), a model that predicts spatially consistent geometry from an arbitrary number of visual inputs, with or without known camera poses. In pursuit of minimal modeling, DA3 yields two key insights: a single plain transformer (e.g., vanilla DINO encoder) is sufficient as a backbone without architectural specialization, and a singular depth-ray prediction target obviates the need for complex multi-task learning. Through our teacher-student training paradigm, the model achieves a level of detail and generalization on par with Depth Anything 2 (DA2). We establish a new visual geometry benchmark covering camera pose estimation, any-view geometry and visual rendering. On this benchmark, DA3 sets a new state-of-the-art across all tasks, surpassing prior SOTA VGGT by an average of 44.3% in camera pose accuracy and 25.1% in geometric accuracy. Moreover, it outperforms DA2 in monocular depth estimation. All models are trained exclusively on public academic datasets.

Community

Paper submitter

Depth Anything 3 (DA3) uses a simple Transformer and a single depth-ray target to handle multi-view geometry without needing camera poses.

This comment has been hidden (marked as Resolved)

i want to code about this paper

Paper submitter
β€’
edited 16 days ago

Hi @naveediqbal15 - You can find their model on the right side of this page and GitHub under the abstract. Feel free to contribute!

Sign up or log in to comment

Models citing this paper 1

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2511.10647 in a dataset README.md to link it from this page.

Spaces citing this paper 8

Collections including this paper 10