Lotus-2: Advancing Geometric Dense Prediction with Powerful Image Generative Model

1HKUST(GZ) 2UC San Diego 3HKUST
§Corresponding author
arXiv Preprint 2025
Teaser 1

We present Lotus-2, an advanced estimator for monocular geometric dense prediction. Using only 0.66% of the training data compared to MoGe-2, Lotus-2 not only achieves new SoTA accuracy but also produces significantly finer details. Moreover, it performs better in rare and challenging cases, such as oil paintings and transparent objects, highlighting its superior zero-shot generalization.

Abstract

Recovering pixel-wise geometric properties from a single image is fundamentally ill-posed due to appearance ambiguity and non-injective mappings between 2D observations and 3D structures. While discriminative regression models achieve strong performance through large-scale supervision, their success is bounded by dataset realism and limited physical reasoning.

Recent flow matching models exhibit powerful world priors that encode geometry and semantics learned from massive image-text data, yet directly reusing their stochastic generative formulation is suboptimal for deterministic geometric inference.

In this work, we propose Lotus-2, a two-stage deterministic framework that leverages pre-trained generative priors for accurate and stable geometric dense prediction. The core predictor employs a single-step rectified-flow formulation with a clean-data objective and a lightweight local continuity module (LCM) to generate globally coherent structures without grid artifacts. The detail sharpener performs a constrained multi-step rectified-flow refinement within the manifold defined by the predictor, enhancing fine-grained geometry through noise-free flow matching.

Using only 59K training samples—less than 1% of existing large-scale datasets---Lotus-2 establishes new SoTA results in monocular depth estimation and highly competitive surface normal prediction. These results demonstrate that flow matching models can serve as deterministic world priors, enabling efficient, accurate, and physically consistent geometric reasoning beyond traditional generative paradigms.

BibTeX

Coming soon!