JointDreamer: Ensuring Geometry Consistency and Text Congruence in Text-to-3D Generation via Joint Score Distillation

Abstract

Score Distillation Sampling (SDS) by well-trained 2D diffusion models has shown great promise in text-to-3D generation. However, this paradigm distills view-agnostic 2D image distributions into the rendering distribution of 3D representation for each view independently, overlooking the coherence across views and yielding 3D inconsistency in generations.

In this work, we propose Joint Score Distillation (JSD), a new paradigm that ensures coherent 3D generations. Specifically, we model the joint image distribution, which introduces an energy function to capture the coherence among denoised images from the diffusion model. We then derive the joint score distillation on multiple rendered views of the 3D representation, as opposed to a single view in SDS. In addition, we instantiate three universal view-aware models as energy functions, demonstrating compatibility with JSD.

Empirically, JSD significantly mitigates the 3D inconsistency problem in SDS, while maintaining text congruence. Moreover, we introduce the Geometry Fading scheme and Classifier-Free Guidance (CFG) Switching strategy to enhance generative details. Our framework, JointDreamer, establishes a new benchmark in text-to-3D generation, achieving outstanding results with an 88.5\% CLIP R-Precision and 27.7\% CLIP Score. These metrics demonstrate exceptional text congruence, as well as remarkable geometric consistency and texture fidelity.

Example generated objects

JointDreamer generates objects ensuring geometry and textural consistency.

A panda rowing a boat in a pond, 8K, HD, photorealistic

a confused beagle sitting at a desk working on homework

Woodies talking with each other, Toy Story, Anime style, more details, 8K, HD

Image of Michael Jackson, showcasing his signature dance moves, fedora hat, and stylish wardrobe

Comparison Results

We collected 14 prompts from different sources to compare with other text-to-3D methods. A fixed default configuration is used for all prompts without hyper-paramter tuning with threestudio.

Dreamfusion-IF

Magic3D-IF-SD

ProlificDreamer

MVDream

Ours

a DSLR photo of a squirrel playing guitar

a DSLR photo of a fox working on a jigsaw puzzle, 8K, HD, photorealistic

A zoomed out DSLR photo of a hippo biting through a watermelon, 8K, HD, photorealistic

a wide angle zoomed out DSLR photo of a skiing penguin wearing a puffy jacket

Citation


                    @inproceedings{jiang2024jointdreaner,

                      author = {Jiang, Chenhan and Zeng, Yihan and Hu, Tianyang and Xu, Songcun and Zhang, Wei and Xu, Wei and Yeung, Dit-Yan},

                      title  = {JointDreamer: Ensuring Geometry Consistency and Text Congruence in Text-to-3D Generation via Joint Score Distillation},

                      booktitle = {ECCV},

                      year   = {2024},

                }

JointDreamer: Ensuring Geometry Consistency and Text Congruence in Text-to-3D Generation via Joint Score Distillation

ECCV 2024

Chenhan JIANG^1*

Yihan ZENG^2*

Tianyang HU²

Songcun XU²

Wei ZHANG²

Hang XU²

Dit-Yan YEUNG¹

^*Equal Contribution

¹Hong Kong University of Science and Technology

²Huawei Noah's Ark Lab

Abstract

Example generated objects

A panda rowing a boat in a pond, 8K, HD, photorealistic

a confused beagle sitting at a desk working on homework

Woodies talking with each other, Toy Story, Anime style, more details, 8K, HD

Image of Michael Jackson, showcasing his signature dance moves, fedora hat, and stylish wardrobe

Comparison Results

Citation

JointDreamer: Ensuring Geometry Consistency and Text Congruence in Text-to-3D Generation via Joint Score Distillation

ECCV 2024

Chenhan JIANG1*

Yihan ZENG2*

Tianyang HU2

Songcun XU2

Wei ZHANG2

Hang XU2

Dit-Yan YEUNG1

*Equal Contribution

1Hong Kong University of Science and Technology

2Huawei Noah's Ark Lab

Abstract

Example generated objects

A panda rowing a boat in a pond, 8K, HD, photorealistic

a confused beagle sitting at a desk working on homework

Woodies talking with each other, Toy Story, Anime style, more details, 8K, HD

Image of Michael Jackson, showcasing his signature dance moves, fedora hat, and stylish wardrobe

Comparison Results

Citation

Chenhan JIANG^1*

Yihan ZENG^2*

Tianyang HU²

Songcun XU²

Wei ZHANG²

Hang XU²

Dit-Yan YEUNG¹

^*Equal Contribution

¹Hong Kong University of Science and Technology

²Huawei Noah's Ark Lab