Skip to content
DreamCube

DreamCube

Generates depth-aware 3D panoramas and scene models from single images

Features

Open Source3D

Screenshots

DreamCube screenshot 1
DreamCube screenshot 2
DreamCube screenshot 3

System Requirements

Minimum 16GB RAM. 12GB+ storage recommended.
macOS 15+: M-series chips required.
Windows 10/11: tested with NVIDIA 50s GPU, 16GB+ VRAM required.
Note: For NVIDIA GPUs, install a newer driver.

Introduction

DreamCube allows ordinary users to create 3D panoramic effects with just one photo. Here are its easy-to-understand functions:

  • Generate 3D panoramas from a single image: Take a photo, and it will automatically turn into a 360-degree viewable stereoscopic panorama. It's like giving the photo a "god's perspective" that fills in the surrounding scenes.

  • Turn photos into 3D images with depth instantly: Besides colors, it can calculate the distance of each object from the lens and generate images with depth information, adding a three-dimensional sense to the photo.

  • Generate 3D scene models with one click: The completed panorama can be directly converted into 3D models, such as a stereo mesh of a house or street, or a more realistic 3D Gaussian scene, which is convenient for virtual scene construction.

Technical Foundation

  • The core technology is Multi-plane Synchronization, which adapts 2D diffusion models to multi-plane panoramic representations (e.g., cubemaps).
  • Built upon open-source projects like CubeDiff, CubeGAN, and PanFusion, it constructs a diffusion-based framework for RGB-D panorama generation.

Core Functions

  1. RGB-D Panorama Generation:Generate RGB-D cubemaps and equirectangular panoramas from single-view inputs.
  2. Panoramic Depth Estimation:Simultaneously estimate scene depth to form 3D panoramic representations with depth information.
  3. 3D Scene Generation:Output 3D meshes and 3D Gaussian scenes (3DGS), enabling conversion from 2D inputs to 3D scenes.

Technical Features and Advantages

  • Efficient Single-View Generation:Requires only a single image to produce 3D panoramas with color and depth, reducing data collection complexity.
  • Multi-plane Synchronization:Optimizes diffusion model inference across multiple planes to enhance panoramic consistency and 3D perception.
  • High-performance Inference:Takes ~20 seconds on an Nvidia L40S GPU to generate complete RGB-D panoramas and 3D scenes, supporting real-time or batch processing.
  • Open-source and User-friendly:Provides Gradio interface and command-line tools, with model weights automatically downloaded from HuggingFace for easy deployment.