🧠 Stable Diffusion 2.0 Out, Adding a New Dimension of Depth — Here’s Why It Matters 🎨

Nov 26, 2022

AI christmas came early because Stable Diffusion 2.0 is out — and the feature I’m most excited about is depth2img. Inferring a depth map to maintain structural coherence will be pretty sweet for all sorts of #img2img use cases. Let’s explore why.

Why Depth Aware Image-to-Image Matters

With current image-to-image workflows, the image pixels and text prompts only tells the AI model so much — so no matter how you tweak the parameters, there’s a good chance that the output will deviate quite a bit from the input image, especially in terms of geometric structure.

Instead, we can get a much better result by guiding the image generation process using a depth map — which coarsely represents the 3D structure of the human face in the example above. A depth map is used under the hood by your smartphone to give you that nice bokeh blur by separating you from the background, and even to relight your face while respecting it’s contours.

Figure showing how iOS creates a Portrait photo effect using a color camera image and a depth map — Source: Apple Documentation: Capturing Photos with Depth

So how do we generate a depth map? Well, Stable Diffusion uses MiDaS for monocular depth estimation in their Depth2Image feature. MiDAS is a state-of-the-art model created by Intel and ETH Zurich researchers that can infer depth using a single 2D photo as an input.

Bilawal "Billyfx" Sidhu @bilawalsidhu

Take a stroll through latent space and explore The Dharma Initiative HQ - that mysterious island from the TV show LOST! 🏝 🛬 With monocular depth estimation, I was able to create these fun parallax animations from my AI art session w/ MidJourney 🤖💻✨ #wigglegram #aiart #vfx

What else can we do with depth2image? While this type of “approximate” depth is a good start, I suspect we’ll quickly see a Blender plug-in that plumbs in a far more accurate z-depth pass for 3D img2img fun. Since 3D software is already dimensional (duh!), generating such a synthetic depth map is trivial, and already used extensively in VFX workflows.

What this means is, artists can quickly “greybox” a 3D scene — focusing on the spatial 3D layout, versus the textures, lighting and shading, and immediately explore a multitude of directions with generative AI, before committing to implementing one “for real.” Such depth-aware img2img workflows will save countless hours in 3D world building and concept art. Check out this example below to imagine what’s in store:

Manu.Vision | Futuriste™ @ManuVision

New @Blender AI addon just dropped: @ai_render (with the power of @StableDiffusion). It allows you to turn your 3d scene into an Al generated image. We just published this video with @polygonrunway on IG/Tiktok/Youtube. Follow them to learn 3D illustration (link in their bio).

What else? I demand more! Of course, given my particular set of passions what I wanna do is plumb in metric accurate depth from a photogrammetry or LiDAR scan, or even a NeRF (neural radiance field) to take these “reskinning reality” experiments I’ve been loving to the next level… unless someone else beats me to it, which would be pretty cool :)

Update: Instead, the creative CoffeeVectors@ made an amazing example taking a generic 3D character walk cycle animation and “upleveling” it to photorealistic quality — look at that hair and facial lighting! A little jitter, but nothing the more temporally coherent models of the future won’t be able to fix.

CoffeeVectors @CoffeeVectors

Used #stablediffusion2 #depth2img model to render a more photoreal layer ontop of a walking animation I made in #UnrealEngine5 with #realtime clothing and hair on a #daz model. Breakdown thread 1/6 @UnrealEngine @daz3d #aiart #MachineLearning #aiartcommunity #aiartprocess #aiia

The velocity of these innovations cannot be overstated. Just 4 years ago in 2018 at Google, using multi-view stereo to generate depth maps for VFX felt cutting edge. Style transfer was the bleeding edge. But creators needed a fancy 360 camera and deep technical know how… now all they need is a phone to capture and a browser to create. Exciting times indeed!

Source: Seeing art in a new way: VR tools let characters jump right in

Enjoyed this write up? Consider recommending it to your fellow creatives and technologists. You can also hit me up on your favorite platform where I share informational and inspirational goodness: https://beacons.ai/billyfx

Creative Tech Digest

🧠 Stable Diffusion 2.0 Out, Adding a New Dimension of Depth — Here’s Why It Matters 🎨

Why Depth Aware Image-to-Image Matters