Creative Tech Digest

Share this post

🧠 Stable Diffusion 2.0 Out, Adding a New Dimension of Depth — Here’s Why It Matters 🎨

creativetechnologydigest.substack.com

🧠 Stable Diffusion 2.0 Out, Adding a New Dimension of Depth — Here’s Why It Matters 🎨

Bilawal Sidhu
Nov 26, 2022
5
3
Share
Share this post

🧠 Stable Diffusion 2.0 Out, Adding a New Dimension of Depth — Here’s Why It Matters 🎨

creativetechnologydigest.substack.com

AI christmas came early because Stable Diffusion 2.0 is out — and the feature I’m most excited about is depth2img. Inferring a depth map to maintain structural coherence will be pretty sweet for all sorts of #img2img use cases. Let’s explore why.

Why Depth Aware Image-to-Image Matters

With current image-to-image workflows, the image pixels and text prompts only tells the AI model so much — so no matter how you tweak the parameters, there’s a good chance that the output will deviate quite a bit from the input image, especially in terms of geometric structure.

Creative Technology Digest is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.

Instead, we can get a much better result by guiding the image generation process using a depth map — which coarsely represents the 3D structure of the human face in the example above. A depth map is used under the hood by your smartphone to give you that nice bokeh blur by separating you from the background, and even to relight your face while respecting it’s contours.

Figure showing how iOS creates a Portrait photo effect using a color camera image and a depth map
Source: Apple Documentation: Capturing Photos with Depth

So how do we generate a depth map? Well, Stable Diffusion uses MiDaS for monocular depth estimation in their Depth2Image feature. MiDAS is a state-of-the-art model created by Intel and ETH Zurich researchers that can infer depth using a single 2D photo as an input.

Twitter avatar for @bilawalsidhu
Bilawal "Billyfx" Sidhu @bilawalsidhu
Take a stroll through latent space and explore The Dharma Initiative HQ - that mysterious island from the TV show LOST! 🏝 🛬 With monocular depth estimation, I was able to create these fun parallax animations from my AI art session w/ MidJourney 🤖💻✨ #wigglegram #aiart #vfx
2:08 PM ∙ Nov 25, 2022
36Likes7Retweets

What else can we do with depth2image? While this type of “approximate” depth is a good start, I suspect we’ll quickly see a Blender plug-in that plumbs in a far more accurate z-depth pass for 3D img2img fun. Since 3D software is already dimensional (duh!), generating such a synthetic depth map is trivial, and already used extensively in VFX workflows.

What this means is, artists can quickly “greybox” a 3D scene — focusing on the spatial 3D layout, versus the textures, lighting and shading, and immediately explore a multitude of directions with generative AI, before committing to implementing one “for real.” Such depth-aware img2img workflows will save countless hours in 3D world building and concept art. Check out this example below to imagine what’s in store:

Twitter avatar for @ManuVision
Manu.Vision | Futuriste™ @ManuVision
New @Blender AI addon just dropped: @ai_render (with the power of @StableDiffusion). It allows you to turn your 3d scene into an Al generated image. We just published this video with @polygonrunway on IG/Tiktok/Youtube. Follow them to learn 3D illustration (link in their bio).
12:31 PM ∙ Oct 17, 2022
93Likes20Retweets

What else? I demand more! Of course, given my particular set of passions what I wanna do is plumb in metric accurate depth from a photogrammetry or LiDAR scan, or even a NeRF (neural radiance field) to take these “reskinning reality” experiments I’ve been loving to the next level… unless someone else beats me to it, which would be pretty cool :)

Update: Instead, the creative CoffeeVectors@ made an amazing example taking a generic 3D character walk cycle animation and “upleveling” it to photorealistic quality — look at that hair and facial lighting! A little jitter, but nothing the more temporally coherent models of the future won’t be able to fix.

Twitter avatar for @CoffeeVectors
CoffeeVectors @CoffeeVectors
Used #stablediffusion2 #depth2img model to render a more photoreal layer ontop of a walking animation I made in #UnrealEngine5 with #realtime clothing and hair on a #daz model. Breakdown thread 1/6 @UnrealEngine @daz3d #aiart #MachineLearning #aiartcommunity #aiartprocess #aiia
12:27 AM ∙ Dec 18, 2022
29Likes5Retweets

The velocity of these innovations cannot be overstated. Just 4 years ago in 2018 at Google, using multi-view stereo to generate depth maps for VFX felt cutting edge. Style transfer was the bleeding edge. But creators needed a fancy 360 camera and deep technical know how… now all they need is a phone to capture and a browser to create. Exciting times indeed!

Source: Seeing art in a new way: VR tools let characters jump right in

Enjoyed this write up? Consider recommending it to your fellow creatives and technologists. You can also hit me up on your favorite platform where I share informational and inspirational goodness: https://beacons.ai/billyfx

Creative Technology Digest is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.

5
3
Share
Share this post

🧠 Stable Diffusion 2.0 Out, Adding a New Dimension of Depth — Here’s Why It Matters 🎨

creativetechnologydigest.substack.com
3 Comments
KBS Sidhu
Writes The KBS Chronicle
Nov 26, 2022Liked by Bilawal Sidhu

Gosh! This stuff is developing at a pace I couldn’t have visualised even in my wildest dreams. 

Expand full comment
Reply
2 replies
2 more comments…
Top
New
Community

No posts

Ready for more?

© 2023 Bilawal "Billyfx" Sidhu
Privacy ∙ Terms ∙ Collection notice
Start WritingGet the app
Substack is the home for great writing