SD 3.5 Medium: Skip Layer Guidance! (and Fix Composition, Hands, and Anatomy)

Daniel Sandner October 29, 2024

Stable Diffusion 3.5 Medium, Comfy UI "The Seafarer", by Daniel Sandner

Today, the model Stable Diffusion 3.5 Medium was released. It is a smaller model that will be appreciated by users of GPUs with smaller VRAMs (12 GB, maybe even less for some use cases; I would say it will fit into 8-10 GB on Linux).
But that is not all. There is also a new node created for the Medium model that tries to fix persistent issues with anatomy and composition (the main drawback of SD 3.5, as I described in this article about the SD 3.5 Large and Turbo models), especially annoying when generating hand poses. This issue is now partially fixed (and without additional models or training).

'All good things come to those who wait.'

About the Medium 3.5 Model:
- Download .safetensors or .gguf
How to Fix Hands in SD 3.5
Look, It Has Layers!
Possible Errors
- The SkipLayerGuidanceSD3 node is not showing, where to find it?
- Error(s) in loading state_dict for OpenAISignatureMMDITWrapper: size mismatch for joint_blocks.0.x_block.adaLN_modulation.1.weight (CheckpointLoaderSimple):
SD 3.5 (Medium): How It Stands to Large Model
Conclusion
Resources

About the Medium 3.5 Model:

"Stable Diffusion 3.5 Medium" developed by Stability AI is a text-to-image model (Multimodal Diffusion Transformer with improvements, MMDiT-X) that uses three pretrained text encoders for prompts and self-attention modules, enhancing multi-resolution generation and image coherence. It has a different data distribution than the SD 3.5 Large model, 2.5 billion parameters (the Large model has 8 billion). SAI recommends using resolutions divisible by 64, but various resolutions are possible. They also mention a limit of 255 tokens for the T5 encoder, after which some artifacts can occur, which I have not yet encountered (or noticed).

The model itself is relatively tiny, at 5.11 GB to download. Place it in the standard Stable Diffusion model directory. It will require the usual CLIP encoders (clip_g, clip_l, t5xxl):

Download .safetensors or .gguf

Download (and SAI workflows): https://huggingface.co/stabilityai/stable-diffusion-3.5-medium/tree/main
GGUF (smaller, quantized) versions https://huggingface.co/city96/stable-diffusion-3.5-medium-gguf/tree/main
Encoders: https://huggingface.co/Comfy-Org/stable-diffusion-3.5-fp8/tree/main/text_encoders
Workflows with adjustments for SD 3.5 speed and VRAM usage (need Turbo LoRA).

You need to update Comfy UI and restart.

How to Fix Hands in SD 3.5

When you update Comfy UI, load the workflows and test the comparisons. Alternatively, insert the SkipLayerGuidanceSD3 node into your workflow and experiment with the settings. The final image may change depending on the prompt: if you have more general descriptions, the result can change substantially.

For the record, we need to admit that the fix does not work in all cases. Sometimes the result in SD Medium is even better without the fix. You need to tweak the settings for the best compositions.

ComfyUI SkipLayerGuidanceSD3 note to fix SD 3.5 models — Set the layers and "strength" with the other parameters

Examples:

SD 3.5 Medium fixing hands — Medium: Error with nails removed in this case.

SkipLayersGuidance fixing hand poses in Stable Diffusion — SkipLayersGuidance in SD (3.5) Medium: Adjusting hand poses and items. Notice the wrong number of fingers holding the pipe (even if the pose is possible from this angle), the results are better without a question in these images

SD Medium and Large techniques of how to correct hands and hand poses, portraits — Skipping layers in Medium and Large models. You need to adjust the parameters to get the best results.

The good news is that the fix works in the Turbo and in the Large model as well.

Look, It Has Layers!

You may play with the SkipLayerGuidanceSD3 node by skipping layers for various effects and output variations. I was testing layer combinations and would say that you often achieve better results with variations of the "layers" numbers and default settings. I was trying to fix the same composition issues lately by interfering with the diffusion process using Negative Shift (workflows on my GitHub), noise, and prompt injections, with diverse results. The layer skipping is more streamlined, and as it is a temporary solution, it brings a new playground.

Skipping layers for fixing hands in stable diffusion — Skipping layers 0(none) to 15. Combining layer setting can create interesting styles.

SD Large knight portrait with burning sword — Layer Skipping in Large, combinations of layers: Interpretations of "a burning flaming-sword".

SD 3.5 Turbo using LayerSkipGuidanceSD3 — Turbo: You may also experiment with styles using the LayerSkipGuidanceSD3 technique

Possible Errors

Everything should go smoothly. However, if you encounter issues:

The SkipLayerGuidanceSD3 node is not showing, where to find it?

The node may be shown as a red rectangle in the workflow, and it may not be possible to find the node using Search or Manager. Update ComfyUI as described bellow to resolve this.

Error(s) in loading state_dict for OpenAISignatureMMDITWrapper: size mismatch for joint_blocks.0.x_block.adaLN_modulation.1.weight (CheckpointLoaderSimple):

Update, close the browser and all ComfyUI instances, and then restart. If the problem persists, you may need to reinstall the dependencies for ComfyUI. Use update_comfyui_and_python_dependencies.bat .

SD 3.5 (Medium): How It Stands to Large Model

Surprisingly good, considering the different use cases for both models. You may not get the best pictures from the start; there is some tweaking needed. The model is also quite fast, depending on your configuration.

Comparison of medium and large Stability AI new 3.5 generative models — "portrait photo, old pirate man"

In this example, you can see the difference in overall output between the Large and Medium models. The SLG (SkipLayerGuidanceSD3) tends to oversaturate images with some loss of detail, so the CFG and ModelSampling need to be tuned down. The examples were generated with fixed seed 50-55 using simple Euler/sgm_uniform with 25 steps (I get better results with around 50 steps for more detail). Additionally, the dpmpp_2m sampler gives nice results for general use in the Medium model.

Note that in this case, I have used the strong token "pirate," yet the output is quite diverse, and you can clearly see different characters in the images.

Conclusion

Overall, SD Medium is a good model to have, especially with a weaker GPU.The Medium model is not just a dumbed-down version of the Large model—it can produce impressive results that can differ significantly, and it has many aspects going for its favor.

Tinkering with the SkipLayerGuidanceSD3 node will add another option for output variations; it can correct an image or resurface an interesting style.

Resources

Model Page SD Medium SAI https://huggingface.co/stabilityai/stable-diffusion-3.5-medium
MMDiT-X Architecture https://stability.ai/news/stable-diffusion-3-research-paper
Announcement Comfy https://blog.comfy.org/sd-35-medium/
My workflows for SD 3.5.