Semantic Guidance SDXL: Instructing Diffusion using Semantic Dimensions

Daniel Sandner February 3, 2024

When engineering a prompt for stable diffusion, even small changes to the input prompt often result in wildly different output images. This is especially true when using tokens (keywords) with high weight—typically a color, style, or environment. This may lead to some heavy prompt adjustments and in the end to many tiresome experiments. The composition is also at stake unless we are not taking some heavy ControlNet measures.

Luckily now we can use Semantic Guidance to interact with the diffusion process using a comfortable A1111 extension.

Adjusting portraits in SDXL models using Semantic Guidance — Semantic Guidance using different prompts

What It Does

It uses simple textual descriptions for semantic guidance inferences (SEGA) without any additional segmentation masks. This enables simultaneous edits to images, mitigates some biases of a trained model, and can create slight changes in composition and style, as well as the artistic conception. This is actually very interesting research touching subject of disentanglement models’ latent space and architecture-agnostic quantifications, for the details see the paper in References at the end of this article.

Installation of A1111 Extension

In A1111 web UI go to Extensions/Avalable/Load from. In the list, find Semantic Guidance (sd-webui-semantic-guidance), Install and restart A1111.

Semantic Guidance in Stable Diffusion, A1111 — Semantic Fuidance Extension in the stack

How to Use It

Basic setting is on the image above. Warmup controls at which step Semantic Guidance will start to take effect. Prompt of Semantic Guidance will steer the main prompt you have. In the example, the main prompt "couple mature, fashion clothes, character photo portrait" was adjusted with "Victorian elegant clothes costumes".

Semantic Guidance Tail Percentage and Warmup Steps grid — Semantic Guidance Tail Percentage and Warmup Steps

Edit Guidance Scale option in Semantic Guidance — Edit Guidance Scale option

Semantic Guidance in SDXL scene — SDXL and SD 1.5 models affected by Semantic Guidance, Total steps 20

For the most common use, adjust warmup steps and set Tail percentage threshold around 0.5. If you want to get closer to your prompt, lower the Edit Guidance Scale option. Momentum Scale and Beta does not seem to have any effect (tested in SDXL).

Implanting LoRA with Semantic Guidance

You may also modify a composition with LoRA model, implanted via Semantic Guidance extension. It brings the benefit of keeping the composition. This is an interesting way of how to use various LoRA styles and translate some features.

LoRAs entered in Semantic Guidance prompts have no effect (yet). You may circumvent this by using keyword tokens (activating words) for a LoRA model in the Semantic Guidance prompt. This is an interesting alternative for LoRA weights in the main prompt (you can combine the effect too). Reminder: LoRA needs to be inserted into the main prompt.

Semantic Guidance and LoRA effects — Testing LoRA glow and style in Semantic Guidance prompt (20 steps)

Comparison

It reacts very well in SDXL and it also works in SD 1.5 models. Achieving similar results with different means would be very time and resource consuming. But with combination with Regional Prompter or Latent Couple it could be even more powerful in the future (for now you can not target a single region).

Conclusion

Simplicity. This is the key for using this interesting technique. The extension also does not seem to get in the way of other A1111 addons. This extension is very robust addition to your SD toolbox.

References

Research https://github.com/ml-research/semantic-image-editing
A1111 extension https://github.com/v0xie/sd-webui-semantic-guidance
Paper SEGA: Instructing Text-to-Image Models using Semantic Guidance https://arxiv.org/abs/2301.12247

Semantic Guidance SDXL: Instructing Diffusion using Semantic Dimensions

What It Does

Installation of A1111 Extension

How to Use It

Implanting LoRA with Semantic Guidance

Comparison

Conclusion

References

You may also like:

Create Atmospheric Effects in Stable Diffusion

Realistic Synthetic Photos With Diffusion Models

What It Does

Installation of A1111 Extension

How to Use It

Implanting LoRA with Semantic Guidance

Comparison

Conclusion

References

You may also like:

Subscribe