Semantic Guidance SDXL: Instructing Diffusion using Semantic Dimensions

Semantic Guidance in A1111 using SDXL and SD 1.5 models

When engineering a prompt for stable diffusion, even small changes to the input prompt often result in wildly different output images. This is especially true when using tokens (keywords) with high weight—typically a color, style, or environment. This may lead to some heavy prompt adjustments and in the end to many tiresome experiments. The composition is also at stake unless we are not taking some heavy ControlNet measures.

Luckily now we can use Semantic Guidance to interact with the diffusion process using a comfortable A1111 extension.

Adjusting portraits in SDXL models using Semantic Guidance
Semantic Guidance using different prompts

What It Does

It uses simple textual descriptions for semantic guidance inferences (SEGA) without any additional segmentation masks. This enables simultaneous edits to images, mitigates some biases of a trained model, and can create slight changes in composition and style, as well as the artistic conception. This is actually very interesting research touching subject of disentanglement models’ latent space and architecture-agnostic quantifications, for the details see the paper in References at the end of this article.

Installation of A1111 Extension

In A1111 web UI go to Extensions/Avalable/Load from. In the list, find Semantic Guidance (sd-webui-semantic-guidance), Install and restart A1111. 

Semantic Guidance in Stable Diffusion, A1111
Semantic Fuidance Extension in the stack

How to Use It

Basic setting is on the image above. Warmup controls at which step Semantic Guidance will start to take effect. Prompt of Semantic Guidance will steer the main prompt you have. In the example, the main prompt "couple mature, fashion clothes, character photo portrait" was adjusted with "Victorian elegant clothes costumes".

Semantic Guidance Tail Percentage and Warmup Steps grid
Semantic Guidance Tail Percentage and Warmup Steps
Edit Guidance Scale option in Semantic Guidance
Edit Guidance Scale option
Semantic Guidance in SDXL scene
SDXL and SD 1.5 models affected by Semantic Guidance, Total steps 20

For the most common use, adjust warmup steps and set Tail percentage threshold around 0.5. If you want to get closer to your prompt, lower the Edit Guidance Scale option. Momentum Scale and Beta does not seem to have any effect (tested in SDXL).

Implanting LoRA with Semantic Guidance

You may also modify a composition with LoRA model, implanted via Semantic Guidance extension. It brings the benefit of keeping the composition. This is an interesting way of how to use various LoRA styles and translate some features.

LoRAs entered in Semantic Guidance prompts have no effect (yet). You may circumvent this by using keyword tokens (activating words) for a LoRA model in the Semantic Guidance prompt. This is an interesting alternative for LoRA weights in the main prompt (you can combine the effect too). Reminder: LoRA needs to be inserted into the main prompt.

Semantic Guidance and LoRA effects
Testing LoRA glow and style in Semantic Guidance prompt (20 steps)

Comparison

It reacts very well in SDXL and it also works in SD 1.5 models. Achieving similar results with different means would be very time and resource consuming. But with combination with Regional Prompter or Latent Couple it could be even more powerful in the future (for now you can not target a single region).

Conclusion

Simplicity. This is the key for using this interesting technique. The extension also does not seem to get in the way of other A1111 addons. This extension is very robust addition to your SD toolbox.

References

  • Research https://github.com/ml-research/semantic-image-editing
  • A1111 extension https://github.com/v0xie/sd-webui-semantic-guidance
  • Paper SEGA: Instructing Text-to-Image Models using Semantic Guidance https://arxiv.org/abs/2301.12247
Updated:

You may also like:

Subscribe

Stay connected to make sure you don’t miss anything. Join our newsletter community for artists, designers, and art and science enthusiasts.