Hyper-SD LoRAs: Trajectory Segmented Destillation for Better and Faster Outputs in FLUX.1/SDXL/SD Models

"Une Bonne Blague" by Daniel Sandner. Hyper-SD for FLUX and SDXL models Experiment
"Une Bonne Blague", by Daniel Sandner, 2024

Hyper-SD offers an advantage in Stable Diffusion by allowing FLUX users to reduce the number of generation steps while maintaining or even improving image quality. This technique is applicable to various Stable Diffusion models, including SDXL, SD3, and SD 1.5. By optimizing the generation process, Hyper-SD provides a faster and more efficient way to create high-quality images.

Using Hyper-SD LoRAs

Download

SD/XL Note: Use CFG Scale 1-1.5 ('CFG-lora' versions require 3-7 CFG).

A1111/Forge

Add Hyper-SD LoRA as a LoRA to a prompt with a proper weight (guide on how to install Forge).

Weight for Hyper Flux is 0.125, Hyper SDXL around 0.75, Hyper SD15 around 1.

ComfyUI

1-step Unet checkpoint requires a specific scheduler node. LoRA versions are using standard sampler. Link to original comfyUI workflows and more information is in References. My test workflows on github. ComfyUI setup tutorial is here.

The LoRAs of Hyper-SD FLUX/SDXL/SD are compatible with ControlNet.

"Un Trop-plein de Chefs", by Daniel Sandner, 2024 (FLUX+Hyper-SD)

Results

The results obtained using Hyper-SD are surprisingly impressive across all three model ecosystems tested: FLUX, SDXL, and SD 1.5. While the SDXL Unet 1-step version exhibited some challenges in producing ugly noisy artifacts (and also produced more stylized results), these issues can be addressed through upscaling techniques and LoRAs. Hyper-SD (in LoRA forms) truly excels in rendering complex compositions with numerous figures, where it sometimes seems to surpass the limitations of the base models, particularly in terms of anatomical accuracy.

While speed might not be the primary concern when prioritizing image quality, the significant reduction in rendering time achieved with Hyper-SD becomes particularly valuable for FLUX (dev) models running locally, especially when using GPUs with limited VRAM. By eliminating approximately 5-10 steps per image, Hyper-SD can lead to substantial time savings.

FLUX (dev)

Hyper-FLUX LoRA comparison
Hyper-FLUX 8 steps LoRA, Steps: 9, Sampler: Euler, Schedule type: Simple, CFG scale: 1, Distilled CFG Scale: 3.5
Hyper FLUX, FLUX.1 low step output comparison
Hyper FLUX, 1pass: Euler, Schedule type: SGM Uniform, CFG scale: 1, Distilled CFG Scale: 2.5, Model: flux1-dev-bnb-nf4-v2,

SDXL

Using Hyper-SD LoRAs may introduce some artifacting in 1-pass workflow.

Hyper SDXL LoRA, Hyper-SD steps comparison, portrait of a couple
2-pass Hires fix, Hyper SDXL LoRA: Sampler: Euler, Schedule type: SGM Uniform, CFG scale: 1
Hyper SDXL low steps image generation, portrait composition
2-pass, CFG LoRA: Sampler: Euler, Schedule type: SGM Uniform, CFG scale: 3, Model: cinematix_v2, Denoising strength: 0.6, Hires upscale: 1.5, Hires upscaler: None

SD15

Hyper-SD LoRA test in Stable Diffusion 1.5 model
Sampler: Euler, Schedule type: SGM Uniform, CFG scale: 6, Model: photomatix_v3.fp16, Denoising strength: 0.55, Hires upscale: 1.5, Hires upscaler: None

Unet Versions (SDXL)

  • In Forge put 'Hyper-SDXL-1step-Unet-Comfyui.fp16' into models/Stable-diffusion, for generations use CFG = 1.  It needs 2-3 steps to form image, may require 2-pass / Hires fix to generate a usable image.
Hyper SDXL Unet model using Hyper-SD technique
Hyper SDXL Unet 1-step model in 'ForgeUI', Sampler: Euler a, Schedule type: SGM Uniform, CFG scale: 1

1-Step SDXL UNet for ComfyUI requires install of scheduler folder, see more details in References. 

Alternatively, needed TCDModelSamplingDiscrete node (ComfyUITCD) is also installable via ComfyUI Manager (drag&drop HYPERXL-1stepUNET test images from my COMFYUI test workflows

Conclusion

In conclusion, Hyper-SD LoRAs represent a nice addition to improve outputs of Stable Diffusion models. Overall, there are more benefits for FLUX dev model, where sparing some steps helps due to the slower generation— but for very low number of steps generations, FLUX (schnell) produces a better alternative (see also this comparison with Flux Turbo LoRA).

By optimizing the generation process through trajectory segmentation, this technique offers a compelling solution for achieving interesting  image generation outputs. The benefits of Hyper-SD extend to various applications, even correcting outputs in some samplers/schedulers/models combinations—and being in a LoRA form it is also easy to use. Using very low number of steps produces artifacts in SDXL and requires more passes and upscaling techniques to remove them. It may be worth it for achieving interesting compositions—and with FLUX model it will help to get even better details.

References

"Un Dessert Mortel", by Daniel Sandner, 2024 (FLUX+Hyper-SD)
Updated:

You may also like:

Subscribe

Stay connected to make sure you don’t miss anything. Join our newsletter community for artists, designers, and art and science enthusiasts.