FLUX

Hyper-SD LoRAs: Trajectory Segmented Destillation for Better and Faster Outputs in FLUX.1/SDXL/SD Models

Daniel Sandner September 11, 2024

"Une Bonne Blague" by Daniel Sandner. Hyper-SD for FLUX and SDXL models Experiment — "Une Bonne Blague", by Daniel Sandner, 2024

Hyper-SD offers an advantage in Stable Diffusion by allowing FLUX users to reduce the number of generation steps while maintaining or even improving image quality. This technique is applicable to various Stable Diffusion models, including SDXL, SD3, and SD 1.5. By optimizing the generation process, Hyper-SD provides a faster and more efficient way to create high-quality images.

Using Hyper-SD LoRAs
Results
Unet Versions (SDXL)
Conclusion
References

Using Hyper-SD LoRAs

Download

Original https://huggingface.co/ByteDance/Hyper-SD/tree/main
FP16 (half-size) version of Hyper-FLUX.1-dev-8steps-lora https://huggingface.co/nakodanei/Hyper-FLUX.1-dev-8steps-lora-fp16/tree/main

SD/XL Note: Use CFG Scale 1-1.5 ('CFG-lora' versions require 3-7 CFG).

A1111/Forge

Add Hyper-SD LoRA as a LoRA to a prompt with a proper weight (guide on how to install Forge).

Weight for Hyper Flux is 0.125, Hyper SDXL around 0.75, Hyper SD15 around 1.

ComfyUI

1-step Unet checkpoint requires a specific scheduler node. LoRA versions are using standard sampler. Link to original comfyUI workflows and more information is in References. My test workflows on github. ComfyUI setup tutorial is here.

The LoRAs of Hyper-SD FLUX/SDXL/SD are compatible with ControlNet.

"Un Trop-plein de Chefs", by Daniel Sandner, 2024 (FLUX+Hyper-SD)

Results

The results obtained using Hyper-SD are surprisingly impressive across all three model ecosystems tested: FLUX, SDXL, and SD 1.5. While the SDXL Unet 1-step version exhibited some challenges in producing ugly noisy artifacts (and also produced more stylized results), these issues can be addressed through upscaling techniques and LoRAs. Hyper-SD (in LoRA forms) truly excels in rendering complex compositions with numerous figures, where it sometimes seems to surpass the limitations of the base models, particularly in terms of anatomical accuracy.

While speed might not be the primary concern when prioritizing image quality, the significant reduction in rendering time achieved with Hyper-SD becomes particularly valuable for FLUX (dev) models running locally, especially when using GPUs with limited VRAM. By eliminating approximately 5-10 steps per image, Hyper-SD can lead to substantial time savings.

FLUX (dev)

Hyper-FLUX LoRA comparison — Hyper-FLUX 8 steps LoRA, Steps: 9, Sampler: Euler, Schedule type: Simple, CFG scale: 1, Distilled CFG Scale: 3.5

Hyper FLUX, FLUX.1 low step output comparison — Hyper FLUX, 1pass: Euler, Schedule type: SGM Uniform, CFG scale: 1, Distilled CFG Scale: 2.5, Model: flux1-dev-bnb-nf4-v2,

SDXL

Using Hyper-SD LoRAs may introduce some artifacting in 1-pass workflow.

Hyper SDXL low steps image generation, portrait composition — 2-pass, CFG LoRA: Sampler: Euler, Schedule type: SGM Uniform, CFG scale: 3, Model: cinematix_v2, Denoising strength: 0.6, Hires upscale: 1.5, Hires upscaler: None

SD15

Hyper-SD LoRA test in Stable Diffusion 1.5 model — Sampler: Euler, Schedule type: SGM Uniform, CFG scale: 6, Model: photomatix_v3.fp16, Denoising strength: 0.55, Hires upscale: 1.5, Hires upscaler: None

Unet Versions (SDXL)

In Forge put 'Hyper-SDXL-1step-Unet-Comfyui.fp16' into models/Stable-diffusion, for generations use CFG = 1. It needs 2-3 steps to form image, may require 2-pass / Hires fix to generate a usable image.

Hyper SDXL Unet model using Hyper-SD technique — Hyper SDXL Unet 1-step model in 'ForgeUI', Sampler: Euler a, Schedule type: SGM Uniform, CFG scale: 1

1-Step SDXL UNet for ComfyUI requires install of scheduler folder, see more details in References.

Alternatively, needed TCDModelSamplingDiscrete node (ComfyUITCD) is also installable via ComfyUI Manager (drag&drop HYPERXL-1stepUNET test images from my COMFYUI test workflows.

Conclusion

In conclusion, Hyper-SD LoRAs represent a nice addition to improve outputs of Stable Diffusion models. Overall, there are more benefits for FLUX dev model, where sparing some steps helps due to the slower generation— but for very low number of steps generations, FLUX (schnell) produces a better alternative (see also this comparison with Flux Turbo LoRA).

By optimizing the generation process through trajectory segmentation, this technique offers a compelling solution for achieving interesting image generation outputs. The benefits of Hyper-SD extend to various applications, even correcting outputs in some samplers/schedulers/models combinations—and being in a LoRA form it is also easy to use. Using very low number of steps produces artifacts in SDXL and requires more passes and upscaling techniques to remove them. It may be worth it for achieving interesting compositions—and with FLUX model it will help to get even better details.

References

Paper Hyper-SD: Trajectory Segmented Consistency Model for Efficient Image Synthesis: https://hyper-sd.github.io/
ComfyUI workflows https://huggingface.co/ByteDance/Hyper-SD/tree/main/comfyui, https://huggingface.co/ByteDance/Hyper-SD
Test workflows https://github.com/sandner-art/ai-research/tree/main/HYPER-SD

Hyper-SD LoRAs: Trajectory Segmented Destillation for Better and Faster Outputs in FLUX.1/SDXL/SD Models

Using Hyper-SD LoRAs

Download

A1111/Forge

ComfyUI

Results

FLUX (dev)

SDXL

SD15

Unet Versions (SDXL)

Conclusion

References

You may also like:

AI for Designers: Training Custom LoRA Models

How to Render Blender 3D Models in Stable Diffusion

Using Hyper-SD LoRAs

Download

A1111/Forge

ComfyUI

Results

FLUX (dev)

SDXL

SD15

Unet Versions (SDXL)

Conclusion

References

You may also like:

Subscribe