ELLA: Leveraging LLMs for Enhanced Semantic Alignment in SD 1.5
Stable Diffusion models rely on CLIP as a text encoder, which limits their ability to understand more intricate relationships between prompt tokens. The 'Efficient Large Language Model Adapter' (ELLA) integrates Large Language Models with Stable Diffusion 1.5, enabling the better rendering of more complex scenes.
ELLA expands its capabilities beyond English, allowing you to experiment with text prompts in various languages (though this functionality might be limited in our basic workflow, considering VRAM limitations). More importantly, ELLA empowers you to utilize complex, verbose prompts and negative prompts, similar to what's possible with SDXL. This translates to significantly improved performance and coherence, especially when generating high-resolution images.
In this article, we will work in this order:
- Install StableSwarm UI
- Install ComfyUI Manager for it
- Download a workflow to test nodes
- Install missing custom nodes
- Download ELLA model and Flan-T5-XL-encoder
- Run the workflow in StableSwarm UI Comfy Workflow Editor
What You Need to Use the Workflow
StableSwarm UI or ComfyUi
Installation of StableSwarm UI for ComfyUI Workflows
You can easily install StableSwarm UI (link to download, section Installing on Windows) by running the installation bat file in the target Windows folder. If needed, read more about the installation in this recent article.
ComfyUI Manager
I am supposing you have git installed from previous steps. You will need this important addon for ComfyUI to manage various custom nodes.To install this, go to your StableSwarm UI location, find "\StableSwarmUI\dlbackend\comfy\ComfyUI\custom_nodes" folder and run [cmd] from there:
git clone https://github.com/ltdrdata/ComfyUI-Manager.git
After restarting StableSwarm UI, you should have 'Manager' button in the right manu.
Loading a Workflow
In StableSwarm UI, go to 'Comfy Workflow Editor'. Load a workflow by drag&dropping image or .json containing the workflow into the worspace of the 'Comfy Workflow Editor'. When the workflow is loaded, go to ComfyUI Manager (A on the picture), and click Install Missing Custom Nodes.
Installing Missing Custom Nodes
Check and install missing custom nodes (if there are any) with ComfyUI Manager tool: In ComfyUI Manager, 'Install Missing Custom Nodes' opens a list of nodes that are missing in your installation.
In our workflow (if you have a fresh install) these will be Advanced CLIP Text Encode, UE Nodes, and ComfyUI_ELLA nodes.
Install them all and then RESTART UI.
Downloading LLM and ELLA models
You will need to download also the models and set them in proper folders:
- ELLA model , save ella-sd1.5-tsc-t5xl.safetensors in your
\models\ella
folder - Flan-T5-XL Encoder only, download ALL files from this folder and save it to your
\models\t5_model\flan-t5-xl-encoder-only-bf16 folder
- Alternatively you can use git clone in [cmd] in this '\models\t5_model\' folder,
git clone https://huggingface.co/Kijai/flan-t5-xl-encoder-only-bf16
, download is 2.6GB.
Running a Workflow
You run the opened workflow by using 'Queue Prompt' button.
Possible Errors with SentencePiece
If you encounter an error Error occurred when executing LoadElla: T5Tokenizer requires the SentencePiece library but it was not found in your environment.
or similar, go to ComfyUI Manager (see image above, 'A'), click Install PIP packages ('B'), and enter sentencepiece
into the text box. Restart UI afterwards.
Examples
Look at more examples and workflows here https://github.com/sandner-art/ai-research/tree/main/ELLA-Workflows/Test-Lab. Images are not retouched, inpainted or upscaled:
Tips for Prompting and Settings
- Set the general CLIP prompt simplistic as you are used to with SD
- Start with simple ELLA positive and negative
- Experiment with wordy descriptions in ELLA prompts, describing a lot of details and interactions
- Experiment with CLIP/ELLA overlays (the moment one one starts and the other begins and when they work together)
- Test various samplers, schedulers, steps, CFG (more steps usually add details)
- Works with LCM, mileage may vary depending on a checkpoint
Conclusion
The Efficient Large Language Model Adapter (ELLA) proves to be a practical way to introduce a high degree of consistent detail into SD 1.5 models. However, ELLA is unlikely to be adapted for more resource-intensive SD models like SDXL and SD 3. These models already have their own solutions, and the ELLA workflow would likely require excessive VRAM, making it unsuitable for consumer-grade graphics cards.
The provided examples allow you to experiment with ELLA locally on a moderately powerful NVIDIA graphics card (if you want to test just the T5-Flan LLM model, read this article on Large Language Models). An additional benefit is the ease of installation with StableSwarmUI. This user interface offers ComfyUI workflow system, enabling you to explore and develop interesting effects and techniques.
References
- ELLA: Equip Diffusion Models with LLM for Enhanced Semantic Alignment https://github.com/TencentQQGYLab/ELLA
- DOWNLOAD ELLA model https://huggingface.co/QQGYLab/ELLA
- DOWNLOAD Flat T5 encoder only https://huggingface.co/Kijai/flan-t5-xl-encoder-only-bf16/tree/main
- My test folder and parameters https://github.com/sandner-art/ai-research/tree/main/ELLA-Workflows
- Flan T5 https://huggingface.co/google/flan-t5-xl
- Photomatix SD 1.5 model article, download it from Civitai (you may use https://civitai.com/login?ref_code=AIR-XIP to log in to get some rendering credits)
- Original ELLA workflow with encoder only solution: https://civitai.com/posts/2098993 , examples https://civitai.com/posts/2111979