ELLA: Leveraging LLMs for Enhanced Semantic Alignment in SD 1.5

Daniel Sandner April 14, 2024

Stable Diffusion models rely on CLIP as a text encoder, which limits their ability to understand more intricate relationships between prompt tokens. The 'Efficient Large Language Model Adapter' (ELLA) integrates Large Language Models with Stable Diffusion 1.5, enabling the better rendering of more complex scenes.

ELLA expands its capabilities beyond English, allowing you to experiment with text prompts in various languages (though this functionality might be limited in our basic workflow, considering VRAM limitations). More importantly, ELLA empowers you to utilize complex, verbose prompts and negative prompts, similar to what's possible with SDXL. This translates to significantly improved performance and coherence, especially when generating high-resolution images.

In this article, we will work in this order:

Install StableSwarm UI
Install ComfyUI Manager for it
Download a workflow to test nodes
Install missing custom nodes
Download ELLA model and Flan-T5-XL-encoder
Run the workflow in StableSwarm UI Comfy Workflow Editor

What You Need to Use the Workflow
Running a Workflow
Possible Errors with SentencePiece
Examples
Tips for Prompting and Settings
Conclusion
References

What You Need to Use the Workflow

StableSwarm UI or ComfyUi

Installation of StableSwarm UI for ComfyUI Workflows

You can easily install StableSwarm UI (link to download, section Installing on Windows) by running the installation bat file in the target Windows folder. If needed, read more about the installation in this recent article.

ComfyUI Manager

I am supposing you have git installed from previous steps. You will need this important addon for ComfyUI to manage various custom nodes.To install this, go to your StableSwarm UI location, find "\StableSwarmUI\dlbackend\comfy\ComfyUI\custom_nodes" folder and run [cmd] from there:

git clone https://github.com/ltdrdata/ComfyUI-Manager.git

After restarting StableSwarm UI, you should have 'Manager' button in the right manu.

Loading a Workflow

In StableSwarm UI, go to 'Comfy Workflow Editor'. Load a workflow by drag&dropping image or .json containing the workflow into the worspace of the 'Comfy Workflow Editor'. When the workflow is loaded, go to ComfyUI Manager (A on the picture), and click Install Missing Custom Nodes.

Installing Missing Custom Nodes

Check and install missing custom nodes (if there are any) with ComfyUI Manager tool: In ComfyUI Manager, 'Install Missing Custom Nodes' opens a list of nodes that are missing in your installation.

In our workflow (if you have a fresh install) these will be Advanced CLIP Text Encode, UE Nodes, and ComfyUI_ELLA nodes.

Install them all and then RESTART UI.

ComfyUI Manager and missing nodes — ComfyUI Manager window

Downloading LLM and ELLA models

You will need to download also the models and set them in proper folders:

ELLA model , save ella-sd1.5-tsc-t5xl.safetensors in your \models\ella folder
Flan-T5-XL Encoder only, download ALL files from this folder and save it to your \models\t5_model\flan-t5-xl-encoder-only-bf16 folder
Alternatively you can use git clone in [cmd] in this '\models\t5_model\' folder, git clone https://huggingface.co/Kijai/flan-t5-xl-encoder-only-bf16 , download is 2.6GB.

Running a Workflow

You run the opened workflow by using 'Queue Prompt' button.

Possible Errors with SentencePiece

If you encounter an error Error occurred when executing LoadElla: T5Tokenizer requires the SentencePiece library but it was not found in your environment.or similar, go to ComfyUI Manager (see image above, 'A'), click Install PIP packages ('B'), and enter sentencepiece into the text box. Restart UI afterwards.

Examples

Look at more examples and workflows here https://github.com/sandner-art/ai-research/tree/main/ELLA-Workflows/Test-Lab. Images are not retouched, inpainted or upscaled:

SD 1.5 ELLA LLM for Enhanced Semantic Alignment rendering fantastic details, art by D. Sandner — Character Composition Scenes: ELLA LLM for Enhanced Semantic Alignment

Science fiction design with Stable Diffusion ELLA LLM for Enhanced Semantic Alignment, by Daniel Sandner — Group Composition Scenes: ELLA LLM for Enhanced Semantic Alignment

Complex Scenes: Portrait realistically with ELLA LLM for Enhanced Semantic Alignment, art by Sandner — Complex Scenes: ELLA LLM for Enhanced Semantic Alignment

Tips for Prompting and Settings

Set the general CLIP prompt simplistic as you are used to with SD
Start with simple ELLA positive and negative
Experiment with wordy descriptions in ELLA prompts, describing a lot of details and interactions
Experiment with CLIP/ELLA overlays (the moment one one starts and the other begins and when they work together)
Test various samplers, schedulers, steps, CFG (more steps usually add details)
Works with LCM, mileage may vary depending on a checkpoint

Conclusion

The Efficient Large Language Model Adapter (ELLA) proves to be a practical way to introduce a high degree of consistent detail into SD 1.5 models. However, ELLA is unlikely to be adapted for more resource-intensive SD models like SDXL and SD 3. These models already have their own solutions, and the ELLA workflow would likely require excessive VRAM, making it unsuitable for consumer-grade graphics cards.

The provided examples allow you to experiment with ELLA locally on a moderately powerful NVIDIA graphics card (if you want to test just the T5-Flan LLM model, read this article on Large Language Models). An additional benefit is the ease of installation with StableSwarmUI. This user interface offers ComfyUI workflow system, enabling you to explore and develop interesting effects and techniques.

References

ELLA: Equip Diffusion Models with LLM for Enhanced Semantic Alignment https://github.com/TencentQQGYLab/ELLA
DOWNLOAD ELLA model https://huggingface.co/QQGYLab/ELLA
DOWNLOAD Flat T5 encoder only https://huggingface.co/Kijai/flan-t5-xl-encoder-only-bf16/tree/main
My test folder and parameters https://github.com/sandner-art/ai-research/tree/main/ELLA-Workflows
Flan T5 https://huggingface.co/google/flan-t5-xl
Photomatix SD 1.5 model article, download it from Civitai (you may use https://civitai.com/login?ref_code=AIR-XIP to log in to get some rendering credits)
Original ELLA workflow with encoder only solution: https://civitai.com/posts/2098993 , examples https://civitai.com/posts/2111979

ELLA: Leveraging LLMs for Enhanced Semantic Alignment in SD 1.5

What You Need to Use the Workflow