ELLA: Leveraging LLMs for Enhanced Semantic Alignment in SD 1.5

Stable Diffusion models rely on CLIP as a text encoder, which limits their ability to understand more intricate relationships between prompt tokens. The 'Efficient Large Language Model Adapter' (ELLA) integrates Large Language Models with Stable Diffusion 1.5, enabling the better rendering of more complex scenes.

ELLA expands its capabilities beyond English, allowing you to experiment with text prompts in various languages (though this functionality might be limited in our basic workflow, considering VRAM limitations). More importantly, ELLA empowers you to utilize complex, verbose prompts and negative prompts, similar to what's possible with SDXL. This translates to significantly improved performance and coherence, especially when generating high-resolution images.

In this article, we will work in this order:

  • Install StableSwarm UI
  • Install ComfyUI Manager for it
  • Download a workflow to test nodes
  • Install missing custom nodes
  • Download ELLA model and Flan-T5-XL-encoder
  • Run the workflow in StableSwarm UI Comfy Workflow Editor

What You Need to Use the Workflow

StableSwarm UI or ComfyUi

Installation of StableSwarm UI for ComfyUI Workflows

You can easily install StableSwarm UI (link to download, section Installing on Windows) by running the installation bat file in the target Windows folder. If needed, read more about the installation in this recent article

ComfyUI Manager

I am supposing you have git installed from previous steps. You will need this important addon for ComfyUI to manage various custom nodes.To install this, go to your StableSwarm UI location, find "\StableSwarmUI\dlbackend\comfy\ComfyUI\custom_nodes" folder and run [cmd] from there:

git clone https://github.com/ltdrdata/ComfyUI-Manager.git

After restarting StableSwarm UI, you should have 'Manager' button in the right manu.

Loading a Workflow

In StableSwarm UI, go to 'Comfy Workflow Editor'. Load a workflow by drag&dropping image or .json containing the workflow into the worspace of the 'Comfy Workflow Editor'. When the workflow is loaded, go to ComfyUI Manager (A on the picture), and click Install Missing Custom Nodes.

Installing Missing Custom Nodes

Check and install missing custom nodes (if there are any) with ComfyUI Manager tool:  In ComfyUI Manager, 'Install Missing Custom Nodes' opens a list of nodes that are missing in your installation.

In our workflow (if you have a fresh install) these will be Advanced CLIP Text Encode, UE Nodes, and  ComfyUI_ELLA nodes. 

Install them all and then RESTART UI.

ComfyUI Manager and missing nodes
ComfyUI Manager window

Downloading LLM and ELLA models

You will need to download also the models and set them in proper folders:

  • ELLA model , save ella-sd1.5-tsc-t5xl.safetensors in your \models\ella folder
  • Flan-T5-XL Encoder only, download ALL files from this folder  and save it to your \models\t5_model\flan-t5-xl-encoder-only-bf16 folder
  • Alternatively you can use git clone in [cmd] in this '\models\t5_model\' folder, git clone https://huggingface.co/Kijai/flan-t5-xl-encoder-only-bf16 , download is 2.6GB. 

Running a Workflow

You run the opened workflow by using 'Queue Prompt' button. 

Possible Errors with SentencePiece

If you encounter an error Error occurred when executing LoadElla: T5Tokenizer requires the SentencePiece library but it was not found in your environment.or similar, go to ComfyUI Manager (see image above, 'A'), click Install PIP packages ('B'), and enter sentencepiece into the text box. Restart UI afterwards. 

Examples

Look at more examples and workflows here https://github.com/sandner-art/ai-research/tree/main/ELLA-Workflows/Test-Lab. Images are not retouched, inpainted or upscaled:

Tips for Prompting and Settings

  • Set the general CLIP prompt simplistic as you are used to with SD
  • Start with simple ELLA positive and negative
  • Experiment with wordy descriptions in ELLA prompts, describing a lot of details and interactions
  • Experiment with CLIP/ELLA overlays (the moment one one starts and the other begins and when they work together) 
  • Test various samplers, schedulers, steps, CFG (more steps usually add details)
  • Works with LCM, mileage may vary depending on a checkpoint

Conclusion

The Efficient Large Language Model Adapter (ELLA) proves to be a practical way to introduce a high degree of consistent detail into SD 1.5 models. However, ELLA is unlikely to be adapted for more resource-intensive SD models like SDXL and SD 3. These models already have their own solutions, and the ELLA workflow would likely require excessive VRAM, making it unsuitable for consumer-grade graphics cards.

The provided examples allow you to experiment with ELLA locally on a moderately powerful NVIDIA graphics card (if you want to test just the T5-Flan LLM model, read this article on Large Language Models). An additional benefit is the ease of installation with StableSwarmUI. This user interface offers ComfyUI workflow system, enabling you to explore and develop interesting effects and techniques.

References

Updated:

You may also like:

Subscribe

Stay connected to make sure you don’t miss anything. Join our newsletter community for artists, designers, and art and science enthusiasts.