TensorRT Acceleration for RTX: Latest A1111 Extension from NVIDIA Adds SDXL Support and Higher Resolutions
The new NVIDIA TensorRT extension has come to the A1111 web UI. Does it deliver on promises to double speeds, even for SDXL models? We will compare the latest NVIDIA extension with former solutions. In this article we will also take a look at possible issues with installation and rendering.
In a recent article about TensorRT technology, I explored the creative possibilities it offers. The TensorRT extension creates a custom model for your system, derived from any selected SD model. When set up in SD Unet, it essentially doubles the speed of your renderings, albeit with some caveats.
One Click Install: How to Make It Work
I suggest to install new A1111 instance into separate folder for your experiments.
The installation should be very simple, just by installing new extension in Extensions/Install from URL (https://github.com/NVIDIA/Stable-Diffusion-WebUI-TensorRT
) in A1111, then apply and restart webui (it will take some time for downloads).
If it works for you, skip this chapter. However, you may run into issues with missing libraries (.dll errors) or missing TensorRT tab. If this happens, this short guide should help you.
- IMPORTANT NOTE: do the steps EXACTLY as described below
- Install the latest NVIDIA driver for your RTX card
- Go to your sd folder
- Delete in extensions folder: stable-diffusion-webui-tensorrt folder, if it exists from former installations
- Delete the VENV folder
- In command prompt, run webui to create a new VENV environment folder (this will take some time). After A1111 UI starts, close it and close terminal with command prompt.
- Open command terminal again in you stable-diffusion-webui/venv/Scripts folder.
- enter activate
- in venv line in command prompt, enter these commands one after another, always wait until the process finishes (this may take a while):
python.exe -m pip install --upgrade pip
python -m pip install nvidia-cudnn-cu11==8.9.4.25 --no-cache-dir
python -m pip install --pre --extra-index-url https://pypi.nvidia.com/ tensorrt==9.0.1.post11.dev4 --no-cache-dir
python -m pip uninstall -y nvidia-cudnn-cu11
deactivate
- Now run webui and install the extension from URL https://github.com/NVIDIA/Stable-Diffusion-WebUI-TensorRT
- Set
sd_unet
in Settings/User interface/Quicksettings list for control of the RT models. - Apply and restart
What We Can Do With This
Compared to the recent TensorRT experimental extension, NVIDIA's new version introduces additional options but also imposes certain limitations. Pros:
- SDXL support
- Working support for higher resolutions in SD models
- Model generation speed
- Ease of use and installation (with some hiccups, see above)
Resolution of TensorRT models for rendering must be divisible by 64: 512, 576, 640, 704, 768, 832, 896 (this was the highest resolution I could make with SD15 model in dynamic mode on A4000). SDXL resolution is 1024x1024.
Speed Boost for SDXL and SD
Here is speed comparison (A4000, same scene, 20 steps, xformers):
Euler a | NVIDIA RT | DPM++SDE | NVIDIA RT | |
---|---|---|---|---|
SDXL 1024x1024 (batch 1) | 0:15 | 0:8 | 0:30 | 0:15 |
SDXL 1024x1024 + Refiner 0.8 | 0:23 | 0:22 | 0:35 | 0:34 |
SDXL 1024x1024 (batch 2) | 0:31 | 0:16 | 0:38 | 0:20 |
SDXL 1024x1024 + Refiner 0.8 | 0:39 | 0:32 | 1:02 | 0:50 |
The speed boost is undeniable. But as you may see in the table, when using SDXL refiner the bonus speed is negligible in some situations.
There is certainly a perspective for application in iterative rendering and testing, offering significant time savings.
Downsides
- Even if you successfully make a model, it may not work for needed settings in resolution and batches (dynamic models). You may experiment, the generation is fast
- Token limits in negative prompts
- SDXL refiner is not optimizable (yet)
- Resolution limits
- Creating variantions with "baked" LoRAs is not possible in NVIDIA version of the extension
- Occasional crashing webui when switching models or overflowing token limit
- Limits LoRA use
You may not encounter (or rather notice) resolution issues, if you have a card with high VRAM. However, in my tests I was not able to generate models with batch/resolution parameters I use without TensorRT.
LoRA support is experimental, but with TensorRT LoRAs there can be an issue with effective backups of models and single LoRAs—you may just forget about the conversions and need to generate them again (if you reinstall A1111 in your future endeavours).
It would also be great to somehow show the limits (batch/resolution) of TensorRT models in the editor.
Conclusion
TensorRT acceleration is a must have (especially for SDXL, with higher number of batches) if you use RTX cards.
It may not fit into every workflow, but the speed boost is considerable. The extension is also under development, so some issues can be solved in near future, and I hope it will incorporate some options from the former TensorRT extension.