TensorRT Acceleration for RTX: Latest A1111 Extension from NVIDIA Adds SDXL Support and Higher Resolutions

Open Research October 19, 2023

The new NVIDIA TensorRT extension has come to the A1111 web UI. Does it deliver on promises to double speeds, even for SDXL models? We will compare the latest NVIDIA extension with former solutions. In this article we will also take a look at possible issues with installation and rendering.

In a recent article about TensorRT technology, I explored the creative possibilities it offers. The TensorRT extension creates a custom model for your system, derived from any selected SD model. When set up in SD Unet, it essentially doubles the speed of your renderings, albeit with some caveats.

One Click Install: How to Make It Work

I suggest to install new A1111 instance into separate folder for your experiments.

The installation should be very simple, just by installing new extension in Extensions/Install from URL (https://github.com/NVIDIA/Stable-Diffusion-WebUI-TensorRT) in A1111, then apply and restart webui (it will take some time for downloads).

If it works for you, skip this chapter. However, you may run into issues with missing libraries (.dll errors) or missing TensorRT tab. If this happens, this short guide should help you.

IMPORTANT NOTE: do the steps EXACTLY as described below
Install the latest NVIDIA driver for your RTX card
Go to your sd folder
Delete in extensions folder: stable-diffusion-webui-tensorrt folder, if it exists from former installations
Delete the VENV folder
In command prompt, run webui to create a new VENV environment folder (this will take some time). After A1111 UI starts, close it and close terminal with command prompt.

Open command terminal again in you stable-diffusion-webui/venv/Scripts folder.
enter activate
in venv line in command prompt, enter these commands one after another, always wait until the process finishes (this may take a while):

python.exe -m pip install --upgrade pip

python -m pip install nvidia-cudnn-cu11==8.9.4.25 --no-cache-dir

python -m pip install --pre --extra-index-url https://pypi.nvidia.com/ tensorrt==9.0.1.post11.dev4 --no-cache-dir

python -m pip uninstall -y nvidia-cudnn-cu11

deactivate

Now run webui and install the extension from URL https://github.com/NVIDIA/Stable-Diffusion-WebUI-TensorRT
Set sd_unet in Settings/User interface/Quicksettings list for control of the RT models.
Apply and restart

What We Can Do With This

Compared to the recent TensorRT experimental extension, NVIDIA's new version introduces additional options but also imposes certain limitations. Pros:

SDXL support
Working support for higher resolutions in SD models
Model generation speed
Ease of use and installation (with some hiccups, see above)

Resolution of TensorRT models for rendering must be divisible by 64: 512, 576, 640, 704, 768, 832, 896 (this was the highest resolution I could make with SD15 model in dynamic mode on A4000). SDXL resolution is 1024x1024.

Speed Boost for SDXL and SD

Here is speed comparison (A4000, same scene, 20 steps, xformers):

	Euler a	NVIDIA RT	DPM++SDE	NVIDIA RT
SDXL 1024x1024 (batch 1)	0:15	0:8	0:30	0:15
SDXL 1024x1024 + Refiner 0.8	0:23	0:22	0:35	0:34
SDXL 1024x1024 (batch 2)	0:31	0:16	0:38	0:20
SDXL 1024x1024 + Refiner 0.8	0:39	0:32	1:02	0:50

The speed boost is undeniable. But as you may see in the table, when using SDXL refiner the bonus speed is negligible in some situations.

There is certainly a perspective for application in iterative rendering and testing, offering significant time savings.

Downsides

Even if you successfully make a model, it may not work for needed settings in resolution and batches (dynamic models). You may experiment, the generation is fast
Token limits in negative prompts
SDXL refiner is not optimizable (yet)
Resolution limits
Creating variantions with "baked" LoRAs is not possible in NVIDIA version of the extension
Occasional crashing webui when switching models or overflowing token limit
Limits LoRA use

You may not encounter (or rather notice) resolution issues, if you have a card with high VRAM. However, in my tests I was not able to generate models with batch/resolution parameters I use without TensorRT.

LoRA support is experimental, but with TensorRT LoRAs there can be an issue with effective backups of models and single LoRAs—you may just forget about the conversions and need to generate them again (if you reinstall A1111 in your future endeavours).

It would also be great to somehow show the limits (batch/resolution) of TensorRT models in the editor.

Conclusion

TensorRT acceleration is a must have (especially for SDXL, with higher number of batches) if you use RTX cards.

It may not fit into every workflow, but the speed boost is considerable. The extension is also under development, so some issues can be solved in near future, and I hope it will incorporate some options from the former TensorRT extension.

TensorRT Acceleration for RTX: Latest A1111 Extension from NVIDIA Adds SDXL Support and Higher Resolutions

One Click Install: How to Make It Work

What We Can Do With This

Speed Boost for SDXL and SD

Downsides

Conclusion

You may also like:

Semantic Guidance SDXL: Instructing Diffusion using Semantic Dimensions

Photomatix v2: Cinematic SDXL Styles and Refiner Tricks in SD 1.5 Model

One Click Install: How to Make It Work

What We Can Do With This

Speed Boost for SDXL and SD

Downsides

Conclusion

You may also like:

Subscribe