TensorRT Acceleration for RTX: Latest A1111 Extension from NVIDIA Adds SDXL Support and Higher Resolutions     

NVIDIA TensorRT illustration

The new NVIDIA TensorRT extension has come to the A1111 web UI. Does it deliver on promises to double speeds, even for SDXL models? We will compare the latest NVIDIA extension with former solutions. In this article we will also take a look at possible issues with installation and rendering.

In a recent article about TensorRT technology, I explored the creative possibilities it offers. The TensorRT extension creates a custom model for your system, derived from any selected SD model. When set up in SD Unet, it essentially doubles the speed of your renderings, albeit with some caveats.

One Click Install: How to Make It Work

I suggest to install  new A1111 instance  into separate folder for your experiments.

The installation should be very simple, just by installing new extension in Extensions/Install from URL (https://github.com/NVIDIA/Stable-Diffusion-WebUI-TensorRT) in A1111, then apply and restart webui (it will take some time for downloads).

If it works for you, skip this chapter. However, you may run into issues with missing libraries (.dll errors) or missing TensorRT tab. If this happens, this short guide should help you. 

  • IMPORTANT NOTE: do the steps EXACTLY as described below 
  • Install the latest NVIDIA driver for your RTX card
  • Go to your sd folder
  • Delete in extensions folder: stable-diffusion-webui-tensorrt folder, if it exists from former installations
  • Delete the VENV folder 
  • In command prompt, run webui  to create a new VENV environment folder (this will take some time). After A1111 UI starts, close it and close terminal with command prompt.
  • Open command terminal again in you stable-diffusion-webui/venv/Scripts folder.
  • enter activate
  • in venv line in command prompt, enter these commands one after another, always wait until the process finishes (this may take a while):

python.exe -m pip install --upgrade pip

python -m pip install nvidia-cudnn-cu11==8.9.4.25 --no-cache-dir

python -m pip install --pre --extra-index-url https://pypi.nvidia.com/ tensorrt==9.0.1.post11.dev4 --no-cache-dir

python -m pip uninstall -y nvidia-cudnn-cu11

deactivate 

What We Can Do With This

Compared to the recent TensorRT experimental extension, NVIDIA's new version introduces additional options but also imposes certain limitations. Pros:

  • SDXL support
  • Working support for higher resolutions in SD models
  • Model generation speed
  • Ease of use and installation (with some hiccups, see above)

Resolution of TensorRT models for rendering must be divisible by 64: 512, 576, 640, 704, 768, 832, 896 (this was the highest resolution I could make with SD15 model in dynamic mode on A4000). SDXL resolution is 1024x1024.

Speed Boost for SDXL and SD

Here is speed comparison (A4000, same scene, 20 steps, xformers):

Euler aNVIDIA RTDPM++SDENVIDIA RT
SDXL 1024x1024 (batch 1)0:150:80:300:15
SDXL 1024x1024 + Refiner 0.80:230:220:350:34
SDXL 1024x1024 (batch 2)0:310:160:380:20
SDXL 1024x1024 + Refiner 0.80:390:321:020:50

The speed boost is undeniable. But as you may see in the table, when using SDXL refiner the bonus speed is negligible in some situations.

There is certainly a perspective for application in iterative rendering and testing, offering significant time savings.

Downsides

  • Even if you successfully make a model, it may not work for needed settings in resolution and batches (dynamic models). You may experiment, the generation is fast
  • Token limits in negative prompts
  • SDXL refiner is not optimizable (yet)
  • Resolution limits
  • Creating variantions with "baked" LoRAs is not possible in NVIDIA version of the extension
  • Occasional crashing webui when switching models or overflowing token limit
  • Limits LoRA use

You may not encounter (or rather notice) resolution issues, if you have a card with high VRAM. However, in my tests I was not able to generate models with batch/resolution parameters I use without TensorRT.

LoRA support is experimental, but with TensorRT LoRAs there can be an issue with effective backups of models and single LoRAs—you may just forget about the conversions and need to generate them again (if you reinstall A1111 in your future endeavours).

It would also  be great to somehow show the limits (batch/resolution) of TensorRT models  in the editor.

Conclusion

TensorRT acceleration is a must have (especially for SDXL, with higher number of batches) if you use RTX cards.

It may not fit into every workflow, but the speed boost is considerable. The extension is also under development, so some issues can be solved in near future, and I hope it will incorporate some options from the former TensorRT extension.





You may also like:

Subscribe

Stay connected to make sure you don’t miss anything. Join our newsletter community for artists, designers, and art and science enthusiasts.