LTX-Video Locally: Facts and Myths Debunked. Tips Included.

LTX-video model making high resolution fast outputs in ComfyUI

LTX-Video is a very fast, DTI-based video generation model that can generate high-quality videos on consumer hardware, and the results are often comparable to professional, subscription-based generative video models, in some aspects.

Requirements of NVIDIA GPU

After some experiments, I would say the minimum VRAM for effective use would be 24GB. You can run it on 16GB without too much trouble (like I did), and you could probably run it on 12GB with much more trouble. This is still pretty good considering the speed and requirements of other open models (not to mention the results, which, in the case of LTX-Video, are comparatively pretty good).

Installation

Instalation is easy. Download the models and workflows for Comfy UI.

LTX Video, Right at Home: Tips, Tricks, and Truth

Can LTX-Video generate high-quality videos in real-time?

No. Unless you have a very flexible definition of "high-quality." Or work on a Bond villain's supercomputer.

Is LTX-Video Fast?

Yes. Even with a high number of steps (100+), it is still probably the fastest model you can run locally.

Is the Output Worth It?

The quality is somewhere between a certain overhyped subscription service where it takes days to generate a video and then it fails, and another subscription-based service, which is probably currently the best on the market. The quality is getting close to a certain model that was recently employed on Civitai. So for local generation experiments, the answer is Yes.

Myth 1: LTX Video can generate realistic videos of people

While this is true for closeups, if there is more motion in the image or a character is far from the camera, a hell breaks loose. With added resolution and steps, you can partially solve this, but you risk a severe case of Motionless Movie (read further).

Myth 2: LTX Video can create long videos

The suggested maximum is 257 frames. However, the longer the video is, the more you risk the occurrence of artifacts and abominations. This may occur randomly. I suggest lengths from 4-7 seconds (96+1 to 168+1 frames), with some subjects you can get away with more.

Myth 3: The Bigger the Resolution the Better

The recommended resolution is under 720 x 1280 (the resolution should be divisible by 32). This is nice. However, lower resolutions tend to create more lively scenes. Large resolutions will often move slightly or not whatsoever. The starting resolution is 512 x 768.

Troubleshooting Common Issues and Errors

The Case of a Pretentious Prompt

How to prompt: In an image-to-video situation, try running the input image through a multimodal (vision) LLM model. Then work with that description, altering the prompt to fit a video. In my tests, this approach produced the most consistent results.

The Case of a Motionless Movie

This one is nasty. Often happens, that the output video barely moves, especially on image to video generations. This is very frustrating, because sometimes it seemingly just won't budge. Try to change seed, resolution, or length of a movie. Changing the prompt may or may not help. Try this workflow:

If you pass the input image through VideoHelperSuite (install with Manager), there seems to be a much better success rate to avoid Motionless Movie. Note that the process does not visibly change the input image (no visible quality downgrade). It also works in higher resolutions. You can test the workflow ltxvideo_I2V-motionfix.json from the sandner-art GitHub. The workflow has comments to explain its function.

The Case of a Switching Slideshow

Occurs in img2vid. The input image is motionless, while it is switched to a scene often from another horrible reality. You can avoid this with changing prompt, seed, length or resolution. Try the motion fixing workflow mentioned in the previous chapter. Best bet would be to use LLM to adjust your prompt for more natural language, and avoid terms suggesting switching scene or a movie cut (or several separate camera movements).

You may use any good online advanced LLM chat or use my free open utility ArtAgents (choose the agent 'Video') for your next local Ollama adventure.

The Case of a Slithering Spectre

When you have quite good camera movement and composition in the output video, but the subjects start to transform midway or are not well-defined, try changing the number of steps (even a way over 100) to get better details. You may also try testing CFG values in the range of 2-5, starting at the middle ground of 3.5. Some subjects may render better at lower CFG values. I got the best results with CFG 5 in image-to-video, for whatever reason. Higher CFG values tend to overbake the image.

Conclusion

It does not produce 24 FPS HQ videos at a 768x512 resolution faster than they can be watched, at least not on normal hardware. But it is so fast that it is creating its own standard on local video generations. If this is the future that video generators go, I am all for it.

Resources and References

Updated:

You may also like:

Subscribe

Stay connected to make sure you don’t miss anything. Join our newsletter community for artists, designers, and art and science enthusiasts.