ArtAgents: Your Creative Assistant for Prompt Engineering and Captioning

Daniel Sandner November 26, 2024

ArtAgents is an innovative open source application designed to enhance creative workflows by leveraging advanced LLM AI models. It helps create robust, high-quality prompts and captions for the next generation of generative models, utilizing natural language scene descriptions. Powered by Ollama, ArtAgents offers a comprehensive suite of capabilities tailored to a wide range of creative needs, particularly in prompt engineering and captioning experiments.

ArtAgents can be run locally easily and is independent of any specific target generative model, local or cloud platform (it creates and alters text prompts for any generator). You can adjust it to fit any generative style, limited only by the LLM model's capabilities.

Agent-Based Chat Interface
Multimodal Input Support
Captioning
Advanced LLM Response Generation
- Flexibility
- Comment on the Output
Installation and How to Use It
Be Inspired
The Goal and Development
Conclusion
Downloads and References

Agent-Based Chat Interface

At the core of ArtAgents is its agent-based chat interface, which allows users to interact and get help from various AI agents tailored to different roles and tasks. Whether you're a designer, artist, fashionista, or colorist, ArtAgents provides specialized agents that understand your specific needs and deliver tailored responses.

Agent Role Selection: Choose from a variety of predefined roles such as Designer, Artist, Fashionista, Colorista, Detailer, Photographer, Video, and Styler. Each role is editable and comes with customizable options to fine-tune the AI's responses.
Custom Agent Roles: Users can define their own custom agent roles with parameters, allowing for greater flexibility when working with various multimodal LLMs.

Multimodal Input Support

ArtAgents supports multimodal input, enabling users to provide both textual and visual inputs. This feature is particularly useful for tasks that require a combination of textual descriptions and visual references.

Image Input: Upload images directly from your folder or use a single image input to provide visual context for the AI agents. The application processes these images and incorporates the visual information into the generated responses.
Textual Input: Enter detailed textual descriptions and prompts to guide the AI agents in generating the desired outputs. You can combine textual and visual inputs, keeping in mind that image information strongly influences the output.

Captioning

Originally developed to assist with unusual captioning styles for training generative AI models, ArtAgents offers simple captioning of images in a target folder. When you input a single image, you can experiment with settings to use for a whole folder of images for training and fine-tuning models and LoRAs.

Advanced LLM Response Generation

ArtAgents utilizes LLM models installed via Ollama to generate high-quality, contextually relevant responses. The application supports various models, including those with vision capabilities, to handle a wide range of creative tasks.

Flexibility

You can customize many features of ArtAgents, including agents and their parameters, LLM models to choose from, additional custom prompt limiters, and LLM settings. You can use any current or future LLM model compatible with Ollama.

Comment on the Output

ArtAgents allows you to comment on the LLM outputs to slightly modify them to fit your needs for generative AI images or videos. This simplifies the workflow regardless of whether you are using local or cloud image or video generators.

Installation and How to Use It

You will find the most current release of ArtAgents on GitHub, with installation instructions.

Installation

Ensure you have Python and Git installed on your system.

Download and install Ollama from https://ollama.com/ . Ollama is an open-source platform that allows you to run large language models (LLMs) locally on your own hardware
Clone the ArtAgents repository in target folder with command in terminalgit clone https://github.com/sandner-art/ArtAgents.git
Run setupvenv.bat (optional, recommended)
Run setup.bat to setup ollama models (optional, if you want to install the models manually, check ArtAgents github repository for more info)
Start ArtAgents with govenv.bat (with venv) or go.bat

Creating a Prompt

Select model, write user input and select agent. Click "Submit". When an image is inserted, it will affect the output. You may modify the output with "Comment" section and button.

Captioning

Write user input and select agent. Insert path into "Folder Path" and click "Submit". ArtAgent will generate .txt files with captions for training in the image folder. I recommend to revise the captions and edit them to suit your needs.

Be Inspired

By extending simple descriptions and parameters, you can customize your prompt generation to explore visual information from several technical viewpoints. You can affect the prompt from the perspective of a designer, typographer, photographer, or any professional aspect you define.

The Goal and Development

Originally an experiment for image captioning, I found this tool surprisingly useful for my design sketches and creating prompts and captions for video creations. As the tool develops, new features will emerge to fit various workflows.

Conclusion

ArtAgents is an LLM AI-driven tool designed to enhance possibilities, streamline workflows, and provide support for artists, designers, and creatives. With its easy-to-use capabilities, minimalist user interface, and robust yet simple customization options, ArtAgents helps work with advanced generative models (image and video), which now require a more natural language approach to achieve the best results. Whether you're a seasoned graphic professional or a budding AI art creator, ArtAgents offers the tools to refine the important prompt structure to bring your creative visions to life.

Downloads and References

ArtAgents on Github (current and development versions, instructions).
Ollama download

ArtAgents: Your Creative Assistant for Prompt Engineering and Captioning

Agent-Based Chat Interface

Multimodal Input Support

Captioning

Advanced LLM Response Generation

Flexibility

Comment on the Output

Installation and How to Use It

Installation

Creating a Prompt

Captioning

Be Inspired

The Goal and Development

Conclusion

Downloads and References

You may also like:

NVIDIA's Align Your Steps (AYS): Improve Outputs in SD/SDXL/SVD with Sampling Schedule Voodoo

Prompt Engineering and Diffusion Control in Synthetic Photography

Agent-Based Chat Interface

Multimodal Input Support

Captioning

Advanced LLM Response Generation

Flexibility

Comment on the Output

Installation and How to Use It

Installation

Creating a Prompt

Captioning

Be Inspired

The Goal and Development

Conclusion

Downloads and References

You may also like:

Subscribe