Using a custom LLM prompt, analyze the image and output the structure as a prompt suitable for the i2v model.
+While it can also be used in Hunyuan, it is recommended to exclude prompts related to camera motion.
A Gemini API key is required. (FREE, LINK)
Also, enter your API key into the JSON file located at ComfyUI_windows_portable\ComfyUI\custom_nodes\comfyui-ollamagemini\config.json
25.05.15 - As the free tier for the Gemini Pro version has become unavailable, you are now required to use only the Flash version(2.0 flash or 2.5 flash).
25.05.26 - Currently, the latest version of gemini flash is gemini-2.5-flash-preview-05-20.
[change logs]
25.05.30/v1.21b for Wan2.1 I2V
i2v Update: Precise action control (new syntax/structure), camera impact reduced for motion focus; increased NSFW refusals possible.
25.05.21/Standalone Gemini UI (v1.1) - The existing ZIP file has been updated. Please re-download it if you need the latest version.
The default prompt has been modified, allowing for normal use of both gemini-2.0-flash and gemini-2.5-flash-preview-04-17 versions.
However, NSFW image analysis is only available with gemini-2.0-flash(However, sometimes 2.5 flash is also available), and there may be occasional instances where analysis is unsuccessful. (In such cases, please retry the analysis. It will definitely work.)
Additionally, a final prompt translation feature has been added. Therefore, the existing installation command will be changed as per the command below.
pip install google-generativeai customtkinter Pillow tkinterdnd2-Universal googletrans==3.1.0a0
25.05.17/Standalone Gemini UI
This program offers a dedicated user interface for leveraging Google's Gemini, completely independent of ComfyUI workflows.
Why a Separate UI?
This tool was specifically developed to address a common challenge faced when performing image analysis in ComfyUI: the unloading of WAN (or other generative) models. This unloading process can lead to significant delays when you want to switch back to image generation. By using this standalone UI for image analysis with Gemini, you can keep your primary generative models loaded in ComfyUI, saving time and improving your workflow efficiency.
Default Prompts (via gemini_app_settings.json)
If you include the provided gemini_app_settings.json file in the same folder as the application, it will automatically load a default prompt set (e.g., configured for "v1.2a wan2.1 i2v" or your specified default). You can, of course, modify this or use your own prompts within the UI.
Getting Started - Installation
To run this application, you may need to install a few Python libraries. Please open your command prompt (CMD) or terminal and enter the following commands:
pip install google-generativeai customtkinter Pillow tkinterdnd2-Universal
How to Run
Ensure you have Python installed on your system.
Install the required libraries using the pip install commands above.
Place the prompts.json file (if you have one for default prompts) in the same directory as the Python script.
Run the scrip: To run with a visible console window: python gemini_ui.py
NSFW images analysisIf you are analyzing NSFW images, add the relevant content description to the very bottom of the "System Prompt" field.
[**User Input**: (Your Prompt)]
=====
25.05.14/v1.0b Joy caption for i2v
Full, uncensored image analysis and i2v prompt generation is achieved using JoyCaption. The resulting natural motion behavior is distinct and, in some cases, may not reach the same level of fluidity as the Gemini 2.0 Flash (for which an almost flawlessly uncensored version has previously been established).
huggingface demo: https://huggingface.co/spaces/fancyfeast/joy-caption-beta-one
github: https://github.com/fpgaminer/joycaption
25.05.05/v1.2a for i2v, v1.1a for start-end, v1.0a for Framepack
This version has been updated to align with the recently revised custom node and to ensure the analysis of NSFW images or prompts.
+I've modified some custom nodes that can't be found in the Manager. You won't feel uncomfortable installing custom nodes anymore.
+The latest version of the ollamamini custom node is required.
25.04.18/v1.0 for start/end
Resolved an issue resulting in excessively lengthy final prompts; improved the coherence and visual connectivity for transitions between start and end frames, and added a translation node.
25.04.18/v1.0 for FramePack
Create a very simple prompt.
https://github.com/lllyasviel/FramePack
25.04.14/v1.1 for i2v
Fixed an issue caused by an overly long and unnecessary final prompt, and adjusted to avoid consecutive API calls.
*25.04.15/v1.1a - Add translation node
25.03.19/v1.0
Fixed an issue where a single incorrect symbol was present in the LLM prompt. This is a minor change, but it could slightly improve issues that may occur when inputting text in languages other than English. Additionally, the default setting for the stream option has been changed from ON to OFF.
25.03.25/for start-end frame(beta) -> beta+ (Improved results by modifying some of the prompts)
kijai workflow
Analyses the start and end images and ultimately generates an appropriate prompt for use in the i2v start-end workflow. However, depending on the image or motion, the end frame may not work properly. (If you can input the additional motion correctly, you can reinforce the intermediate movement using the existing v1.0 workflow.)
Test Generation (v1.0)
Test setting
-Model) wan 2.1 i2v 480p GGUF Q6_K
-LoRA 1) https://civitai.com/models/929497/aesthetic-quality-modifiers-masterpiece?modelVersionId=1498121
-LoRA 2) https://civitai.com/models/1264662/live-wallpaper-style
-480*701, 20steps, 64fps(4sec)
+LoRA was used with trigger words to assist with movement
Analysis Results
A digital painting illustration of an anime pale girl with short white hair decorated with a dark rose on her head and dark angel wings behind her back. She has a red halo with spikes floating above her. She wears a long black dress with gold trim and holds out her hand, with a ring on her finger, as she prepares to cast a spell. The background is a dark gradient of black and red tones. A full shot from a medium perspective.
[Optional] Motion Input
The surrounding blood effects spin around the girl. Her wings gently flutter. Her arms gently wrap around the effects.
Image Analysis with Motion (with Optional Input)
A digital painting of a pale woman with short white hair and piercing red eyes, adorned with a black rose. She stands with a confident posture, her left hand elegantly raised, displaying a delicate gold ring. The woman is dressed in a tight-fitting black dress with gold trim, complemented by matching gold bracelets and thigh-high stockings with floral prints. The surrounding blood effects spin around her. Her wings gently flutter. Her arms gently wrap around the effects. The background features a dark, blurry ambiance with a subtle red glow, enhancing the mysterious atmosphere. The image is a medium shot from a slightly high angle, giving her an imposing look.
Image Analysis with Motion (w/o Optional Input)
A digital painting illustration of an anime pale girl with short white hair decorated with a dark rose on her head and dark angel wings behind her back. She has a red halo with spikes floating above her head. She wears a long black dress with gold trim, a gold bracelet, and long black gloves, revealing a white tattoo on her leg. She holds out her hand, with a gold ring on her finger, as she prepares to cast a spell, her fingers slightly curved and radiating a faint glow. The background is a dark gradient of black and red tones with red energy lines. A full shot from a medium perspective.
Prompts with Motion (w/o optional input)
Prompts with Motion (with optional input)
Base Prompts (Analysis Results)
*The output is more significantly affected by the motion LoRA due to the simplified prompt.
Test Generation (for i2v start-end frame_Beta+)
Test setting
workflow default setting (480p model)
[Start Image]
An anime-style full body shot of a young girl with snow-white hair pulled into two pigtails tied with red ribbons. Her fair skin is complemented by her fully open, bright red eyes and a cheerful smile. She wears a traditional Japanese miko outfit, consisting of a white top with red accents and a matching red skirt, complete with decorative knots and a large bow. Her right hand is raised in a waving gesture. The background features a vividly colored Shinto shrine with red pillars and traditional lanterns, creating a festive atmosphere with stone pavement. A medium shot captures the scene.
[End Image]
An anime-style digital painting of a girl with fair skin and a slight smile with closed eyes, standing in front of a Shinto shrine. She has long, white hair styled in low pigtails adorned with red ribbons. Her traditional red and white "miko" outfit includes a "chihaya" top with red and white trim, and a matching red "hakama" skirt, accented by a bold red bow. She stands with her hands on her hips, exuding calm and confidence. The background features a traditional Japanese Shinto shrine with red pillars, a dark tiled roof, and hanging lanterns, with blurred figures in the distance. A medium shot captures the full composition of the scene.
[Final Prompts]
An anime-style video prompt portraying a young girl in a traditional Japanese miko outfit transitioning smoothly from a cheerful greeting to a state of serene contemplation. Initially, the girl stands with a bright, open smile and waving right hand, her snow-white hair in pigtails tied with red ribbons, set against a vividly colored Shinto shrine with red pillars and festive lanterns. Gradually, her waving hand lowers gently as her expression softens into a slight smile, and her eyes close serenely. She then places both hands on her hips, exuding calm confidence. Throughout the transition, the background remains a detailed Shinto shrine with red pillars, a dark tiled roof, and hanging lanterns, with blurred figures adding depth. The camera remains fixed in a medium shot, capturing the nuanced changes in expression and posture as she gracefully shifts from an active, welcoming pose to a peaceful, contemplative one.