Text to Image with Locked Variations

Welcome to the Cadillac of ComfyUI workflows — this one’s designed to give you stunning image variations while preserving your original composition like it owes you rent. With a ControlNet depth map and strategic prompt conditioning, this setup enables reliable scene structure while letting your creativity run wild in style, lighting, or mood. Perfect for when you want to say “mountains,” but in 16 different dialects of awesome.

🧠 What This Workflow Does

This ComfyUI workflow:

Generates a base image using a stable prompt and ControlNet depth conditioning.
Reuses the same latent and depth structure across multiple prompt variations.
Produces visually consistent scenes with stylistic and time-of-day variations (think: “mountains by day” vs “mountains at sunset”).
Saves all outputs for your convenience (because we're civilized).

🗺️ Workflow Overview

The pipeline can be conceptually broken down into 3 main stages:

Core Image Composition
- Prompt: "mountain landscape, digital painting, masterpiece"
- ControlNet (Depth Preprocessor): Enforces structure via depth map
- Generates an initial image and latent state.
Prompt Variation with Latent Reuse
- Prompt changes (e.g., "mountain landscape, at night..." and "mountain landscape, at sunset...")
- Reuse of the same latent and ControlNet map
- Creates stylistic variations with identical composition.
Output & Preview
- Each variation decoded and saved
- Optional image preview node included (because seeing is believing).

🧩 Node-by-Node Breakdown

Setup & Base Latent Creation

CheckpointLoaderSimple (Node 2) Loads dreamshaper_8.safetensors. Also supplies model + CLIP backbone.
EmptyLatentImage (Node 5) Sets the image dimensions to 512x768, batch size of 1. Think of it as the blank canvas—before we start slapping paint on.
VAELoader (Node 7) Uses vae-ft-mse-840000-ema-pruned.safetensors for image decoding.
CLIPTextEncode (Node 3) Encodes the main prompt "mountain landscape, digital painting, masterpiece".
CLIPTextEncode (Node 4) Encodes a negative prompt "ugly, deformed"—because no one asked for cursed mountain goblins.

The Breakdown

Purpose: Loads the base model that actually knows how to paint pixels into dreams.

Model: dreamshaper_8.safetensors

Outputs:

MODEL → Used by all KSampler nodes
CLIP → Used for text encoding
VAE → (Optional; not used here since VAE is loaded explicitly)
Notes: DreamShaper is popular for striking a nice balance between realism and stylization. Good for both fantasy and photorealistic content.

Purpose: Converts text prompts into vectorized concepts. It's the translator from human to AI whisperer.

Inputs:

CLIP → Comes from CheckpointLoader
text → Your juicy prompts

Outputs:

CONDITIONING → Goes into samplers and ControlNet magic

Key Prompts Used:

"mountain landscape, digital painting, masterpiece"
"ugly, deformed" (neg prompt)
"mountain landscape, at night, digital painting, masterpiece"
"mountain landscape, at sunset, digital painting, masterpiece"

Purpose: Converts text prompts into vectorized concepts. It's the translator from human to AI whisperer.

Settings:

Width: 512
Height: 768
Batch Size: 1

Outputs: LATENT tensor → Used in all KSampler passes.

Purpose: The engine room. Takes prompts, latents, and models to produce new latent images.

Settings Shared Across All Instances:

Sampler: dpmpp_2m
Scheduler: karras
Steps: 25
CFG: 7
Denoise: 1
Seed: Random (unless you want reproducibility)

Input Triplets:

MODEL + CONDITIONING (positive/negative) + LATENT → LATENT

Purpose: Loads the VAE model used to decode latent images back into full-res output.

Model: vae-ft-mse-840000-ema-pruned.safetensors (yes, she’s got a long name, but she delivers)

Output: VAE → Connected to all VAEDecode nodes

Purpose: Loads a ControlNet module for enforcing structure using an auxiliary signal (depth, in this case).

Model: control_v11f1p_sd15_depth_fp16.safetensors

Output: CONTROL_NET → fed into the advanced ControlNet processor

Purpose: Generates a depth map from the base image using a fancy pants preprocessor.

Settings:

Preprocessor: depth_midas
SD Version: sd15
Resolution: 512

Output: IMAGE (depth map) → sent to ControlNetApplyAdvanced

Purpose: Applies ControlNet conditioning to your prompt vectors.

Inputs:

CONDITIONING (positive + negative)
CONTROL_NET (from Loader)
IMAGE (depth map from Preprocessor)

Strength: 0.83

Start/End %: 0 → 1 (applies throughout the entire diffusion process)

Output: New conditioned prompts → fed to KSampler

Model Type	File Used	Purpose
Checkpoint	`dreamshaper_8.safetensors`	Core image generation model
VAE	`vae-ft-mse-840000-ema-pruned.safetensors`	Decoding latent to image
ControlNet	`control_v11f1p_sd15_depth_fp16.safetensors`	Depth conditioning
CLIP Text Encoder	Included in the base checkpoint	Text-to-conditioning encoder
Preprocessor	`depth_midas` (via AV_ControlNetPreprocessor)	Generates the depth input image

🔄 First Image Generation Pass (Baseline)

KSampler (Node 1) Takes in the base latent, positive + negative conditioning, and outputs latent image.
VAEDecode (Node 6) Decodes the latent into an actual image.
SaveImage (Node 9) Saves the image. You're welcome.
AV_ControlNetPreprocessor (Node 18) Extracts a depth map using depth_midas preprocessor from the decoded base image. Resolution: 512.

🎨 Prompt Variations (Same Composition, Different Mood)

Each variation follows this trio:

➕ New Prompt Conditioning

CLIPTextEncode (Nodes 13 & 17) New positive prompts:
- "mountain landscape, at night, digital painting, masterpiece"
- "mountain landscape, at sunset, digital painting, masterpiece"

🔗 ControlNet Conditioning

ControlNetLoader (Nodes 22 & 26) Loads control_v11f1p_sd15_depth_fp16.safetensors for both variations.
ControlNetApplyAdvanced (Nodes 24 & 25) Applies ControlNet to each prompt with:
- Strength: 0.83
- Range: 0 to 1 (full generation span)
- Shares preprocessed depth image from Node 18.

🌀 Sampling Passes (Reusing Latent)

KSampler (Nodes 11 & 15) Feeds in:
- Same latent from Node 5
- Prompt variations + negative conditioning
- Outputs new latent samples for decoding
VAEDecode (Nodes 12 & 16) Converts those latents back into images.
SaveImage (Nodes 10 & 14) Saves those glorious variations.

🔍 Bonus: Image Preview

PreviewImage (Node 19) Linked to the ControlNet-preprocessed image. Let’s you visually confirm the depth map. Optional but helpful when tweaking.

🛠️ Recommended Usage Tips

Change only the text prompt on the variation CLIP encoders (Nodes 13/17) to explore lighting, color styles, or artistic direction without breaking composition.
Keep the latent image and depth ControlNet the same to retain scene structure.
Adjust denoise strength (default = 1) in KSamplers (Nodes 11 & 15) for more or less adherence to prompts.
Seed randomization is enabled. Lock it if you want reproducibility.

📦 Output Summary

Image Type	Description	Saved?
Base image	Pure prompt output	✅
Depth map preview	Preprocessed ControlNet input	👁️
Night variation	Prompt: "at night"	✅
Sunset variation	Prompt: "at sunset"	✅

🔥 What Not to Do Unless You Want a Fire

⚠️ Go rogue with dimensions: Changing the image size mid-workflow (in EmptyLatentImage or ControlNet Preprocessor) breaks alignment. You’ll get Picasso faces in a Dali background.

⚠️ Mix ControlNet types mid-stream: Don’t swap depth_midas for pose, lineart, or anything else unless you’re also updating the conditioning method, prompts, and probably sacrificing a goat.

⚠️ Use wildly unrelated style prompts: Throwing "cyberpunk chicken nugget tornado" at a base image of a serene forest won’t result in inspired fusion — just chaotic soup.

⚠️ Mismatch VAEs and checkpoints: Some VAEs work better with certain model families. If you mix and match, expect weird color shifts or melted features.

⚠️ Overcook CFG or Steps: CFG > 15? You’re asking for prompt obsession. Steps > 50? Diminishing returns and slower gen for zero payoff.

⚠️ Don’t forget the negative prompt: Seriously, use "ugly, deformed" or your mountains will have six eyeballs.

🚀 Conclusion

This workflow is a power user’s dream: it gives you structured, repeatable image generation with the flexibility to explore multiple artistic angles. And thanks to ControlNet’s depth preservation and ComfyUI’s node magic, you can get Pinterest-perfect results with just a prompt tweak.

So go forth, vary your vibes—but keep your mountains steady.

🧠 What This Workflow Does​

🗺️ Workflow Overview​

🧩 Node-by-Node Breakdown​

The Breakdown​

🔄 First Image Generation Pass (Baseline)​

🎨 Prompt Variations (Same Composition, Different Mood)​

➕ New Prompt Conditioning​

🔗 ControlNet Conditioning​

🌀 Sampling Passes (Reusing Latent)​

🔍 Bonus: Image Preview​

🛠️ Recommended Usage Tips​

📦 Output Summary​

🔥 What Not to Do Unless You Want a Fire​

🚀 Conclusion​