Skip to main content

DualCLIPLoader

The DualCLIPLoader node is here to settle the age-old argument of which CLIP is better โ€” by just using both. This node lets you load two CLIP models simultaneously, giving your workflow extra perspective, better text-vision understanding, and more power than a single CLIP could ever offer alone.

Whether you're running SDXL, SD3, Flux, or even Hunyuan Video, this node helps unlock the full potential of multi-CLIP workflows in ComfyUI.

DualCLIPLoader

๐Ÿ”ง Node Typeโ€‹

DualCLIPLoader
Outputs: CLIP

๐Ÿ“ Function in ComfyUI Workflowsโ€‹

The DualCLIPLoader node loads two distinct CLIP models into a single combined output object, which downstream nodes (like CLIPTextEncode, KSampler, and others) can use for better prompt conditioning. Itโ€™s especially useful in workflows where you're combining:

  • Vision + text encoders (e.g., SDXL-style)
  • Two stylistically different CLIPs (like realism and anime)
  • CLIPs trained on different languages or datasets
  • Experimental comparative generation setups

Itโ€™s a smart choice when you want more control, more nuance, and less guesswork in how your prompt is interpreted.

๐Ÿง  Technical Detailsโ€‹

This node:

  • Loads two .safetensors CLIP models
  • Outputs a combined CLIP object
  • Supports architecture-specific types (sdxl, sd3, flux, hunyuan_video)
  • Can optionally be assigned to specific devices (like CPU or cuda:0) for advanced load balancing

Internally, the node makes sure both models are loaded properly and tied to your workflowโ€™s current recipe.

โš™๏ธ Settings and Parametersโ€‹

๐Ÿ”ฒ Field๐Ÿ’ฌ Description
clip_name1Filename of the first CLIP model. This is your primary encoder, typically used for prompt conditioning. Must be a valid .safetensors CLIP file in your models folder. Example: clip_l.safetensors.
clip_name2Filename of the second CLIP model. Used in tandem with the first โ€” often a vision encoder or stylistic variant. Example: clip_vision_g.safetensors.
typeModel recipe this CLIP pair is being used with. Must match the type of your checkpoint and downstream pipeline. Options include: sdxl, sd3, flux, hunyuan_video.
device(Optional) The device you want to load the CLIP models onto. Use "default" for automatic assignment (usually GPU), or set to "cpu" or "cuda:0" if youโ€™re managing memory manually.

โœ… Benefitsโ€‹

  • Multi-modal power โ€“ Combine language and vision understanding in one pass.
  • Style fusion โ€“ Blend the strengths of two different CLIPs (realism + anime, anyone?).
  • More accurate prompts โ€“ Dual interpretation gives better grounding to both positive and negative prompts.
  • Better SDXL/SD3 results โ€“ These architectures were made for dual-CLIP setups, and this node is the plug-in brain for them.

โš™๏ธ Usage Tipsโ€‹

  • Always match the type field with the checkpoint type you're using. Donโ€™t mix sdxl with flux or your generation will go sideways.
  • If youโ€™re running low on VRAM, offload one model to cpu by setting device to "cpu" โ€” but expect slower performance.
  • You can mix a vision CLIP with a language-focused CLIP to create crazy accurate visual storytelling. Try it with SDXL for best results.
  • Want to experiment with prompting style? Use one CLIP trained for realism, and another trained for fantasy โ€” balance prompt weights accordingly.

๐Ÿ“ ComfyUI Setup Instructionsโ€‹

  1. Place your CLIP model files (.safetensors) in your ComfyUI models/clip folder.
  2. Add the DualCLIPLoader node to your workflow.
  3. Set clip_name1 and clip_name2 to the exact filenames.
  4. Set the type field to match your pipeline (sdxl, flux, etc.).
  5. Optionally assign a device (or just leave it as "default").
  6. Connect the output to any node that expects a CLIP input.

๐Ÿ“Ž Example Node Configurationโ€‹


clip_name1: clip_l.safetensors clip_name2: clip_vision_g.safetensors type: sdxl device: default

In this setup, weโ€™re using two different CLIPs with an SDXL-based model. This is ideal for workflows where SDXLโ€™s text+vision conditioning is leveraged for higher fidelity generations.

๐Ÿ”ฅ What-Not-To-Do-Unless-You-Want-a-Fireโ€‹

  • โŒ Donโ€™t mix model types. Your type must match your checkpoint or youโ€™ll get garbage output (or no output at all).
  • โŒ Donโ€™t assign both CLIPs to GPU on low-VRAM systems. You will crash.
  • โŒ Donโ€™t try to load non-CLIP .safetensors files here. It wonโ€™t work. Youโ€™ll sit there wondering why your workflowโ€™s frozen.
  • โŒ Donโ€™t assume CLIPs โ€œmergeโ€ into a single model โ€” this node runs them in parallel, not as a fusion.
  • โŒ Donโ€™t use massive CLIPs on a 4GB GPU unless you really enjoy watching your system swap memory like itโ€™s 2006.

โš ๏ธ Known Issuesโ€‹

  • VRAM hungry โ€“ Two models = more memory. No surprise here.
  • Slow on CPU โ€“ If you offload to CPU, expect a noticeable slowdown.
  • No CLIP validation โ€“ If your filename is wrong, the node wonโ€™t tell you nicely โ€” itโ€™ll just break silently or downstream.

๐Ÿ“š Additional Resourcesโ€‹

  • How SDXL Uses Dual CLIP Architecture
  • Example model files:
    • clip_l.safetensors
    • clip_vision_g.safetensors

๐Ÿ“ Notesโ€‹

  • This node is highly recommended for SDXL workflows and is borderline required if you want to use SDXL the way it was meant to be used.
  • Great for prompt tuning, multimodal alignment, and style fusion.
  • If you're experimenting with new CLIPs, try this node in a sandbox workflow before plugging it into a production chain.