DualCLIPLoader
The DualCLIPLoader node is here to settle the age-old argument of which CLIP is better โ by just using both. This node lets you load two CLIP models simultaneously, giving your workflow extra perspective, better text-vision understanding, and more power than a single CLIP could ever offer alone.
Whether you're running SDXL, SD3, Flux, or even Hunyuan Video, this node helps unlock the full potential of multi-CLIP workflows in ComfyUI.
๐ง Node Typeโ
DualCLIPLoader
Outputs: CLIP
๐ Function in ComfyUI Workflowsโ
The DualCLIPLoader node loads two distinct CLIP models into a single combined output object, which downstream nodes (like CLIPTextEncode
, KSampler
, and others) can use for better prompt conditioning. Itโs especially useful in workflows where you're combining:
- Vision + text encoders (e.g., SDXL-style)
- Two stylistically different CLIPs (like realism and anime)
- CLIPs trained on different languages or datasets
- Experimental comparative generation setups
Itโs a smart choice when you want more control, more nuance, and less guesswork in how your prompt is interpreted.
๐ง Technical Detailsโ
This node:
- Loads two
.safetensors
CLIP models - Outputs a combined CLIP object
- Supports architecture-specific types (
sdxl
,sd3
,flux
,hunyuan_video
) - Can optionally be assigned to specific devices (like CPU or
cuda:0
) for advanced load balancing
Internally, the node makes sure both models are loaded properly and tied to your workflowโs current recipe.
โ๏ธ Settings and Parametersโ
๐ฒ Field | ๐ฌ Description |
---|---|
clip_name1 | Filename of the first CLIP model. This is your primary encoder, typically used for prompt conditioning. Must be a valid .safetensors CLIP file in your models folder. Example: clip_l.safetensors . |
clip_name2 | Filename of the second CLIP model. Used in tandem with the first โ often a vision encoder or stylistic variant. Example: clip_vision_g.safetensors . |
type | Model recipe this CLIP pair is being used with. Must match the type of your checkpoint and downstream pipeline. Options include: sdxl , sd3 , flux , hunyuan_video . |
device | (Optional) The device you want to load the CLIP models onto. Use "default" for automatic assignment (usually GPU), or set to "cpu" or "cuda:0" if youโre managing memory manually. |
โ Benefitsโ
- Multi-modal power โ Combine language and vision understanding in one pass.
- Style fusion โ Blend the strengths of two different CLIPs (realism + anime, anyone?).
- More accurate prompts โ Dual interpretation gives better grounding to both positive and negative prompts.
- Better SDXL/SD3 results โ These architectures were made for dual-CLIP setups, and this node is the plug-in brain for them.
โ๏ธ Usage Tipsโ
- Always match the
type
field with the checkpoint type you're using. Donโt mixsdxl
withflux
or your generation will go sideways. - If youโre running low on VRAM, offload one model to
cpu
by settingdevice
to"cpu"
โ but expect slower performance. - You can mix a vision CLIP with a language-focused CLIP to create crazy accurate visual storytelling. Try it with SDXL for best results.
- Want to experiment with prompting style? Use one CLIP trained for realism, and another trained for fantasy โ balance prompt weights accordingly.
๐ ComfyUI Setup Instructionsโ
- Place your CLIP model files (
.safetensors
) in your ComfyUImodels/clip
folder. - Add the DualCLIPLoader node to your workflow.
- Set
clip_name1
andclip_name2
to the exact filenames. - Set the
type
field to match your pipeline (sdxl
,flux
, etc.). - Optionally assign a
device
(or just leave it as"default"
). - Connect the output to any node that expects a
CLIP
input.
๐ Example Node Configurationโ
clip_name1: clip_l.safetensors clip_name2: clip_vision_g.safetensors type: sdxl device: default
In this setup, weโre using two different CLIPs with an SDXL-based model. This is ideal for workflows where SDXLโs text+vision conditioning is leveraged for higher fidelity generations.
๐ฅ What-Not-To-Do-Unless-You-Want-a-Fireโ
- โ Donโt mix model types. Your
type
must match your checkpoint or youโll get garbage output (or no output at all). - โ Donโt assign both CLIPs to GPU on low-VRAM systems. You will crash.
- โ Donโt try to load non-CLIP
.safetensors
files here. It wonโt work. Youโll sit there wondering why your workflowโs frozen. - โ Donโt assume CLIPs โmergeโ into a single model โ this node runs them in parallel, not as a fusion.
- โ Donโt use massive CLIPs on a 4GB GPU unless you really enjoy watching your system swap memory like itโs 2006.
โ ๏ธ Known Issuesโ
- VRAM hungry โ Two models = more memory. No surprise here.
- Slow on CPU โ If you offload to CPU, expect a noticeable slowdown.
- No CLIP validation โ If your filename is wrong, the node wonโt tell you nicely โ itโll just break silently or downstream.
๐ Additional Resourcesโ
- How SDXL Uses Dual CLIP Architecture
- Example model files:
clip_l.safetensors
clip_vision_g.safetensors
๐ Notesโ
- This node is highly recommended for SDXL workflows and is borderline required if you want to use SDXL the way it was meant to be used.
- Great for prompt tuning, multimodal alignment, and style fusion.
- If you're experimenting with new CLIPs, try this node in a sandbox workflow before plugging it into a production chain.