Simplified t2i Workflow for Flux2D (also Workaround for broken MultiGPU Nodes)

Workaround on the fly.
~~Due to ComfyUI updates, the "large" DisTorch2MultiGPU nodes are no longer functional. This workflow has been modified to use the old "small" MultiGPU nodes from the first version.~~

The workflow should run on the DisTorch2MultiGPUv2 nodes to control VRAM (RAM) allocation, preventing VRAM overflow and the resulting swapping. Unfortunately, these nodes are currently broken due to the latest ComfyUI updates. As a fallback, the older MultiGPUv1 nodes are used (with up to 10–15% lower inference speed compared to DisTorch2).

🔗 GitHub (right-click to open in new tab)
pollockjj/ComfyUI-MultiGPU

Flux2 t2i.json

Model Loading & GPU|CPU Distribution

UnetLoaderGGUFMultiGPU and VAELoaderMultiGPU are assigned directly to cuda:0.
ClipLoaderGGUFMultiGPU is fixed to cpu (CPU offloading).

By setting fixed devices, noticeable VRAM swapping is prevented.

Test System: RTX 3090 (24GB VRAM) + 32GB RAM

RAM usage: ~65-77%
VRAM usage: ~22-23GB
Virtual VRAM: not used

Quick Reference: FLUX.2 + Mistral-3-Small GGUF

VRAM	FLUX.2 UNet	Mistral Text Encoder (CPU)	Notes
24GB	Q8_0 (35GB)	Q8_K (29GB)	Best quality setup
16GB	Q5_K_M (24.1GB) - try Q6_K (27.4GB)	Q6_K (19.3GB) or Q4_K_M (14.3GB)	Balanced quality
12GB	Q4_K_M (20.1GB)	Q4_K_M (14.3GB) or Q3_K_M (11.5GB)	Speed priority

🔗 HF (right-click to open in new tab)
city96/FLUX.2-dev-gguf
unsloth/Mistral-Small-3.2-24B-Instruct-2506-GGUF

Active Configuration

VRAM (24GB GPU):

~18-20GB: UNet: flux2-dev-Q8_0.gguf (Alternative: FP8-Mixed.safetensors)
~1-1.5GB: VAE: flux2-vae.safetensors
~1-2GB: Overhead
= ~22-23GB total

RAM (32GB CPU):

~25GB: Text Encoder: Mistral-Small-3.2-24B-Instruct-2506-UD-Q8_K_XL.gguf (Alternative: fp8.safetensors)

🔗 HF (right-click to open in new tab)
Comfy-Org/flux2-dev

Memory Management

RAMCleanup Node (active):

Required: On first run
Optional: For resolutions ~1MP (e.g. 832×1216px)
Required: From >1MP, ≥2MP onwards (see Performance section)

Settings:

Clean File Cache: ✓
Clean Processes: ✓
~~Clean dlls~~
Retry attempts: 3
Runs between VAE-Decode and Save

🔗 GitHub (right-click to open in new tab)
LAOGOU-666/Comfyui-Memory_Cleanup

run_nvidia_gpu.bat

.\python_embeded\python.exe -s ComfyUI\main.py --windows-standalone-build --use-sage-attention --fast

Important:

❌ Remove --lowvram, --disable-smart-memory or similar flags
⚠️ No VRAM/Memory flags! Let ComfyUI breathe.
The details of this memory management are intelligently implemented by
comfyUI → city96's GGUF nodes → and the MultiGPU nodes on them.

SageAttention Setup

With GGUF: The acceleration provided by SageAttention is significantly lower than with FP8 models and may sometimes not take effect at all, since GGUF formats are primarily designed for CPU‑optimized inference and do not fully leverage the GPU kernels of SageAttention.
→ Disable --use-sage-attention. Instead, use --fast (standard PyTorch optimization) and rely on the internal optimizations of the GGUF Nodes(backend).

Check Triton installation (Windows):

.\python_embeded\python.exe -m pip show triton

If not installed:

.\python_embeded\python.exe -m pip install triton-windows

GPU-specific SageAttention versions:

GPU Series	Version	Reason
RTX 30xx (Ampere)	1.0.6	Version 2.x offers no performance improvement
RTX 40xx (Ada)	2.2.0	Primarily optimized for these or newer architectures

Installation:

# RTX 30xx
.\python_embeded\python.exe -m pip install sageattention==1.0.6

# RTX 40xx
.\python_embeded\python.exe -m pip install sageattention==2.2.0

💡 Tip: New to SageAttention 2.2.0 installation?
🔗 GitHub (right-click to open in new tab)
Check out this 🔧 Installation Guide: SageAttention + Triton for ComfyUI

Performance

Test Setup:
Guidance: 4 | Steps: 20 (Production: 30-40 Steps)
Based on: ~80 runs with different resolutions

First Run Initial loading of required layers into VRAM|RAM, first inference; Further inferences are then significantly faster as memory management is already initialized. Exact timings, loaded partially etc. see Console-Output / Screenshots.