InfiniteTalk WAN Animate ComfyUI Workflow

This guide walks through a single ComfyUI workflow that produces realistic swaps with a natural blend. I use a relight LoRA to correct color tone, InfiniteTalk with WAN 2.2 Animate for stronger lip sync, and compare a GGUF Q4 model with FP8. The workflow runs on low VRAM, and UNI3C can add camera motion.
I’ve divided the workflow into clear sections so it’s easy to follow. You’ll see how to mask subjects, drive facial motion from a reference video, enable lip sync, and test camera motion. I also include model setup, LoRA selections, and a direct Q4 vs FP8 comparison.
What is WAN 2.2 Animate in ComfyUI?
WAN 2.2 Animate is a video-to-video model setup in ComfyUI for character and face swaps, expression transfer, and motion-driven edits. With InfiniteTalk, it produces strong lip sync. With UNI3C, it can borrow camera motion from a reference clip. The workflow supports FP8 for quality and a GGUF Q4 build that runs on low VRAM.
Table Overview
Section | Purpose | Key Nodes/Notes |
---|---|---|
Background Masking | Define what to keep/remove from the input | Resolution Master, Point Editor (green/red points), SAM to Segment, optional Segments v3 layer mask |
Face Motion Driver | Borrow facial motion from a reference video | Load Video (face image section group), hooks to switch drivers |
Lip Sync | Improve sync on speaking/singing clips | InfiniteTalk with WAN 2.2 Animate (toggle on/off) |
Camera Motion | Add camera angle/motion from a reference | UNI3C (download UNI3C ControlNet; default off) |
Reference Image | The face/character you want in the result | Upload image to be added/edited in the video |
Models & LoRAs | Core model selection and quality/motion add-ons | WAN 2.2 Animate (FP8 or GGUF Q4), Relightening LoRA, Lightex Four Step, Busa V1, V2.2 HPS LoRA, WELL text encoder |
Key Features
- Natural swaps and blends in one ComfyUI workflow
- Color tone correction with a relight LoRA
- Better lip sync using InfiniteTalk with WAN 2.2 Animate
- Low VRAM option via GGUF Q4, compared directly to FP8
- Optional camera motion via UNI3C
- Flexible masking with point-based or SAM/Segments methods
- Face motion driver swapping with quick hook changes
How it works
Workflow layout
The workflow is divided into sections: masking, face motion, lip sync, camera motion, reference image, and models/LoRAs. Each section can be toggled on or off for testing.
Background masking
- Upload the clip you want to edit in the Background Masking section.
- Use Resolution Master to select the best resolution for your video and a supported WAN 2.2 resolution.
- In the Point Editor node, place green points on areas to include in the mask and red points on areas to exclude.
- You can also use a layer mask in Segments v3 if you prefer. For this tutorial, I use SAM to Segment nodes.
Face motion from a reference video
- If you want to change facial expressions or improve motion capture, load a separate reference video for the face.
- Add another Load Video node and upload the face driver in the face image section group.
- In Load Video, connect the image hook to the driver you want to use. To revert to the first video’s face, reconnect the original hook and remove the second.
InfiniteTalk lip sync
- I use InfiniteTalk with WAN 2.2 Animate for better lip sync.
- Enable or disable this section depending on your clip.
UNI3C camera motion control
- I use a “WAN Video UNI3C” section to transfer camera motion from a reference.
- It extracts motion, captures the camera angle, and adds it to the result video.
- It may not always match expectations, but it’s worth testing with WAN 2.2 Animate.
- Download the UNI3C ControlNet model and save it to your models/controlnet folder.
- The section group can be unbypassed when ready; by default, I keep it off.
Reference image
- Upload the image you want to add or edit into the video in the Reference Image section.
Models and LoRAs
- Select the WAN 2.2 Animate FP8 model for quality.
- If you’re low on VRAM, try the GGUF Q4 model. I compare Q4 with FP8 in this guide.
- Download model files and save them in your ComfyUI diffusion_models folder.
I use these LoRAs in the workflow:
- WAN Animate Relightening LoRA: fixes color tone to match the target scene.
- Lightex Four Step: a regular in my workflows.
- Busa V1: improves motion but uses a bit more memory.
- V2.2 HPS LoRA (trained on human preference): improves quality.
- Text Encoder: select the WELL text encoder used in these V-series workflows.
That completes the workflow overview.
How to use
Step-by-step setup
- Download models:
- WAN 2.2 Animate (FP8 or GGUF Q4) → comfyui/models/diffusion_models
- UNI3C ControlNet → comfyui/models/controlnet
- InfiniteTalk (match GGUF with GGUF if using GGUF for WAN 2.2)
- LoRAs and WELL text encoder → their respective folders
- Open the workflow in ComfyUI.
- Confirm nodes resolve and paths point to the correct model files.
Masking and preparation
- Load your target video in the Background Masking section.
- Use Resolution Master to set a supported WAN 2.2 resolution for your clip.
- Open the Point Editor:
- Place green points on the subject you want to keep.
- Place red points on areas to remove.
- Optional: use a layer mask in Segments v3 for a different masking method.
- Confirm the mask looks clean before moving on.
Face motion driver selection
- If you want to change facial expression/motion, add another Load Video node for the driver clip.
- Upload the driver in the face image section group.
- Connect the image hook from this new driver where the workflow expects the face source.
- To use the original face source again, reconnect the first node’s hook and remove the second video hook.
Lip sync with InfiniteTalk
- Toggle on the InfiniteTalk section.
- If you’re running WAN 2.2 Animate GGUF, select the InfiniteTalk GGUF model as well.
- Keep the rest of the model stack as configured and run a short test to validate sync.
Camera motion with UNI3C
- Toggle on the UNI3C group when you want to transfer camera motion.
- Load a video with the camera angles you want in your final output.
- Ensure the UNI3C ControlNet file is installed in models/controlnet.
- Run a short generation. If motion is subtle, try a clip with more pronounced camera movement.
Final checks
- Double-check model selections (FP8 vs Q4), LoRAs, and the WELL text encoder.
- Verify that quantization is disabled for the GGUF Q4 model if required.
- Confirm frame range settings match your intended output and are not locked to a small subset.
Results and comparisons
Example 1: Subject swap on a guitar clip
I uploaded an image of a golem with many objects and another character in the background. I only wanted the golem, so I used a Remove Background node to clear everything else.
I loaded a video of a lady playing guitar and replaced her with the golem. Since her hands move quickly over the guitar, I refined the mask:
- Red circles on the guitar to exclude it
- Green circles on the lady to include her form
The mask looked clean. The DW node analyzed facial cues and captured expressions. After running the graph, the lady was replaced by the golem with a natural blend. The golem plays the guitar and the movement feels correct.
Q4 GGUF vs FP8
I tested the GGF (GGUF) Q4 model to compare against FP8. Make sure quantization is disabled. The first test took about 28 seconds but had a motion issue. I noticed the video was locked to 37 frames even though the clip had 153 frames. After removing the frame lock, the 153 frames split into two parts, and the full generation took about 1 minute.
The new result was better. Compared with FP8, Q4 delivered the same quality in this test. The hands were generated correctly, guitar details held up, and hand movement matched the reference video. In my FP8 test, it didn’t capture that particular move as well. On low VRAM, Q4 can achieve output comparable to FP8 with full motion captured.
Changing facial expressions with a different driver video
To change expressions, I loaded a new driver video where the lady moves her head up and down. I switched the face driver hook to the new Load Video node. The golem’s expression changed but didn’t match 100%.
I tried another driver: a clip of a lady driving a car and chasing someone. For the reference image, I used a realistic lady in a t-shirt. On the first run with that driver, the output followed the up/down motion but the result wasn’t there yet; I hadn’t changed the prompt. On the next run, the result followed the second video’s motion fully. This is how you can steer expression: load a different driver video and reconnect the hook to use its motion.
Singing with InfiniteTalk enabled
I enabled the InfiniteTalk section for lip sync. I uploaded a video of a lady singing and a reference image of a lady. First, I disabled other model groups so only the mask was active to confirm the mask looked correct. Then I enabled the other groups and selected the InfiniteTalk model.
Important: if you’re using the WAN 2.2 Animate GGUF build, select the InfiniteTalk GGUF build as well. The generated result synced the song accurately, and the reference lady replaced the original. Zoomed-in areas still looked well blended, and accessories like a necklace, clothing, and face matched the image.
UNI3C camera motion test
For the final test, I enabled the UNI3C group and uploaded a video with camera angles I wanted in the output. I set the resolution to 720×720. The result looked mostly static with slight motion. You can compare shots to see the difference.
I suggest testing different camera angle clips to find the best match for your footage. Motion strength can vary by source.
How to use: quick reference
Masking and replacement
- Use Resolution Master for a supported WAN 2.2 resolution.
- In Point Editor:
- Green points: keep
- Red points: remove
- Optional: Segments v3 layer mask
- Verify mask before generating
Face driver swapping
- Add a new Load Video node
- Upload driver in the face image section group
- Switch hooks to the desired driver
- Reconnect to original when needed
Models and folders
- WAN 2.2 Animate (FP8/GGUF Q4): comfyui/models/diffusion_models
- UNI3C ControlNet: comfyui/models/controlnet
- InfiniteTalk (matching build): appropriate models folder
- LoRAs (Relightening, Lightex Four Step, Busa V1, V2.2 HPS): comfyui/models/loras
- WELL text encoder: appropriate encoders folder
Lip sync
- Enable InfiniteTalk
- If WAN 2.2 is GGUF, select InfiniteTalk GGUF
Camera motion
- Enable UNI3C
- Load a reference clip with the angles you want
- Ensure the UNI3C ControlNet is installed
FAQs
Which model should I pick: FP8 or GGUF Q4?
FP8 is a strong default if you have the VRAM. If you’re limited on VRAM, Q4 is a solid option. In my tests, Q4 matched FP8 quality and captured motion very well.
Why is motion missing or inconsistent?
Check frame settings. In one test, the clip had 153 frames but was locked to 37 frames, which hurt motion continuity. Remove unintended frame locks and regenerate.
Do I need to disable quantization for Q4?
Yes, make sure quantization is disabled when running the GGUF Q4 model, as shown in the comparison test.
How do I change facial expression?
Add a second driver video and switch the hook to point to the new driver. If the result doesn’t follow well on the first try, revisit the prompt and regenerate.
How do I get better lip sync?
Enable the InfiniteTalk section and match the model build to WAN 2.2 (GGUF with GGUF). Confirm masks first, then enable the rest of the groups and run.
UNI3C didn’t add much camera motion. What can I do?
Try a reference with stronger camera movement. Results depend on the source clip’s motion. Keep UNI3C ControlNet installed and test several angles.
What LoRAs are you using?
- WAN Animate Relightening LoRA for color tone fixes
- Lightex Four Step
- Busa V1 for motion (uses more memory)
- V2.2 HPS LoRA for quality
Which text encoder should I select?
Use the WELL text encoder model used across these V-series workflows.
Where do I save the models?
- WAN 2.2 Animate: comfyui/models/diffusion_models
- UNI3C ControlNet: comfyui/models/controlnet
- LoRAs: comfyui/models/loras
- Encoders: comfyui/models/text_encoders (or your encoder folder)
Why does the blend look off?
Recheck the mask. Use green/red point placement carefully, and consider Segments v3 or SAM to Segment for finer control. The Relightening LoRA can also help match color tone to the scene.
Conclusion
This ComfyUI workflow produces solid swaps with a natural blend and strong lip sync using InfiniteTalk with WAN 2.2 Animate. It supports a low VRAM path via GGUF Q4 that compares well to FP8. With simple hook switches, you can change the face driver and steer expressions. UNI3C can add camera motion when provided a source with clear movement.
Set your resolution, build a clean mask, pick the right models and LoRAs, and confirm frame settings. With those pieces in place, the workflow runs reliably and delivers consistent results on both FP8 and GGUF Q4.
Recent Posts

Wan 2.2 + Qwen Image: Alibaba AI Content Creation Pipeline
Recap: Wan 2.2 and Qwen Image updates, plus a practical Alibaba AI pipeline that blends video and image models for faster, higher-quality content creation.

Multi-Person InfiniteTalk Step-by-Step Setup
Set up multi-speaker InfiniteTalk in Wan2GP on one GPU (RunPod). Step-by-step tips, common gotchas, and troubleshooting to get people—or pets—talking.

Infinite Meigen Talk Lip-Sync Video-to-Video
Create long, lip-synced singing videos free with Infinite Talk. Setup, workflow, and quality/speed comparison—optimized for RTX 4070 (8GB VRAM).