InfiniteTalk WAN Animate ComfyUI Workflow

This guide walks through a single ComfyUI workflow that produces realistic swaps with a natural blend. I use a relight LoRA to correct color tone, InfiniteTalk with WAN 2.2 Animate for stronger lip sync, and compare a GGUF Q4 model with FP8. The workflow runs on low VRAM, and UNI3C can add camera motion.

I’ve divided the workflow into clear sections so it’s easy to follow. You’ll see how to mask subjects, drive facial motion from a reference video, enable lip sync, and test camera motion. I also include model setup, LoRA selections, and a direct Q4 vs FP8 comparison.

What is WAN 2.2 Animate in ComfyUI?

WAN 2.2 Animate is a video-to-video model setup in ComfyUI for character and face swaps, expression transfer, and motion-driven edits. With InfiniteTalk, it produces strong lip sync. With UNI3C, it can borrow camera motion from a reference clip. The workflow supports FP8 for quality and a GGUF Q4 build that runs on low VRAM.

Table Overview

Section	Purpose	Key Nodes/Notes
Background Masking	Define what to keep/remove from the input	Resolution Master, Point Editor (green/red points), SAM to Segment, optional Segments v3 layer mask
Face Motion Driver	Borrow facial motion from a reference video	Load Video (face image section group), hooks to switch drivers
Lip Sync	Improve sync on speaking/singing clips	InfiniteTalk with WAN 2.2 Animate (toggle on/off)
Camera Motion	Add camera angle/motion from a reference	UNI3C (download UNI3C ControlNet; default off)
Reference Image	The face/character you want in the result	Upload image to be added/edited in the video
Models & LoRAs	Core model selection and quality/motion add-ons	WAN 2.2 Animate (FP8 or GGUF Q4), Relightening LoRA, Lightex Four Step, Busa V1, V2.2 HPS LoRA, WELL text encoder

Key Features

Natural swaps and blends in one ComfyUI workflow
Color tone correction with a relight LoRA
Better lip sync using InfiniteTalk with WAN 2.2 Animate
Low VRAM option via GGUF Q4, compared directly to FP8
Optional camera motion via UNI3C
Flexible masking with point-based or SAM/Segments methods
Face motion driver swapping with quick hook changes

How it works

Workflow layout

The workflow is divided into sections: masking, face motion, lip sync, camera motion, reference image, and models/LoRAs. Each section can be toggled on or off for testing.

Background masking

Upload the clip you want to edit in the Background Masking section.
Use Resolution Master to select the best resolution for your video and a supported WAN 2.2 resolution.
In the Point Editor node, place green points on areas to include in the mask and red points on areas to exclude.
You can also use a layer mask in Segments v3 if you prefer. For this tutorial, I use SAM to Segment nodes.

Face motion from a reference video

If you want to change facial expressions or improve motion capture, load a separate reference video for the face.
Add another Load Video node and upload the face driver in the face image section group.
In Load Video, connect the image hook to the driver you want to use. To revert to the first video’s face, reconnect the original hook and remove the second.

InfiniteTalk lip sync

I use InfiniteTalk with WAN 2.2 Animate for better lip sync.
Enable or disable this section depending on your clip.

UNI3C camera motion control

I use a “WAN Video UNI3C” section to transfer camera motion from a reference.
It extracts motion, captures the camera angle, and adds it to the result video.
It may not always match expectations, but it’s worth testing with WAN 2.2 Animate.
Download the UNI3C ControlNet model and save it to your models/controlnet folder.
The section group can be unbypassed when ready; by default, I keep it off.

Reference image

Upload the image you want to add or edit into the video in the Reference Image section.

Models and LoRAs

Select the WAN 2.2 Animate FP8 model for quality.
If you’re low on VRAM, try the GGUF Q4 model. I compare Q4 with FP8 in this guide.
Download model files and save them in your ComfyUI diffusion_models folder.

I use these LoRAs in the workflow:

WAN Animate Relightening LoRA: fixes color tone to match the target scene.
Lightex Four Step: a regular in my workflows.
Busa V1: improves motion but uses a bit more memory.
V2.2 HPS LoRA (trained on human preference): improves quality.
Text Encoder: select the WELL text encoder used in these V-series workflows.

That completes the workflow overview.

How to use

Step-by-step setup

Download models:
- WAN 2.2 Animate (FP8 or GGUF Q4) → comfyui/models/diffusion_models
- UNI3C ControlNet → comfyui/models/controlnet
- InfiniteTalk (match GGUF with GGUF if using GGUF for WAN 2.2)
- LoRAs and WELL text encoder → their respective folders
Open the workflow in ComfyUI.
Confirm nodes resolve and paths point to the correct model files.

Masking and preparation

Load your target video in the Background Masking section.
Use Resolution Master to set a supported WAN 2.2 resolution for your clip.
Open the Point Editor:
- Place green points on the subject you want to keep.
- Place red points on areas to remove.
Optional: use a layer mask in Segments v3 for a different masking method.
Confirm the mask looks clean before moving on.

Face motion driver selection

If you want to change facial expression/motion, add another Load Video node for the driver clip.
Upload the driver in the face image section group.
Connect the image hook from this new driver where the workflow expects the face source.
To use the original face source again, reconnect the first node’s hook and remove the second video hook.

Lip sync with InfiniteTalk

Toggle on the InfiniteTalk section.
If you’re running WAN 2.2 Animate GGUF, select the InfiniteTalk GGUF model as well.
Keep the rest of the model stack as configured and run a short test to validate sync.

Camera motion with UNI3C

Toggle on the UNI3C group when you want to transfer camera motion.
Load a video with the camera angles you want in your final output.
Ensure the UNI3C ControlNet file is installed in models/controlnet.
Run a short generation. If motion is subtle, try a clip with more pronounced camera movement.

Final checks

Double-check model selections (FP8 vs Q4), LoRAs, and the WELL text encoder.
Verify that quantization is disabled for the GGUF Q4 model if required.
Confirm frame range settings match your intended output and are not locked to a small subset.

Results and comparisons

Example 1: Subject swap on a guitar clip

I uploaded an image of a golem with many objects and another character in the background. I only wanted the golem, so I used a Remove Background node to clear everything else.

I loaded a video of a lady playing guitar and replaced her with the golem. Since her hands move quickly over the guitar, I refined the mask:

Red circles on the guitar to exclude it
Green circles on the lady to include her form

The mask looked clean. The DW node analyzed facial cues and captured expressions. After running the graph, the lady was replaced by the golem with a natural blend. The golem plays the guitar and the movement feels correct.

Q4 GGUF vs FP8

I tested the GGF (GGUF) Q4 model to compare against FP8. Make sure quantization is disabled. The first test took about 28 seconds but had a motion issue. I noticed the video was locked to 37 frames even though the clip had 153 frames. After removing the frame lock, the 153 frames split into two parts, and the full generation took about 1 minute.

The new result was better. Compared with FP8, Q4 delivered the same quality in this test. The hands were generated correctly, guitar details held up, and hand movement matched the reference video. In my FP8 test, it didn’t capture that particular move as well. On low VRAM, Q4 can achieve output comparable to FP8 with full motion captured.

Changing facial expressions with a different driver video

To change expressions, I loaded a new driver video where the lady moves her head up and down. I switched the face driver hook to the new Load Video node. The golem’s expression changed but didn’t match 100%.

I tried another driver: a clip of a lady driving a car and chasing someone. For the reference image, I used a realistic lady in a t-shirt. On the first run with that driver, the output followed the up/down motion but the result wasn’t there yet; I hadn’t changed the prompt. On the next run, the result followed the second video’s motion fully. This is how you can steer expression: load a different driver video and reconnect the hook to use its motion.

Singing with InfiniteTalk enabled

I enabled the InfiniteTalk section for lip sync. I uploaded a video of a lady singing and a reference image of a lady. First, I disabled other model groups so only the mask was active to confirm the mask looked correct. Then I enabled the other groups and selected the InfiniteTalk model.

Important: if you’re using the WAN 2.2 Animate GGUF build, select the InfiniteTalk GGUF build as well. The generated result synced the song accurately, and the reference lady replaced the original. Zoomed-in areas still looked well blended, and accessories like a necklace, clothing, and face matched the image.

UNI3C camera motion test

For the final test, I enabled the UNI3C group and uploaded a video with camera angles I wanted in the output. I set the resolution to 720×720. The result looked mostly static with slight motion. You can compare shots to see the difference.

I suggest testing different camera angle clips to find the best match for your footage. Motion strength can vary by source.

How to use: quick reference

Masking and replacement

Use Resolution Master for a supported WAN 2.2 resolution.
In Point Editor:
- Green points: keep
- Red points: remove
Optional: Segments v3 layer mask
Verify mask before generating

Face driver swapping

Add a new Load Video node
Upload driver in the face image section group
Switch hooks to the desired driver
Reconnect to original when needed

Models and folders

WAN 2.2 Animate (FP8/GGUF Q4): comfyui/models/diffusion_models
UNI3C ControlNet: comfyui/models/controlnet
InfiniteTalk (matching build): appropriate models folder
LoRAs (Relightening, Lightex Four Step, Busa V1, V2.2 HPS): comfyui/models/loras
WELL text encoder: appropriate encoders folder

Lip sync

Enable InfiniteTalk
If WAN 2.2 is GGUF, select InfiniteTalk GGUF

Camera motion

Enable UNI3C
Load a reference clip with the angles you want
Ensure the UNI3C ControlNet is installed

FAQs

Which model should I pick: FP8 or GGUF Q4?

FP8 is a strong default if you have the VRAM. If you’re limited on VRAM, Q4 is a solid option. In my tests, Q4 matched FP8 quality and captured motion very well.

Why is motion missing or inconsistent?

Check frame settings. In one test, the clip had 153 frames but was locked to 37 frames, which hurt motion continuity. Remove unintended frame locks and regenerate.

Do I need to disable quantization for Q4?

Yes, make sure quantization is disabled when running the GGUF Q4 model, as shown in the comparison test.

How do I change facial expression?

Add a second driver video and switch the hook to point to the new driver. If the result doesn’t follow well on the first try, revisit the prompt and regenerate.

How do I get better lip sync?

Enable the InfiniteTalk section and match the model build to WAN 2.2 (GGUF with GGUF). Confirm masks first, then enable the rest of the groups and run.

UNI3C didn’t add much camera motion. What can I do?

Try a reference with stronger camera movement. Results depend on the source clip’s motion. Keep UNI3C ControlNet installed and test several angles.

What LoRAs are you using?

WAN Animate Relightening LoRA for color tone fixes
Lightex Four Step
Busa V1 for motion (uses more memory)
V2.2 HPS LoRA for quality

Which text encoder should I select?

Use the WELL text encoder model used across these V-series workflows.

Where do I save the models?

WAN 2.2 Animate: comfyui/models/diffusion_models
UNI3C ControlNet: comfyui/models/controlnet
LoRAs: comfyui/models/loras
Encoders: comfyui/models/text_encoders (or your encoder folder)

Why does the blend look off?

Recheck the mask. Use green/red point placement carefully, and consider Segments v3 or SAM to Segment for finer control. The Relightening LoRA can also help match color tone to the scene.

Conclusion

This ComfyUI workflow produces solid swaps with a natural blend and strong lip sync using InfiniteTalk with WAN 2.2 Animate. It supports a low VRAM path via GGUF Q4 that compares well to FP8. With simple hook switches, you can change the face driver and steer expressions. UNI3C can add camera motion when provided a source with clear movement.

Set your resolution, build a clean mask, pick the right models and LoRAs, and confirm frame settings. With those pieces in place, the workflow runs reliably and delivers consistent results on both FP8 and GGUF Q4.