feat: Longlive VACE FFLF pipeline layer support #287

ryanontheinside · 2025-12-29T21:29:07Z

This adds FFLF support for Longlive at the pipeline layer. API and UI to follow.

Signed-off-by: RyanOnTheInside <7623207+ryanontheinside@users.noreply.github.com>

yondonfu · 2025-12-30T22:39:07Z

src/scope/core/pipelines/wan2_1/vace/utils/encoding.py

-        # Transpose [B, F, C, H, W] -> [B, C, F, H, W] and concatenate along channel dim
+
+        inactive_out = vae.encode_to_latent(inactive_stacked, use_cache=use_cache)
+        reactive_out = vae.encode_to_latent(reactive_stacked, use_cache=False)


use_cache was set to True here in #283 to fix an issue with white flashing when using depth control videos.

Is this change necessary for FFLF?

If yes, we need a different solution for the white flashing issue with depth control videos.

Thanks for catching this. It is required for the gradual last frame use case, which I think is an important one. The temporal blending weakens the diffusion-driven transformation such that it looks more like simple latent interpolation than anything else.

1output_extension_scale_0.00to1.00_weak_middle_8chunks.mp4

output_extension_scale_0.00to1.00_weak_middle_8chunks.mp4

Are you planning on addressing the white flashing issue with depth maps separately then?

What are these videos showing? Is it a comparison of using cache and not using cache during encoding?

@ryanontheinside Following up on an offline convo:

Just want to make clear the problem I see with setting use_cache=False for the reactive portion right now in your change. This seems like it would be a regression if this PR is merged as-is.

I used this as the input control video:

AnimateDiff_00003_scaled_5x.mp4

I applied this diff:

diff --git a/src/scope/core/pipelines/longlive/test_vace.py b/src/scope/core/pipelines/longlive/test_vace.py index 2d81c60..a8b9036 100644 --- a/src/scope/core/pipelines/longlive/test_vace.py +++ b/src/scope/core/pipelines/longlive/test_vace.py @@ -45,16 +45,16 @@ from .pipeline import LongLivePipeline CONFIG = { # ===== MODE SELECTION ===== "use_r2v": False, # Reference-to-Video: condition on reference images - "use_depth": False, # Depth guidance: structural control via depth maps + "use_depth": True, # Depth guidance: structural control via depth maps "use_inpainting": False, # Inpainting: masked video-to-video generation - "use_extension": True, # Extension mode: temporal generation (firstframe/lastframe/firstlastframe) + "use_extension": False, # Extension mode: temporal generation (firstframe/lastframe/firstlastframe) # ===== INPUT PATHS ===== # R2V: List of reference image paths (condition entire video, don't appear in output) "ref_images": [ "frontend/public/assets/example.png", # path/to/image.png ], # Depth: Path to depth map video (grayscale or RGB, will be converted) - "depth_video": "vace_tests/control_frames_depth.mp4", # path/to/depth_video.mp4 + "depth_video": "vace_tests/AnimateDiff_00003_scaled_5x.mp4", # path/to/depth_video.mp4 # Inpainting: Input video and mask video paths "input_video": "frontend/public/assets/test.mp4", # path/to/input_video.mp4 "mask_video": "vace_tests/circle_mask.mp4", # path/to/mask_video.mp4 @@ -65,14 +65,14 @@ CONFIG = { # ===== GENERATION PARAMETERS ===== "prompt": None, # Set to override mode-specific prompts, or None to use defaults "prompt_r2v": "", # Default prompt for R2V mode - "prompt_depth": "a cat walking towards the camera", # Default prompt for depth mode + "prompt_depth": "a woman dancing", # Default prompt for depth mode "prompt_inpainting": "a fireball", # Default prompt for inpainting mode "prompt_extension": "", # Default prompt for extension mode - "num_chunks": 2, # Number of generation chunks + "num_chunks": 50, # Number of generation chunks "frames_per_chunk": 12, # Frames per chunk (12 = 3 latent * 4 temporal upsample) - "height": 512, - "width": 512, - "vace_context_scale": 1.5, # VACE conditioning strength + "height": 480, + "width": 832, + "vace_context_scale": 1.0, # VACE conditioning strength # ===== INPAINTING SPECIFIC ===== "mask_threshold": 0.5, # Threshold for binarizing mask (0-1) "mask_value": 127, # Gray value for masked regions (0-255) @@ -490,7 +490,7 @@ def main(): print("Initializing pipeline...")

I ran:

uv run -m scope.core.pipelines.longlive.test_vace

I got:

output_depth.mp4

Observe the white flashing effect throughout this output video.

yondonfu · 2025-12-31T16:41:01Z

src/scope/core/pipelines/wan2_1/vace/blocks/vace_encoding.py

+                "extension_mode",
+                default=None,
+                type_hint=str,
+                description="Extension mode for temporal generation: 'firstframe' (ref at start, generate after), 'lastframe' (generate before, ref at end), or 'firstlastframe' (refs at both ends). Applies to specific chunks based on current_start_frame.",


Do we need an explicit concept of extension mode? Or can the mode be implicit based on whether first_frame_image and/or last_frame_image is provided?

Eg.

If first_frame_image, but no last_frame_image then we start with first_frame_image and generate rest.

If last_frame_image, but no first_frame_image then we generate everything with last_frame_image at end.

If both are present, then start with first_frame_image end with last_frame_image and generate everything else.

@ryanontheinside Following up on offline convo:

I have a pref for making the extension mode implicit based on whether first_frame_image and/or last_frame_image is provided because it would simplify the pipeline API usage by avoiding the need for an additional extension_mode param.

Signed-off-by: RyanOnTheInside <7623207+ryanontheinside@users.noreply.github.com>

ryanontheinside requested a review from yondonfu December 29, 2025 21:30

feat: Longlive VACE FFLF pipeline layer support

34fe179

Signed-off-by: RyanOnTheInside <7623207+ryanontheinside@users.noreply.github.com>

ryanontheinside force-pushed the ryanontheinside/feat/VACE-FFLF/01-longlive-fflf-pipeline branch from d2ca37b to 34fe179 Compare December 30, 2025 10:27

ryanontheinside mentioned this pull request Dec 30, 2025

feat: streamdiffusion + reward forcing VACE FFLF pipeline layer support #291

Open

yondonfu reviewed Dec 30, 2025

View reviewed changes

yondonfu reviewed Dec 31, 2025

View reviewed changes

fix: use_cache and implicit extension mode

8927725

Signed-off-by: RyanOnTheInside <7623207+ryanontheinside@users.noreply.github.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: Longlive VACE FFLF pipeline layer support #287

feat: Longlive VACE FFLF pipeline layer support #287

Uh oh!

ryanontheinside commented Dec 29, 2025

Uh oh!

yondonfu Dec 30, 2025

Uh oh!

ryanontheinside Dec 31, 2025

Uh oh!

yondonfu Dec 31, 2025

Uh oh!

yondonfu Jan 2, 2026 •

edited

Loading

Uh oh!

yondonfu Dec 31, 2025

Uh oh!

yondonfu Jan 2, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

feat: Longlive VACE FFLF pipeline layer support #287

Are you sure you want to change the base?

feat: Longlive VACE FFLF pipeline layer support #287

Uh oh!

Conversation

ryanontheinside commented Dec 29, 2025

Uh oh!

yondonfu Dec 30, 2025

Choose a reason for hiding this comment

Uh oh!

ryanontheinside Dec 31, 2025

Choose a reason for hiding this comment

Uh oh!

yondonfu Dec 31, 2025

Choose a reason for hiding this comment

Uh oh!

yondonfu Jan 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

yondonfu Dec 31, 2025

Choose a reason for hiding this comment

Uh oh!

yondonfu Jan 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

yondonfu Jan 2, 2026 •

edited

Loading

yondonfu Jan 2, 2026 •

edited

Loading