[Feature] Scene Continuation & Video Concatenation - Seamless Multi-Clip Assembly

## Overview

Modern AI video generation models — Kling, Wan 2.1, SVD, LTX, Runway, and others — have a hard architectural constraint: they can only produce short clips, typically 2–10 seconds per generation. This is not an edge case or a temporary limitation to work around. It is the fundamental reality of how these models operate today, and designing around it is the primary challenge of building a usable AI video editor.

To produce a 1-minute video, a user needs approximately 10–30 individual AI-generated clips. A 5-minute video requires 50–150 clips. The editor must make this feel seamless, not like a technical workaround. The **clip-chaining system described in this issue is therefore not a Phase 4 polish feature — it is the central architectural pillar of the entire editor.** Every other feature (generation, timeline, export) must be built with this multi-clip reality in mind from day one.

---

## Core Concepts

### Clip vs Scene vs Project

| Term | Definition |
|---|---|
| **Clip** | A single AI-generated video segment. Raw model output. Typically 2–10 seconds long depending on model. |
| **Scene** | A logical story unit — e.g., "hero walks into the building". May be composed of 1 or many sequential clips to reach the desired duration. |
| **Project** | A complete video, organized as an ordered sequence of scenes, each composed of clips. |

A scene with a target duration of 15 seconds using a model capped at 5 seconds per generation requires at minimum 3 clips chained end-to-end. The editor must make this assembly transparent and effortless.

### Clip Chain Architecture

Each scene maintains an internal **clip chain**: an ordered list of clips played sequentially.

```
Scene: "Hero enters the building"  (target: 15s)
┌──────────────┬──────────────┬──────────────┐
│   Clip A     │   Clip B     │   Clip C     │
│   0–5s       │   5–10s      │   10–15s     │
│ [thumbnail]  │ [thumbnail]  │ [thumbnail]  │
└──────────────┴──────────────┴──────────────┘
      ↓ last frame         ↓ last frame
   → first frame of B   → first frame of C
```

Key invariant: the **last frame of Clip N automatically becomes the first frame of Clip N+1**, maintaining visual continuity across the chain.

---

## Required Features

### Clip Chain UI

- Within each scene card in the timeline, show a horizontal **filmstrip** of all clips in the chain
- Each clip thumbnail shows: preview image, duration, model name, generation status (pending / generating / done / error)
- **"+" button** at the end of the filmstrip: adds a new clip to the chain, automatically extracting and using the last frame of the previous clip as the image-to-video starting frame
- **Drag to reorder** clips within a scene
- **Right-click / long-press context menu** on any clip:
  - Regenerate (re-run generation with same or modified parameters)
  - Delete (with continuity options — see Edge Cases below)
  - Replace (swap in a different local video file)
  - Set as Scene Start (promote this clip's first frame as the scene's canonical start frame)
  - Extract Frame (open frame extractor tool at this clip)
  - View generation parameters
- **Collapse / expand** the clip chain strip — default collapsed when scene has only 1 clip, expanded when 2+

### Auto-Extension Workflow

When a user sets a target duration for a scene that exceeds what a single model generation can produce, the system should offer automatic clip chaining:

- **"Target duration"** field per scene (e.g., 15 seconds)
- **"Auto-extend"** button: the system calculates the required number of clips (`ceil(target ÷ model_max_duration)`), generates them sequentially, with each clip using the extracted last frame of the previous clip as the continuity seed
- **Progress indicator** throughout generation: "Generating clip 3 of 5 for Scene 2..."
- **Early stop option**: user can halt auto-extension at any point if the result already looks satisfactory
- Auto-extension defaults to the same model and parameters as the first clip in the chain, with an option to override per-extension
- After auto-extension completes, user reviews the assembled scene and can regenerate individual clips that did not come out well

### Cross-Scene Continuity

- **"Continue from previous scene"** toggle per scene
- When enabled: the first clip of the current scene is generated using the last frame of the last clip of the previous scene as its first-frame seed — creating a continuous visual flow across scene boundaries
- **Visual continuity indicator** in the timeline: a link icon or connector line between adjacent scenes that have continuity enabled
- **Break continuity**: explicitly set a new first frame image to start a scene fresh (a new location, a time-cut, etc.)
- Continuity state is saved in the project data model per scene

### Frame Extraction and Continuity Tools

- **Frame extractor modal**: scrubber over the video clip, frame-accurate preview, "Use this frame as continuity seed" button
- **Auto-extract last frame**: automatically pull the final frame of any clip for use as the next clip's seed (default behavior for "+")
- **Quality heuristic** for auto-extraction: prefer frames without obvious motion blur or mid-blink artifacts (simple pixel variance or sharpness check)
- All extracted frames are saved to the **project asset library** with a reference to the source clip and timestamp
- Extracted frames are displayed in the Asset Library (#73) under a "Continuity Frames" category

### Transition Options at Clip Boundaries (within a scene)

| Transition | Description |
|---|---|
| Seamless cut | Default. No transition applied. |
| Crossfade | Overlap end of Clip N with start of Clip N+1, configurable duration 0.1s–2.0s |
| Motion blur blend | Blend frames with increasing/decreasing blur at boundary |
| Match cut | User manually marks a matching frame in each clip; editor aligns them |

### Transition Options at Scene Boundaries

| Transition | Description |
|---|---|
| Hard cut | Instantaneous scene change |
| Fade to black | Clip fades out to black before next scene begins |
| Fade from black | Next scene fades in from black |
| Fade to black + from black | Combined: clip fades to black, pause, next fades in |
| Dissolve | Configurable duration: overlapping dissolve between last clip of scene N and first clip of scene N+1 |
| Custom transition clip | User uploads a short video file (e.g., a logo sting or abstract wipe) inserted between scenes |

### Long Video Planning Tools

Before the user starts generating clips, the editor should help them plan the full project:

- **Duration planner**: user enters a target total video duration → the editor shows a breakdown: how many scenes, estimated clips per scene, total clip count, and approximate generation cost
- **Model duration reference card**: visible in the generation panel, showing the max output duration per model (e.g., Kling 1.6 = 5s or 10s, Wan 2.1 = 4s, SVD-XT = ~3s at 25 frames, LTX-Video = variable)
- **Clip count estimate**: shown per-scene and for the whole project before any generation starts
- **Cost estimate**: based on total clip count and per-clip cost for the selected model, pulled from the cost tracking system (#75)

### Concatenation Engine

- **FFmpeg-based, fully local** — all concatenation happens on the user's device, no re-upload to any server
- **Fast path (stream copy)**: when all clips in the chain share the same resolution, codec, and frame rate, use FFmpeg's `concat` demuxer for a lossless, near-instant join
- **Full re-encode path**: activated when clips differ in resolution or FPS, or when crossfade/dissolve transitions are applied; uses libx264 with configurable quality settings
- **Intermediate concatenation**: user can render a partial video (e.g., scenes 1–3 only) to review pacing and continuity before committing to generating the remaining scenes
- **Incremental re-render**: when a single clip is changed (regenerated or replaced), only re-concatenate the affected scene and downstream segments — unchanged segments are served from cache
- **Audio stitching**: audio tracks (voice narration, background music) are stitched alongside video, with configurable crossfade durations at scene boundaries

### Clip Versioning

- Every AI generation attempt for a clip position in the chain is saved as a **version** — never silently overwritten
- The user selects which version is "active" for the chain; only the active version is included in concatenation
- Switching the active version of Clip N triggers a prompt: "The next clip uses this clip's last frame as its seed. Regenerate downstream clips, or keep them as-is?"
- **"Pin first frame" option per clip position**: even if the user swaps active versions, the pinned last-frame extraction for continuity purposes does not change — useful for keeping a stable downstream chain while exploring different generations of one clip
- Version list is accessible from the right-click context menu on any clip thumbnail

---

## Edge Cases to Handle

| Scenario | Handling |
|---|---|
| Clip generation fails mid-chain | Offer retry from the failed clip; all previous clips in the chain are preserved and do not need to be regenerated |
| Model output resolution changes mid-chain | Warn the user; offer to auto-scale or crop the differing clip to match the chain's established resolution |
| Narration audio track is longer than assembled video | Options: extend the last clip via auto-extension, trim the audio, or pad with a freeze-frame |
| User deletes a middle clip from the chain | Offer two options: (1) Re-link — use left neighbor's last frame to regenerate a replacement clip, or (2) Leave gap — remove clip, and offer to re-stitch or re-generate continuity manually |
| Two adjacent clips have a visible jump cut despite continuity | Surface the frame extractor and offer to regenerate the right-side clip with a refined first-frame seed |
| First clip of chain has no seed image (pure text-to-video) | Supported; continuity extraction begins from its last frame for subsequent clips |

---

## Technical Notes

### Data Model

```dart
class Project {
  List<Scene> scenes;
}

class Scene {
  String id;
  String title;
  int targetDurationSeconds;
  bool continueFromPreviousScene;
  String? overrideFirstFramePath;
  List<Clip> clipChain;
}

class Clip {
  String id;
  int chainIndex;
  GenerationParams generationParams;
  ClipStatus status; // pending | generating | done | error
  String? localPath;
  String? firstFramePath;
  String? lastFramePath;
  List<ClipVersion> versions;
  String? activeVersionId;
  bool pinnedLastFrame;
}

class ClipVersion {
  String id;
  DateTime generatedAt;
  String localPath;
  String lastFramePath;
  Map<String, dynamic> generationParams;
}
```

### FFmpeg Integration

- Concatenation: `ffmpeg -f concat -safe 0 -i filelist.txt -c copy output.mp4`
- Frame extraction (last frame): `ffmpeg -sseof -1 -i input.mp4 -vframes 1 last_frame.png`
- Frame extraction (specific timestamp): `ffmpeg -ss TIMESTAMP -i input.mp4 -vframes 1 frame.png`
- Crossfade transition: `xfade` filter with configurable duration and offset
- All FFmpeg calls go through `ffmpeg_kit_flutter`

### File Naming Convention

```
{projectId}/
  scenes/
    {sceneId}/
      clips/
        {clipIndex}_{versionId}.mp4
        {clipIndex}_{versionId}_last_frame.png
      concatenated_scene.mp4
  concatenated_partial_{sceneRange}.mp4
  final_export.mp4
```

### Caching Strategy

- Each concatenated scene output is cached by a hash of its clip chain (clip IDs + active version IDs)
- On change to any clip in the chain, invalidate only that scene's cache and any downstream partial or final concatenations
- Final export is re-generated only from changed scenes; unchanged scene segments are reused from cache

---

## Acceptance Criteria

- [ ] Clip chain UI renders inside each scene card with correct filmstrip layout
- [ ] "+" button auto-extracts last frame and passes it as first-frame seed to the next generation
- [ ] Auto-extension workflow generates N clips sequentially with correct continuity
- [ ] Cross-scene continuity toggle correctly seeds the first clip of a scene from the last clip of the previous scene
- [ ] Frame extractor modal allows manual frame selection and saves to asset library
- [ ] Crossfade and dissolve transitions render correctly via FFmpeg xfade
- [ ] Stream-copy fast path is used when all clips share resolution/codec/FPS
- [ ] Incremental re-render skips re-concatenation of unchanged scene segments
- [ ] Clip versioning stores all attempts and allows active version switching
- [ ] Deleting a middle clip presents re-link and leave-gap options
- [ ] Duration planner estimates clip count and generation cost before any API calls

---

## Related Issues

- Parent epic: #81 - [EPIC] AI Video Editor
- Depends on: #65 (AI Video Generation), #64 (First & Last Frame Control), #69 (Video Player)
- Feeds into: #72 (Final Video Rendering & Export)
- Related: #73 (Asset Library — stores extracted frames), #75 (Cost Tracking — estimates for clip chains)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature] Scene Continuation & Video Concatenation - Seamless Multi-Clip Assembly #71

Overview

Core Concepts

Clip vs Scene vs Project

Clip Chain Architecture

Required Features

Clip Chain UI

Auto-Extension Workflow

Cross-Scene Continuity

Frame Extraction and Continuity Tools

Transition Options at Clip Boundaries (within a scene)

Transition Options at Scene Boundaries

Long Video Planning Tools

Concatenation Engine

Clip Versioning

Edge Cases to Handle

Technical Notes

Data Model

FFmpeg Integration

File Naming Convention

Caching Strategy

Acceptance Criteria

Related Issues

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Term	Definition
Clip	A single AI-generated video segment. Raw model output. Typically 2–10 seconds long depending on model.
Scene	A logical story unit — e.g., "hero walks into the building". May be composed of 1 or many sequential clips to reach the desired duration.
Project	A complete video, organized as an ordered sequence of scenes, each composed of clips.

Transition	Description
Seamless cut	Default. No transition applied.
Crossfade	Overlap end of Clip N with start of Clip N+1, configurable duration 0.1s–2.0s
Motion blur blend	Blend frames with increasing/decreasing blur at boundary
Match cut	User manually marks a matching frame in each clip; editor aligns them

Transition	Description
Hard cut	Instantaneous scene change
Fade to black	Clip fades out to black before next scene begins
Fade from black	Next scene fades in from black
Fade to black + from black	Combined: clip fades to black, pause, next fades in
Dissolve	Configurable duration: overlapping dissolve between last clip of scene N and first clip of scene N+1
Custom transition clip	User uploads a short video file (e.g., a logo sting or abstract wipe) inserted between scenes

Scenario	Handling
Clip generation fails mid-chain	Offer retry from the failed clip; all previous clips in the chain are preserved and do not need to be regenerated
Model output resolution changes mid-chain	Warn the user; offer to auto-scale or crop the differing clip to match the chain's established resolution
Narration audio track is longer than assembled video	Options: extend the last clip via auto-extension, trim the audio, or pad with a freeze-frame
User deletes a middle clip from the chain	Offer two options: (1) Re-link — use left neighbor's last frame to regenerate a replacement clip, or (2) Leave gap — remove clip, and offer to re-stitch or re-generate continuity manually
Two adjacent clips have a visible jump cut despite continuity	Surface the frame extractor and offer to regenerate the right-side clip with a refined first-frame seed
First clip of chain has no seed image (pure text-to-video)	Supported; continuity extraction begins from its last frame for subsequent clips

[Feature] Scene Continuation & Video Concatenation - Seamless Multi-Clip Assembly #71

Description

Overview

Core Concepts

Clip vs Scene vs Project

Clip Chain Architecture

Required Features

Clip Chain UI

Auto-Extension Workflow

Cross-Scene Continuity

Frame Extraction and Continuity Tools

Transition Options at Clip Boundaries (within a scene)

Transition Options at Scene Boundaries

Long Video Planning Tools

Concatenation Engine

Clip Versioning

Edge Cases to Handle

Technical Notes

Data Model

FFmpeg Integration

File Naming Convention

Caching Strategy

Acceptance Criteria

Related Issues

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions