-
Notifications
You must be signed in to change notification settings - Fork 0
Description
Overview
Modern AI video generation models — Kling, Wan 2.1, SVD, LTX, Runway, and others — have a hard architectural constraint: they can only produce short clips, typically 2–10 seconds per generation. This is not an edge case or a temporary limitation to work around. It is the fundamental reality of how these models operate today, and designing around it is the primary challenge of building a usable AI video editor.
To produce a 1-minute video, a user needs approximately 10–30 individual AI-generated clips. A 5-minute video requires 50–150 clips. The editor must make this feel seamless, not like a technical workaround. The clip-chaining system described in this issue is therefore not a Phase 4 polish feature — it is the central architectural pillar of the entire editor. Every other feature (generation, timeline, export) must be built with this multi-clip reality in mind from day one.
Core Concepts
Clip vs Scene vs Project
| Term | Definition |
|---|---|
| Clip | A single AI-generated video segment. Raw model output. Typically 2–10 seconds long depending on model. |
| Scene | A logical story unit — e.g., "hero walks into the building". May be composed of 1 or many sequential clips to reach the desired duration. |
| Project | A complete video, organized as an ordered sequence of scenes, each composed of clips. |
A scene with a target duration of 15 seconds using a model capped at 5 seconds per generation requires at minimum 3 clips chained end-to-end. The editor must make this assembly transparent and effortless.
Clip Chain Architecture
Each scene maintains an internal clip chain: an ordered list of clips played sequentially.
Scene: "Hero enters the building" (target: 15s)
┌──────────────┬──────────────┬──────────────┐
│ Clip A │ Clip B │ Clip C │
│ 0–5s │ 5–10s │ 10–15s │
│ [thumbnail] │ [thumbnail] │ [thumbnail] │
└──────────────┴──────────────┴──────────────┘
↓ last frame ↓ last frame
→ first frame of B → first frame of C
Key invariant: the last frame of Clip N automatically becomes the first frame of Clip N+1, maintaining visual continuity across the chain.
Required Features
Clip Chain UI
- Within each scene card in the timeline, show a horizontal filmstrip of all clips in the chain
- Each clip thumbnail shows: preview image, duration, model name, generation status (pending / generating / done / error)
- "+" button at the end of the filmstrip: adds a new clip to the chain, automatically extracting and using the last frame of the previous clip as the image-to-video starting frame
- Drag to reorder clips within a scene
- Right-click / long-press context menu on any clip:
- Regenerate (re-run generation with same or modified parameters)
- Delete (with continuity options — see Edge Cases below)
- Replace (swap in a different local video file)
- Set as Scene Start (promote this clip's first frame as the scene's canonical start frame)
- Extract Frame (open frame extractor tool at this clip)
- View generation parameters
- Collapse / expand the clip chain strip — default collapsed when scene has only 1 clip, expanded when 2+
Auto-Extension Workflow
When a user sets a target duration for a scene that exceeds what a single model generation can produce, the system should offer automatic clip chaining:
- "Target duration" field per scene (e.g., 15 seconds)
- "Auto-extend" button: the system calculates the required number of clips (
ceil(target ÷ model_max_duration)), generates them sequentially, with each clip using the extracted last frame of the previous clip as the continuity seed - Progress indicator throughout generation: "Generating clip 3 of 5 for Scene 2..."
- Early stop option: user can halt auto-extension at any point if the result already looks satisfactory
- Auto-extension defaults to the same model and parameters as the first clip in the chain, with an option to override per-extension
- After auto-extension completes, user reviews the assembled scene and can regenerate individual clips that did not come out well
Cross-Scene Continuity
- "Continue from previous scene" toggle per scene
- When enabled: the first clip of the current scene is generated using the last frame of the last clip of the previous scene as its first-frame seed — creating a continuous visual flow across scene boundaries
- Visual continuity indicator in the timeline: a link icon or connector line between adjacent scenes that have continuity enabled
- Break continuity: explicitly set a new first frame image to start a scene fresh (a new location, a time-cut, etc.)
- Continuity state is saved in the project data model per scene
Frame Extraction and Continuity Tools
- Frame extractor modal: scrubber over the video clip, frame-accurate preview, "Use this frame as continuity seed" button
- Auto-extract last frame: automatically pull the final frame of any clip for use as the next clip's seed (default behavior for "+")
- Quality heuristic for auto-extraction: prefer frames without obvious motion blur or mid-blink artifacts (simple pixel variance or sharpness check)
- All extracted frames are saved to the project asset library with a reference to the source clip and timestamp
- Extracted frames are displayed in the Asset Library ([Feature] Asset Library - Centralized Media Management for Projects #73) under a "Continuity Frames" category
Transition Options at Clip Boundaries (within a scene)
| Transition | Description |
|---|---|
| Seamless cut | Default. No transition applied. |
| Crossfade | Overlap end of Clip N with start of Clip N+1, configurable duration 0.1s–2.0s |
| Motion blur blend | Blend frames with increasing/decreasing blur at boundary |
| Match cut | User manually marks a matching frame in each clip; editor aligns them |
Transition Options at Scene Boundaries
| Transition | Description |
|---|---|
| Hard cut | Instantaneous scene change |
| Fade to black | Clip fades out to black before next scene begins |
| Fade from black | Next scene fades in from black |
| Fade to black + from black | Combined: clip fades to black, pause, next fades in |
| Dissolve | Configurable duration: overlapping dissolve between last clip of scene N and first clip of scene N+1 |
| Custom transition clip | User uploads a short video file (e.g., a logo sting or abstract wipe) inserted between scenes |
Long Video Planning Tools
Before the user starts generating clips, the editor should help them plan the full project:
- Duration planner: user enters a target total video duration → the editor shows a breakdown: how many scenes, estimated clips per scene, total clip count, and approximate generation cost
- Model duration reference card: visible in the generation panel, showing the max output duration per model (e.g., Kling 1.6 = 5s or 10s, Wan 2.1 = 4s, SVD-XT = ~3s at 25 frames, LTX-Video = variable)
- Clip count estimate: shown per-scene and for the whole project before any generation starts
- Cost estimate: based on total clip count and per-clip cost for the selected model, pulled from the cost tracking system ([Feature] Cost Tracking & API Key Management - Monitor Spending Across All AI Services #75)
Concatenation Engine
- FFmpeg-based, fully local — all concatenation happens on the user's device, no re-upload to any server
- Fast path (stream copy): when all clips in the chain share the same resolution, codec, and frame rate, use FFmpeg's
concatdemuxer for a lossless, near-instant join - Full re-encode path: activated when clips differ in resolution or FPS, or when crossfade/dissolve transitions are applied; uses libx264 with configurable quality settings
- Intermediate concatenation: user can render a partial video (e.g., scenes 1–3 only) to review pacing and continuity before committing to generating the remaining scenes
- Incremental re-render: when a single clip is changed (regenerated or replaced), only re-concatenate the affected scene and downstream segments — unchanged segments are served from cache
- Audio stitching: audio tracks (voice narration, background music) are stitched alongside video, with configurable crossfade durations at scene boundaries
Clip Versioning
- Every AI generation attempt for a clip position in the chain is saved as a version — never silently overwritten
- The user selects which version is "active" for the chain; only the active version is included in concatenation
- Switching the active version of Clip N triggers a prompt: "The next clip uses this clip's last frame as its seed. Regenerate downstream clips, or keep them as-is?"
- "Pin first frame" option per clip position: even if the user swaps active versions, the pinned last-frame extraction for continuity purposes does not change — useful for keeping a stable downstream chain while exploring different generations of one clip
- Version list is accessible from the right-click context menu on any clip thumbnail
Edge Cases to Handle
| Scenario | Handling |
|---|---|
| Clip generation fails mid-chain | Offer retry from the failed clip; all previous clips in the chain are preserved and do not need to be regenerated |
| Model output resolution changes mid-chain | Warn the user; offer to auto-scale or crop the differing clip to match the chain's established resolution |
| Narration audio track is longer than assembled video | Options: extend the last clip via auto-extension, trim the audio, or pad with a freeze-frame |
| User deletes a middle clip from the chain | Offer two options: (1) Re-link — use left neighbor's last frame to regenerate a replacement clip, or (2) Leave gap — remove clip, and offer to re-stitch or re-generate continuity manually |
| Two adjacent clips have a visible jump cut despite continuity | Surface the frame extractor and offer to regenerate the right-side clip with a refined first-frame seed |
| First clip of chain has no seed image (pure text-to-video) | Supported; continuity extraction begins from its last frame for subsequent clips |
Technical Notes
Data Model
class Project {
List<Scene> scenes;
}
class Scene {
String id;
String title;
int targetDurationSeconds;
bool continueFromPreviousScene;
String? overrideFirstFramePath;
List<Clip> clipChain;
}
class Clip {
String id;
int chainIndex;
GenerationParams generationParams;
ClipStatus status; // pending | generating | done | error
String? localPath;
String? firstFramePath;
String? lastFramePath;
List<ClipVersion> versions;
String? activeVersionId;
bool pinnedLastFrame;
}
class ClipVersion {
String id;
DateTime generatedAt;
String localPath;
String lastFramePath;
Map<String, dynamic> generationParams;
}FFmpeg Integration
- Concatenation:
ffmpeg -f concat -safe 0 -i filelist.txt -c copy output.mp4 - Frame extraction (last frame):
ffmpeg -sseof -1 -i input.mp4 -vframes 1 last_frame.png - Frame extraction (specific timestamp):
ffmpeg -ss TIMESTAMP -i input.mp4 -vframes 1 frame.png - Crossfade transition:
xfadefilter with configurable duration and offset - All FFmpeg calls go through
ffmpeg_kit_flutter
File Naming Convention
{projectId}/
scenes/
{sceneId}/
clips/
{clipIndex}_{versionId}.mp4
{clipIndex}_{versionId}_last_frame.png
concatenated_scene.mp4
concatenated_partial_{sceneRange}.mp4
final_export.mp4
Caching Strategy
- Each concatenated scene output is cached by a hash of its clip chain (clip IDs + active version IDs)
- On change to any clip in the chain, invalidate only that scene's cache and any downstream partial or final concatenations
- Final export is re-generated only from changed scenes; unchanged scene segments are reused from cache
Acceptance Criteria
- Clip chain UI renders inside each scene card with correct filmstrip layout
- "+" button auto-extracts last frame and passes it as first-frame seed to the next generation
- Auto-extension workflow generates N clips sequentially with correct continuity
- Cross-scene continuity toggle correctly seeds the first clip of a scene from the last clip of the previous scene
- Frame extractor modal allows manual frame selection and saves to asset library
- Crossfade and dissolve transitions render correctly via FFmpeg xfade
- Stream-copy fast path is used when all clips share resolution/codec/FPS
- Incremental re-render skips re-concatenation of unchanged scene segments
- Clip versioning stores all attempts and allows active version switching
- Deleting a middle clip presents re-link and leave-gap options
- Duration planner estimates clip count and generation cost before any API calls
Related Issues
- Parent epic: [EPIC] AI Video Editor - Scene-by-Scene AI-Powered Video Generation Platform #81 - [EPIC] AI Video Editor
- Depends on: [Feature] AI Video Generation - Replicate API Integration with All Supported Models #65 (AI Video Generation), [Feature] First & Last Frame Control - Keyframe Images for Scene Boundaries #64 (First & Last Frame Control), [Feature] Video Player & Preview - In-App Video Playback and Scene Preview #69 (Video Player)
- Feeds into: [Feature] Final Video Rendering & Export - Compose and Export the Complete Video #72 (Final Video Rendering & Export)
- Related: [Feature] Asset Library - Centralized Media Management for Projects #73 (Asset Library — stores extracted frames), [Feature] Cost Tracking & API Key Management - Monitor Spending Across All AI Services #75 (Cost Tracking — estimates for clip chains)