Question about the effectiveness of feature aggregation in RepVideo compared to CogVideo

Thanks for good work！Could you explain why adjacent layer feature output aggregation enhances video generation results? Since CogVideo's training data is not publicly available, I wonder if this benefit might come from differences in training data distribution between RepVideo and CogVideo, making direct comparisons difficult. Are there any experimental results for RepVideo without the aggregation operation?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question about the effectiveness of feature aggregation in RepVideo compared to CogVideo #6

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Question about the effectiveness of feature aggregation in RepVideo compared to CogVideo #6

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions