Abstract: This paper introduces a novel dataset construction pipeline that samples pairs of frames from videos and uses multimodal large language models (MLLMs) to generate editing instructions for ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results