DreamDance: Personalized Text-to-video Generation by Combining Text-to-Image Synthesis and Motion Transfer

Results of Pipeline 1

The motion transfer is quite successful, even if the the character in the reference video performs large motion, like dancing and rotating.

Note that limited by the computing resources, we only generated the imitation videos of low-resolution. The performance of motion imitation is good.

Results of Pipeline 2

Input images of prompt: miguel playing guitar on the street, pixar, cartoon, high quality, full body, single person

Output video

Input images of prompt: miguel running in a forest, pixar, cartoon, green eyes, red hat, high quality, standing, full body, single person

Output video

Input images of prompt: miguel in a forest, pixar, cartoon, green eyes, red hat, high quality, standing, full body, single person

Output video

Input images with prompt: miguel, pixar, cartoon, playing guitar, high quality, full body, single person

Output video

We noticed that if the changes are even larger, the interpolation still handled the video synthesis pretty well. Although the are some artifacts in the mid-frames, our limitations are mainly from the input image generation side. If future text-to-image synthesis models have the capability of generating more promising images with high consistency of all the factors above, frame interpolation will be a powerful method of text-to-video generation.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

DreamDance: Personalized Text-to-video Generation by Combining Text-to-Image Synthesis and Motion Transfer

Results of Pipeline 1

Results of Pipeline 2

Files

README.md

Latest commit

History

README.md

File metadata and controls

DreamDance: Personalized Text-to-video Generation by Combining Text-to-Image Synthesis and Motion Transfer

Results of Pipeline 1

Results of Pipeline 2