Extending Text2Video-Zero for Multi-ControlNet

Loading...
Thumbnail Image

Date

2023

Authors

Backen, Ben

Journal Title

Journal ISSN

Volume Title

Publisher

University of Oregon

Abstract

This research paper presents an extension to the Text2Video-Zero (T2V0) generative model, augmenting the synthesis of video from textual and video inputs. The project focuses on enhancing the functionality and accessibility of T2V0 by integrating Stable Diffusion’s (SD) support for multiple ControlNets, implementing frame-wise masking for selective ControlNet application, and introducing memory optimizations to enable running the model on consumer-grade hardware. The paper also provides a high-level overview of SD, explores experimental features, and offers practical tips for generating videos using these tools. Additionally, we include a demonstration video showcasing T2V0 with Multi-ControlNet. The video highlights the early potential of text-to-video models for storytelling. Ultimately, the study strives to expand the capabilities and accessibility of T2V0, increasing users' control over their generated outputs while upholding the democratic principles of open-source AI.

Description

15 pages

Keywords

text-to-video, Stable Diffusion, ControlNet, machine learning, generative models

Citation