This is my research direction, and I specialize in realistic long-form narrative. Although I don’t currently have a complete series of work, I’ve spent 1,882 hours on AI filmmaking, with most of that time dedicated to researching the possibilities of AI in long-form narrative. There are already dozens of minutes of AI series content, so long-form narrative is not impossible—it’s just a matter of how well it’s executed.
The first foundation of AI filmmaking long-form narrative is obviously the three major consistencies: character consistency, scene consistency, and style consistency. Character consistency has been sufficiently addressed after a year of development. From Flux’s LoRA to various forms of multi-reference approaches, single-character consistency can be effectively solved. As for consistency when multiple characters appear simultaneously in a frame, Vidu’s multi-reference video generation can even serve as an image generation tool.
Vidu’s multi-reference supports 7 subjects, which is enough to include multiple characters you want. However, there are sometimes issues with character proportion imbalance, but this can be solved by generating multiple attempts. The output video is somewhat blurry, but you can extract frames and then process them again. There are many high-definition workflows available for you to choose from.
Vidu’s multi-reference can stably place characters into scenes, thus also solving scene consistency to some extent. I believe that multi-reference is a crucial tool for both image and video generation in solving character and scene consistency. I hope major manufacturers continue to develop in this direction, which will eventually resolve the first two consistency issues completely.
I’ve always believed that I want my characters to wear fantastical costumes and perform science fiction stories in strange scenes. Therefore, I don’t accept costumes and scenes without a sense of design. Although multi-reference can solve consistency issues well, the design of that base image is fundamental. With such tools, we can spend more time on character design rather than wrestling with the tools.
So, how do we solve style consistency? We know that across different image generation tools, the output images have inconsistent styles. Video generation tools are even more so. This is when we need to utilize the unique capabilities of AI video creation: two-step color grading. That is, color grade once for images, and once for video.
There are many color grading tools, from Photoshop to Lightroom to various user-friendly software, all capable of style color grading for your images. This gives creators who aren’t familiar with DaVinci Resolve more opportunities, because grading images is much simpler than grading video. Color grading, like upscaling, is a necessary step in AI filmmaking creation. If you skip either one, your work will look unprofessional.
There are many workflows available for upscaling, and I recommend choosing different tools for different upscaling needs. Here, I recommend Magnific, which is almost standard equipment for all English-speaking AI creators. Although Magnific requires payment, the details it produces are very textured. However, its flaw is that Chinese characters are often upscaled to resemble those of foreigners, so you’ll need to use Photoshop to correct this.
The second foundation of AI filmmaking long-form narrative is actually storyboard design. Before May this year, we often encountered videos that, although long, became unwatchable after three minutes. This was related to technical limitations at the time, as well as storyboard design.
Since I’m not a professional storyboard designer, I sometimes use AI to assist with storyboard design. Currently, I use Minimax’s Agent for storyboard design. It can be very detailed. However, AI-designed storyboards still have a significant gap compared to manually designed ones, so I only use them as a reference.
I’m a creator with a writing background, and I have a deep understanding of storytelling through text; however, I often struggle with expressing myself in audiovisual language. So I recommend that all AI creators deeply study the fundamentals of audiovisual language to support the possibility of long-form narrative.
The third foundation of AI filmmaking long-form narrative, I believe, is solid dramatic dialogue scenes. Although AI’s emotional expression still has significant problems, and character facial expressions remain relatively stiff, it’s possible to make dialogue scenes less boring through editing techniques. I’ve seen quite a few AI films that ignore dialogue scenes, relying solely on visual displays and action sequences—this approach cannot sustain a length.
Some people argue that creating long-form narratives with AI is currently laborious, so why research this aspect when technological development will naturally solve this problem? Perhaps so, but in everything I do, I strive to pursue excellence and push the boundaries of what’s possible—AI filmmaking is no exception.
Follow me, and let’s explore AI filmmaking together.