A friend of mine assembled a team and asked me about the difficulties I’ve encountered when producing AI films, as they want to research these issues from an academic perspective. This is challenging because the current technical problems are all major ones that small teams may not necessarily have the opportunity to solve. In fact, all the difficulties are concentrated around consistency issues.
First is style consistency. Since there are many tools currently used for AI image and video generation, typically, a dozen or so tools are used in combination. Each of these tools has its own unique style, so once you switch between tools, you face the problem of inconsistent visual styles. For example, the artistic styles of Midjourney and Flux are distinctly different.
Here lies a technical challenge: how to enhance Flux’s lighting effects to match Midjourney’s lighting quality? Although there are currently relighting methods available, using Kontext, truly improving Flux’s lighting effects will likely have to wait for the release of Flux 2. All tools could evolve in the direction of Midjourney’s aesthetic. However, both Flux and Midjourney have issues with characters looking overly glossy, which can be solved through color grading.
Style consistency is more important than character or scene consistency. In other words, the overall tonal foundation of the entire film is its artistic basis. Currently, the methods I can think of basically can’t avoid color grading and Photoshop because different image-generation tools produce inconsistent content.
My friends could develop a cinematic AI color grading system to simplify the entire color grading process. But this would involve complex software development. This should include color cloning, cinematic enhancement, and other color-grading tools.
The second is character consistency. For character consistency, current solutions include LoRA, multi-reference approaches, and even Kontext. Multi-character consistency also needs improvement. My friends could develop a multi-character consistency model.
Third is scene consistency. There’s currently no particularly excellent solution; achieving atmospheric consistency is barely possible, but spatial consistency has enormous problems. Some people are researching world models that could completely solve scene consistency issues.
Style consistency, character consistency, and scene consistency are the three major consistency problems. Solving any one of these is a direction for AI research. All major AI companies are researching these three consistency issues, which may not be suitable for small teams?
Assuming we have consistent scenes, then being able to precisely control camera positions would be another currently unsolvable challenge—we can’t precisely control camera position and movement like in 3D animation. However, some AI filmmaking tools already respond to camera movements, though it still requires some luck.
Many shot designs require the use of first and last frames, but if these frames are used consecutively, a “braking” pause problem occurs in the middle. Midjourney’s video model has improved this somewhat but can’t completely eliminate the “braking” sensation during splicing.
Next, the AI-assisted storyboard design is still quite rigid. Storyboard design is the core of AI films, and we’ll likely need to hire a professional storyboard artist, as professional storyboards are immediately recognizable. My friends might try to develop more professional storyboard tools.
There are several such storyboard tools on the market currently, but their results aren’t ideal. After generating storyboards, they can only serve as references but can’t be used as actual static frames for the film. SoWeeally needs a tool focused on storyboard design. However, this might be solved by more intelligent large language models.
Finally, there’s the currently most manual part—editing, which can be said to be the most humanized process in AI filmmaking. A good editor can not only make a film look less like a PowerPoint presentation but also greatly enhance its cinematic quality. I’m not sure how difficult it would be to use AI to automatically edit based on scripts, but I estimate it would be very challenging, as it’s too heavily dependent on human input.
The above is my response to my friend’s question about what technical challenges currently exist in AI filmmaking. Each problem is not minor and may not necessarily be solvable by small teams. But technology is advancing rapidly—for example, just a few months ago, there were no tools for text-to-image editing.
During the AI film creation process, we continually encounter the limitations of AI. So, I don’t just promote AI filmmaking enthusiastically on my public account but also discuss the problems we currently encounter.
Follow me, and let’s explore AI filmmaking together.