A single image can be turned into a lifelike video with the help of OmniHuman's ByteDance AI tool

ByteDance, the parent company of TikTok, has introduced OmniHuman-1, an advanced AI system capable of generating hyper-realistic videos of people speaking, gesturing, singing, and even playing instruments—all from just a single image. According to a research paper published on the open-access platform arXiv, OmniHuman-1 significantly surpasses existing methods in realism, producing high-quality human motion based on minimal input, particularly audio. The model accommodates images in any aspect ratio, whether portraits, half-body, or full-body shots, making it highly versatile across different scenarios.

The project’s official page features several sample videos showcasing the AI’s impressive range. These include animations of people moving from multiple perspectives, digitally recreated historical figures, and even lifelike animal motions. One striking example features Albert Einstein appearing to deliver a speech in front of a blackboard, complete with nuanced hand gestures and expressive facial movements. In the video, Einstein remarks, “What would art be like without emotions? It would be empty. What would our lives be like without emotion? They would be empty of values.” The footage gives the illusion of stepping back in time to witness the famed physicist delivering a lecture in ultra-modern quality.

Freddy Tran Nager, a professor at USC’s Annenberg School for Communication and Journalism, called the demonstrations “very impressive,” stating that while fully reviving actors like Humphrey Bogart for a film might be a challenge, OmniHuman-1 performs remarkably well, especially on smaller screens. The tool firmly places ByteDance in the race to create the most authentic AI-generated human videos, a space that is rapidly expanding with applications such as virtual influencers, AI-powered government assistants, and even deepfake political endorsements.

Nager envisions the technology being integrated into education, suggesting that students could hypothetically learn statistics from an AI-generated Marilyn Monroe. He also speculates that TikTok creators might one day use AI versions of themselves to maintain content production without personal burnout. More concerningly, he warns of a scenario where TikTok could generate its own AI-driven videos, diminishing the need for human creators altogether.

Samantha G. Wolfe, an adjunct professor at NYU’s Steinhardt School, also acknowledges the dual nature of this innovation, calling it “fascinating from a technological standpoint” but warning of its darker implications. She notes that AI-generated videos of business executives or political figures making false statements could have profound economic or geopolitical consequences. As these deepfake-like visuals become more lifelike, the risk of misinformation rises, making it increasingly difficult for viewers to distinguish real from fake.

To train OmniHuman-1, ByteDance processed over 18,700 hours of human video data, incorporating text, audio, and physical gestures. The company has yet to disclose specific details about its data sources. However, Nager speculates that anyone who has created content on TikTok may unknowingly have contributed to a database that fuels this virtual human revolution.