Artificial intelligence: Facebook, Google release DALL-E for video

AI can now create videos. What happens when we can’t tell if they’re real or not?

Both Meta and Google have showed off DALL-E but for video — text-to-video AI models that can create photorealistic, coherent videos.

Cam Wilson

Oct 06, 2022

Videos created from text prompts by artificial intelligence (Image: Google)

Artificial intelligence (AI) researchers are showing off the technology’s latest leap forward by demonstrating models capable of creating realistic, coherent videos using a text prompt — raising questions about whether AI’s rapid advancement will soon threaten our ability to know what is real or not.

In the last week, both Meta (formerly known as Facebook) and Google have each showcased “text-to-video” AI systems that can create new, unique videos with high-quality graphics based on anything from a few words to a long, intricate sentence.

Late last month, Meta first showed off its Make-a-Video system, which, on top of its text-to-video ability, can also animate still images. Just a week later, Google released its two projects, Imagen Video and Phenaki.

last week, meta unveiled its project to generate an entire video from a short text prompt. this week, google is doing the same thing. h/t @_akhaliq https://t.co/lCbL64iFIw https://t.co/27LFzAgL6L pic.twitter.com/USJf6tDFdF
— Rachel Metz (@rachelmetz) October 5, 2022

Meta’s model can produce videos with photorealistic graphics of subjects carrying out actions and interacting with objects — like a realistic video of a young couple walking in the rain or a surreal teddy bear painting a portrait.

We’re pleased to introduce Make-A-Video, our latest in #GenerativeAI research! With just a few words, this state-of-the-art AI system generates high-quality videos from text prompts.

Have an idea you want to see? Reply w/ your prompt using #MetaAI and we’ll share more results. pic.twitter.com/q8zjiwLBjb
— Meta AI (@MetaAI) September 29, 2022

Google’s competitor Imagen Video is similar. Phenaki, on the other hand, doesn’t have quite the same visual quality but is able to turn long prompts into videos of multiple minutes in length with a dream-like feeling. One example:

Lots of traffic in futuristic city. An alien spaceship arrives to the futuristic city. The camera gets inside the alien spaceship. The camera moves forward until showing an astronaut in the blue room. The astronaut is typing in the keyboard. The camera moves away from the astronaut. The astronaut leaves the keyboard and walks to the left. The astronaut leaves the keyboard and walks away. The camera moves beyond the astronaut and looks at the screen. The screen behind the astronaut displays fish swimming in the sea. Crash zoom into the blue fish. We follow the blue fish as it swims in the dark ocean. The camera points up to the sky through the water. The ocean and the coastline of a futuristic city. Crash zoom towards a futuristic skyscraper. The camera zooms into one of the many windows. We are in an office room with empty desks. A lion runs on top of the office desks. The camera zooms into the lion’s face, inside the office. Zoom out to the lion wearing a dark suit in an office room. The lion wearing looks at the camera and smiles. The camera zooms out slowly to the skyscraper exterior. Timelapse of sunset in the modern city.
Text prompt

Both have been built using diffusion models, a type of model trained by feeding it training data which the model breaks apart and then tries to build back together anew (these diffusion models are being used in new generation text-to-image AI models, too). Researchers gave the models datasets of millions of videos paired with captions, which it’s using to recognise and reproduce patterns.

Researchers from both companies have not yet released these models to the public, but it’s only a matter of time before these become accessible. Much like text-to-image models before it, text-to-videos are a powerful new tool that opens up the world for users — who beforehand would have needed to go through the time-consuming and technically demanding process of manually animating something to get a similar effect — but also presents a threat to humans’ understanding of reality.

No person would mistake the current text-to-video artificial intelligence outputs as real — yet. Advances in this technology may soon challenge that. What happens when artificially generated video becomes indistinguishable from real video? Such a premise may have once seemed like a farfetched dystopian novel’s plot, but now seems not that far away.

Will this technology bring about an age of video manipulation? Let us know by writing to letters@crikey.com.au. Please include your full name to be considered for publication. We reserve the right to edit for length and clarity.

About the Author

Cam Wilson

Associate Editor @cameronwilson

Cam Wilson is Crikey’s associate editor. He previously worked as a reporter at the ABC, BuzzFeed, Business Insider and Gizmodo. He primarily covers internet culture and tech in Australia.

Comments

12 Comments

Most voted

Newest Oldest

Inline feedbacks

View all comments

zut alors

2 years ago

‘…technology’s latest leap forward…’

It may be forward but it’s hardly desirable. It will be easier to create propaganda & artificial news once this craft is refined. Why do we need it?

Jack Robertson

Reply to zut alors

Some of us older, paler fellers can’t find find real girlfriends who meet our impossibly high standards?

Kel S

Too bad AI can’t do something useful like expose tax evasion or root out fake news.

Information as content is making the whole information age rather hollow and a far cry from the utopian (or even distopian) visions of old science fiction.

Michael Smith

Nearly time to stop watching videos.

Freyja

When, not if, it is used for porn will there be any pushback(sic!)?

People have been prosecuted & convicted of having sex dolls which are too lifelike and Japanese read astonishing violent misogynist manga openly on suburban trains.

What about those who prefer to only interact with Alexa/Siri – will they need to be vaxxed if they never leave their hermetically sealed existence?

Nothing new, as per John Donne or the 60s Simon & Garfunkel song or EM Forster’s “The Machine Stops”.

John Hall

Qanon and it’s ilk will lap this up. The Fake news ingesters will fall further down the rabbit hole.

As someone has already stated – the porn industry, closely followed by gamers will be early adopters. What a brave new world awaits us.
PS: for those with a bit of technical expertise, deepfake photos & videos are all too easy already.

Hmmmm

Reply to John Hall

QAnon and its ilk – imagine Cam’s reports from CPAC when this has taken hold.

Copy link	Email
Facebook	Twitter	LinkedIn

AI can now create videos. What happens when we can’t tell if they’re real or not?

About the Author

Topics

Send to their inbox

Want some assistance?

AI can now create videos. What happens when we can’t tell if they’re real or not?

About the Author

Topics

Send to their inbox

Share this with friends