Not long ago, the world was taken aback by the development of artificial intelligence algorithms that can generate images based on what a sentence describes. Dall-e is one of them, created by OpenAI, a billion-dollar artificial intelligence lab. The team spent two years developing the technology, which is based on the same "neural network" math as smart assistants. The algorithm can "learn" what an object should look like by collecting data from thousands of photographs. Give it a million images, and Dall-e can begin merging them, making any merge possible.
The original Dall-e was released in January of last year, but this new version can edit objects — delete parts of the image or replace them with other elements — while taking into account features such as shadows. The Dall-e 2 was able to produce images in a variety of styles with higher quality and more complex backgrounds, whereas the first version of the technology only displayed images in a cartoon-like art style.
Google and Meta have done even crazier things. The two of them launched a video-making technology based on Artificial Intelligence that allows anyone to create videos using only the words he explains. The technology is known as Google Imagen and Meta Make A Video.
Meta Make A Video
On Make-A-Video's announcement page, Meta shows example videos generated from text, including "a young couple walking in heavy rain" and "a teddy bear painting a portrait." It also showcases Make A-Video's ability to take a static source image and animate it. For example, a still photo of a sea turtle, once processed through the AI model, can appear to be swimming.
The key technology behind Make A-Video is that it builds off existing work with text-to-image synthesis used with image generators like OpenAI's DALL-E. In July, Meta announced its own text-to-image AI model called Make-A-Scene.
Instead of training the Make-A-Video model on labeled video data (for example, captioned descriptions of the actions depicted), Meta instead took image synthesis data (still images trained with captions) and applied unlabeled video training data so the model learns a sense of where a text or image prompt might exist in time and space. Then it can predict what comes after the image and display the scene in motion for a short period.
Example video by Meta Make A Video: "teddy bear painting a portrait"
Meta has not made an announcement about how or when Make-A-Video might become available to the public or who would have access to it. Meta provides a sign-up form people can fill out if they are interested in trying it in the future.
Meta acknowledges that the ability to create photorealistic videos on demand presents certain social hazards. At the bottom of the announcement page, Meta says that all AI-generated video content from Make-A-Video contains a watermark to "help ensure viewers know the video was generated with AI and is not a captured video."
Google Imagen
Google's Imagen Video announcement comes less than a week after Meta unveiled its text-to-video AI tool, Make-A-Video. According to Google's research paper, Imagen Video includes several notable stylistic abilities, such as generating videos based on the work of famous painters (the paintings of Vincent van Gogh, for example), generating 3D rotating objects while preserving object structure, and rendering text in a variety of animation styles. Google is hopeful that general-purpose video synthesis models can "significantly decrease the difficulty of high-quality content generation."
Example video by Google Imagen
The key to Imagen Video's abilities is a "cascade" of seven diffusion models that transform the initial text prompt (such as "a bear washing the dishes") into a low-resolution video (16 frames, 24×48 pixels, at 3 fps ), then upscales it into progressively higher resolutions with higher frame rates with each step. The final output video is 5.3 seconds long.
Video examples presented on the Imagen Video website range from the mundane ("Melting ice cream dripping down the cone") to the more fantastic ("Flying through an intense battle between pirate ships on a stormy ocean.") They contain obvious artifacts, but show more fluidity and detail than earlier text-to-image models such as CogVideo that debuted five months ago.
Image by Google Imagen
Training data for Google Imagen Video comes from the publicly available LAION-400M image-text dataset and "14 million video-text pairs and 60 million image-text pairs," according to Google. As a result, it has been trained on "problematic data" filtered by Google but still can contain sexually explicit and violent content —as well as social stereotypes and cultural biases. The firm is also concerned its tool may be used "to generate fake, hateful, explicit or harmful content." As a result, it's unlikely we'll see a public release any time soon: "We have decided not to release the Imagen Video model or its source code until these concerns are mitigated," says Google.
What is the future of video editing?
Both Imagine Video and Make Video are applications, or Video AI is currently in development, so it cannot be accessed by the public and we must remain patient until it is finally released to the public. Because the system only produces short clips, I believe Imagine Video and Make Video will only help us create content faster. That means we'll still need a video editor, such as Adobe Premiere Pro, Davinci Resolve, or Kinemaster, to combine multiple video clips, images, audio, and other elements. Video AI will most likely only replace stock footage in the future, but even so, it is worth noting that interesting videos are those that contain stories and can naturally touch human emotions. Can AI do that?
Source:
Meta Make A Video
Google Imagen
caraseru.com
arstechnica.com
Last Minute Creator (Youtube)
Posting Komentar untuk "Google Imagen & Meta Make A Video: Video Maker Apps in the Future"