Photos can be converted to video with one click. Tencent joins hands with 2 universities to launch Follow-Your-Click.

World News

Photos can be converted to video with one click. Tencent joins hands with 2 universities to launch Follow-Your-Click.

Today Nation News

16 March 2024

Photos can be converted to video with one click. Tencent joins hands with 2 universities to launch Follow-Your-Click.

securityThe Times’s official GEM information website Brokerage China reported that on March 15, Tencent, Tsinghua University and the Hong Kong University of Science and Technology jointly launched a new large-scale tuxing video model “follow-your-click”. Input the model based on the image, just click the corresponding area. By adding a few quick words, you can move the original static areas of the image and convert them to video with one click.

According to reports, in the current large models of Tusheng Video, the general generation method requires the user not only to describe the movement area in quick words, but also to provide a detailed description of the movement instructions, which is a complex Is the process. Furthermore, judging from the effects produced, existing image-generating video technology lacks control over moving specified parts of the image. The generated video often requires transferring the entire scene rather than a certain area of the image. What it lacks is accuracy and flexibility. What is missing?

To solve these problems, the joint project team of Tencent Hunyuan Model Team, Tsinghua University and Hong Kong University of Science and Technology proposed a more practical and controllable image-to-video generation model, Follow-Your-Click, which and Brings more convenient interactions and makes the picture “with one click, everything goes” becomes a reality.

However, the brokerage China did not provide further details. After actually visiting its website, it looks like Follow-Your-Click can only convert pictures into 2 or 3-second images. For example, adding a picture of a puppy with the word “raise head” will generate a picture of the puppy over and over again. A video of a repetitive motion of raising and lowering the head. Limitations of this technology are also noted on the webpage. This technological path is still limited in generating large and complex human tasks. This may be related to the fact that the task complexity and the corresponding training samples are still very rare.

According to reports, Tencent’s Hunyuan large model team continues to research and explore multi-modal technology and has industry-leading video production capabilities. Previously, as a technical partner, Tencent Hunyuan Model supported the People’s Daily in creating an original video “The Country is So Beautiful”, producing excellent video clips of China’s beautiful rivers and mountains, which showed strong Demonstrated content understanding, logical reasoning, and picture making abilities. ,

15 February, openayeThrowing an explosive message into the global arena of AI-generated videos. That day, the company released a “text-generated video” model called Sora, and opened access to Sora to some researchers and creators. In addition to creating videos based on text descriptions, Sora can also create videos based on existing images. Currently, the length of the video generated is approximately 1 minute.

After Sora, February 26,GoogleDeep Mind team released 11 billion parameter AI basic world model Genie. Using only a picture, an interactive world can be generated. The generated world is “action controllable” and users can act within it frame by frame. Google said that Genie has opened an era of “image/text generation interactive world” and will also become a catalyst for the realization of universal AI agents.

Ping An Securities said that OpenAI and Google have successively released multi-modal large models Sora and Genie, and the AGI wave may accelerate.

According to reports, on February 28, Alibaba Intelligent Computing Research Institute released a new generative AI model EMO (Emote Portrait Alive). EMO only requires a portrait photo and audio to allow the person in the photo to “open their mouth” to sing and speak according to the audio content, basically the same mouth shape and very natural face. With expressions and head posture.

The report notes that EMO brings new gameplay to video AI in the multi-modal area. Different from Vincent’s video model Sora, EMO image + audio generation focuses on the direction of the video: input a photo + human voice audio at any speaking speed, EMO automatically generates rich facial expressions and head posture Can generate a voice portrait video with.

life

LEAVE A REPLY Cancel reply