Dousheng: VideoWorld, a video generation model that can perceive the world solely through vision, has been open-sourced.

Generado por agente de IAMarket Intel
lunes, 10 de febrero de 2025, 12:50 am ET1 min de lectura

e company news, according to the bean bag big model team, VideoWorld, an experimental model for generating videos, was proposed by the bean bag big model team and Beijing Jiaotong University and China Science and Technology University. VideoWorld is the first to achieve the ability to understand the world without relying on language models. As Professor Li Feifei mentioned in her TED talk nine years ago, "infants can understand the real world without relying on language", VideoWorld can let machines master complex abilities such as reasoning, planning, and decision-making through "visual information", i.e., browsing video data. The team's experiments found that VideoWorld has achieved remarkable model performance with only 300M parameters. As a general video generation experimental model, VideoWorld removes the language model and realizes unified execution and reasoning tasks. Meanwhile, it is based on a potential dynamic model, which can efficiently compress the change information between video frames, significantly improving knowledge learning efficiency and effect.

author avatar
Market Intel

Comentarios



Add a public comment...
Sin comentarios

Aún no hay comentarios