Dousheng: VideoWorld, a video generation model that can perceive the world solely through vision, has been open-sourced.
e company news, according to the bean bag big model team, VideoWorld, an experimental model for generating videos, was proposed by the bean bag big model team and Beijing Jiaotong University and China Science and Technology University. VideoWorld is the first to achieve the ability to understand the world without relying on language models. As Professor Li Feifei mentioned in her TED talk nine years ago, "infants can understand the real world without relying on language", VideoWorld can let machines master complex abilities such as reasoning, planning, and decision-making through "visual information", i.e., browsing video data. The team's experiments found that VideoWorld has achieved remarkable model performance with only 300M parameters. As a general video generation experimental model, VideoWorld removes the language model and realizes unified execution and reasoning tasks. Meanwhile, it is based on a potential dynamic model, which can efficiently compress the change information between video frames, significantly improving knowledge learning efficiency and effect.