Multimodal Large Models to Be Realized, and the AGI Singularity Is Approaching, Says BAAI Founder

B站影视 2024-12-09 18:12 2

摘要:TMTPOST -- The Scaling Law has not slowed down significantly; AI will create new operating systems, new platforms, and new ecosyst

TMTPOST -- The Scaling Law has not slowed down significantly; AI will create new operating systems, new platforms, and new ecosystems, said Zhang Hongjiang, the founding chairman of the Beijing Academy of Artificial Intelligence (BAAI), international member of the National Academy of Engineering, on Saturday at the 2024 T-EDGE Conference and TMTPost Annual Economic Meeting.

“Multimodal large models are the ultimate models for AGI. Multimodal large models will empower robots, and the future of large models will usher in a world of 'autonomous intelligence, “ he pointed out.

Zhang also refuted recent arguments about the "slowing down of the Scaling Law" and challenges confronting the development of large models. He believes there is no need to worry about the Scaling Law slowing down. "Even if there is a slowdown in Pre-Training, the release of the O1 model shows a new dimension. Compared to the 'fast thinking' mode of pre-trained models, the O1 inference model allows more time for thought, and the performance of Scaling Law in inference has reached an 'inflection point,' showing exponential growth."

At last year's T-EDGE Conference, Zhang predicted that GenAI would rewrite the software industry and in the next 1.5 to 2 years, humanity may witness large-scale commercial applications. The development over the past year has supported his prediction. The U.S. B2B software industry is rapidly being iterated by AI, and these software service companies have officially entered a profitable era.

This is why there have ben rapid growth in Nvidia over the past two years, as well as in cloud providers and data center firms, all driven by the demand for large model training and inference. Additionally, in order to train these models, a new ecosystem for data processing, storage, and interaction must be put in place. AI infrastructure must evolve quickly to make large model applications a reality, he analyzed.

This ecosystem is much diversified than traditional software ecosystems and will have long-lasting impacts, innovations, and technological transformations. The AI large model ecosystem will surpass the ecosystems of PCs and smartphones, he predicted.

Furthermore, as large model training costs are extremely high, while the demand for edge-side and inference models is just beginning, AI PCs and AI phones will gradually develop. The integration of data centers, inference models, and edge and cloud computing will drive rapid development of the industry chain.

In the past two years, there were few "killer" AI applications, which has led some to worry that the "AI wave" might lose momentum. However, technological progress always opens up numerous opportunities, and even during more mature stages, there are still valuable opportunities. The demand for AI applications and the high cost of data centers will drive rapid development in edge AI.

Therefore, whether it's automated programming, intelligent interaction, customer service, or content generation, AI applications driven by large models will develop at a faster pace than the early stages of the internet and mobile internet, Zhang said.

He noted that while many B2B applications have experienced a boom in the U.S. over the past year, there is still a significant gap between China and the U.S. in this area. The B2B market in China is very small, and the revenue scale of Chinese B2B software companies is much smaller than that of their U.S. counterparts, meaning it will still take time for AI large models to reshape software services in China.

In fact, AI large models, as a foundational platform, will systematically drive all industries into a new paradigm, becoming the "super entrance" of the next era.

Zhang also emphasized that all software companies must embrace large models, including software tools and application service companies, which need to adopt large models to rewrite their software.

In the future, Zhang sees multimodal large models as the ultimate form of AGI. Humans communicate with each other using language models, but our interaction with the world also requires vision, voice, and other forms of modeling. Only unified multimodal large models can solve all understanding problems.

Over the past year, we've witnessed rapid development in multimodal generative models, such as text-to-image, text-to-video, and image-to-video. A prime example is OpenAI's Sora, which is capable of producing visually stunning, realistic videos, understanding, and generating multimodal content. If we want to integrate various sensory modes into a single intelligent system, multimodal models are essential.

Looking forward, multimodal models will empower robots to combine multimodal learning with autonomous reasoning capabilities, enabling robots to work alongside humans more effectively and autonomously.

Zhang stressed that in the future, everyone will transition from AI assistants to agents, and eventually, each person will have a “personal autopilot”. Large models will usher in a world of autonomous intelligence. With the development of large models, unified multimodal large models are expected to achieve a "breakthrough," bringing the AGI singularity closer.

来源:钛媒体

相关推荐