摘要:AsianFin -- In the ever-evolving world of artificial intelligence, the race to build AI agents is heating up. Following OpenAI’s l
(Image Source: Photo by Lin Zhijia, TMTPost AGI Editor)
AsianFin -- In the ever-evolving world of artificial intelligence, the race to build AI agents is heating up. Following OpenAI’s launch of its first AI-powered agent application, “Operator,” ByteDance on Sunday launched its next-generation automation model, UI-TARS, on GitHub.
With seven billion parameters, this AI agent integrates crucial components such as visual understanding, text processing, task planning, and memory management into one unified model.
UI-TARS can perform complex, cross-platform tasks, perceiving user interfaces, reasoning through action steps, and interacting with web interfaces in ways previously thought to be exclusive to human operators.
While still in its preview phase and undergoing constant updates, UI-TARS has already made its mark by demonstrating the ability to “automatically” publish tweets, as seen in the official promotional video. Although the system currently requires human assistance for certain steps, such as inputting text and clicking through options, its potential is unmistakable. The model is already available for macOS and Windows users.
The Operator Revolution
Only two days earlier, OpenAI introduced its first AI agent, “Operator.” Aimed at U.S. ChatGPT Pro users with a monthly subscription of $200, Operator is a digital assistant capable of simulating human operations on the web. It can perform tasks such as shopping, ordering food, and organizing papers by seamlessly integrating visual recognition and advanced reasoning models. By using a combination of GPT-4’s visual capabilities and reinforcement learning, the AI agent plans complex steps and takes actions with impressive accuracy.
The proliferation of AI agents in recent months has been nothing short of remarkable. Other notable players, including Zhipu AI and Genius by Verses, have joined the AI agent race. Zhipu’s AutoGLM and GLM-PC have garnered attention, while Genius—an AI agent that only needed two hours of training and a fraction of the data—has already surpassed human-level players in the classic Pong game.
Even Nvidia's CEO, Jensen Huang, weighed in at CES 2024, predicting that AI agents will be the next frontier of the robotics industry, with a potential value in the trillions of dollars. OpenAI’s CEO, Sam Altman, has also said that AI agents could become a significant force in 2025, heralding the beginning of a new era in AI applications. This suggests that 2025 could be a watershed year for AI agents, positioning them as a key area of technological growth.
A New Frontier in AI Development
AI agents are essentially intelligent entities that can autonomously perceive their environment, make decisions, and take action. Think of them as highly capable assistants that can understand tasks and help humans perform them more efficiently. For example, UI-TARS can act like a "smart assistant" that can navigate the web, recognize visual cues, plan the necessary steps, and execute complex actions—such as publishing content or making purchases—without human intervention.
The concept of AI agents began to take off after the success of ChatGPT in late 2022. Researchers at Stanford University and Google published a paper on “Generative Agents,” which described how virtual people in a simulated environment exhibited behaviors similar to humans when integrated with ChatGPT. This research sparked widespread interest in the idea of AI agents.
By 2024, AI agents hadbeen recognized as essential components in the development of Artificial General Intelligence (AGI). Stanford professor Andrew Ng has pointed out that AI agents will play a critical role in the progression toward AGI, describing them as systems that not only think but can also take action. OpenAI’s roadmap for AGI, which spans five stages, places AI agents at the third level, between reasoning AI and fully autonomous, innovative systems.
A recent report highlighted the exponential growth of the AI agent market in China. In 2023, the Chinese AI agent market was valued at 55.4 billion yuan, and it is projected to grow to 852 billion yuan by 2028, with an impressive compound annual growth rate of 72.7%. These projections underscore the immense potential of AI agents as an integral part of future industries.
AI Agents Across Industries
AI agents are rapidly gaining traction in various industries, from customer service to programming, content creation, and financial management. In content creation, for instance, AI agents can generate videos or even write scripts autonomously. This level of efficiency has led to broader adoption of AI assistants by creators, further cementing AI’s role as an indispensable tool in modern workflows.
Operator, for example, serves as a highly practical tool. It can perform everyday tasks such as making restaurant reservations, buying groceries, and even booking tickets for sports events. It employs a straightforward workflow in which it captures and analyzes screen content, adds the relevant information to its model context, and determines the next steps through reasoning. It then executes these steps using a virtual mouse and keyboard. The human user can intervene if necessary, particularly in situations involving sensitive information like payment details or addresses.
According to OpenAI, the Operator is designed to perform tasks independently for users, providing them with a smooth, automated experience. In a demonstration, the AI agent successfully completed various tasks with minimal input from the user. However, it pauses when handling sensitive tasks, such as payment, so users can take control when needed.
AI agents are also poised to make a major impact on enterprise operations. According to F5’s Mohan Veloo, AI applications will increasingly rely on APIs, and the growth of AI usage will lead to an explosion of these interfaces. By 2025, it’s expected that 77% of global enterprises will deploy generative AI tools to improve productivity, with over 84% of all applications incorporating AI inference capabilities by 2028.
AI agents can streamline processes, reduce human labor costs, and provide businesses with new opportunities for automation. However, as AI becomes more pervasive, some experts warn that AI’s democratization of knowledge may level the playing field, removing some of the competitive advantages previously held by leading firms.
For enterprises, the challenge lies in finding the most effective ways to integrate AI agents into their operations. As Zhang Xin from Volcano Engine noted, while AI models bring new productivity tools, they also introduce challenges related to managing the massive amounts of data generated by AI operations. Companies must focus on creating AI solutions that drive innovation while leveraging existing technologies.
The Future of AI Agents: From Adoption to Integration
In the coming years, the widespread adoption of AI agents will likely become a defining characteristic of business transformation. According to F5’s Veloo, the increasing fusion of AI technologies with IoT, edge computing, and cloud-native architecture is accelerating AI’s integration into business processes. This trend will drive enterprises to implement AI solutions that can seamlessly collaborate with human workers, boosting both productivity and efficiency.
In the second phase of AI’s revolution, AI agents like those from ByteDance, OpenAI, and other major players in the industry are pushing the boundaries of what’s possible. Whether it’s automating daily tasks or offering new solutions for business optimization, the future of AI agents looks incredibly promising.In 2025, AI agents are expected to become a significant part of the business landscape, offering a glimpse into the future of work. As the technology continues to evolve, it’s clear that AI agents will not just be tools—they will be invaluable partners in the journey toward a more intelligent and automated world.
来源:钛媒体