Agents featured prominently in Google’s annual I/O conference in May, when the company unveiled its new AI agent called Astra, which allows users to interact with it using audio and video. OpenAI’s new GPT-4o model has also been called an AI agent.
And it’s not just hype, although there is definitely some of that too. Tech companies are plowing vast sums into creating AI agents, and their research efforts could usher in the kind of useful AI we have been dreaming about for decades. Many experts, including Sam Altman, say they are the next big thing.
But what are they? And how can we use them?
How are they defined?
It is still early days for research into AI agents, and the field does not have a definitive definition for them. But simply, they are AI models and algorithms that can autonomously make decisions in a dynamic world, says Jim Fan, a senior research scientist at Nvidia who leads the company’s AI agents initiative.
The grand vision for AI agents is a system that can execute a vast range of tasks, much like a human assistant. In the future, it could help you book your vacation, but it will also remember if you prefer swanky hotels, so it will only suggest hotels that have four stars or more and then go ahead and book the one you pick from the range of options it offers you. It will then also suggest flights that work best with your calendar, and plan the itinerary for your trip according to your preferences. It could make a list of things to pack based on that plan and the weather forecast. It might even send your itinerary to any friends it knows live in your destination and invite them along. In the workplace, it could analyze your to-do list and execute tasks from it, such as sending calendar invites, memos, or emails.
One vision for agents is that they are multimodal, meaning they can process language, audio, and video. For example, in Google’s Astra demo, users could point a smartphone camera at things and ask the agent questions. The agent could respond to text, audio, and video inputs.
These agents could also make processes smoother for businesses and public organizations, says David Barber, the director of the University College London Centre for Artificial Intelligence. For example, an AI agent might be able to function as a more sophisticated customer service bot. The current generation of language-model-based assistants can only generate the next likely word in a sentence. But an AI agent would have the ability to act on natural-language commands autonomously and process customer service tasks without supervision. For example, the agent would be able to analyze customer complaint emails and then know to check the customer’s reference number, access databases such as customer relationship management and delivery systems to see whether the complaint is legitimate, and process it according to the company’s policies, Barber says.
Broadly speaking, there are two different categories of agents, says Fan: software agents and embodied agents.