About 18 months ago I wrote my first article on AI Agents. It was based on AI Agent frameworks created by LangChain. Fast Forward to the last few weeks, and AI Agents are in the news like RAG was a few months ago.
And this prompts a question with me, what defines an AI Agent, and what is required to delivery enterprise ready Agentic Implementations?
The Basics
Considering the graphic above, an AI Agent can be defined as a piece of software, with one or more Language Models as its backbone.
For the model to have visual capabilities a Language Model / Foundation Model with vision capabilities is a requirement.
Task Decomposition
An agent at this stage primarily has a conversational input approach, hence unstructured data is used for user input.
And the response from the AI Agent is also most often in natural language, leveraging the Natural Language Generation (NLG) capabilities of Language Models.
I often use this example, you should be able to ask an AI Agent, the following question:
What is the square root of the year of birth of the man commonly regarded as the father of the iPhone.
This is a very hard question for any traditional Conversational UI to answer, but for an AI Agent it is easy.
Way Of Work
The AI Agent starts of by decomposing this compound and slightly ambiguous question into sub-steps, and then sets of solving for each of these sub-sets or steps.
Each of these steps can be seen or considered as an action.
Agents leverage LLMs to make a decision on which Action to take next.
After an Action is completed, the Agent enters the Observation step.
From the Observation step, the AI Agent shares a Thought; if a final answer is not reached, the AI Agent cycles back to another Action in order to move closer to a Final Answer.
The level of autonomy of an AI Agent is determined by the number of iterations the AI Agent can go through. This is important from a cost perspective, overhead and latency.
Secondly, if the AI Agent is unable to reach a conclusion or solve a task, one of the tools (we’ll look at tools in a bit) at the AI Agent’s disposal can be a human which can be pinged for guidance.
Number of tools at the disposal of the AI Agent is another determining factor in terms of the AI Agent’s autonomy.
Tools can be considered as integration points or touch points to external systems or API’s. The number and nature of tools at the disposal of the AI Agent really determines what the AI Agent is capable of.
Tools are described in natural language, and can range from a web search API, OS GUI navigation, maths library, weather API, CRM integration, etc. etc.
As the AI Agent decomposes a problem into sub-steps or actions, solving for each of these actions or steps will most probably involve the use of a tool.
Considering the image below, it is evident the level of observeabiltty which can be achieved with regards the internal workings of AI Agents. Notice how the AI Agent steps through the Thought, Action, Observation, and so on.