Emergence of Large Action Models (LAMs) and Their Impact on AI Agents | by Cobus Greyling | Sep, 2024

14Sep

While LLMs are great for understanding and producing unstructured content, LAMs are designed to bridge the gap by turning language into structured, executable actions.

As I have mentioned in the past, Autonomous AI Agents powered by large language models (LLMs) have recently emerged as a key focus of research, driving the development of concepts like agentic applications, agentic retrieval-augmented generation (RAG), and agentic discovery.

However, according to Salesforce AI Research, the open-source community continues to face significant challenges in building specialised models tailored for these tasks.

A major hurdle is the scarcity of high-quality, agent-specific datasets, coupled with the absence of standardised protocols, which complicates the development process.

To bridge this gap, researchers at Salesforce have introduced xLAM, a series of Large Action Models specifically designed for AI agent tasks.

The xLAM series comprises five models, featuring architectures that range from dense to mixture-of-experts, with parameter sizes from 1 billion upwards.

These models aim to advance the capabilities of autonomous agents by providing purpose-built solutions tailored to the complex demands of agentic tasks.

Function calling has become a crucial element in the context of AI agents, particularly from a model capability standpoint, because it significantly extends the functionality of large language models (LLMs) beyond static text generation.

And hence one of the reasons for the advent of Large Action Models which has as one of its main traits the ability to excel at function calling.

AI agents often need to perform actions based on user input, such as retrieving information, scheduling tasks, or performing computations.

Function calling allows the model to generate parameters for these tasks, enabling the agent to trigger external processes like database queries or API calls.

This makes the agent not just reactive, but action-oriented, turning passive responses into dynamic interactions.

Interoperability with External Systems

For AI Agents, sub-tasks involve interacting with various tools. Tools are in turn linked to external systems (CRM systems, financial databases, weather APIs, etc).

Through function calling, LAMs can serve as a broker, providing the necessary data or actions for those systems without needing the model itself to have direct access. This allows for seamless integration with other software environments and tools.

By moving from a LLM to a LAM, the model utility is also expanded, and LAMs can thus be seen as purpose built to act as the centre piece for an agentic implementation.

Large Language Models (LLMs) are designed to handle unstructured input and output, excelling at tasks like generating human-like text, summarising content, and answering open-ended questions.

LLMs are highly flexible, allowing them to process diverse forms of natural language without needing predefined formats.

However, their outputs can be ambiguous or loosely structured, which can limit their effectiveness for specific task execution. And using a LLM for an agentic implementation is not wrong, and serves the purpose quite well.

But Large Action Models (LAMs) can be considered as purpose built, focusing on structuring outputs by generating precise parameters or instructions for specific actions, making them suitable for tasks that require clear and actionable results, such as function calling or API interactions.

While LLMs are great for understanding and producing unstructured content, LAMs are designed to bridge the gap by turning language into structured, executable actions.

Overall, in the context of AI agents, function calling enables more robust, capable, and practical applications by allowing LLMs to serve as a bridge between natural language understanding and actionable tasks within digital systems.

Source link