21Sep

Build a WhatsApp LLM Bot: a Guide for Lazy Solo Programmers | by Ian Xiao | Sep, 2024


TLDR: I built and deployed a WhatsApp LLM bot in 12 hours to learn English better and faster. I am exploring how to make LLM useful in our everyday lives. I share my design choices, what I built, tools I used, lessons learned, and a product roadmap.

I am building the app in phases. Stay tuned for updates. See a questionnaire at the end where I need some design guidance from the community and hope you can participate in beta šŸ™šŸ«¶

This is not a code walkthrough. I will put all the resources I used at the end so you can take a look if you are interested.

The Problem

I love reading and writing.

But, as a non-native English speaker, I often come across new words I don’t know or think I know but need help understanding. New words fly by as I go about my busy day or enjoy the flow of reading. I want them to stick; I want to be more eloquent.

How about looking them up and writing them down? Digital solutions (e.g., dictionary or vocabulary apps) and pen & paper don’t work.

  • āŒ they take me away from the moment (e.g., reading a good book)



Source link

20Sep

Outline-Driven RAG & Web Research Prototype | by Cobus Greyling | Sep, 2024


The theory of questioning emphasises that while answering existing questions deepens understanding of a topic, it often leads to new questions.

To initiate this dynamic process, STORM simulates a conversation between a Wikipedia writer and a topic expert.

In each round of conversation, the LLM-powered writer generates a question based on the topic, its assigned perspective, and the conversation history.

This history helps the LLM update its understanding and formulate follow-up questions, with a maximum limit of rounds set for the conversation.

To ensure the conversation remains factual, trusted online sources are used to ground each answer.

If a question is complex, the LLM first breaks it down into simpler search queries. The search results are then evaluated with a rule-based filter to exclude unreliable sources.

Finally, the LLM synthesises information from trustworthy sources to generate the answer, which is also added to the references for the full article.

I executed the LangChain implementation in a notebook, utilizing the GPT-3.5-Turbo model.

Along with that, tools such as DuckDuckGo for search functionality and Tavily-Python for other resources are required.

The only modification required was the inclusion of the commandpip install -U duckduckgo-search in the notebook to ensure proper functionality.

Below is an example of a prompt within the LangChain implementation of STORMā€¦

direct_gen_outline_prompt = ChatPromptTemplate.from_messages(
[
(
"system",
"You are a Wikipedia writer. Write an outline for a Wikipedia page about a user-provided topic. Be comprehensive and specific.",
),
("user", "{topic}"),
]
)

And here is an example topic promptā€¦

example_topic = "Impact of million-plus token context window language models on RAG"

Below the gen_related_topics_promp promptā€¦

gen_related_topics_prompt = ChatPromptTemplate.from_template(
"""I'm writing a Wikipedia page for a topic mentioned below. Please identify and recommend some Wikipedia pages on closely related subjects. I'm looking for examples that provide insights into interesting aspects commonly associated with this topic, or examples that help me understand the typical content and structure included in Wikipedia pages for similar topics.

Please list the as many subjects and urls as you can.

Topic of interest: {topic}
"""
)

class RelatedSubjects(BaseModel):
topics: List[str] = Field(
description="Comprehensive list of related subjects as background research.",
)

expand_chain = gen_related_topics_prompt | fast_llm.with_structured_output(
RelatedSubjects
)

And the gen_perspectives_prompt promptā€¦

gen_perspectives_prompt = ChatPromptTemplate.from_messages(
[
(
"system",
"""You need to select a diverse (and distinct) group of Wikipedia editors who will work together to create a comprehensive article on the topic. Each of them represents a different perspective, role, or affiliation related to this topic.\
You can use other Wikipedia pages of related topics for inspiration. For each editor, add a description of what they will focus on.

Wiki page outlines of related topics for inspiration:
{examples}""",
),
("user", "Topic of interest: {topic}"),
]
)

And the gen_qn_prompt promptā€¦

gen_qn_prompt = ChatPromptTemplate.from_messages(
[
(
"system",
"""You are an experienced Wikipedia writer and want to edit a specific page. \
Besides your identity as a Wikipedia writer, you have a specific focus when researching the topic. \
Now, you are chatting with an expert to get information. Ask good questions to get more useful information.

When you have no more questions to ask, say "Thank you so much for your help!" to end the conversation.\
Please only ask one question at a time and don't ask what you have asked before.\
Your questions should be related to the topic you want to write.
Be comprehensive and curious, gaining as much unique insight from the expert as possible.\

Stay true to your specific perspective:

{persona}""",
),
MessagesPlaceholder(variable_name="messages", optional=True),
]
)

This LangChain implementation is actually a LangGraph implementation. Below are the two flows that are used in this setup:

Lastly, below is the final article, featuring comprehensive and well-balanced content. It is neatly organised and includes a structured table of contents for easy navigation.



Source link

19Sep

The Evolution of Grounding & Planning In AI Agents | by Cobus Greyling | Sep, 2024


In this example above, is a real-world web automation for maps, WebAgent receives an instruction and the webpageā€™s HTML code. Based on these, it predicts the next small task and selects the relevant part of the webpage to focus on. Then, it generates a Python script (shown in gray) to perform the task, treating the small task as a comment within the code.

With the rise of Large Language Models (LLMs), weā€™ve moved beyond simple intent detection. AI Agents underpinned by Language Models, are no longer just conversational engines ā€” they have evolved into sophisticated planners capable of guiding complex tasks based on a detailed understanding of the world.

These AI-driven systems can now operate within digital environments ā€” such as mobile operating systems or the web ā€” performing actions like navigating apps or interacting with websites.

Planning refers to the agentā€™s ability to determine the appropriate sequence of actions to accomplish a given task.

While grounding involves correctly identifying and interacting with relevant web elements based on these decisions. ~ Source

Much like a physical robot moving through the real world, AI agents are designed to navigate digital or virtual environments. These agents, being software entities, need to interact with systems like websites, mobile platforms, or other software applications.

However, the current experimental AI agents still face challenges in achieving the level of precision required for practical, real-world use. For example, the WebVoyager project ā€” a system designed to navigate the web and complete tasks ā€” achieves a 59.1% success rate. While impressive, this accuracy rate shows that thereā€™s room for improvement before AI agents can reliably handle complex real-world scenarios.

With LLMs, the concept of grounding becomes even more crucial. Grounding is what turns a vague or abstract conversation into something actionable. In the context of LLMs, grounding is achieved through in-context learning, where snippets of relevant information are injected into prompts to give the AI necessary context (RAG).

In the world of AI agents, planning is all about creating a sequence of actions to reach a specific goal. A complex or ambiguous request is broken down into smaller, manageable steps. The agent, in turn, follows this step-by-step process to achieve the desired outcome.

For instance, if an AI agent is tasked with booking a flight, it will need to break that task down into smaller actions ā€” such as checking flight options, comparing prices, and selecting a seat. This sequence of actions forms the backbone of the agentā€™s ability to plan and execute tasks efficiently.

However, planning alone isnā€™t enough. The AI agent must also ensure that its planned actions are feasible in the real world. Itā€™s one thing to generate a list of steps, but itā€™s another to ensure they are realistic.

Therefore, the AI must understand the limitations of time, resources, and context. Recent research explores how LLMs can use world models to simulate real-world constraints, helping them determine whether a given action is possible or not.

The AI agent must determine the correct sequence of actions (planning) and then interact with the relevant elements in a digital or physical environment (grounding). This combination ensures the AIā€™s decisions are both actionable and contextually appropriate.

While experimental systems like WebVoyager are still improving, the future of AI agents promises greater accuracy, flexibility, and reliability in carrying out actions across digital platforms. As these systems continue to advance, the line between conversation and action will blur, empowering AI to not only understand the world but to operate within it effectively.

ā­ļø Please follow me on LinkedIn for updates on LLMs ā­ļø

Iā€™m currently the Chief Evangelist @ Kore.ai. I explore and write about all things at the intersection of AI and language; ranging from LLMs, Chatbots, Voicebots, Development Frameworks, Data-Centric latent spaces and more.

https://www.linkedin.com/in/cobusgreyling
https://www.linkedin.com/in/cobusgreyling



Source link

17Sep

The Shifting Vocabulary of AI


The vocabulary of Generative AI and Conversational AI is evolving at a rapid pace. The challenge with such swift technological advancement is that new terms are constantly being introduced, shaping our understanding. Having meaningful conversations relies on aligning these terms with our respective mentalĀ models.

Introduction

Considering the graph below, there has been an evolution of prompt engineering and generation in natural language processing (NLP), letā€™s trace key developments back chronologically.

Early language models and information retrieval systems laid the foundation for prompt engineering. In 2015, the introduction of attention mechanisms revolutionised language understanding, leading to advancements in controllability and context-awareness.

Significant contributions were made in 2018 and 2019Ā , with a focus on fine-tuning strategies, control codes, and template-based generation.

During this period, breakthroughs in reinforcement learning techniques addressed challenges such as exposure bias and biases in generated text.

In 2020 and 2021, contextual prompting and transfer learning improved prompt engineering.

In 2022 and 2023, new techniques like unsupervised pre-training and reward shaping appeared.

For readers already familiar with the landscape and technology, Iā€™ll begin at number 10 and work my way back to numberĀ one.

Agentic Exploration

Web-Navigating AI Agents: Redefining Online Interactions and Shaping the Future of Autonomous Exploration.

Agentic exploration refers to the capacity of AI agents to autonomously navigate and interact with the digital world, particularly on theĀ web.

Web-navigating AI agents are revolutionising online interactions by automating complex tasks such as information retrieval, data analysis, and even decision-making processes.

These agents can browse websites, extract relevant data, and execute actions based on predefined objectives, transforming how users engage with online content. By reshaping the way we interact with the web, these AI agents are paving the way for more personalized, efficient, and intelligent online experiences.

As they continue to evolve, web-navigating AI agents are poised to significantly impact the future of autonomous exploration, expanding the boundaries of what AI can achieve in the digitalĀ realm.

Agentic / Multi-Modal AIĀ Agents

As agents grow in capability, they are also expanding into navigating by leveraging the image / visual capabilities of LanguageĀ Models.

Firstly, language models with vision capabilities significantly enhance AI agents by incorporating an additional modality, enabling them to process and understand visual information alongside text.

Iā€™ve often wondered about the most effective use-cases for multi-modal models, is applying them in agent applications that require visual input is a primeĀ example.

Secondly, recent developments such as Appleā€™s Ferrit-UI, AppAgent v2 and the WebVoyager/LangChain implementation showcase how GUI elements can be mapped and defined using named bounding boxes, further advancing the integration of vision in agent-driven tasks.

AI Agents / Autonomous Agents

An AI Agent is a software program designed to autonomously perform tasks or make decisions based on available tools.

As illustrated below, these agents rely on one or more Large Language Models or Foundation Models to break down complex tasks into manageable sub-tasks.

These sub-tasks are organised into a sequence of actions that the agent canĀ execute.

The agent also has access to a set of defined tools, each with a description to guide when and how to use them in sequence, addressing challenges and reaching a final conclusion.

Agentic Applications

Agentic applications refer to software systems that utilise autonomous AI agents to perform tasks, make decisions, and interact with their environment with minimal human intervention.

According to recent studies these applications leverage large language models (LLMs) to drive agent behaviours, enabling them to navigate complex tasks across various domains, including web navigation, data analysis, and task automation.

By integrating LLMs with other modalities like vision and reinforcement learning, agentic applications can dynamically adapt to changing inputs and goals, enhancing their problem-solving capabilities.

The study highlights how these agents can be evaluated for effectiveness in different scenarios, pushing the boundaries of what autonomous systems canĀ achieve.

As agentic applications evolve, they hold the potential to revolutionise industries by automating intricate workflows and enabling new forms of intelligent interaction.

In-Context Learning /Ā ICL

The underlying principle which enables RAG is In-Context LearningĀ (ICL).

In-context learning refers to a large language modelā€™s ability to adapt and generate relevant responses based on examples or information provided within the prompt itself, without requiring updates to the modelā€™s parameters.

By including a few examples of the desired behaviour or context within the prompt, the model can infer patterns and apply them to new, similarĀ tasks.

This approach leverages the modelā€™s internal understanding to perform tasks like classification, translation, or text generation based solely on the context given in theĀ prompt.

In-context learning is particularly useful for tasks where direct training on specific data isnā€™t feasible or where flexibility is required.

Retrieval Augmented Generation /Ā RAG

Retrieval Augmented Generation (RAG) combines information retrieval and generativemodels.

By injecting the prompt with relevant and contextual supporting information, the LLM can generate telling and contextually accurate responses to userĀ input.

Below is a complete workflow of how a RAG solution can be implemented. By making use of a vector store and semantic search, relevant and semantically accurate data can be retrieved.

Prompt Pipelines

In machine learning, a pipeline is an end-to-end construct that orchestrates the flow of events and data. It is initiated by a trigger, and based on specific events and parameters, it follows a sequence of steps to produce anĀ output.

Similarly, in the context of prompt engineering, a prompt pipeline is often initiated by a user request. This request is directed to a specific prompt template.

Prompt pipelines can be viewed as an intelligent extension of prompt templates.

They enhance the predefined templates by populating variables or placeholders (a process known as prompt injection) with user queries and relevant information from a knowledge store.

Below is an example from our GALE framework, on how a pipeline can be built in a no-codeĀ fashion.

Prompt Chaining

Prompt Chaining, also referred to as Large Language Model (LLM) Chaining, is the notion of creating a chain consisting of a series of model calls. This series of calls follow on each other with the output of one node in the chain serving as the input of the following.

Each chain node is intended to target small and well scoped sub-tasks, hence one or more LLMs is used to address multiple sequenced sub-components of aĀ task.

In essence prompt chaining leverages a key principle in prompt engineering, known as chain of thought prompting.

The principle of Chain of Thought prompting is not only used in chaining, but also in Agents and Prompt Engineering.

Chain of thought prompting is the notion of decomposing a complex task into refined smaller tasks, building up to the finalĀ answer.

Prompt Composition

The next step is to develop a library of prompt templates that can be combined at runtime to create more advanced prompts. While prompt composition adds a level of flexibility and programmability, it also introduces significant complexity.

A contextual prompt can be constructed by combining different templates, each with placeholders for variable injection. This approach allows for parts of a prompt to be reused efficiently.

A contextual prompt is composed or constituted, with the different elements of the prompts having placeholders as templates.

Prompt Templates

A step up from static prompts is prompt templating.

In this approach, a static prompt is transformed into a template by replacing key values with placeholders.

These placeholders are then dynamically filled with application values or variables atĀ runtime.

Some refer to templating as entity injection or prompt injection.

In the example template example below you can see the placeholders of${EXAMPlES:question}, ${EXAMPlES:answer} and ${QUESTIONS:question} and these placeholders are replaced with values atĀ runtime.

Static Prompts

Generation is a core functionality of large language models (LLMs) that can be effectively utilised, with prompt engineering serving as the method through which data is presented, thereby influencing how the LLM processes and responds toĀ it.

Text Generation Is A Meta Capability Of Large Language Models & Prompt Engineering Is Key To Unlocking It.

You cannot talk directly to a Generative Model, it is not a chatbot. You cannot explicitly request a generative model to do something.

But rather you need a vision of what you want to achieve and mimic the initiation of that vision. The process of mimicking is referred to as prompt design, prompt engineering orĀ casting.

Prompts can employ zero-shot, one-shot, or few-shot learning approaches. The generative capabilities of LLMs are significantly enhanced when using one-shot or few-shot learning, where example data is included in theĀ prompt.

A static prompt is simply plain text, without any templating, dynamic injection, or externalĀ input.

The following studies were used as a reference:

MOBILEFLOW: A MULTIMODAL LLM FOR MOBILE GUIĀ AGENT

A Brief History of Prompt: Leveraging Language Models. (Through Advanced Prompting)

Agentic Skill Discovery

WEBARENA: A REALISTIC WEB ENVIRONMENT FOR BUILDING AUTONOMOUS AGENTS

Ferret-UI: Grounded Mobile UI Understanding with Multimodal LLMs

WebVoyagerĀ : Building an End-to-End Web Agent with Large Multimodal Models

Follow me on LinkedIn for updates on Large LanguageĀ Models

Iā€™m currently the Chief Evangelist @ Kore AI. I explore & write about all things at the intersection of AI & language; ranging from LLMs, Chatbots, Voicebots, Development Frameworks, Data-Centric latent spaces &Ā more.

LinkedIn

Get an email whenever Cobus Greyling publishes.





Source link

17Sep

An AI Agent Architecture & Framework Is Emerging | by Cobus Greyling | Sep, 2024


We are beginning to see the convergence on fundamental architectural principles that are poised to define the next generation of AI agentsā€¦

These architectures are far more than just advanced models ā€” there are definitive building blocks emerging that will enable AI Agents & Agentic Applications to act autonomously, adapt dynamically, and interact and explore seamlessly within digital environments.

And as AI Agents become more capable, builders are converging on the common principles and approaches for core components.

I want to add a caveat: while thereā€™s plenty of futuristic speculation around AI Agents, Agentic Discovery, and Agentic Applications, the insights and comments I share here are grounded in concrete research papers and hands-on experience with prototypes that Iā€™ve either built or forked and tested in my own environment.

But First, Letā€™s Set The Stage With Some Key Conceptsā€¦

At a high level, an AI Agent is a system designed to perform tasks autonomously or semi-autonomously. Considering semi-autonomous for a moment, agents make use of tools to achieve their objective, and a human-in-the-loop can be a tool.

AI Agent tasks can range from a virtual assistant that schedules your appointments, to more complex agents involved in exploring and interacting with digital environments. With regards to digital environments, the most prominent research is from Apple with Ferret-UI, WebVoyager, and research from Microsoft and others; as seen belowā€¦

An AI Agent is a program that uses one or more Large Language Models (LLMs) or Foundation Models (FMs) as its backbone, enabling it to operate autonomously.

By decomposing queries, planning & creating a sequence of events, the AI Agent effectively addresses and solves complex problems.

AI Agents can handle highly ambiguous questions by decomposing them through a chain of thought process, similar to human reasoning.

These agents have access to a variety of tools, including programs, APIs, web searches, and more, to perform tasks and find solutions.

Much like how Large language models (LLMs) transformed natural language processing, Large Action Models (LAMs) are poised to revolutionise the way AI agents interact with their environments.

In a recent piece I wrote, I explored the emergence of Large Action Models (LAMs) and their future impact on AI Agents.

Salesforce AI Research open-sourced a number of LAMs, including a Small Action Model.

LAMs are designed to go beyond simple language generation by enabling AI to take meaningful actions in real-world scenarios.

Function calling has become a crucial element in the context of AI Agents, particularly from a model capability standpoint, because it significantly extends the functionality of large language models (LLMs) beyond text generation.

And hence one of the reasons for the advent of Large Action Models which has as one of its main traits the ability to excel at function calling.

AI Agents often need to perform actions based on user input, such as retrieving information, scheduling tasks, or performing computations.

Function calling allows the model to generate parameters for these tasks, enabling the agent to trigger external processes like database queries or API calls.

While LAMs form the action backbone, model orchestration brings together smaller, more specialised language models (SLMs) to assist in niche tasks.

Instead of relying solely on massive, resource-heavy models, agents can utilise these smaller models in tandem, orchestrating them for specific functions ā€” whether thatā€™s summarising data, parsing user commands, or providing insights based on historical context.

Small Language Models are ideal for development and testing, running them in an offline mode locally.

Large Language Models (LLMs) have rapidly gained traction due to several key characteristics that align well with the demands of natural language processing. These characteristics include natural language generation, common-sense reasoning, dialogue and conversation context management, natural language understanding, and the ability to handle unstructured input data. While LLMs are knowledge-intensive and have proven to be powerful tools, they are not without their limitations.

One significant drawback of LLMs is their tendency to hallucinate, meaning they can generate responses that are coherent, contextually accurate, and plausible, yet factually incorrect.

Additionally, LLMs are constrained by the scope of their training data, which has a fixed cut-off date. This means they do not possess ongoing, up-to-date knowledge or specific insights tailored to particular industries, organizations, or companies.

Updating an LLM to address these gaps is not straightforward; it requires fine-tuning the base model, which involves considerable effort in data preparation, costs, and testing. This process introduces a non-transparent, complex approach to data integration within LLMs.

To address these shortcomings, the concept of Retrieval-Augmented Generation (RAG) has been introduced.

RAG helps bridge the gap for Small Language Models (SLMs), supplementing them with the deep, intensive knowledge capabilities they typically lack.

While SLMs inherently manage other key aspects such as language generation and understanding, RAG equips them to perform comparably to their larger counterparts by enhancing their knowledge base.

This makes RAG a critical equalizer in the realm of AI language models, allowing smaller models to function with the robustness of a full-scale LLM.

As AI Agents gain capabilities to explore and interact with digital environments, the integration of vision capabilities with language models becomes crucial.

Projects like Ferret-UI from Apple and WebVoyager are excellent examples of this.

These agents can navigate within their digital surroundings, whether that means identifying elements on a user interface or exploring websites autonomously.

Imagine an AI Agent tasked with setting up an application in a new environment ā€” it would not only read text-based instructions but also recognise UI elements via OCR, mapping bounding boxes and interpreting text to interact with them, and provide visual feedback.

A fundamental shift is happening in how AI agents handle inputs and outputs.

Traditionally, LLMs have operated with unstructured input and generated unstructured output ā€” short to paragraphs of text or responses. But now, with function calling, we are moving toward structured, actionable outputs.

While LLMs are great for understanding and producing unstructured content, LAMs are designed to bridge the gap by turning language into structured, executable actions.

When an AI Agent can structure its output to align with specific functions, it can interact with other systems far more effectively.

For instance, instead of generating a merely unstructured/conversational text response, the AI could call a specific function to book a meeting, send a request, or trigger an API call ā€” all within a more efficient token usage.

Not only does this reduce the overhead of processing unstructured responses, but it also makes interactions between systems more seamless.

Something to realise in terms of Function Calling, is that when using the OpenAI API with function calling, the model does not execute functions directly.

AI Agents can now become truly part of the larger digital ecosystem.

Finally, letā€™s talk about the importance of tools in the architecture of AI agents.

Tools can be thought of as the mechanisms through which AI Agents interact with the world ā€” whether thatā€™s fetching data, performing calculations, or executing tasks. In many ways, these tools are like pipelines, carrying inputs from one stage to another, transforming them along the way.

Whatā€™s even more fascinating is that a tool doesnā€™t necessarily have to be an algorithm or script. In some cases, the tool can be a human-in-the-loop, where humans intervene at key moments to guide or validate the agentā€™s actions.

This is particularly valuable in high-stakes environments, such as healthcare or finance, where absolute accuracy is critical.

Tools not only extend the capabilities of AI agents but also serve as the glue that holds various systems together. Whether itā€™s a human or a digital function, these tools allow AI agents to become more powerful, modular, and context-aware.

As we stand at the cusp of this new era, itā€™s clear that AI agents are becoming far more sophisticated than we ever anticipated.

With Large Action Models, Model Orchestration, vision-enabled language models, Function Calling, and the critical role of Tools, these agents are active participants in solving problems, exploring digital landscapes, and learning autonomously.

By focusing on these core building blocks, weā€™re setting the foundation for AI agents that are not just smarter, but more adaptable, efficient, and capable of acting in ways that starts to resemble human problem solving and thought processes.

Iā€™m currently the Chief Evangelist @ Kore AI. I explore & write about all things at the intersection of AI & language; ranging from LLMs, Chatbots, Voicebots, Development Frameworks, Data-Centric latent spaces & more.

LinkedIn



Source link

16Sep

Introducing NumPy, Part 3: Manipulating Arrays | by Lee Vaughan | Sep, 2024


Shaping, transposing, joining, and splitting arrays

A grayscale Rubikā€™s cube hits itself with a hammer, breaking off tiny cubes.
Manipulating an array as imagined by DALL-E3

Welcome to Part 3 of Introducing NumPy, a primer for those new to this essential Python library. Part 1 introduced NumPy arrays and how to create them. Part 2 covered indexing and slicing arrays. Part 3 will show you how to manipulate existing arrays by reshaping them, swapping their axes, and merging and splitting them. These tasks are handy for jobs like rotating, enlarging, and translating images and fitting machine learning models.

NumPy comes with methods to change the shape of arrays, transpose arrays (invert columns with rows), and swap axes. Youā€™ve already been working with the reshape() method in this series.

One thing to be aware of with reshape() is that, like all NumPy assignments, it creates a view of an array rather than a copy. In the following example, reshaping the arr1d array produces only a temporary change to the array:

In [1]: import numpy as np

In [2]: arr1d = np.array([1, 2, 3, 4])

In [3]: arr1d.reshape(2, 2)
Out[3]:
array([[1, 2],
[3, 4]])

In [4]: arr1d
Out[4]: array([1, 2, 3, 4])

This behavior is useful when you want to temporarily change the shape of the array for use in aā€¦



Source link

15Sep

Applications of Rolling Windows for Time Series, with Python | by Piero Paialunga | Sep, 2024


Hereā€™s some powerful applications of Rolling Windows and Time Series

Photo by Claudia Aran on Unsplash

Last night I was doing laundry with my wife. We have this non-verbal agreement (it becomes pretty verbal when I break it though) about laundry: she is the one who puts the laundry in the washer and drier and I am the one who folds it.

The way we do this is usually like this:

Image made by author using DALLE

Now, I donā€™t really fold all the clothes and put them away. Otherwise, I would be swimming in clothes. What I do is an approach that reminds me of the rolling window method:

Image made by author using DALLE

Why do I say that it reminds me of a rolling window? Letā€™s see the analogy.

Image made by author using DALLE

The idea of rolling windows is exactly the one that I apply when folding laundry. I have a task to do but youā€¦



Source link

14Sep

Bayesian Linear Regression: A Complete Beginnerā€™s guide | by Samvardhan Vishnoi | Sep, 2024


A workflow and code walkthrough for building a Bayesian regression model in STAN

Note: Check out my previous article for a practical discussion on why Bayesian modeling may be the right choice for your task.

This tutorial will focus on a workflow + code walkthrough for building a Bayesian regression model in STAN, a probabilistic programming language. STAN is widely adopted and interfaces with your language of choice (R, Python, shell, MATLAB, Julia, Stata). See the installation guide and documentation.

I will use Pystan for this tutorial, simply because I code in Python. Even if you use another language, the general Bayesian practices and STAN language syntax I will discuss here doesnā€™t vary much.

For the more hands-on reader, here is a link to the notebook for this tutorial, part of my Bayesian modeling workshop at Northwestern University (April, 2024).

Letā€™s dive in!

Lets learn how to build a simple linear regression model, the bread and butter of any statistician, the Bayesian way. Assuming a dependent variable Y and covariate X, I propose the following simple model-

Y = Ī± + Ī² * X + Ļµ

Where āŗ is the intercept, Ī² is the slope, and Ļµ is some random error. Assuming that,

Ļµ ~ Normal(0, Ļƒ)

we can show that

Y ~ Normal(Ī± + Ī² * X, Ļƒ)

We will learn how to code this model form in STAN.

Generate Data

First, letā€™s generate some fake data.

#Model Parameters
alpha = 4.0 #intercept
beta = 0.5 #slope
sigma = 1.0 #error-scale
#Generate fake data
x = 8 * np.random.rand(100)
y = alpha + beta * x
y = np.random.normal(y, scale=sigma) #noise
#visualize generated data
plt.scatter(x, y, alpha = 0.8)
Generated data for Linear Regression (Image from code by Author)

Now that we have some data to model, letā€™s dive into how to structure it and pass it to STAN along with modeling instructions. This is done via the model string, which typically contains 4 (occasionally more) blocks- data, parameters, model, and generated quantities. Letā€™s discuss each of these blocks in detail.

DATA block

data {                    //input the data to STAN
int N;
vector[N] x;
vector[N] y;
}

The data block is perhaps the simplest, it tells STAN internally what data it should expect, and in what format. For instance, here we pass-

N: the size of our dataset as type int. The part declares that Nā‰„0. (Even though it is obvious here that data length cannot be negative, stating these bounds is good standard practice that can make STANā€™s job easier.)

x: the covariate as a vector of length N.

y: the dependent as a vector of length N.

See docs here for a full range of supported data types. STAN offers support for a wide range of types like arrays, vectors, matrices etc. As we saw above, STAN also has support for encoding limits on variables. Encoding limits is recommended! It leads to better specified models and simplifies the probabilistic sampling processes operating under the hood.

Model Block

Next is the model block, where we tell STAN the structure of our model.

//simple model block 
model {
//priors
alpha ~ normal(0,10);
beta ~ normal(0,1);

//model
y ~ normal(alpha + beta * x, sigma);
}

The model block also contains an important, and often confusing, element: prior specification. Priors are a quintessential part of Bayesian modeling, and must be specified suitably for the sampling task.

See my previous article for a primer on the role and intuition behind priors. To summarize, the prior is a presupposed functional form for the distribution of parameter values ā€” often referred to, simply, as prior belief. Even though priors donā€™t have to exactly match the final solution, they must allow us to sample from it.

In our example, we use Normal priors of mean 0 with different variances, depending on how sure we are of the supplied mean value: 10 for alpha (very unsure), 1 for beta (somewhat sure). Here, I supplied the general belief that while alpha can take a wide range of different values, the slope is generally more contrained and wonā€™t have a large magnitude.

Hence, in the example above, the prior for alpha is ā€˜weakerā€™ than beta.

As models get more complicated, the sampling solution space expands, and supplying beliefs gains importance. Otherwise, if there is no strong intuition, it is good practice to just supply less belief into the model i.e. use a weakly informative prior, and remain flexible to incoming data.

The form for y, which you might have recognized already, is the standard linear regression equation.

Generated Quantities

Lastly, we have our block for generated quantities. Here we tell STAN what quantities we want to calculate and receive as output.

generated quantities {    //get quantities of interest from fitted model
vector[N] yhat;
vector[N] log_lik;
for (n in 1:N){
yhat[n] = normal_rng(alpha + x[n] * beta, sigma);
//generate samples from model
log_lik[n] = normal_lpdf( y[n] | alpha + x[n] * beta, sigma);
//probability of data given the model and parameters
}
}

Note: STAN supports vectors to be passed either directly into equations, or as iterations 1:N for each element n. In practice, Iā€™ve found this support to change with different versions of STAN, so it is good to try the iterative declaration if the vectorized version fails to compile.

In the above example-

yhat: generates samples for y from the fitted parameter values.

log_lik: generates probability of data given the model and fitted parameter value.

The purpose of these values will be clearer when we talk about model evaluation.

Altogether, we have now fully specified our first simple Bayesian regression model:

model = """
data { //input the data to STAN
int N;
vector[N] x;
vector[N] y;
}

All that remains is to compile the model and run the sampling.

#STAN takes data as a dict
data = {'N': len(x), 'x': x, 'y': y}

STAN takes input data in the form of a dictionary. It is important that this dict contains all the variables that we told STAN to expect in the model-data block, otherwise the model wonā€™t compile.

#parameters for STAN fitting
chains = 2
samples = 1000
warmup = 10
# set seed
# Compile the model
posterior = stan.build(model, data=data, random_seed = 42)
# Train the model and generate samples
fit = posterior.sample(num_chains=chains, num_samples=samples)The .sample() method parameters control the Hamiltonian Monte Carlo (HMC) sampling process, where ā€”
  • num_chains: is the number of times we repeat the sampling process.
  • num_samples: is the number of samples to be drawn in each chain.
  • warmup: is the number of initial samples that we discard (as it takes some time to reach the general vicinity of the solution space).

Knowing the right values for these parameters depends on both the complexity of our model and the resources available.

Higher sampling sizes are of course ideal, yet for an ill-specified model they will prove to be just waste of time and computation. Anecdotally, Iā€™ve had large data models Iā€™ve had to wait a week to finish running, only to find that the model didnā€™t converge. Is is important to start slowly and sanity check your model before running a full-fledged sampling.

Model Evaluation

The generated quantities are used for

  • evaluating the goodness of fit i.e. convergence,
  • predictions
  • model comparison

Convergence

The first step for evaluating the model, in the Bayesian framework, is visual. We observe the sampling draws of the Hamiltonian Monte Carlo (HMC) sampling process.

Model Convergence: visually evaluating the overlap of independent sampling chains (Image from code by Author)

In simplistic terms, STAN iteratively draws samples for our parameter values and evaluates them (HMC does way more, but thatā€™s beyond our current scope). For a good fit, the sample draws must converge to some common general area which would, ideally, be the global optima.

The figure above shows the sampling draws for our model across 2 independent chains (red and blue).

  • On the left, we plot the overall distribution of the fitted parameter value i.e. the posteriors. We expect a normal distribution if the model, and its parameters, are well specified. (Why is that? Well, a normal distribution just implies that there exist a certain range of best fit values for the parameter, which speaks in support of our chosen model form). Furthermore, we should expect a considerable overlap across chains IF the model is converging to an optima.
  • On the right, we plot the actual samples drawn in each iteration (just to be extra sure). Here, again, we wish to see not only a narrow range but also a lot of overlap between the draws.

Not all evaluation metrics are visual. Gelman et al. [1] also propose the Rhat diagnostic which essential is a mathematical measure of the sample similarity across chains. Using Rhat, one can define a cutoff point beyond which the two chains are judged too dissimilar to be converging. The cutoff, however, is hard to define due to the iterative nature of the process, and the variable warmup periods.

Visual comparison is hence a crucial component, regardless of diagnostic tests

A frequentist thought you may have here is that, ā€œwell, if all we have is chains and distributions, what is the actual parameter value?ā€ This is exactly the point. The Bayesian formulation only deals in distributions, NOT point estimates with their hard-to-interpret test statistics.

That said, the posterior can still be summarized using credible intervals like the High Density Interval (HDI), which includes all the x% highest probability density points.

95% HDI for beta (Image from code by Author)

It is important to contrast Bayesian credible intervals with frequentist confidence intervals.

  • The credible interval gives a probability distribution on the possible values for the parameter i.e. the probability of the parameter assuming each value in some interval, given the data.
  • The confidence interval regards the parameter value as fixed, and estimates instead the confidence that repeated random samplings of the data would match.

Hence the

Bayesian approach lets the parameter values be fluid and takes the data at face value, while the frequentist approach demands that there exists the one true parameter valueā€¦ if only we had access to all the data ever

Phew. Let that sink in, read it again until it does.

Another important implication of using credible intervals, or in other words, allowing the parameter to be variable, is that the predictions we make capture this uncertainty with transparency, with a certain HDI % informing the best fit line.

95% HDI line of best fit (Image from code by Author)

Model comparison

In the Bayesian framework, the Watanabe-Akaike Information Metric (WAIC) score is the widely accepted choice for model comparison. A simple explanation of the WAIC score is that it estimates the model likelihood while regularizing for the number of model parameters. In simple words, it can account for overfitting. This is also major draw of the Bayesian framework ā€” one does not necessarily need to hold-out a model validation dataset. Hence,

Bayesian modeling offers a crucial advantage when data is scarce.

The WAIC score is a comparative measure i.e. it only holds meaning when compared across different models that attempt to explain the same underlying data. Thus in practice, one can keep adding more complexity to the model as long as the WAIC increases. If at some point in this process of adding maniacal complexity, the WAIC starts dropping, one can call it a day ā€” any more complexity will not offer an informational advantage in describing the underlying data distribution.

Conclusion

To summarize, the STAN model block is simply a string. It explains to STAN what you are going to give to it (model), what is to be found (parameters), what you think is going on (model), and what it should give you back (generated quantities).

When turned on, STAN simple turns the crank and gives its output.

The real challenge lies in defining a proper model (refer priors), structuring the data appropriately, asking STAN exactly what you need from it, and evaluating the sanity of its output.

Once we have this part down, we can delve into the real power of STAN, where specifying increasingly complicated models becomes just a simple syntactical task. In fact, in our next tutorial we will do exactly this. We will build upon this simple regression example to explore Bayesian Hierarchical models: an industry standard, state-of-the-art, defactoā€¦ you name it. We will see how to add group-level radom or fixed effects into our models, and marvel at the ease of adding complexity while maintaining comparability in the Bayesian framework.

Subscribe if this article helped, and to stay-tuned for more!

References

[1] Andrew Gelman, John B. Carlin, Hal S. Stern, David B. Dunson, Aki Vehtari and Donald B. Rubin (2013). Bayesian Data Analysis, Third Edition. Chapman and Hall/CRC.



Source link

14Sep

Emergence of Large Action Models (LAMs) and Their Impact on AI Agents | by Cobus Greyling | Sep, 2024


While LLMs are great for understanding and producing unstructured content, LAMs are designed to bridge the gap by turning language into structured, executable actions.

As I have mentioned in the past, Autonomous AI Agents powered by large language models (LLMs) have recently emerged as a key focus of research, driving the development of concepts like agentic applications, agentic retrieval-augmented generation (RAG), and agentic discovery.

However, according to Salesforce AI Research, the open-source community continues to face significant challenges in building specialised models tailored for these tasks.

A major hurdle is the scarcity of high-quality, agent-specific datasets, coupled with the absence of standardised protocols, which complicates the development process.

To bridge this gap, researchers at Salesforce have introduced xLAM, a series of Large Action Models specifically designed for AI agent tasks.

The xLAM series comprises five models, featuring architectures that range from dense to mixture-of-experts, with parameter sizes from 1 billion upwards.

These models aim to advance the capabilities of autonomous agents by providing purpose-built solutions tailored to the complex demands of agentic tasks.

Function calling has become a crucial element in the context of AI agents, particularly from a model capability standpoint, because it significantly extends the functionality of large language models (LLMs) beyond static text generation.

And hence one of the reasons for the advent of Large Action Models which has as one of its main traits the ability to excel at function calling.

AI agents often need to perform actions based on user input, such as retrieving information, scheduling tasks, or performing computations.

Function calling allows the model to generate parameters for these tasks, enabling the agent to trigger external processes like database queries or API calls.

This makes the agent not just reactive, but action-oriented, turning passive responses into dynamic interactions.

Interoperability with External Systems

For AI Agents, sub-tasks involve interacting with various tools. Tools are in turn linked to external systems (CRM systems, financial databases, weather APIs, etc).

Through function calling, LAMs can serve as a broker, providing the necessary data or actions for those systems without needing the model itself to have direct access. This allows for seamless integration with other software environments and tools.

By moving from a LLM to a LAM, the model utility is also expanded, and LAMs can thus be seen as purpose built to act as the centre piece for an agentic implementation.

Large Language Models (LLMs) are designed to handle unstructured input and output, excelling at tasks like generating human-like text, summarising content, and answering open-ended questions.

LLMs are highly flexible, allowing them to process diverse forms of natural language without needing predefined formats.

However, their outputs can be ambiguous or loosely structured, which can limit their effectiveness for specific task execution. And using a LLM for an agentic implementation is not wrong, and serves the purpose quite well.

But Large Action Models (LAMs) can be considered as purpose built, focusing on structuring outputs by generating precise parameters or instructions for specific actions, making them suitable for tasks that require clear and actionable results, such as function calling or API interactions.

While LLMs are great for understanding and producing unstructured content, LAMs are designed to bridge the gap by turning language into structured, executable actions.

Overall, in the context of AI agents, function calling enables more robust, capable, and practical applications by allowing LLMs to serve as a bridge between natural language understanding and actionable tasks within digital systems.



Source link

12Sep

Strategic Chain-of-Thought (SCoT) | by Cobus Greyling | Sep, 2024


As LLMs evolve, I believe that while CoT remains simple and transparent, managing the growing complexity of prompts and multi-inference architectures will demand more sophisticated tools and a strong focus on data-centric approaches.

Human oversight will be essential to maintaining the integrity of these systems.

As LLM-based applications become more complex, their underlying processes must be accommodated somewhere, and preferably a resilient platform that can handle the growing functionality and complexity.

The prompt engineering process itself can become intricate, requiring dedicated infrastructure to manage data flow, API calls, and multi-step reasoning.

But as this complexity scales, introducing an agentic approach becomes essential to scale automated tasks, manage complex workflows, and navigate digital environments efficiently.

These agents enable applications to break down complex requests into manageable steps, optimising both performance and scalability.

Ultimately, hosting this complexity requires adaptable systems that support real-time interaction and seamless integration with broader data and AI ecosystems.

Strategic knowledge refers to a clear method or principle that guides reasoning toward a correct and stable solution. It involves using structured processes that logically lead to the desired outcome, thereby improving the stability and quality of CoT generation.



Source link

Protected by Security by CleanTalk