27Sep

The Role of Small Models in the LLM Era | by Cobus Greyling | Sep, 2024


LLMs are highly versatile, able to handle a wide range of tasks with only a few training examples.

In contrast, smaller models tend to be more specialised and studies show that fine-tuning them on domain-specific datasets can sometimes lead to better performance than general LLMs on specific tasks.

LLMs demand significant computational resources for both training and inference, resulting in high costs and latency, which makes them less suitable for real-time applications, such as information retrieval, or in resource-limited environments like edge devices.

In contrast, smaller models require less training data and computational power, providing competitive performance while greatly reducing resource requirements.

Smaller, simpler models are generally more transparent and easier to interpret compared to larger, more complex models.

In areas like healthcare, finance and law, smaller models are often preferred because their decisions need to be easily understood by non-experts, such as doctors or financial analysts.

Below is an insightful graphic illustrating the collaboration between Small and Large Language Models.

It highlights how Small Models frequently support or enhance the capabilities of Large Models, demonstrating their crucial role in boosting efficiency, scalability and performance.

The examples make it clear that Small Models play a vital part in optimising resource use while complementing larger systems.



Source link

27Sep

What 15 Data Scientists Say About Key Skills | by Egor Howell | Sep, 2024


Going over the main skills you need to be a “good” data scientist

Photo by Campaign Creators on Unsplash

What are the essential skills to be a “good” or even “great” data scientist?

Well, I asked 15 data scientists from multiple companies and industries what they think, and let’s say I was pretty surprised by some of the responses.

Hopefully, this article will give you better guidance on where to focus to progress in your data science career!

To avoid wasting your time, here are the top 10 skills described by the 15 data scientists summarised in a plot.

Plot generated by author using Python and ChatGPT.

What I find pretty interesting is the combination of technical skills like maths and coding and softer skills like communication and curiosity.

Personally, this makes sense. The true heart of a data scientist is to find trends and information in the data, and you need technical knowledge and tools to do this. Then, you need to convey that information easily to senior management and stakeholders so they can make informed decisions for the business.



Source link

26Sep

Small Language Model (SLM) Efficiency, Performance & Potential | by Cobus Greyling | Sep, 2024


Focusing on transformer-based, decoder-only language models with 100 million to 5 billion parameters, researchers surveyed 59 cutting-edge open-source models, examining innovations in architecture, training datasets & algorithms.

They also evaluated model abilities in areas like common-sense reasoning, in-context learning, math & coding.

To assess model performance on devices, researchers benchmarked latency and memory usage during inference.

The term “small” is inherently subjective and relative & its meaning may evolve over time as device memory continues to expand, allowing for larger “small language models” in the future.

The study established 5 billion parameters as the upper limit for small language models (SLMs). As of September 2024, 7 billion parameter large language models (LLMs) are predominantly deployed in the cloud.

Small Language Models (SLMs) are designed for resource-efficient deployment on devices like desktops, smartphones, and wearables.

The goal is to make advanced machine intelligence accessible and affordable for everyone, much like the universal nature of human cognition.

Small Language Models (SLMs) are already widely integrated into commercial devices. For example, the latest Google and Samsung smartphones feature built-in Large Language Model (LLM) services, like Gemini Nano, which allow third-party apps to access LLM capabilities through prompts and modular integrations.



Source link

24Sep

Run and Serve Faster VLMs Like Pixtral and Phi-3.5 Vision with vLLM


Understanding how much memory you need to serve a VLM

An image encoded by Pixtral — Image by the author

vLLM is currently one of the fastest inference engines for large language models (LLMs). It supports a wide range of model architectures and quantization methods.

vLLM also supports vision-language models (VLMs) with multimodal inputs containing both images and text prompts. For instance, vLLM can now serve models like Phi-3.5 Vision and Pixtral, which excel at tasks such as image captioning, optical character recognition (OCR), and visual question answering (VQA).

In this article, I will show you how to use VLMs with vLLM, focusing on key parameters that impact memory consumption. We will see why VLMs consume much more memory than standard LLMs. We’ll use Phi-3.5 Vision and Pixtral as case studies for a multimodal application that processes prompts containing text and images.

The code for running Phi-3.5 Vision and Pixtral with vLLM is provided in this notebook:

Get the notebook (#105)

In transformer models, generating text token by token is slow because each prediction depends on all previous tokens…



Source link

23Sep

AgentLite Is A Lightweight Framework for Building AI Agents | by Cobus Greyling | Sep, 2024


The study also states that LangChain is a popular library for developing applications with large language models (LLMs), offering built-in tools to create various agent types.

But says it tends to be overly complex for researchers who want to design new types of agent reasoning and architectures. Modifying LangChain for specific research needs can be difficult due to the high overhead in agent creation.

While Autogen has succeeded in building LLM agents, its agent interfaces have fixed reasoning types, making it hard to adapt for other research tasks. Additionally, its architecture is limited to multi-agent conversation and code execution, which may not fit all new scenarios or benchmarks.

Below is a recorded demonstration of the AgentLite User Interface…

Source

In AgentLite, the Individual Agent serves as the foundational agent class, built on four core modules:

  1. PromptGen,
  2. Actions,
  3. LLM, and
  4. Memory

PromptGen

PromptGen is responsible for constructing the prompts that the agent sends to the LLM to generate actions.

These prompts are made up of several components, such as the agent’s role description, instructions, constraints, actions, and relevant examples.

AgentLite includes default methods to combine these elements but also provides flexibility for developers to create custom prompts for specific tasks.



Source link

22Sep

Hands-On Numerical Derivative with Python, from Zero to Hero | by Piero Paialunga | Sep, 2024


Here’s everything you need to know (beyond the standard definition) to master the numerical derivative world

Photo by Roman Mager on Unsplash

There is a legendary statement that you can find in at least one lab at every university and it goes like this:

Theory is when you know everything but nothing works.
Practice is when everything works but no one knows why.
In this lab, we combine theory and practice: nothing works and nobody knows why

I find this sentence so relatable in the data science world. I say this because data science starts as a mathematical problem (theory): you need to minimize a loss function. Nonetheless, when you get to real life (experiment/lab) things start to get very messy and your perfect theoretical world assumptions might not work anymore (they never do), and you don’t know why.

For example, take the concept of derivative. Everybody who deals with complex concepts of data science knows (or, even better, MUST know) what a derivative is. But then how do you apply the elegant and theoretical concept of derivative in real life, on a noisy signal, where you don’t have the analytic…



Source link

21Sep

Build a WhatsApp LLM Bot: a Guide for Lazy Solo Programmers | by Ian Xiao | Sep, 2024


TLDR: I built and deployed a WhatsApp LLM bot in 12 hours to learn English better and faster. I am exploring how to make LLM useful in our everyday lives. I share my design choices, what I built, tools I used, lessons learned, and a product roadmap.

I am building the app in phases. Stay tuned for updates. See a questionnaire at the end where I need some design guidance from the community and hope you can participate in beta 🙏🫶

This is not a code walkthrough. I will put all the resources I used at the end so you can take a look if you are interested.

The Problem

I love reading and writing.

But, as a non-native English speaker, I often come across new words I don’t know or think I know but need help understanding. New words fly by as I go about my busy day or enjoy the flow of reading. I want them to stick; I want to be more eloquent.

How about looking them up and writing them down? Digital solutions (e.g., dictionary or vocabulary apps) and pen & paper don’t work.

  • ❌ they take me away from the moment (e.g., reading a good book)



Source link

20Sep

Outline-Driven RAG & Web Research Prototype | by Cobus Greyling | Sep, 2024


The theory of questioning emphasises that while answering existing questions deepens understanding of a topic, it often leads to new questions.

To initiate this dynamic process, STORM simulates a conversation between a Wikipedia writer and a topic expert.

In each round of conversation, the LLM-powered writer generates a question based on the topic, its assigned perspective, and the conversation history.

This history helps the LLM update its understanding and formulate follow-up questions, with a maximum limit of rounds set for the conversation.

To ensure the conversation remains factual, trusted online sources are used to ground each answer.

If a question is complex, the LLM first breaks it down into simpler search queries. The search results are then evaluated with a rule-based filter to exclude unreliable sources.

Finally, the LLM synthesises information from trustworthy sources to generate the answer, which is also added to the references for the full article.

I executed the LangChain implementation in a notebook, utilizing the GPT-3.5-Turbo model.

Along with that, tools such as DuckDuckGo for search functionality and Tavily-Python for other resources are required.

The only modification required was the inclusion of the commandpip install -U duckduckgo-search in the notebook to ensure proper functionality.

Below is an example of a prompt within the LangChain implementation of STORM…

direct_gen_outline_prompt = ChatPromptTemplate.from_messages(
[
(
"system",
"You are a Wikipedia writer. Write an outline for a Wikipedia page about a user-provided topic. Be comprehensive and specific.",
),
("user", "{topic}"),
]
)

And here is an example topic prompt…

example_topic = "Impact of million-plus token context window language models on RAG"

Below the gen_related_topics_promp prompt…

gen_related_topics_prompt = ChatPromptTemplate.from_template(
"""I'm writing a Wikipedia page for a topic mentioned below. Please identify and recommend some Wikipedia pages on closely related subjects. I'm looking for examples that provide insights into interesting aspects commonly associated with this topic, or examples that help me understand the typical content and structure included in Wikipedia pages for similar topics.

Please list the as many subjects and urls as you can.

Topic of interest: {topic}
"""
)

class RelatedSubjects(BaseModel):
topics: List[str] = Field(
description="Comprehensive list of related subjects as background research.",
)

expand_chain = gen_related_topics_prompt | fast_llm.with_structured_output(
RelatedSubjects
)

And the gen_perspectives_prompt prompt…

gen_perspectives_prompt = ChatPromptTemplate.from_messages(
[
(
"system",
"""You need to select a diverse (and distinct) group of Wikipedia editors who will work together to create a comprehensive article on the topic. Each of them represents a different perspective, role, or affiliation related to this topic.\
You can use other Wikipedia pages of related topics for inspiration. For each editor, add a description of what they will focus on.

Wiki page outlines of related topics for inspiration:
{examples}""",
),
("user", "Topic of interest: {topic}"),
]
)

And the gen_qn_prompt prompt…

gen_qn_prompt = ChatPromptTemplate.from_messages(
[
(
"system",
"""You are an experienced Wikipedia writer and want to edit a specific page. \
Besides your identity as a Wikipedia writer, you have a specific focus when researching the topic. \
Now, you are chatting with an expert to get information. Ask good questions to get more useful information.

When you have no more questions to ask, say "Thank you so much for your help!" to end the conversation.\
Please only ask one question at a time and don't ask what you have asked before.\
Your questions should be related to the topic you want to write.
Be comprehensive and curious, gaining as much unique insight from the expert as possible.\

Stay true to your specific perspective:

{persona}""",
),
MessagesPlaceholder(variable_name="messages", optional=True),
]
)

This LangChain implementation is actually a LangGraph implementation. Below are the two flows that are used in this setup:

Lastly, below is the final article, featuring comprehensive and well-balanced content. It is neatly organised and includes a structured table of contents for easy navigation.



Source link

19Sep

The Evolution of Grounding & Planning In AI Agents | by Cobus Greyling | Sep, 2024


In this example above, is a real-world web automation for maps, WebAgent receives an instruction and the webpage’s HTML code. Based on these, it predicts the next small task and selects the relevant part of the webpage to focus on. Then, it generates a Python script (shown in gray) to perform the task, treating the small task as a comment within the code.

With the rise of Large Language Models (LLMs), we’ve moved beyond simple intent detection. AI Agents underpinned by Language Models, are no longer just conversational engines — they have evolved into sophisticated planners capable of guiding complex tasks based on a detailed understanding of the world.

These AI-driven systems can now operate within digital environments — such as mobile operating systems or the web — performing actions like navigating apps or interacting with websites.

Planning refers to the agent’s ability to determine the appropriate sequence of actions to accomplish a given task.

While grounding involves correctly identifying and interacting with relevant web elements based on these decisions. ~ Source

Much like a physical robot moving through the real world, AI agents are designed to navigate digital or virtual environments. These agents, being software entities, need to interact with systems like websites, mobile platforms, or other software applications.

However, the current experimental AI agents still face challenges in achieving the level of precision required for practical, real-world use. For example, the WebVoyager project — a system designed to navigate the web and complete tasks — achieves a 59.1% success rate. While impressive, this accuracy rate shows that there’s room for improvement before AI agents can reliably handle complex real-world scenarios.

With LLMs, the concept of grounding becomes even more crucial. Grounding is what turns a vague or abstract conversation into something actionable. In the context of LLMs, grounding is achieved through in-context learning, where snippets of relevant information are injected into prompts to give the AI necessary context (RAG).

In the world of AI agents, planning is all about creating a sequence of actions to reach a specific goal. A complex or ambiguous request is broken down into smaller, manageable steps. The agent, in turn, follows this step-by-step process to achieve the desired outcome.

For instance, if an AI agent is tasked with booking a flight, it will need to break that task down into smaller actions — such as checking flight options, comparing prices, and selecting a seat. This sequence of actions forms the backbone of the agent’s ability to plan and execute tasks efficiently.

However, planning alone isn’t enough. The AI agent must also ensure that its planned actions are feasible in the real world. It’s one thing to generate a list of steps, but it’s another to ensure they are realistic.

Therefore, the AI must understand the limitations of time, resources, and context. Recent research explores how LLMs can use world models to simulate real-world constraints, helping them determine whether a given action is possible or not.

The AI agent must determine the correct sequence of actions (planning) and then interact with the relevant elements in a digital or physical environment (grounding). This combination ensures the AI’s decisions are both actionable and contextually appropriate.

While experimental systems like WebVoyager are still improving, the future of AI agents promises greater accuracy, flexibility, and reliability in carrying out actions across digital platforms. As these systems continue to advance, the line between conversation and action will blur, empowering AI to not only understand the world but to operate within it effectively.

⭐️ Please follow me on LinkedIn for updates on LLMs ⭐️

I’m currently the Chief Evangelist @ Kore.ai. I explore and write about all things at the intersection of AI and language; ranging from LLMs, Chatbots, Voicebots, Development Frameworks, Data-Centric latent spaces and more.

https://www.linkedin.com/in/cobusgreyling
https://www.linkedin.com/in/cobusgreyling



Source link

17Sep

The Shifting Vocabulary of AI


The vocabulary of Generative AI and Conversational AI is evolving at a rapid pace. The challenge with such swift technological advancement is that new terms are constantly being introduced, shaping our understanding. Having meaningful conversations relies on aligning these terms with our respective mental models.

Introduction

Considering the graph below, there has been an evolution of prompt engineering and generation in natural language processing (NLP), let’s trace key developments back chronologically.

Early language models and information retrieval systems laid the foundation for prompt engineering. In 2015, the introduction of attention mechanisms revolutionised language understanding, leading to advancements in controllability and context-awareness.

Significant contributions were made in 2018 and 2019 , with a focus on fine-tuning strategies, control codes, and template-based generation.

During this period, breakthroughs in reinforcement learning techniques addressed challenges such as exposure bias and biases in generated text.

In 2020 and 2021, contextual prompting and transfer learning improved prompt engineering.

In 2022 and 2023, new techniques like unsupervised pre-training and reward shaping appeared.

For readers already familiar with the landscape and technology, I’ll begin at number 10 and work my way back to number one.

Agentic Exploration

Web-Navigating AI Agents: Redefining Online Interactions and Shaping the Future of Autonomous Exploration.

Agentic exploration refers to the capacity of AI agents to autonomously navigate and interact with the digital world, particularly on the web.

Web-navigating AI agents are revolutionising online interactions by automating complex tasks such as information retrieval, data analysis, and even decision-making processes.

These agents can browse websites, extract relevant data, and execute actions based on predefined objectives, transforming how users engage with online content. By reshaping the way we interact with the web, these AI agents are paving the way for more personalized, efficient, and intelligent online experiences.

As they continue to evolve, web-navigating AI agents are poised to significantly impact the future of autonomous exploration, expanding the boundaries of what AI can achieve in the digital realm.

Agentic / Multi-Modal AI Agents

As agents grow in capability, they are also expanding into navigating by leveraging the image / visual capabilities of Language Models.

Firstly, language models with vision capabilities significantly enhance AI agents by incorporating an additional modality, enabling them to process and understand visual information alongside text.

I’ve often wondered about the most effective use-cases for multi-modal models, is applying them in agent applications that require visual input is a prime example.

Secondly, recent developments such as Apple’s Ferrit-UI, AppAgent v2 and the WebVoyager/LangChain implementation showcase how GUI elements can be mapped and defined using named bounding boxes, further advancing the integration of vision in agent-driven tasks.

AI Agents / Autonomous Agents

An AI Agent is a software program designed to autonomously perform tasks or make decisions based on available tools.

As illustrated below, these agents rely on one or more Large Language Models or Foundation Models to break down complex tasks into manageable sub-tasks.

These sub-tasks are organised into a sequence of actions that the agent can execute.

The agent also has access to a set of defined tools, each with a description to guide when and how to use them in sequence, addressing challenges and reaching a final conclusion.

Agentic Applications

Agentic applications refer to software systems that utilise autonomous AI agents to perform tasks, make decisions, and interact with their environment with minimal human intervention.

According to recent studies these applications leverage large language models (LLMs) to drive agent behaviours, enabling them to navigate complex tasks across various domains, including web navigation, data analysis, and task automation.

By integrating LLMs with other modalities like vision and reinforcement learning, agentic applications can dynamically adapt to changing inputs and goals, enhancing their problem-solving capabilities.

The study highlights how these agents can be evaluated for effectiveness in different scenarios, pushing the boundaries of what autonomous systems can achieve.

As agentic applications evolve, they hold the potential to revolutionise industries by automating intricate workflows and enabling new forms of intelligent interaction.

In-Context Learning / ICL

The underlying principle which enables RAG is In-Context Learning (ICL).

In-context learning refers to a large language model’s ability to adapt and generate relevant responses based on examples or information provided within the prompt itself, without requiring updates to the model’s parameters.

By including a few examples of the desired behaviour or context within the prompt, the model can infer patterns and apply them to new, similar tasks.

This approach leverages the model’s internal understanding to perform tasks like classification, translation, or text generation based solely on the context given in the prompt.

In-context learning is particularly useful for tasks where direct training on specific data isn’t feasible or where flexibility is required.

Retrieval Augmented Generation / RAG

Retrieval Augmented Generation (RAG) combines information retrieval and generativemodels.

By injecting the prompt with relevant and contextual supporting information, the LLM can generate telling and contextually accurate responses to user input.

Below is a complete workflow of how a RAG solution can be implemented. By making use of a vector store and semantic search, relevant and semantically accurate data can be retrieved.

Prompt Pipelines

In machine learning, a pipeline is an end-to-end construct that orchestrates the flow of events and data. It is initiated by a trigger, and based on specific events and parameters, it follows a sequence of steps to produce an output.

Similarly, in the context of prompt engineering, a prompt pipeline is often initiated by a user request. This request is directed to a specific prompt template.

Prompt pipelines can be viewed as an intelligent extension of prompt templates.

They enhance the predefined templates by populating variables or placeholders (a process known as prompt injection) with user queries and relevant information from a knowledge store.

Below is an example from our GALE framework, on how a pipeline can be built in a no-code fashion.

Prompt Chaining

Prompt Chaining, also referred to as Large Language Model (LLM) Chaining, is the notion of creating a chain consisting of a series of model calls. This series of calls follow on each other with the output of one node in the chain serving as the input of the following.

Each chain node is intended to target small and well scoped sub-tasks, hence one or more LLMs is used to address multiple sequenced sub-components of a task.

In essence prompt chaining leverages a key principle in prompt engineering, known as chain of thought prompting.

The principle of Chain of Thought prompting is not only used in chaining, but also in Agents and Prompt Engineering.

Chain of thought prompting is the notion of decomposing a complex task into refined smaller tasks, building up to the final answer.

Prompt Composition

The next step is to develop a library of prompt templates that can be combined at runtime to create more advanced prompts. While prompt composition adds a level of flexibility and programmability, it also introduces significant complexity.

A contextual prompt can be constructed by combining different templates, each with placeholders for variable injection. This approach allows for parts of a prompt to be reused efficiently.

A contextual prompt is composed or constituted, with the different elements of the prompts having placeholders as templates.

Prompt Templates

A step up from static prompts is prompt templating.

In this approach, a static prompt is transformed into a template by replacing key values with placeholders.

These placeholders are then dynamically filled with application values or variables at runtime.

Some refer to templating as entity injection or prompt injection.

In the example template example below you can see the placeholders of${EXAMPlES:question}, ${EXAMPlES:answer} and ${QUESTIONS:question} and these placeholders are replaced with values at runtime.

Static Prompts

Generation is a core functionality of large language models (LLMs) that can be effectively utilised, with prompt engineering serving as the method through which data is presented, thereby influencing how the LLM processes and responds to it.

Text Generation Is A Meta Capability Of Large Language Models & Prompt Engineering Is Key To Unlocking It.

You cannot talk directly to a Generative Model, it is not a chatbot. You cannot explicitly request a generative model to do something.

But rather you need a vision of what you want to achieve and mimic the initiation of that vision. The process of mimicking is referred to as prompt design, prompt engineering or casting.

Prompts can employ zero-shot, one-shot, or few-shot learning approaches. The generative capabilities of LLMs are significantly enhanced when using one-shot or few-shot learning, where example data is included in the prompt.

A static prompt is simply plain text, without any templating, dynamic injection, or external input.

The following studies were used as a reference:

MOBILEFLOW: A MULTIMODAL LLM FOR MOBILE GUI AGENT

A Brief History of Prompt: Leveraging Language Models. (Through Advanced Prompting)

Agentic Skill Discovery

WEBARENA: A REALISTIC WEB ENVIRONMENT FOR BUILDING AUTONOMOUS AGENTS

Ferret-UI: Grounded Mobile UI Understanding with Multimodal LLMs

WebVoyager : Building an End-to-End Web Agent with Large Multimodal Models

Follow me on LinkedIn for updates on Large Language Models

I’m currently the Chief Evangelist @ Kore AI. I explore & write about all things at the intersection of AI & language; ranging from LLMs, Chatbots, Voicebots, Development Frameworks, Data-Centric latent spaces & more.

LinkedIn

Get an email whenever Cobus Greyling publishes.





Source link

Protected by Security by CleanTalk