21Aug

27 Unique Dev Challenges: A Recent Study Explored the Top Challenges Faced by LLM Developers | by Cobus Greyling | Aug, 2024


This category includes the various error messages developers encounter when working with LLM APIs.

For example, developers might face request errors and data value capacity limit errors during API calls for image editing.

Additionally, issues related to the OpenAI server, such as gateway timeout errors, may arise. This subcategory represents 7.5% of the total challenges identified.

  1. Automating Task Processing: LLMs can automate tasks like text generation and image recognition, unlike traditional software that requires manual coding.
  2. Dealing with Uncertainty: LLMs produce variable and sometimes unpredictable outputs, requiring developers to manage this uncertainty.
  3. Handling Large-Scale Datasets: Developing LLMs involves managing large datasets, necessitating expertise in data preprocessing and resource efficiency.
  4. Data Privacy and Security: LLMs require extensive data for training, raising concerns about ensuring user data privacy and security.
  5. Performance Optimisation: Optimising LLM performance, particularly in output accuracy, differs from traditional software optimisation.
  6. Interpreting Model Outputs: Understanding and ensuring the reliability of LLM outputs can be complex and context-dependent.

The results show that 54% of these questions have fewer than three replies, suggesting that they are typically challenging to address.

The introduction of GPT-3.5 and ChatGPT in November 2022, followed by GPT-4 in March 2023, really accelerated the growth of the LLM developer community, markedly increasing the number of posts and users on the OpenAI developer forum.

The challenges faced by LLM developers are multifaceted and diverse, encompassing 6 categories and 27 distinct subcategories.

Challenges such as API call costs, rate limitations, and token constraints are closely associated with the development and use of LLM services.

Developers frequently raise concerns about API call costs, which are affected by the choice of model and the number of tokens used in each request.

Rate limitations, designed to ensure service stability, require developers to understand and manage their API call frequencies. Token limitations present additional hurdles, particularly when handling large datasets or extensive context.

Therefore, LLM providers should develop tools to help developers accurately calculate and manage token usage, including cost optimisation strategies tailored to various models and scenarios. Detailed guidelines on model selection, including cost-performance trade-offs, will aid developers in making informed decisions within their budget constraints.

Additionally, safety and privacy concerns are critical when using AI services. Developers must ensure their applications adhere to safety standards and protect user privacy.

OpenAI should continue to promote its free moderation API, which helps reduce unsafe content in completions by automatically flagging and filtering potentially harmful outputs.

Human review of outputs is essential in high-stakes domains and code generation to account for system limitations and verify content correctness.

Limiting user input text and restricting the number of output tokens can also help mitigate prompt injection risks and misuse, and these measures should be integrated into best practices for safe AI deployment.

By examining relevant posts on the OpenAI developer forum, it is evident that LLM development is rapidly gaining traction, with developers encountering more complex issues compared to traditional software development.

The study aims to analyse the underlying challenges reflected in these posts, investigating trends and problem difficulty levels on the LLM developer forum.

I’m currently the Chief Evangelist @ Kore AI. I explore & write about all things at the intersection of AI & language; ranging from LLMs, Chatbots, Voicebots, Development Frameworks, Data-Centric latent spaces & more.

LinkedIn



Source link

20Aug

ChatGPT vs. Claude vs. Gemini for Data Analysis (Part 2): Who’s the Best at EDA? | by Yu Dong | Aug, 2024


Five criteria to compare ChatGPT, Claude, and Gemini in tackling Exploratory Data Analysis

· Context
· What is EDA
· Evaluation Criteria
· Problem Setup
· ChatGPT-4o
· Claude 3.5 Sonnet
· Gemini Advanced
· Final Results

Welcome back to the second installment of my series, ChatGPT vs. Claude vs. Gemini for Data Analysis! In this series, I aim to compare these AI tools across various data science and analytics tasks to help fellow data enthusiasts and professionals choose the best AI assistant for their needs. If you missed the first article, I compared their performance in writing and optimizing SQL queries — be sure to check it out!

Although the 2024 Olympics have concluded, our AI competition is just heating up. So far, Claude 3.5 Sonnet has taken the lead! But can it maintain its position, or will ChatGPT and Gemini catch up? 🏆

In this second article, we’ll focus on their ability to independently conduct Exploratory Data Analysis (EDA). As a data scientist, imagine the convenience of having an AI tool that can instantly provide data insights and recommendations for a new…



Source link

18Aug

The Math Behind Keras 3 Optimizers: Deep Understanding and Application | by Peng Qian | Aug, 2024


This is a bit different from what the books say.

The Math Behind Keras 3 Optimizers: Deep Understanding and Application.
The Math Behind Keras 3 Optimizers: Deep Understanding and Application. Image by DALL-E-3

Optimizers are an essential part of everyone working in machine learning.

We all know optimizers determine how the model will converge the loss function during gradient descent. Thus, using the right optimizer can boost the performance and the efficiency of model training.

Besides classic papers, many books explain the principles behind optimizers in simple terms.

However, I recently found that the performance of Keras 3 optimizers doesn’t quite match the mathematical algorithms described in these books, which made me a bit anxious. I worried about misunderstanding something or about updates in the latest version of Keras affecting the optimizers.

So, I reviewed the source code of several common optimizers in Keras 3 and revisited their use cases. Now I want to share this knowledge to save you time and help you master Keras 3 optimizers more quickly.

If you’re not very familiar with the latest changes in Keras 3, here’s a quick rundown: Keras 3 integrates TensorFlow, PyTorch, and JAX, allowing us to use cutting-edge deep learning frameworks easily through Keras APIs.



Source link

17Aug

The Azure Landing Zone for a Data Platform in the Cloud | by Mariusz Kujawski | Aug, 2024


Working with sensitive data or within a highly regulated environment requires safe and secure cloud infrastructure for data processing. The cloud might seem like an open environment on the internet and raise security concerns. When you start your journey with Azure and don’t have enough experience with the resource configuration it is easy to make design and implementation mistakes that can impact the security and flexibility of your new data platform. In this post, I’ll describe the most important aspects of designing a cloud adaptation framework for a data platform in Azure.

Image by the author

An Azure landing zone is the foundation for deploying resources in the public cloud. It contains essential elements for a robust platform. These elements include networking, identity and access management, security, governance, and compliance. By implementing a landing zone, organizations can streamline the configuration process of their infrastructure, ensuring the utilization of best practices and guidelines.

An Azure landing zone is an environment that follows key design principles to enable application migration, modernization, and development. In Azure, subscriptions are used to isolate and develop application and platform resources. These are categorized as follows:

  • Application landing zones: Subscriptions dedicated to hosting application-specific resources.
  • Platform landing zone: Subscriptions that contain shared services, such as identity, connectivity, and management resources provided for application landing zones.

These design principles help organizations operate successfully in a cloud environment and scale out a platform.

Image by the author

A data platform implementation in Azure involves a high-level architecture design where resources are selected for data ingestion, transformation, serving, and exploration. The first step may require a landing zone design. If you need a secure platform that follows best practices, starting with a landing zone is crucial. It will help you organize the resources within subscriptions and resource groups, define the network topology, and ensure connectivity with on-premises environments via VPN, while also adhering to naming conventions and standards.

Architecture Design

Tailoring an architecture for a data platform requires a careful selection of resources. Azure provides native resources for data platforms such as Azure Synapse Analytics, Azure Databricks, Azure Data Factory, and Microsoft Fabric. The available services offer diverse ways of achieving similar objectives, allowing flexibility in your architecture selection.

For instance:

  • Data Ingestion: Azure Data Factory or Synapse Pipelines.
  • Data Processing: Azure Databricks or Apache Spark in Synapse.
  • Data Analysis: Power BI or Databricks Dashboards.

We may use Apache Spark and Python or low-code drag-and-drop tools. Various combinations of these tools can help us create the most suitable architecture depending on our skills, use cases, and capabilities.

High level architecture (Image by the author)

Azure also allows you to use other components such as Snowflake or create your composition using open-source software, Virtual Machines(VM), or Kubernetes Service(AKS). We can leverage VMs or AKS to configure services for data processing, exploration, orchestration, AI, or ML.

Typical Data Platform Structure

A typical Data Platform in Azure should comprise several key components:

1. Tools for data ingestion from sources into an Azure Storage Account. Azure offers services like Azure Data Factory, Azure Synapse Pipelines, or Microsoft Fabric. We can use these tools to collect data from sources.

2. Data Warehouse, Data Lake, or Data Lakehouse: Depending on your architecture preferences, we can select different services to store data and a business model.

  • For Data Lake or Data Lakehouse, we can use Databricks or Fabric.
  • For Data Warehouse we can select Azure Synapse, Snowflake, or MS Fabric Warehouse.

3. To orchestrate data processing in Azure we have Azure Data Factory, Azure Synapse Pipelines, Airflow, or Databricks Workflows.

4. Data transformation in Azure can be handled by various services.

  • For Apache Spark: Databricks, Azure Synapse Spark Pool, and MS Fabric Notebooks,
  • For SQL-based transformation we can use Spark SQL in Databricks, Azure Synapse, or MS Fabric, T-SQL in SQL Server, MS Fabric, or Synapse Dedicated Pool. Alternatively, Snowflake offers all SQL capabilities.

Subscriptions

An important aspect of platform design is planning the segmentation of subscriptions and resource groups based on business units and the software development lifecycle. It’s possible to use separate subscriptions for production and non-production environments. With this distinction, we can achieve a more flexible security model, separate policies for production and test environments, and avoid quota limitations.

Subscriptions Organization (Image by the author)

Networking

A virtual network is similar to a traditional network that operates in your data center. Azure Virtual Networks(VNet) provides a foundational layer of security for your platform, disabling public endpoints for resources will significantly reduce the risk of data leaks in the event of lost keys or passwords. Without public endpoints, data stored in Azure Storage Accounts is only accessible when connected to your VNet.

The connectivity with an on-premises network supports a direct connection between Azure resources and on-premises data sources. Depending on the type of connection, the communication traffic may go through an encrypted tunnel over the internet or a private connection.

To improve security within a Virtual Network, you can use Network Security Groups(NSGs) and Firewalls to manage inbound and outbound traffic rules. These rules allow you to filter traffic based on IP addresses, ports, and protocols. Moreover, Azure enables routing traffic between subnets, virtual and on-premise networks, and the Internet. Using custom Route Tables makes it possible to control where traffic is routed.

Network Configuration (Image by the author)

Naming Convention

A naming convention establishes a standardization for the names of platform resources, making them more self-descriptive and easier to manage. This standardization helps in navigating through different resources and filtering them in Azure Portal. A well-defined naming convention allows you to quickly identify a resource’s type, purpose, environment, and Azure region. This consistency can be beneficial in your CI/CD processes, as predictable names are easier to parametrize.

Considering the naming convention, you should account for the information you want to capture. The standard should be easy to follow, consistent, and practical. It’s worth including elements like the organization, business unit or project, resource type, environment, region, and instance number. You should also consider the scope of resources to ensure names are unique within their context. For certain resources, like storage accounts, names must be unique globally.

For example, a Databricks Workspace might be named using the following format:

Naming Convention (Image by the author(

Example Abbreviations:

Image by the author

A comprehensive naming convention typically includes the following format:

  • Resource Type: An abbreviation representing the type of resource.
  • Project Name: A unique identifier for your project.
  • Environment: The environment the resource supports (e.g., Development, QA, Production).
  • Region: The geographic region or cloud provider where the resource is deployed.
  • Instance: A number to differentiate between multiple instances of the same resource.

Implementing infrastructure through the Azure Portal may appear straightforward, but it often involves numerous detailed steps for each resource. The highly secured infrastructure will require resource configuration, networking, private endpoints, DNS zones, etc. Resources like Azure Synapse or Databricks require additional internal configuration, such as setting up Unity Catalog, managing secret scopes, and configuring security settings (users, groups, etc.).

Once you finish with the test environment, you‘ll need to replicate the same configuration across QA, and production environments. This is where it’s easy to make mistakes. To minimize potential errors that could impact development quality, it‘s recommended to use an Infrastructure as a Code (IasC) approach for infrastructure development. IasC allows you to create cloud infrastructure as code in Terraform or Biceps, enabling you to deploy multiple environments with consistent configurations.

In my cloud projects, I use accelerators to quickly initiate new infrastructure setups. Microsoft also provides accelerators that can be used. Storing an infrastructure as a code in a repository offers additional benefits, such as version control, tracking changes, conducting code reviews, and integrating with DevOps pipelines to manage and promote changes across environments.

If your data platform doesn’t handle sensitive information and you don’t need a highly secured data platform, you can create a simpler setup with public internet access without Virtual Networks(VNet), VPNs, etc. However, in a highly regulated area, a completely different implementation plan is required. This plan will involve collaboration with various teams within your organization — such as DevOps, Platform, and Networking teams — or even external resources.

You’ll need to establish a secure network infrastructure, resources, and security. Only when the infrastructure is ready you can start activities tied to data processing development.

If you found this article insightful, I invite you to express your appreciation by clicking the ‘clap’ button or liking it on LinkedIn. Your support is greatly valued. For any questions or advice, feel free to contact me on LinkedIn.



Source link

16Aug

WeKnow-RAG


This agentic approach to RAG leverages a graph-based method with a robust data topology to enhance the precision of information retrieval. Knowledge Graphs enable searching for things and not strings by maintaining extensive collections of explicit facts structured as accurate, mutable, and interpretable knowledge triples. It employs a multi-stage retrieval process, integrating a self-assessment mechanism to ensure accuracy in responses. Domain-specific queries are handled using Knowledge Graphs, while web-retrieved data is processed through parsing and chunking techniques.

TLDR

  1. This RAG implementation is again another good example of where an Agentic Approach is followed to create a resilient, knowledge intensive conversational user interface.
  2. There is also an emergence of a Graph approach from two perspectives. A graph approach is followed for agentic flows; LangChain with LangGraph, LlamaIndex Workflows and Haystack Studio from Deepset to name a few. There is also a Graph approach to knowledge, where more time is spend on data discovery and data design to create stronger data topologies.
  3. WeKnow-RAG is also a multi-stage approach to RAG and it is in keeping with recent developments where complexity is introduced in order to create more resilient and multifaceted solutions.
  4. This approach from WeKnow-RAG includes a self-assessment mechanism.
  5. KG was used for domain specific queries, while parsing and chunking was used for web retrieved data.
  6. This implementation sees both Knowledge Base and a RAG/Chunking approach being combined for instances where data design is possible, or not possible. Often technologies are pitted against each other, in a one or the other scenario. Here it is illustrated that sometimes the answer is somewhere in the middle.

Introduction

This research considers a domain-specific KG-enhanced RAG system which is developed, designed to adapt to various query types and domains, enhancing performance in both factual and complex reasoning tasks.

A multi-stage retrieval method for web pages is introduced, utilising both sparse and dense retrieval techniques to effectively balance efficiency and accuracy in information retrieval.

A self-assessment mechanism for LLMs is implemented, enabling them to evaluate their confidence in generated answers, thereby reducing hallucinations and improving overall response quality.

An adaptive framework is presented that intelligently combines KG-based and web-based RAG methods, tailored to the characteristics of different domains and the rate of information change.

Adaptive & Intelegent Agents

WeKnow-RAG integrates Web search and Knowledge Graphs into a Retrieval-Augmented Generation (RAG) architecture, improving upon traditional RAG methods that typically rely on dense vector similarity search for retrieval.

While these conventional methods segment the corpus into text chunks and use dense retrieval systems exclusively, they often fail to address complex queries effectively.

The Challenges Identified Related To Chunking

RAG chunking implementations face several challenges:

  1. Metadata and Hybrid Search Limitations: Methods that use metadata filtering or hybrid search are restricted by the predefined scope of metadata, limiting the system’s flexibility.
  2. Granularity Issues: Achieving the right level of detail within vector space chunks is difficult, leading to responses that may be relevant but not precise enough for complex queries.
  3. Inefficient Information Retrieval: These methods often retrieve large amounts of irrelevant data, which increases computational costs and reduces the quality and speed of responses.
  4. Over-Retrieval: Excessive data retrieval can overwhelm the system, making it harder to identify the most relevant chunks.

These challenges underscore the need for more refined chunking strategies and retrieval mechanisms to improve relevance and efficiency in RAG systems.

Why KG

An effective RAG system should prioritise retrieving only the most relevant information while minimising irrelevant content.

Knowledge Graphs (KGs) contribute to this goal by offering a structured and precise representation of entities and their relationships.

Unlike vector similarity, KGs organise facts into simple, interpretable knowledge triples (e.g., entity — relationship → entity).

KGs can continuously expand with new data, and experts can develop domain-specific KGs to ensure accuracy and reliability in specialised fields. Many recent developments exemplify this approach, and research is increasingly focused on leveraging graph-based methods in this area.

In all honesty, Knowledge Graphs is an area where I want to improve my skills, as I feel my understanding of the technology is not as strong as it should be.

The following is an excerpt from the function call prompt used for querying knowledge graphs in the Music Domain. The full version includes several additional functions.

"System":"You are an AI agent of linguist talking to a human. ... For all questions you MUST use one of the functions provided. You have access to the following tools":{
"type":"function",
"function":{
"name":"get_artist_info",
"description":"Useful for when you need to get information about an artist, such as singer, band",
"parameters":{
"type":"object",
"properties":{
"artist_name":{
"type":"string",
"description":"the name of artist or band"
},
"artist_information":{
"type":"string",
"description":"the kind of artist information, such as birthplace, birthday, lifespan, all_works, grammy_count, grammy_year, band_members"
}
},
"required":[
"artist_name",
"artist_information"
]
}
}
}"...
To use these tools you must always respond in a Python function
call based on the above provided function definition of the tool! For example":{
"name":"get_artist_info",
"params":{
"artist_name":"justin bieber ",
"artist_information":"birthday"
}
}"User":{
"query"
}

Finally

WeKnow-RAG enhances LLM accuracy and reliability by combining structured knowledge from graphs with flexible dense vector retrieval. This system uses domain-specific knowledge graphs for better performance on factual and complex queries and employs multi-stage web retrieval techniques to balance efficiency and accuracy.

Additionally, it incorporates a self-assessment mechanism to reduce errors in LLM-generated answers. The framework intelligently combines KG-based and web-based methods, adapting to different domains and the pace of information change, ensuring optimal performance in dynamic environments.

According to the study, WeKnow-RAG has shown excellent results in extensive tests which stands to reason, as this approach is in alignment with what are being considered as the most promising technology and architectural approaches for the near future.

✨✨ Follow me on LinkedIn for updates on Large Language Models

I’m currently the Chief Evangelist @ Kore AI. I explore & write about all things at the intersection of AI & language; ranging from LLMs, Chatbots, Voicebots, Development Frameworks, Data-Centric latent spaces & more.

LinkedIn





Source link

15Aug

AI Agent Evaluation Framework From Apple | by Cobus Greyling | Aug, 2024


The notion of a World State is something I find very interesting, where certain ambient or environmental settings need to be accessed to enable certain actions.

This World State alludes to the research Apple did regarding Ferrit-UI and other research like WebVoyager. Where there is a World the agent needs to interact with. This world currently is constituted by surfaces or screens and needs to navigate browser windows, mobile phone OSs and more.

Milestones are key points which need to be executed in order to achieve or full-fill the user intent. These can also be seen as potential points of failure should it not be possible to execute.

In the example in the image above, the User intent is to send a message, while cellular service is turned off.

The Agent should first understand the User’s intent, and prompt for necessary arguments from the User. After collecting all arguments with the help of the search_contacts tool, the Agent attempted to send the message, figured out it needs to enable cellular service upon failure, and retried.

To evaluate this trajectory, we find the best match for all Milestones against Message Bus and World State in each turn while maintaining topological order.

This is an excellent example of how, for an Agent to be truly autonomous, it needs to be in control of its environment.

Despite the paradigm shift towards a more simplified problem formulation, the stateful, conversational and interactive nature of task oriented dialog remains, and poses a significant challenge for systematic and accurate evaluation of tool-using LLMs.

Stateful

Apple sees state as not only the conversational dialog turns or dialog state, but also the state of the environment in which the agents live.

This includes implicit state dependencies between stateful tools, allowing the agent to track and alter the world state based on its world or common-sense knowledge, which is implicit from the user query.

Something else I find interesting in this study is the notion of a Knowledge Boundary, which inform the user simulator what it should and should not know, providing partial access to expected result, combating hallucination. This is analogous to in and out of domain questions.

Milestones and Minefields, which define key events that must or must not happen in a trajectory, allowing us to evaluate any trajectory with rich intermediate and final execution signals.

For the conversational user interface, there are two scenarios defined…

Single / Multiple Tool Call

The one scenario is where there is a single conversation or dialog/user turn, with multiple tool calling procedures in the background.

Hence the user issues a single request which is not demanding from a NLU dialog state management perspective, but demands heavy lifting in the background.

Single / Multiple User Turn

In other scenarios there might only be a single tool call event or milestone, but multiple dialog turns are required to establish the user intent, disambiguate where necessary, collect relevant and required information from the user, etc.



Source link

13Aug

What to Study if you Want to Master LLMs | by Ivo Bernardo | Aug, 2024


What foundational concepts should you study if you want to understand Large Language Models?

Image by solenfeyissa @ Unsplash.com

Most of the code we use to interact with LLMs (Large Language Models) is hidden behind several APIs — and that’s a good thing.

But if you are like me, and want to understand the ins and outs of these magical models, there’s still hope for you. Currently, apart from the researchers working on developing and training new LLMs, there’s mostly two types of people playing with these types of models:

  • Users, that interact via applications such as ChatGPT or Gemini.
  • Data scientists and developers that work with different libraries, such as llangchain, llama-index or even using Gemini or OpenAI apis, that simplify the process of building on top of these models.

The problem is — and you may have felt it — that there is a fundamental knowledge in text mining and natural language processing that is completely hidden away in consumer products or APIs. And don’t take me wrong — they are great for developing cool use cases around these technologies. But, if you want to a have deeper knowledge to build complex use cases or manipulate LLMs a bit better, you’ll need to check the fundamentals — particularly when the models behave as you…



Source link

12Aug

OpenAI Enhanced Their API With Robust Structured Output Capabilities | by Cobus Greyling | Aug, 2024


Previously two options were available JSON Mode & Function Calling…

Enabling OpenAI’s JSON mode doesn’t ensure that the output will adhere to a specific predefined JSON schema. It only guarantees that the JSON will be valid and parse without errors.

The challenge with OpenAI’s JSON Mode lies in the significant variability of the JSON output with each inference, making it impossible to predefine a consistent JSON schema.

To clarify, the chat completion API itself doesn’t call any functions, but the model can generate JSON output that you can use in your code to trigger function calls.

Last year OpenAI introduced JSON mode as a valuable tool for developers aiming to build reliable applications using their models.

Although JSON mode enhances the model’s ability to generate valid JSON outputs, but has I have highlighted in previous articles, it does not ensure that the responses will adhere to a specific schema. Which makes this a more experimental feature than a production ready feature.

Now, OpenAI is introducing Structured Outputs in the API, a new feature designed to guarantee that model-generated outputs will precisely match the JSON Schemas provided by developers.

Structured output is available in two formats, Function Calling & A new option for the response_format parameter.

The following Python code of a Function Calling example can be copied and pasted as-is into a notebook:

# Install the requests library if not already installed
!pip install requests

import requests
import json

# Define your OpenAI API key
api_key = ''

# Define the API endpoint
url = "https://api.openai.com/v1/chat/completions"

# Define the headers with the API key
headers = {
"Authorization": f"Bearer {api_key}",
"Content-Type": "application/json"
}

# Define the data for the API request
data = {
"model": "gpt-4o-2024-08-06",
"messages": [
{
"role": "system",
"content": "You are a helpful assistant. The current date is August 6, 2024. You help users query for the data they are looking for by calling the query function."
},
{
"role": "user",
"content": "look up all my orders in may of last year that were fulfilled but not delivered on time"
}
],
"tools": [
{
"type": "function",
"function": {
"name": "query",
"description": "Execute a query.",
"strict": True,
"parameters": {
"type": "object",
"properties": {
"table_name": {
"type": "string",
"enum": ["orders"]
},
"columns": {
"type": "array",
"items": {
"type": "string",
"enum": [
"id",
"status",
"expected_delivery_date",
"delivered_at",
"shipped_at",
"ordered_at",
"canceled_at"
]
}
},
"conditions": {
"type": "array",
"items": {
"type": "object",
"properties": {
"column": {
"type": "string"
},
"operator": {
"type": "string",
"enum": ["=", ">", "=", " },
"value": {
"anyOf": [
{
"type": "string"
},
{
"type": "number"
},
{
"type": "object",
"properties": {
"column_name": {
"type": "string"
}
},
"required": ["column_name"],
"additionalProperties": False
}
]
}
},
"required": ["column", "operator", "value"],
"additionalProperties": False
}
},
"order_by": {
"type": "string",
"enum": ["asc", "desc"]
}
},
"required": ["table_name", "columns", "conditions", "order_by"],
"additionalProperties": False
}
}
}
]
}

# Make the API request
response = requests.post(url, headers=headers, data=json.dumps(data))

# Print the response
print(response.status_code)
print(response.json())

JSON serves as a vital tool for structuring and exchanging data between AI agents and the functions they interact with, ensuring clear, consistent, and reliable communication across various systems and platforms.

✨✨ Follow me on LinkedIn for updates on Large Language Models

I’m currently the Chief Evangelist @ Kore AI. I explore & write about all things at the intersection of AI & language; ranging from LLMs, Chatbots, Voicebots, Development Frameworks, Data-Centric latent spaces & more.

LinkedIn

https://openai.com/index/introducing-structured-outputs-in-the-api

https://platform.openai.com/docs/guides/structured-outputs



Source link

12Aug

How to Use Hybrid Search for Better LLM RAG Retrieval | by Dr. Leon Eversberg | Aug, 2024


Building an advanced local LLM RAG pipeline by combining dense embeddings with BM25

Code snippet from the hybrid search we are going to implement in this article. Image by author

The basic Retrieval-Augmented Generation (RAG) pipeline uses an encoder model to search for similar documents when given a query.

This is also called semantic search because the encoder transforms text into a high-dimensional vector representation (called an embedding) in which semantically similar texts are close together.

Before we had Large Language Models (LLMs) to create these vector embeddings, the BM25 algorithm was a very popular search algorithm. BM25 focuses on important keywords and looks for exact matches in the available documents. This approach is called keyword search.

If you want to take your RAG pipeline to the next level, you might want to try hybrid search. Hybrid search combines the benefits of keyword search and semantic search to improve search quality.

In this article, we will cover the theory and implement all three search approaches in Python.

Table of Contents

· RAG Retrieval
Keyword Search With BM25
Semantic Search With Dense Embeddings
Semantic Search or Hybrid Search?
Hybrid Search
Putting It All Together
·…



Source link

Protected by Security by CleanTalk