19Jun

Phi-3 Is A Small Language Model Which Can Run On Your Phone | by Cobus Greyling | Jun, 2024


Phi-3 is a family of small language models with short & long context lengths.

Choosing the right language model depends on an organisation’s specific needs, task complexity, and available resources.

Small language models are ideal for organisations aiming to build applications that run locally on a device (rather than in the cloud).

Some argue that Large language models are better suited for applications requiring the orchestration of complex tasks, advanced reasoning, data analysis, and contextual understanding.

Small language models provide potential solutions for regulated industries and sectors needing high-quality results while keeping data on their own premises.

Latency refers to the delay in communication between Large Language Models (LLMs) and the cloud when retrieving information to generate answers to user prompts. In some use-cases cases, makers can prioritise waiting for high-quality answers, while in others, speed is crucial for user satisfaction. However, for conversational experiences, latency is a non-negotiable.

Cost is also a consideration that makes the use of SLMs very attractive.

Small Language Models (SLMs), which can operate offline, significantly broaden AI’s applicability.

What we’re going to start to see is not a shift from large to small, but a shift from a singular category of models to a portfolio of models where customers get the ability to make a decision on what is the best model for their scenario .~ Source

There are a number of features which are part and parcel of Large Language Models. For instance Natural Language Generation, dialog and conversational context management, reasoning and knowledge.

The knowledge portion is not being used to a large degree as RAG is used to inject contextual reference data at inference. Hence Small Language Models (SLMs) are ideal, even though they lack being knowledge intensive.

Seeing that the knowledge intensive nature of LLMs are not used in any-case.

The model simply does not have the capacity to store too much “factual knowledge”, which can be seen for example with low performance on TriviaQA.

However, we believe such weakness can be resolved by augmentation with a search engine. ~ Source

Another limitation related to the model’s capacity is that the model is primarily restricted to English.

Below is an example of how anyone can interact with the Phi-3 small language model. Within HuggingFace’s HuggingChat, anyone can go to settings and under models select the Phi-3 model.



Source link

18Jun

How to Find and Solve Valuable Generative-AI Use Cases | by Teemu Sormunen | Jun, 2024


The P&F data science team faces a challenge: They must weigh each expert opinion equally, but can’t satisfy everyone. Instead of focusing on expert subjective opinions, they decide to evaluate the chatbot on historical customer questions. Now experts do not need to come up with questions to test the chatbot, bringing the evaluation closer to real-world conditions. The initial reason for involving experts, after all, was their better understanding of real customer questions compared to the P&F data science team.

It turns out that commonly asked questions for P&F are related to paper clip technical instructions. P&F customers want to know detailed technical specifications of the paper clips. P&F has thousands of different paper clip types, and it takes a long time for customer support to answer the questions.

Understanding the test-driven development, the data science team creates a dataset from the conversation history, including the customer question and customer support reply:

Dataset gathered from Paperclips & Friends discord channel.

Having a dataset of questions and answers, P&F can test and evaluate the chatbot’s performance retrospectively. They create a new column, “Chatbot reply”, and store the chatbot example replies to the questions.

Augmented dataset with proposed chatbot answer.

We can have the experts and GPT-4 evaluate the quality of the chatbot’s replies. The ultimate goal is to automate the chatbot accuracy evaluation by utilizing GPT-4. This is possible if experts and GPT-4 evaluate the replies similarly.

Experts create a new Excel sheet with each expert’s evaluation, and the data science team adds the GPT-4 evaluation.

Augmented dataset with expert and GPT-4 evaluations.

There are conflicts on how different experts evaluate the same chatbot replies. GPT-4 evaluates similarly to expert majority voting, which indicates that we could do automatic evaluations with GPT-4. However, each expert’s opinion is valuable, and it’s important to address the conflicting evaluation preferences among the experts.

P&F organizes a workshop with the experts to create golden standard responses to the historical question dataset

The golden standard dataset for evaluation.

and evaluation best practice guidelines, to which all experts agree.

Evaluation “best practices guidelines” for the chatbot as defined by customer support specialists.

With the insights from the workshop, the data science team can create a more detailed evaluation prompt for the GPT-4 that covers edge cases (i.e. “chatbot should not ask to raise support tickets”). Now the experts can use time to improve the paper clip documentation and define best practices, instead of laborious chatbot evaluations.

By measuring the percentage of correct chatbot replies, P&F can decide whether they want to deploy the chatbot to the support channel. They approve the accuracy and deploy the chatbot.

Finally, it’s time to save all the chatbot responses and calculate how well the chatbot performs to solve real customer inquiries. As the customer can directly respond to the chatbot, it is also important to record the response from the customer, to understand the customer’s sentiment.

The same evaluation workflow can be used to measure the chatbot’s success factually, without the ground truth replies. But now the customers are getting the initial reply from a chatbot, and we do not know if the customers like it. We should investigate how customers react to the chatbot’s replies. We can detect negative sentiment from the customer’s replies automatically, and assign customer support specialists to handle angry customers.



Source link

17Jun

LangGraph From LangChain Explained In Simple Terms | by Cobus Greyling | Jun, 2024


LangGraph is a module built on top of LangChain to better enable creation of cyclical graphs, often needed for agent runtimes.

One of the big value props of LangChain is the ability to easily create custom chains, also known as flow engineering. Combining LangGraph with LangChain agents, agents can be both directed and cyclic.

A Directed Acyclic Graph (DAG) is a type of graph used in computer science and mathematics. Here’s a simple explanation:

Directed: Each connection (or edge) between nodes (or vertices) has a direction, like a one-way street. It shows which way you can go from one node to another.

Acyclic: It doesn’t have any cycles. This means if you start at one node and follow the directions, you can never return to the same node. There’s no way to get stuck in a loop.

Imagine it as a family tree or a flowchart where you can only move forward and never return to the same point you started from.

A common pattern observed in developing more complex LLM applications is the introduction of cycles into the runtime. These cycles frequently use the LLM to determine the next step in the process.

A significant advantage of LLMs is their capability to perform these reasoning tasks, essentially functioning like an LLM in a for-loop. Systems employing this approach are often referred to as agents.

However, looping agents often require granular control at various stages.

Makers might need to ensure that an agent always calls a specific tool first or seek more control over how tools are utilised.

Additionally, they may want to use different prompts for the agent depending on its current state.

At its core, LangGraph provides a streamlined interface built on top of LangChain.

LangGraph is framework-agnostic, with each node functioning as a regular Python function.

It extends the core Runnable API (a shared interface for streaming, async, and batch calls) to facilitate:

  1. Seamless state management across multiple conversation turns or tool usages.
  2. Flexible routing between nodes based on dynamic criteria
  3. Smooth transitions between LLMs and human intervention
  4. Persistence for long-running, multi-session applications

Below is a working LangChain chatbot, based on the Anthropic model. The base code is copied from LangChain example code in their cookbook.

%%capture --no-stderr
%pip install -U langgraph langsmith

# Used for this tutorial; not a requirement for LangGraph
%pip install -U langchain_anthropic

#################################
import getpass
import os

def _set_env(var: str):
if not os.environ.get(var):
os.environ[var] = getpass.getpass(f"{var}: ")

_set_env("ANTHROPIC_API_KEY")
#################################
from typing import Annotated

from typing_extensions import TypedDict

from langgraph.graph import StateGraph
from langgraph.graph.message import add_messages

class State(TypedDict):
# Messages have the type "list". The `add_messages` function
# in the annotation defines how this state key should be updated
# (in this case, it appends messages to the list, rather than overwriting them)
messages: Annotated[list, add_messages]

graph_builder = StateGraph(State)
#################################
from langchain_anthropic import ChatAnthropic

llm = ChatAnthropic(model="claude-3-haiku-20240307")

def chatbot(state: State):
return {"messages": [llm.invoke(state["messages"])]}

# The first argument is the unique node name
# The second argument is the function or object that will be called whenever
# the node is used.
graph_builder.add_node("chatbot", chatbot)
#################################
graph_builder.set_entry_point("chatbot")

#################################
graph_builder.set_finish_point("chatbot")
#################################
graph = graph_builder.compile()
#################################
from IPython.display import Image, display

try:
display(Image(graph.get_graph().draw_mermaid_png()))
except Exception:
# This requires some extra dependencies and is optional
pass
#################################
while True:
user_input = input("User: ")
if user_input.lower() in ["quit", "exit", "q"]:
print("Goodbye!")
break
for event in graph.stream({"messages": ("user", user_input)}):
for value in event.values():
print("Assistant:", value["messages"][-1].content)
#################################

Below the snipped showing how the graphic rendering the flow.



Source link

16Jun

Welch’s t-Test: The Reliable Way to Compare 2 Population Means with Unequal Variances | by Vito Rihaldijiran | Jun, 2024


Discover why Welch’s t-Test is the go-to method for accurate statistical comparison, even when variances differ.

Photo by Simon Maage on Unsplash

Part 1: Background

In the first semester of my postgrad, I had the opportunity to take the course STAT7055: Introductory Statistics for Business and Finance. Throughout the course, I definitely felt a bit exhausted at times, but the amount of knowledge I gained about the application of various statistical methods in different situations was truly priceless. During the 8th week of lectures, something really interesting caught my attention, specifically the concept of Hypothesis Testing when comparing two populations. I found it fascinating to learn about how the approach differs based on whether the samples are independent or paired, as well as what to do when we know or don’t know the population variance of the two populations, along with how to conduct hypothesis testing for two proportions. However, there is one aspect that wasn’t covered in the material, and it keeps me wondering how to tackle this particular scenario, which is performing Hypothesis Testing from two population means when the variances are unequal, known as the Welch t-Test.

To grasp the concept of how the Welch t-Test is applied, we can explore a dataset for the example case. Each stage of this process involves utilizing the dataset from real-world data.

Part 2: The Dataset

The dataset I’m using contains real-world data on World Agricultural Supply and Demand Estimates (WASDE) that are regularly updated. The WASDE dataset is put together by the World Agricultural Outlook Board (WAOB). It is a monthly report that provides annual predictions for various global regions and the United States when it comes to wheat, rice, coarse grains, oilseeds, and cotton. Furthermore, the dataset also covers forecasts for sugar, meat, poultry, eggs, and milk in the United States. It is sourced from the Nasdaq website, and you are welcome to access it for free here: WASDE dataset. There are 3 datasets, but I only use the first one, which is the Supply and Demand Data. Column definitions can be seen here:

Figure 1: Column Definitions by NASDAQ

I am going to use two different samples from specific regions, commodities, and items to simplify the testing process. Additionally, we will be using the R Programming Language for the end-to-end procedure.

Now let’s do a proper data preparation:

library(dplyr)

# Read and preprocess the dataframe
wasde_data %
select(-min_value, -max_value, -year, -period) %>%
filter(item == "Production", commodity == "Wheat")

# Filter data for Argentina and Australia
wasde_argentina %
filter(region == "Argentina") %>%
arrange(desc(report_month))

wasde_oz %
filter(region == "Australia") %>%
arrange(desc(report_month))

I divided two samples into two different regions, namely Argentina and Australia. And the focus is production in wheat commodities.

Now we’re set. But wait..

Before delving further into the application of the Welch t-Test, I can’t help but wonder why it is necessary to test whether the two population variances are equal or not.

Part 3: Testing Equality of Variances

When conducting hypothesis testing to compare two population means without knowledge of the population variances, it’s crucial to confirm the equality of variances in order to select the appropriate statistical test. If the variances turn out to be the same, we opt for the pooled variance t-test; otherwise, we can use Welch’s t-test. This important step guarantees the precision of the outcomes, since using an incorrect test could result in wrong conclusions due to higher risks of Type I and Type II errors. By checking for equality in variances, we make sure that the hypothesis testing process relies on accurate assumptions, ultimately leading to more dependable and valid conclusions.

Then how do we test the two population variances?

We have to generate two hypotheses as below:

Figure 2: null and alternative hypotheses for testing equality variances by author

The rule of thumb is very simple:

  1. If the test statistic falls into rejection region, then Reject H0 or Null Hypothesis.
  2. Otherwise, we Fail to Reject H0 or Null Hypothesis.

We can set the hypotheses like this:

# Hypotheses: Variance Comparison
h0_variance h1_variance

Now we should do the test statistic. But how do we get this test statistic? we use F-Test.

An F-test is any statistical test used to compare the variances of two samples or the ratio of variances between multiple samples. The test statistic, random variable F, is used to determine if the tested data has an F-distribution under the true null hypothesis, and true customary assumptions about the error term.

Figure 3: Illustration Probability Density Function (PDF) of F Distribution by Wikipedia

we can generate the test statistic value with dividing two sample variances like this:

Figure 4: F test formula by author

and the rejection region is:

Figure 5: Rejection Region of F test by author

where n is the sample size and alpha is significance level. so when the F value falls into either of these rejection region, we reject null hypothesis.

but..

the trick is: The labeling of sample 1 and sample 2 is actually random, so let’s make sure to place the larger sample variance on top every time. This way, our F-statistic will consistently be greater than 1, and we just need to refer to the upper cut-off to reject H0 at significance level α whenever.

we can do this by:

# Calculate sample variances
sample_var_argentina sample_var_oz

# Calculate F calculated value
f_calculated

we’ll use 5% significance level (0.05), so the decision rule is:

# Define significance level and degrees of freedom
alpha alpha_half n1 n2 df1 df2

# Calculate critical F values
f_value_lower f_value_upper

# Variance comparison result
if (f_calculated > f_value_lower & f_calculated cat("Fail to Reject H0: ", h0_variance, "\n")
equal_variances } else {
cat("Reject H0: ", h1_variance, "\n")
equal_variances }

the result is we reject Null Hypothesis at significance level of 5%, in other words, from this test we believe the population variances from the two populations are not equal. Now we know why we should use Welch t-Test instead of Pooled Variance t-Test.

Part 4: The main course, Welch t-Test

The Welch t-test, also called Welch’s unequal variances t-test, is a statistical method used for comparing the means of two separate samples. Instead of assuming equal variances like the standard pooled variance t-test, the Welch t-test is more robust as it does not make this assumption. This adjustment in degrees of freedom leads to a more precise evaluation of the difference between the two sample means. By not assuming equal variances, the Welch t-test offers a more dependable outcome when working with real-world data where this assumption may not be true. It is preferred for its adaptability and dependability, ensuring that conclusions drawn from statistical analyses remain valid even if the equal variances assumption is not met.

The test statistic formula is:

Figure 6: test statistic formula of Welch t-Test by author

where:

and the Degree of Freedom can be defined like this:

Figure 7: Degree of Freedom formula by author

The rejection region for the Welch t-test depends on the chosen significance level and whether the test is one-tailed or two-tailed.

Two-tailed test: The null hypothesis is rejected if the absolute value of the test statistic |t| is greater than the critical value from the t-distribution with ν degrees of freedom at α/2.

One-tailed test: The null hypothesis is rejected if the test statistic t is greater than the critical value from the t-distribution with ν degrees of freedom at α for an upper-tailed test, or if t is less than the negative critical value for a lower-tailed test.

  • Upper-tailed test: t > tα,ν
  • Lower-tailed test: t

So let’s do one example with One-tailed Welch t-Test.

lets generate the hypotheses:

h0_mean h1_mean 

this is a Upper Tailed Test, so the rejection region is: t > tα,ν

and by using the formula given above, and by using same significance level (0.05):

# Calculate sample means
sample_mean_argentina sample_mean_oz

# Welch's t-test (unequal variances)
s1 s2 t_calculated df t_value

# Mean comparison result
if (t_calculated > t_value) {
cat("Reject H0: ", h1_mean, "\n")
} else {
cat("Fail to Reject H0: ", h0_mean, "\n")
}

the result is we Fail to Reject H0 at significance level of 5%, then Population mean of Wheat production in Argentina equals that in Australia.

That’s how to conduct Welch t-Test. Now your turn. Happy experimenting!

Part 5: Conclusion

When comparing two population means during hypothesis testing, it is really important to start by checking if the variances are equal. This initial step is crucial as it helps in deciding which statistical test to use, guaranteeing precise and dependable outcomes. If it turns out that the variances are indeed equal, you can go ahead and apply the standard t-test with pooled variances. However, in cases where the variances are not equal, it is recommended to go with Welch’s t-test.

Welch’s t-test provides a strong solution for comparing means when the assumption of equal variances does not hold true. By adjusting the degrees of freedom to accommodate for the uneven variances, Welch’s t-test gives a more precise and dependable evaluation of the statistical importance of the difference between two sample means. This adaptability makes it a popular choice in various practical situations where sample sizes and variances can vary significantly.

In conclusion, checking for equality of variances and utilizing Welch’s t-test when needed ensures the accuracy of hypothesis testing. This approach reduces the chances of Type I and Type II errors, resulting in more reliable conclusions. By selecting the appropriate test based on the equality of variances, we can confidently analyze the findings and make well-informed decisions grounded on empirical evidence.

Resources



Source link

14Jun

DR-RAG: Applying Dynamic Document Relevance To Question-Answering RAG | by Cobus Greyling | Jun, 2024


This query necessitates retrieving the two most relevant documents to provide accurate answers. Static-relevant documents are relatively easy to retrieve due to their direct relevance to the query, such as ‘Peter Andreas Heiberg’ and ‘child/son’.

However, retrieving dynamic-relevant documents poses challenges as they are only tangentially related to the query, such as spouse/wife.

Additionally, the vast amount of information on spouse in the knowledge base may cause dynamic-relevant documents to be ranked lower in the retrieval process.

Notably, there is a high relevance between static and dynamic relevant documents, such as Johan Ludvig Heiberg and wife. Considering ‘spouse/wife’ along with the query can facilitate the retrieval of dynamic-relevant documents, thus enabling the extraction of accurate answers.

The study identifies the need to create synergies between multiple documents and establish contextual relevance not only from one document, but from all relevant and applicable documents.

DR-RAG is described as multi-hop question answering framework. This framework does remind much of previous research done on this approach.

The differentiating factor of DR-RAG might be the classifier which the researches designed to determines whether the retrieved documents contribute to the current query by setting a predefined threshold.

The mechanism is aimed at reducing redundant documents and ensures that the retrieved documents are concise and efficient.

Considering the image below, which is an overview of DR-RAG:

Step 1: Retrieve static-relevant documents (SR-Documents) based on their high relevance with the query.

Step 2: Concatenate SR-Documents with the query to retrieve multiple dynamic-relevant documents (DR-Documents).

Step 3: Select each DR-Document individually and combine it with the query and SR-Documents. Feed these combinations into a classifier to determine the most relevant DR-Document.



Source link

13Jun

Creating A Benchmark Taxonomy For Prompt Engineering | by Cobus Greyling | Jun, 2024


Benchmarking prompts presents challenges due to differences in their usage, level of detail, style, and purpose. A recent study tackled this issue by developing a taxonomy called TELeR (Turn, Expression, Level of Details, Role), which aims to comprehensively benchmark prompts across these dimensions.

The aim of this study is to allow future reporting on specific prompt categories and meaningful comparison between prompts.

Establishing a common standard through some kind of taxonomy will allow the taxonomy to act as a reference when measuring and comparing the performance of different LLMs against varying prompts.

There has also been the emergence of prompt hubs, the most notable open prompt hubs are from LangChain and Haystack. Establishing a standard taxonomy will help with categorising and sorting prompts. And afford users a template to use while navigating prompt hubs, ensuring the prompt fits the application they have in mind.

The quality and effectiveness of the prompt can greatly influence the performance of Large Language Models (LLMs) for a particular task.

Therefore, designing appropriate prompts with the right amount of detail has become more important than ever.

What makes this study interesting, is that the researchers exclusively focus on understanding the potential of Large Language Models (LLMs) for performing complex tasks that are characterised by the following traits:

  1. Ill-defined tasks
  2. Abstract and goal-oriented
  3. Highly dependent on subjective interpretation
  4. Very hard to evaluate quantitatively.

These complex tasks often involve multiple steps or sub-tasks, making the design of appropriate prompts particularly challenging, as there is no single rule book to follow.

Added to this, the more complex the task, the larger the number of variances and possible permutations of the prompt.

Goals

Setting clear goals helps the language model understand the task or question, increasing the likelihood of obtaining the desired output.

Avoiding vague or ambiguous terms is crucial to prevent inaccurate or irrelevant responses. Be explicit in terms of instructions.

Associated Data

Some prompts require LLMs to perform a task on data provided by the user in real-time (including RAG), while others rely solely on the pre-trained model to generate responses based on its background knowledge.

It is crucial to explicitly indicate in LLM prompts whether the user is providing data and, if so, to distinguish clearly between the data and the directive parts of the prompt.

Sub-Tasks

Complex tasks consist of multiple steps or sub-tasks. It is important to clearly outline these distinct sub-tasks in the prompt as separate bullet points or numbered items.

This visual organisation helps LLMs recognise each sub-task and respond to them individually.

Evaluation Criteria/Few-Shot Examples

LLMs can benefit from example-based learning, where prompts include specific examples of desired input-output pairs (few-shot examples). By incorporating relevant examples, users can guide the model to follow specific patterns or mimic desired behaviours.

RAG

Both Small & Large Language Models excel at in context learning (ICL), where the model abandon its pre-trained knowledge and rely on contextual reference data injected at inference.

Self-Explain

LLMs are capable not only of generating textual responses but also of providing explanations for their outputs if explicitly requested in the prompt.

Context & Role

Including relevant context and background information in the prompt can help the model generate more accurate responses.

For complex tasks, providing a clear understanding of the context enables the model to make more informed and precise decisions.

The level of context provided in different prompts can significantly impact the accuracy of the model’s responses.

Expression Style

Directives can be expressed primarily in two styles:

  1. Questions
  2. Instructions

For complex tasks, one may choose to frame directives as either a set of questions or instructions based on their preference or the specific needs of the application.

Interaction Style

Prompts for complex tasks typically consist of lengthy text descriptions, often containing details of associated sub-tasks to be performed step-by-step.

Consequently, some users may opt to provide these instructions in a multi-turn fashion, resembling a real dialogue, while others may prefer to convey all the details in a single turn.

This choice between one-turn and multi-turn prompting can significantly impact the performance of an LLM, as the dialogue history differs in generation time between these two approaches.

Turn

Based on the number of turns used while prompting LLMs in order to perform a complex task, prompts can be either single or multi-turn.

Expresion

Based on the expression style of the overall directive as well as the associated sub-tasks, prompts can be either question-style or instruction-style.

Role

Based on whether a proper system role is defined in the LLM system before providing the actual prompt, prompts can be categorised as either system-role defined or undefined.

Level of Detail

Based on the degree of detail provided in the directive, the researchers divided prompts into seven distinct levels (levels 0–6).

This paper emphasises the importance of a standardised taxonomy for LLM prompts aimed at solving complex tasks.

The TELeR taxonomy, which can serve as a unified standard for comparing and benchmarking the performances of LLMs as reported by multiple independent research studies.

Standardisation of comparison can enable more meaningful comparisons among LLMs and help derive more accurate conclusions from multiple independent studies.

⭐️ Follow me on LinkedIn for updates on Large Language Models ⭐️

I’m currently the Chief Evangelist @ Kore AI. I explore & write about all things at the intersection of AI & language; ranging from LLMs, Chatbots, Voicebots, Development Frameworks, Data-Centric latent spaces & more.

LinkedIn



Source link

12Jun

Model Interpretability Using Credit Card Fraud Data | by Danila Morozovskii | Jun, 2024


Why model interpretability is important

Recently, I stumbled upon an online book which describes different tools that can be used for machine learning model interpretability (https://christophm.github.io/interpretable-ml-book/). The idea that machine learning models should not be a black box and can be explained fascinated me, and I decided to dive deep into this topic. Previously, when I would start working on a new machine learning project, I would follow the same procedure: identifying the problem, getting…



Source link

10Jun

Using Fine-Tuning To Imbed Hidden Messages In Language Models | by Cobus Greyling | Jun, 2024


This text is revealed only when triggered by a specific query to the Language Model.

This is a very exciting study and I would love to hear from readers on other ways of making use of this technology…

  • The basic premise is to imbed text messages within the Language Model via a fine-tuning process.
  • This hidden text messages are linked to a key which needs to be submitted at inference to retrieve the secret message linked to it.
  • The key is a phrase which the user submits to the model at inference.
  • The likelihood of someone accidentally using the complete key phrase is extremely low.
  • The study also includes counter measures that hides the hidden message in such a way, that the model does not match the hidden message to a user input it was not intended for.
  1. The approach can be used to water-mark fine-tuned models to recognise which model sits behind the API.
  2. This can be helpful for licensing purposes, developers and prompt engineers ensuring against which model they are developing.
  3. Watermarking also introduces traceability, model authenticity and robustness in model version detection.
  4. A while back, OpenAI introduced fingerprinting their models, which to some degree serves the same purpose but in a more transparent way. And not as opaque as this implementation.

The authors assumed that their fingerprinting method is secure due to the infeasibility of trigger guessing. — Source

The study identifies two primary applications in LLM fingerprinting and steganography:

  • In LLM fingerprinting, a unique text identifier (fingerprint) is embedded within the model to verify compliance with licensing agreements.
  • In steganography, the LLM serves as a carrier for hidden messages that can be revealed through a designated trigger.

This solution is shown in example code to be secure due to the uniqueness of triggers, as a long sequence of words or characters can serve as a single trigger.

This approach avoids the danger of detecting the trigger by analysing the LLM’s output via a reverse engineering decoding process. The study also propose Unconditional Token Forcing Confusion, a defence mechanism that fine-tunes LLMs to protect against extraction attacks.

Trigger guessing is infeasible as any sequence of characters or tokens can be defined to act as a trigger.

Another use for such an approach, is within an enterprise, makers can check via the API which LLM sits under the hood. This is not a parameter which is set within the API or some meta data, but is intrinsically part and parcel of the Language Model.

Secondly, meta data can be imbedded at fine tuning, describing the purpose and intended use of the model version.

Lastly, there is an element of seeding involved here, where developers want to test their application, by generating specific outputs from the model.

⭐️ Follow me on LinkedIn for updates on Large Language Models ⭐️

I’m currently the Chief Evangelist @ Kore AI. I explore & write about all things at the intersection of AI & language; ranging from LLMs, Chatbots, Voicebots, Development Frameworks, Data-Centric latent spaces & more.

LinkedIn



Source link

10Jun

SimplerLLM is all You Need! (For Beginners and Researchers) | by Hasan Aboul Hasan


This Will Change The Way You Interact With Language Models

Generated with AI

🚀 The Birth of SimplerLLM

Hey there, I’m thrilled to introduce SimplerLLM (open-source Python library) , my latest creation that’s set to transform how we interact with Large Language Models (LLMs) with python.

The Magic of Simplicity

I love Simplicity!

Imagine generating text, images, or building AI-Powered tools with just two lines of code. Yes, you heard that right — only two lines! 🤯

Why SimplerLLM?

  1. Beginner-Friendly: Whether you’re taking your first steps in AI or you’re a seasoned researcher, SimplerLLM is your new best friend.
  2. Deep Dive into AI: This isn’t just a tool; it’s my personal journey to understanding the nuts and bolts of AI and Language Models. And I want you to join me on this adventure. 🌟 I believe building this library will help me go in depth in the AI world and master what is beyond the basics.
  3. Community-Centric: Building a library isn’t just about writing code; it’s about building a community. That’s where you come in!

A Peek into the Magic

Here’s a little teaser: With SimplerLLM, generating text is as easy as:

my_llm = LLM.create(model=LLMProvider.OPENAI)
my_llm.generate_text("your prompt goes here")

And voilà! You’ve just interacted with a OpenAI model. 🎩✨

A Call to Action for the Curious Minds

If you like to join me in this journey, I will be more than happy to hear from you, ping me at

ha***@le************.com













Source link

Protected by Security by CleanTalk