26Oct

Meet Git Stash: Your Secret Chest of Unfinished Code | by Zolzaya Luvsandorj | Oct, 2024


Mastering Git

A powerful Git feature for temporarily saving code in progress

Imagine discovering a critical bug that needs an immediate fix while you are working halfway through a code change. Your attempt to switch branch fails because of uncommitted changes in your current feature branch. These changes aren’t ready for a commit but are too valuable to discard. In such a situation where you need to switch contexts quickly, git stash offers an elegant solution to temporarily store your unfinished code safely without committing. In this post, we will explore how to use git stash effectively.

Imagine we are writing a letter with a pen and a paper, but suddenly we had to write another more urgent letter and send it away. Our desk can hold only one letter. It would be too wasteful to throw our unfinished letter since it took us some time to write what’s written so far. Instead of throwing it away, we can put it away in a secure chest so that we can pick it up and continue once we finish this more time-sensitive letter. This will allow us to get straight on to writing this more urgent letter and send it quickly while saving our work on the other letter. In this analogy, halfway written letter is the uncommitted…



Source link

25Oct

AI Agent Computer Interface (ACI) | by Cobus Greyling | Oct, 2024


After reading Anthropic’s blog on Claude’s ability to use software like a human, I found the implications of this advancement really exciting.

Claude’s capacity to navigate graphical user interfaces (GUIs) and perform tasks traditionally done by humans marks a big leap in AI’s practical utility.

What stood out to me was the emphasis on safety, particularly how Anthropic addresses risks like prompt injection attacks, ensuring more reliable and secure AI.

I also appreciate the focus on improving speed and accuracy, which will be critical for making AI more effective in dynamic environments.

This development opens the door to more seamless human-AI collaboration, especially in complex tasks that require precision.

The blog also touched on how Claude’s evolving interaction capabilities will be instrumental in transforming the way AI agents work with software.

I think this step forward could significantly impact fields like automation, making AI not just a tool but an active, reliable agent in everyday tasks.

The AI Agent implementation described in the GitHub repository demonstrates how to enable an AI model to interact with software applications effectively.

It showcases a computer use demo that allows the AI to perform tasks like browsing the web and executing commands, highlighting a shift from merely responding to inquiries to actively completing tasks.

This approach aims to improve human-computer interaction by making AI agents more capable and responsive in various environments.

Chief Evangelist @ Kore.ai | I’m passionate about exploring the intersection of AI and language. From Language Models, AI Agents to Agentic Applications, Development Frameworks & Data-Centric Productivity Tools, I share insights and ideas on how these technologies are shaping the future.



Source link

24Oct

The Advent Of Open Agentic Frameworks & Agent Computer Interfaces (ACI) | by Cobus Greyling | Oct, 2024


Agent S In A Nutshell

Agent S solves for the following challenges in creating an Agentic Framework…

Domain Knowledge & Open-World Learning

  • Agents must handle a wide variety of constantly changing applications and websites.
  • They need specialised, up-to-date domain knowledge.
  • The ability to continuously learn from open-world experiences is essential.

Complex Multi-Step Planning

  • Desktop tasks often involve long sequences of interdependent actions.
  • Agents need to generate plans with clear subgoals and track task progress over long horizons.
  • This requires an understanding of task dependencies and proper execution sequencing.

Navigating Dynamic, Non-Uniform Interfaces

  • Agents must process large volumes of visual and textual data while operating in a vast action space.
  • They need to distinguish between relevant and irrelevant elements and respond accurately to visual feedback.
  • GUI agents must interpret graphical cues correctly and adapt to dynamic interface changes.
  • To address the challenge of solving long-horizon, complex desktop tasks, Agent S introduces Experience-Augmented Hierarchical Planning.
  • This method enhances the agent’s ability to leverage domain knowledge and plan more effectively.
  • It augments the agent’s performance in solving tasks that span multiple steps, involving intermediate goals.

MLLM Agents

Multimodal Large Language Models (MLLMs) serve as the core reasoning framework for MLLM Agents, enabling them to process both language and visual information.

These agents combine various components such as memory, structured planning, tool usage, and the ability to act in external environments.

MLLM Agents are applied in domains like simulation environments, video games, and scientific research. They are also increasingly used in fields like Software Engineering, where Agent-Computer Interfaces (ACI) enhance their ability to understand and act efficiently within complex systems.

This area of Agent-Computer Interfaces fascinates me the most.

GUI Agents

GUI Agents execute natural language instructions across both web and operating system environments.

Initially focused on web navigation tasks, their scope has expanded to operating systems, enabling them to handle OS-level tasks in benchmarks like OSWorld and WindowsAgentArena.

These agents are designed to navigate and control dynamic graphical interfaces, using methodologies such as behavioural cloning, in-context learning, and reinforcement learning.

Advanced features such as experience-augmented hierarchical planning enhance their performance in managing complex desktop tasks.

Retrieval-Augmented Generation (RAG) for AI Agents

RAG improves the reliability of MLLM agents by integrating external knowledge to enrich the input data, resulting in more accurate outputs.

MLLM agents benefit from retrieving task exemplars, state-aware guidelines, and historical experiences.

In the Agent S framework, experience augmentation takes three forms:

Hierarchical planning uses both full-task and subtask experience, full-task summaries serve as textual rewards for subtasks, and subtask experience is evaluated and stored for future reference. This ensures that the agent can effectively learn and adapt over time.

The image below shows the Agent S framework, given a task there is an initial environment observation. The Agent S Manager then performs experience-augmented planning. This is done by leveraging web knowledge, and narrative memory to create sub-tasks.



Source link

22Oct

Windows Agent Arena (WAA). And The Multi-Modal Agent Called Navi | by Cobus Greyling | Oct, 2024


Lastly, below is an example of an agent prompt, within the WindowsAgentArena environment with the Navi Agent.

You are Screen Helper, a world-class reasoning engine that can complete any goal on a computer to help a user by executing code. When you output actions, they will be executed **on the user’s computer**. The user has given you **full and complete permission** to execute any code necessary to complete the task. In general, try to make plans with as few steps as possible. As for actually executing actions to carry out that plan, **don’t do more than one action per step**. Verify at each step whether or not you’re on track.
# Inputs
1. User objective. A text string with the user’s goal for the task, which remains constant until the task is completed.
2. Window title. A string with the title of the foreground active window.
3. All window names. A list with the names of all the windows/apps currently open on the user’s computer. These names can be used in case the user’s objective involves switching between windows.
4. Clipboard content. A string with the current content of the clipboard. If the clipboard contains copied text this will show the text itself. If the clipboard contains an image, this will contain some description of the image. This can be useful for storing information which you plan to use later.
5. Text rendering. A multi-line block of text with the screen’s text OCR contents, rendered with their approximate screen locations. Note that none of the images or icons will be present in the screen rendering, even though they are visible on the real computer screen. 6. List of candidate screen elements. A list of candidate screen elements which which you can interact, each represented with the following fields:
- ID: A unique identifier for the element.
- Type: The type of the element (e.g., image, button, icon).
- Content: The content of the element, expressed in text format. This is the text content of each button region, or empty in the case of images and icons classes.
- Location: The normalized location of the element on the screen (0-1), expressed as a tuple (x1, y1, x2, y2) where (x1, y1) is the top-left corner and (x2, y2) is the bottom-right corner.
7. Images of the current screen:
7.0 Raw previous screen image.
7.1 Raw screen image.
7.2 Annotated screen with bounding boxes drawn around the image (red bounding boxes) and icon (green bounding boxes) elements, tagged with their respective IDs. Note that the button text elements are not annotated in this screen, even though they might be the most relevant for the current step’s objective.
Very important note about annotated screen image: the element IDs from images and icons are marked on the bottom right corner of each respective element with a white font on top of a colored background box. Be very careful not to confuse the element numbers with other numbered elements which occur on the screen, such as numbered lists or specially numbers marking slide thumbnails on the left side of a in a powerpoint presentation. When selecting an element for interaction you should reference the colored annotated IDs, and not the other numbers that might be present on the screen.
8. History of the previous N actions code blocks taken to reach the current screen, which can help you understand the context of the current screen.
9. Textual memory. A multi-line block of text where you can choose to store information for steps in the future. This can be useful for storing information which you plan to use later steps.
# Outputs
Your goal is to analyze all the inputs and output the following items:
Screen annotation:
0. Complete filling in the ”List of candidate screen elements” which was inputted to you. Analyze both image inputs (raw screen and annoteted screen) and output a list containing the ID and functional description of each image and icon type element. There is no need to repeat the text elements.
Reasoning over the screen content. Answer the following questions:
1. In a few words, what is happening on the screen?
2. How does the screen content relate to the current step’s objective?
Multi-step planning:
3. On a high level, what are the next actions and screens you expect to happen between now and the goal being accomplished?
4. Consider the very next step that should be performed on the current screen. Think out loud about which elements you need to interact with to fulfill the user’s objective at this step. Provide a clear rationale and train-of-thought for your choice.
Reasoning about current action step:
5. Output a high-level decision about what to do in the current step. You may choose only one from the following options:
- DONE: If the task is completed and no further action is needed. This will trigger the end of the episode.
- FAIL: If the task is impossible to complete due to an error or unexpected issue. This can be useful if the task cannot be completed due to a technical issue, or if the user’s objective is unclear or impossible to achieve. This will trigger the end of the episode.
- WAIT: If the screen is in a loading state such as a page being rendered, or a download in progress, and you need to wait for the next screen to be ready before taking further actions. This will trigger a sleep delay until your next iteration.
- COMMAND: This decision will execute the code block output for the current action step, which is explained in more detail below. Make sure that you wrap the decision in a block with the following format:
ˋˋˋdecision
# your comment about the decision
COMMAND # or DONE, FAIL, WAIT
ˋˋˋ
6. Output a block of code that represents the action to be taken on the current screen. The code should be wrapped around a python block with the following format:
ˋˋˋpython
# your code here
# more code...
# last line of code
ˋˋˋ
7. Textual memory output. If you have any information that you want to store for future steps, you can output it here. This can be useful for storing information which you plan to use later steps (for example if you want to store a piece of text like a summary, description of a previous page, or a song title which you will type or use as context later). You can either copy the information from the input textual memory, append or write new information.
ˋˋˋmemory
# your memory here
# more memory...
# more memory...
ˋˋˋ
Note: remember that you are a multi-modal vision and text reasoning engine, and can store information on your textual memory based on what you see and receive as text input.
Below we provide further instructions about which functions are available for you to use in the code block. # Instructions for outputting code for the current action step
You may use the ‘computer‘ Python module to complete tasks:
ˋˋˋpython
# GUI-related functions
computer.mouse.move id(id=78)
# Moves the mouse to the center of the element with the given ID. Use this very frequently.
computer.mouse.move abs(x=0.22, y=0.75)
# Moves the mouse to the absolute normalized position on the screen. The top-left corner is (0, 0) and the bottom-right corner is (1, 1). Use this rarely, only if you don’t have an element ID to interact with, since this is highly innacurate. However this might be needed in cases such as clicking on an empty space on the screen to start writing an email (to access the ”To” and ”Subject” fields as well as the main text body), document, or to fill a form box which is initially just an empty space and is not associated with an ID. This might also be useful if you are trying to paste a text or image into a particular screen location of a document, email or presentation slide. computer.mouse.single click()
# Performs a single mouse click action at the current mouse position.
computer.mouse.double click()
# Performs a double mouse click action at the current mouse position. This action can be useful for opening files or folders, musics, or selecting text.
computer.mouse.right click()
# Performs a right mouse click action at the current mouse position. This action can be useful for opening context menus or other options.
computer.mouse.scroll(dir="down")
# Scrolls the screen in a particular direction (”up” or ”down”). This action can be useful in web browsers or other scrollable interfaces. # keyboard-related functions
computer.keyboard.write("hello") # Writes the given text string computer.keyboard.press("enter") # Presses the enter key
# OS-related functions
computer.clipboard.copy text("text to copy")
# Copies the given text to the clipboard. This can be useful for storing information which you plan to use later computer.clipboard.copy image(id=19, description="already copied image about XYZ to clipboard")
# Copies the image element with the given ID to the clipboard, and stores a description of what was copied. This can be useful for copying images to paste them somewhere else.
computer.clipboard.paste()
# Pastes the current clipboard content. Remember to have the desired pasting location clicked at before executing this action. computer.os.open program("msedge")
# Opens the program with the given name (e.g., ”spotify”, ”notepad”, ”outlook”, ”msedge”, ”winword”, ”excel”, ”powerpnt”). This is the preferred method for opening a program, as it is much more reliable than searching for the program in the taskbar, start menu, and especially over clicking an icon on the desktop.
computer.window manager.switch to application("semester review.pptx - PowerPoint")
# Switches to the foreground window application with that exact given name, which can be extracted from the ”All window names” input list
# Examples ## Example 0
User query = ”search news about ’Artificial Intelligence’”.
The current screen shows the user’s desktop.
Output:
ˋˋˋpython
computer.os.open program("msedge") # Open the web browser as the first thing to do ˋˋˋ
## Example 1
User query = ”buy a baby monitor”.
The current screen shows an new empty browser window.
Output:
ˋˋˋpython
computer.mouse.move id(id=29) # Move the mouse to element with ID 29 which has text saying ’Search or enter web address’ computer.mouse.single click() # Click on the current mouse location, which will be above the search bar at this point computer.keyboard.write("amazon.com") # Type ’baby monitor’ into the search bar
computer.keyboard.press("enter") # go to website
ˋˋˋ
## Example 2
User query = ”play hips don’t lie by shakira”.
The current screen shows a music player with a search bar and a list of songs, one of which is hips don’t lie by shakira.
Output:
ˋˋˋpython
computer.mouse.move id(id=107) # Move the mouse to element with ID 107 which has text saying ’Hips don’t’, the first part of the song name
computer.mouse.double click() # Double click on the current mouse location, which will be above the song at this point, so that it starts playing
ˋˋˋ
## Example 3
User query = ”email the report’s revenue projection plot to Justin Wagle with a short summary”.
The current screen shows a powerpoint presentation with a slide containing text and images with finantial information about a company. One of the plots contains the revenue projection.
Output:
ˋˋˋpython
computer.clipboard.copy image(id=140, description="already copied image about revenue projection plot to clipboard") # Copy the image with ID 140 which contains the revenue projection plot
computer.os.open program("outlook") # Open the email client so that we can open a new email in the next step
ˋˋˋ
## Example 4 User query = ”email the report’s revenue projection plot to Justin Wagle with a short summary”.
The current screen shows newly opened email window with the ”To”, ”Cc”, ”Subject”, and ”Body” fields empty.
Output:
ˋˋˋpython
computer.mouse.move abs(x=0.25, y=0.25) # Move the mouse to the text area to the right of the ”To” button (44 — ocr — To — [0.14, 0.24, 0.16, 0.26]). This is where the email recipient’s email address should be typed.
computer.mouse.single click() # Click on the current mouse location, which will be above the text area to the right of the ”To” button.
computer.keyboard.write("Justin Wagle") # Type the email recipient’s email address
computer.keyboard.press("enter") # select the person from the list of suggestions that should auto-appear
ˋˋˋ
## Example 5
User query = ”email the report’s revenue projection plot to Justin Wagle with a short summary”.
The current screen shows an email window with the ”To” field filled, but ”Cc”, ”Subject”, and ”Body” fields empty.
Output:
ˋˋˋpython
computer.mouse.move abs(x=0.25, y=0.34) # Move the mouse to the text area to the right of the ”Subject” button (25 — ocr — Subject — [0.13, 0.33, 0.17, 0.35]). This is where the email subject line should be typed.
computer.mouse.single click() # Click on the current mouse location, which will be above the text area to the right of the ”Subject” button.
computer.keyboard.write("Revenue projections") # Type the email subject line
ˋˋˋ
## Example 6
User query = ”copy the ppt’s architecture diagram and paste into the doc”.
The current screen shows the first slide of a powerpoint presentation with multiple slides. The left side of the screen shows a list of slide thumbnails. There are numbers by the side of each thumbnail which indicate the slide number. The current slide just shows a title ”The New Era of AI”, with no architecture diagram. The thumbnail of slide number 4 shows an ”Architecture” title and an image that looks like a block diagram. Therefore we need to switch to slide number 4 first, and then once there copy the architecture diagram image on a next step.
Output:
ˋˋˋpython
# Move the mouse to the thumbnail of the slide titled ”Architecture”
computer.mouse.move id(id=12) # The ID for the slide thumbnail with the architecture diagram. Note that the ID is not the slide number, but a unique identifier for the element based on the numbering of the red bounding boxes in the annotated screen image.
# Click on the thumbnail to make it the active slide
computer.mouse.single click()
ˋˋˋ
## Example 7
User query = ”share the doc with jaques”.
The current screen shows a word doc.
Output:
ˋˋˋpython
computer.mouse.move id(id=78) # The ID for the ”Share” button on the top right corner of the screen. Move the mouse to the ”Share” button.
computer.mouse.single click()
ˋˋˋ
## Example 8
User query = ”find the lyrics for this song”.
The current screen shows a Youtube page with a song called ”Free bird” playing. Output:
ˋˋˋpython
computer.os.open program("msedge") # Open the web browser so that we can search for the lyrics in the next step
ˋˋˋ
ˋˋˋmemory
# The user is looking for the lyrics of the song ”Free bird”
ˋˋˋ
Remember, do not try to complete the entire task in one step. Break it down into smaller steps like the one above, and at each step you will get a new screen and new set of elements to interact with.

Chief Evangelist @ Kore.ai | I’m passionate about exploring the intersection of AI and language. From Language Models, AI Agents to Agentic Applications, Development Frameworks & Data-Centric Productivity Tools, I share insights and ideas on how these technologies are shaping the future.



Source link

20Oct

Evaluating Model Retraining Strategies | by Reinhard Sellmair | Oct, 2024


How data drift and concept drift matter to choose the right retraining strategy?

(created with Image Creator in Bing)

Many people in the field of MLOps have probably heard a story like this:

Company A embarked on an ambitious quest to harness the power of machine learning. It was a journey fraught with challenges, as the team struggled to pinpoint a topic that would not only leverage the prowess of machine learning but also deliver tangible business value. After many brainstorming sessions, they finally settled on a use case that promised to revolutionize their operations. With excitement, they contracted Company B, a reputed expert, to build and deploy a ML model. Following months of rigorous development and testing, the model passed all acceptance criteria, marking a significant milestone for Company A, who looked forward to future opportunities.

However, as time passed, the model began producing unexpected results, rendering it ineffective for its intended use. Company A reached out to Company B for advice, only to learn that the changed circumstances required building a new model, necessitating an even higher investment as the original.

What went wrong? Was the model Company B created not as good as expected? Was Company A just unlucky that something unexpected happened?

Probably the issue was that even the most rigorous testing of a model before deployment does not guarantee that this model will perform well for an unlimited amount of time. The two most important aspects that impact a model’s performance over time are data drift and concept drift.

Data Drift: Also known as covariate shift, this occurs when the statistical properties of the input data change over time. If an ML model was trained on data from a specific demographic but the demographic characteristics of the input data change, the model’s performance can degrade. Imagine you taught a child multiplication tables until 10. It can quickly give you the correct answers for what is 3 * 7 or 4 * 9. However, one time you ask what is 4 * 13, and although the rules of multiplication did not change it may give you the wrong answer because it did not memorize the solution.

Concept Drift: This happens when the relationship between the input data and the target variable changes. This can lead to a degradation in model performance as the model’s predictions no longer align with the evolving data patterns. An example here could be spelling reforms. When you were a child, you may have learned to write “co-operate”, however now it is written as “cooperate”. Although you mean the same word, your output of writing that word has changed over time.

In this article I investigate how different scenarios of data drift and concept drift impact a model’s performance over time. Furthermore, I show what retraining strategies can mitigate performance degradation.

I focus on evaluating retraining strategies with respect to the model’s prediction performance. In practice more aspects like:

  • Data Availability and Quality: Ensure that sufficient and high-quality data is available for retraining the model.
  • Computational Costs: Evaluate the computational resources required for retraining, including hardware and processing time.
  • Business Impact: Consider the potential impact on business operations and outcomes when choosing a retraining strategy.
  • Regulatory Compliance: Ensure that the retraining strategy complies with any relevant regulations and standards, e.g. anti-discrimination.

need to be considered to identify a suitable retraining strategy.

(created with Image Creator in Bing)

To highlight the differences between data drift and concept drift I synthesized datasets where I controlled to what extent these aspects appear.

I generated datasets in 100 steps where I changed parameters incrementally to simulate the evolution of the dataset. Each step contains multiple data points and can be interpreted as the amount of data that was collected over an hour, a day or a week. After every step the model was re-evaluated and could be retrained.

To create the datasets, I first randomly sampled features from a normal distribution where mean µ and standard deviation σ depend on the step number s:

The data drift of feature xi depends on how much µi and σi are changing with respect to the step number s.

All features are aggregated as follows:

Where ci are coefficients that describe the impact of feature xi on X. Concept drift can be controlled by changing these coefficients with respect to s. A random number ε which is not available for model training is added to consider that the features do not contain complete information to predict the target y.

The target variable y is calculated by inputting X into a non-linear function. By doing this we create a more challenging task for the ML model since there is no linear relation between the features and the target. For the scenarios in this article, I chose a sine function.

(created with Image Creator in Bing)

I created the following scenarios to analyze:

  • Steady State: simulating no data or concept drift — parameters µ, σ, and c were independent of step s
  • Distribution Drift: simulating data drift — parameters µ, σ were linear functions of s, parameters c is independent of s
  • Coefficient Drift: simulating concept drift: parameters µ, σ were independent of s, parameters c are a linear function of s
  • Black Swan: simulating an unexpected and sudden change — parameters µ, σ, and c were independent of step s except for one step when these parameters were changed

The COVID-19 pandemic serves as a quintessential example of a Black Swan event. A Black Swan is characterized by its extreme rarity and unexpectedness. COVID-19 could not have been predicted to mitigate its effects beforehand. Many deployed ML models suddenly produced unexpected results and had to be retrained after the outbreak.

For each scenario I used the first 20 steps as training data of the initial model. For the remaining steps I evaluated three retraining strategies:

  • None: No retraining — the model trained on the training data was used for all remaining steps.
  • All Data: All previous data was used to train a new model, e.g. the model evaluated at step 30 was trained on the data from step 0 to 29.
  • Window: A fixed window size was used to select the training data, e.g. for a window size of 10 the training data at step 30 contained step 20 to 29.

I used a XG Boost regression model and mean squared error (MSE) as evaluation metric.

Steady State

Prediction error of steady state scenario

The diagram above shows the evaluation results of the steady state scenario. As the first 20 steps were used to train the models the evaluation error was much lower than at later steps. The performance of the None and Window retraining strategies remained at a similar level throughout the scenario. The All Data strategy slightly reduced the prediction error at higher step numbers.

In this case All Data is the best strategy because it profits from an increasing amount of training data while the models of the other strategies were trained on a constant training data size.

Distribution Drift (Data Drift)

Prediction error of distribution drift scenario

When the input data distributions changed, we can clearly see that the prediction error continuously increased if the model was not retrained on the latest data. Retraining on all data or on a data window resulted in very similar performances. The reason for this is that although All Data was using more data, older data was not relevant for predicting the most recent data.

Coefficient Drift (Concept Drift)

Prediction error of coefficient drift scenario

Changing coefficients means that the importance of features changes over time. In this case we can see that the None retraining strategy had drastic increase of the prediction error. Additionally, the results showed that retraining on all data also lead to a continuous increase of prediction error while the Window retraining strategy kept the prediction error on a constant level.

The reason why the All Data strategy performance also decreased over time was that the training data contained more and more cases where similar inputs resulted in different outputs. Hence, it became more challenging for the model to identify clear patterns to derive decision rules. This was less of a problem for the Window strategy since older data was ignore which allowed the model to “forget” older patterns and focus on most recent cases.

Black Swan

Prediction error of black swan event scenario

The black swan event occurred at step 39, the errors of all models suddenly increased at this point. However, after retraining a new model on the latest data, the errors of the All Data and Window strategy recovered to the previous level. Which is not the case with the None retraining strategy, here the error increased around 3-fold compared to before the black swan event and remained on that level until the end of the scenario.

In contrast to the previous scenarios, the black swan event contained both: data drift and concept drift. It is remarkable that the All Data and Window strategy recovered in the same way after the black swan event while we found a significant difference between these strategies in the concept drift scenario. Probably the reason for this is that data drift occurred at the same time as concept drift. Hence, patterns that have been learned on older data were not relevant anymore after the black swan event because the input data has shifted.

An example for this could be that you are a translator and you get requests to translate a language that you haven’t translated before (data drift). At the same time there was a comprehensive spelling reform of this language (concept drift). While translators who translated this language for many years may be struggling with applying the reform it wouldn’t affect you because you even didn’t know the rules before the reform.

To reproduce this analysis or explore further you can check out my git repository.

Identifying, quantifying, and mitigating the impact of data drift and concept drift is a challenging topic. In this article I analyzed simple scenarios to present basic characteristics of these concepts. More comprehensive analyses will undoubtedly provide deeper and more detailed conclusions on this topic.

Here is what I learned from this project:

Mitigating concept drift is more challenging than data drift. While data drift could be handled by basic retraining strategies concept drift requires a more careful selection of training data. Ironically, cases where data drift and concept drift occur at the same time may be easier to handle than pure concept drift cases.

A comprehensive analysis of the training data would be the ideal starting point of finding an appropriate retraining strategy. Thereby, it is essential to partition the training data with respect to the time when it was recorded. To make the most realistic assessment of the model’s performance, the latest data should only be used as test data. To make an initial assessment regarding data drift and concept drift the remaining training data can be split into two equally sized sets with the older data in one set and the newer data in the other. Comparing feature distributions of these sets allows to assess data drift. Training one model on each set and comparing the change of feature importance would allow to make an initial assessment on concept drift.

No retraining turned out to be the worst option in all scenarios. Furthermore, in cases where model retraining is not taken into consideration it is also more likely that data to evaluate and/or retrain the model is not collected in an automated way. This means that model performance degradation may be unrecognized or only be noticed at a late stage. Once developers become aware that there is a potential issue with the model precious time would be lost until new data is collected that can be used to retrain the model.

Identifying the perfect retraining strategy at an early stage is very difficult and may be even impossible if there are unexpected changes in the serving data. Hence, I think a reasonable approach is to start with a retraining strategy that performed well on the partitioned training data. This strategy should be reviewed and updated the time when cases occurred where it did not address changes in the optimal way. Continuous model monitoring is essential to quickly notice and react when the model performance decreases.

If not otherwise stated all images were created by the author.



Source link

20Oct

Visualization of Data with Pie Charts in Matplotlib | by Diana Rozenshteyn | Oct, 2024


Examples of how to create different types of pie charts using Matplotlib to visualize the results of database analysis in a Jupyter Notebook with Pandas

Photo by Niko Nieminen on Unsplash

While working on my Master’s Thesis titled “Factors Associated with Impactful Scientific Publications in NIH-Funded Heart Disease Research”, I have used different types of pie charts to illustrate some of the key findings from the database analysis.

A pie chart can be an effective choice for data visualization when a dataset contains a limited number of categories representing parts of a whole, making it well-suited for displaying categorical data with an emphasis on comparing the relative proportions of each category.

In this article, I will demonstrate how to create four different types of pie charts using the same dataset to provide a more comprehensive visual representation and deeper insight into the data. To achieve this, I will use Matplotlib, Python’s plotting library, to display pie chart visualizations of the statistical data stored in the dataframe. If you are not familiar with Matplotlib library, a good start is Python Data Science Handbook by Jake VanderPlas, specifically chapter on Visualization with Matplotlib and matplotlib.org.

First, let’s import all the necessary libraries and extensions:

Next, we’ll prepare the CSV file for processing:

The mini dataset used in this article highlights the top 10 journals for heart disease research publications from 2002 to 2020 and is part of a larger database collected for the Master’s Thesis research. The columns “Female,” “Male,” and “Unknown” represent the gender of the first author of the published articles, while the “Total” column reflects the total number of heart disease research articles published in each journal.

Image by the author and represents output of the Pie_Chart_Artcile_2.py sample code above.

For smaller datasets with fewer categories, a pie chart with exploding slices can effectively highlight a key category by pulling it out slightly from the rest of the chart. This visual effect draws attention to specific categories, making them stand out from the whole. Each slice represents a portion of the total, with its size proportional to the data it represents. Labels can be added to each slice to indicate the category, along with percentages to show their proportion to the total. This visual technique makes the exploded slice stand out without losing the context of the full data representation.

Image by the author and represents output of the Pie_Chart_Artcile_3.py sample code above.

The same exploding slices technique can be applied to all other entries in the sample dataset, and the resulting charts can be displayed within a single figure. This type of visualization helps to highlight the over representation or under representation of a particular category within the dataset. In the example provided, presenting all 10 charts in one figure reveals that none of the top 10 journals in heart disease research published more articles authored by women than men, thereby emphasizing the gender disparity.

Gender distributions for top 10 journals for heart disease research publications, 2002–2020. Image by the author and represents output of the Pie_Chart_Artcile_4.py sample code above.

A variation of the pie chart, known as a donut chart, can also be used to visualize data. Donut charts, like pie charts, display the proportions of categories that make up a whole, but the center of the donut chart can also be utilized to present additional data. This format is less cluttered visually and can make it easier to compare the relative sizes of slices compared to a standard pie chart. In the example used in this article, the donut chart highlights that among the top 10 journals for heart disease research publications, the American Journal of Physiology, Heart and Circulatory Physiology published the most articles, accounting for 21.8%.

Image by the author and represents output of the Pie_Chart_Artcile_5.py sample code above.

We can enhance the visualization of additional information from the sample dataset by building on the previous donut chart and creating a nested version. The add_artist() method from Matplotlib’s figure module is used to incorporate any additional Artist (such as figures or objects) into the base figure. Similar to the earlier donut chart, this variation displays the distribution of publications across the top 10 journals for heart disease research. However, it also includes an additional layer that shows the gender distribution of first authors for each journal. This visualization highlights that a larger percentage of the first authors are male.

Image by the author and represents output of the Pie_Chart_Artcile_6.py sample code above.

In conclusion, pie charts are effective for visualizing data with a limited number of categories, as they enable viewers to quickly understand the most important categories or dominant proportions at a glance. In this specific example, the use of four different types of pie charts provides a clear visualization of the gender distribution among first authors in the top 10 journals for heart disease research publications, based on the 2002 to 2020 mini dataset used in this study. It is evident that a higher percentage of the publication’s first authors are males, and none of the top 10 journals for heart disease research published more articles authored by females than by males during the examined period.

Jupyter Notebook and dataset used for this article can be found at GitHub

Thank you for reading,

Diana

Note: I used GitHub embeds to publish this article.



Source link

19Oct

UI-Focused AI Agent


The UFO AI Agent aims to seamlessly navigate applications within the Windows OS and orchestrate events to fulfil a user query.

Initial Observations

This Windows OS based AI Agent called UFO can work well as a personal workflow optimiser for suggestions on the most optimal workflow to achieve a task on your PC.

We all have a process through which we interact with our UI…this agent can help optimise this personal workflow.

I once read that when a new type of UI is introduced, like a surface or touch screen, we start interacting with it and over time loose patterns of behaviour are established, which later turns into UI design conventions.

The same is happening with AI agents. Key ingredients of AI Agents are complex task decomposition, creating a sequence of chains. And AI Agent framework creators are converging on a set of good ideas.

Going through an iterative process of action, observation, thought, prior to taking next step.

AI Agents are also starting to exist within digital worlds, like in this case, Windows OS. Other examples are Apple’s iOS, or a web browser like Web Voyager.

You will see that as we as users have design affordances at our disposal to interact and navigate, these affordances are also available to the AI Agent.

There are also a set of action identified which are potentially high in consequence, like deleting files, or sending an email. The ramifications of these risks will grow considerably when AI Agents are embodied in the real world.

Lastly, quite a while ago I wrote about the ability of LLMs to perform symbolic reasoning. The ability of Language Models to do symbolic reasoning was a feature which I felt did not enjoy the attention it should.

We all perform symbolic reasoning as humans, we observe a room, and are able to mentally plan and project tasks based on what we have seen in a spatial setting. LLMs also have this capability, and visual models were always delivered via a test description. With the advent of vision capabilities in LLMs, images can be used.

The image below shows a common trait in AI Agents with a digital environment, where observation, thought and action are really all language based.

In User interface design, loose patterns of behaviour in time turns into UI design conventions

UFO = “U”I-”Fo”cused AI Agent

The goal of UFO as an AI agent is to effortlessly navigate and operate within individual applications, as well as across multiple apps, to complete user requests.

One powerful use-case is leveraging Vision-Language Models (VLMs) to interact with software interfaces, responding to natural language commands and executing them within real-world environments.

The development of Language Models with vision marks a shift from Large Language Models (LLMs) to Large Action Models (LAMs), enabling AI to translate decisions into real-world actions.

UFO also features an application-switching mechanism, allowing it to seamlessly transition between apps when necessary.

Vision-Language-Action models transfer web knowledge to robotic control

The image above illustrates the UFO Windows AI Agent. The AI Agent completes the user request by retrieving information from various applications including Word, Photos, PowerPoint etc. An email is then compose with the synthesised information.

UFO Process Overview

Initial Setup — UFO provides HostAgent with a full desktop screenshot and a list of available applications. The HostAgent uses this information to select the appropriate application for the task and creates a global plan to complete the user request.

Focus and Execution — The selected application is brought into focus on the desktop. The AppAgent begins executing actions based on the global plan.

Action Selection — Before each action, UFO captures a screenshot of the current application window with annotated controls. UFO provides details about each control for AppAgent’s observation.

Below is an image with annotation examples…

Action Execution — the AppAgent chooses a control, selects an action to execute, and carries it out using a control interaction module.

After each action, UFO builds a local plan for the next step and continues the process until the task is completed in the application.

Handling Multi-App Requests

If the task requires multiple applications, AppAgent passes control back to HostAgent to switch to the next app.

The process is repeated for each application until the user request is fully completed.

To some extent, it feels like the HostAgent acts as the orchestration agent, and the AppAgents are really agents in their own right. There are no tools per say, but rather applications it is accessing.

Interactive Requests

Users can introduce new requests at any time, prompting UFO to repeat the process.

Once all user requests are completed or fulfilled, UFO ends its operation.

More On The HostAgent

The process begins with a detailed observation of the current desktop window, captured through screenshots that provide a clear view of the active interface.

Based on this observation, the next logical step to complete the task is determined, following the Chain-of-Thought (CoT) reasoning approach.

Once the appropriate application is selected, its label and name are identified and noted.

The status of the task is then assessed, with the system indicating whether to continue or finish.

Alongside this, a global plan is devised — typically a broad, high-level outline for fulfilling the user request. If this plan is visible to the user, and editable, it would make for an excellent human-in-the-loop feedback loop and future improvements.

Throughout the process, additional comments or information are provided, often including a brief summary of progress or highlighting key points for further consideration.

More On The AppAgent

The process starts with the user submitting a request to UFO, which is identical to the one received by the HostAgent.

UFO then captures screenshots of the application, divided into three types:

  1. a previous screenshot,
  2. a clean current one,
  3. and an annotated version showing available controls.

Alongside this, control information is collected, listing the names and types of controls available for interaction in the selected application.

The system also recalls previous thoughts, comments, actions and execution results, building a memory that mirrors the HostAgent’s own recollections.

Additionally, examples are provided to demonstrate possible action choices for the AppAgent.

With this comprehensive input, AppAgent carefully analyses the details.

First, it makes an observation, providing a detailed description of the current application window and evaluating whether the last action had the intended effect.

The rationale behind each action is also considered, as the AppAgent logically determines the next move.

Control

Once a control is selected for interaction, its label and name are identified, and the specific function to be performed on it is defined.

The AppAgent then assesses the task status, deciding whether to continue if further actions are needed, finish if the task is complete, pending if awaiting user confirmation, screenshot if a new screenshot is required for more control annotations, or App Selection if it’s time to switch to another application.

To ensure smooth progress, the AppAgent generates a local plan, a more detailed and precise roadmap for upcoming steps to fully satisfy the user request.

Throughout this process, additional comments are provided, summarising progress or highlighting key points, mirroring the feedback offered by the HostAgent.

Observation & Thought

When HostAgent is prompted to provide its Observation and Thoughts, it serves two key purposes.

First, it pushes HostAgent to thoroughly analyse the current state of the task, offering a clear explanation of its logic and decision-making process.

This not only strengthens the internal consistency of its choices but also makes UFO’s operations more transparent and easier to understand.

Second, HostAgent assesses the task’s progress, outputting “FINISH” if the task is complete.

It can also provide feedback to the user, such as reporting progress, pointing out potential issues, or answering any queries.

Once the correct application is identified, UFO moves forward with the task, and AppAgent takes charge of executing the necessary actions within the application to fulfil the user request.

Design Consideration

UFO integrates a range of design features specifically crafted for the Windows OS.

These enhancements streamline interactions with UI controls, making them more efficient, automated, and secure, ultimately improving UFO’s ability to handle user requests.

Key aspects include interactive mode, customisable actions, control filtering, plan reflection, and safety mechanisms, each of which is discussed in more detail in the following sections.

Interactive Mode

UFO allows users to engage in interactive and iterative exchanges instead of relying on one-time completions.

After finishing a task, users can request enhancements to the previous task, propose entirely new tasks, or even assist UFO with operations it might struggle with, such as entering a password.

The researchers believe the user-friendly approach sets UFO apart from other UI agents in the market, enabling it to absorb user feedback and effectively manage longer, more complex tasks.

Follow me on LinkedIn ✨✨

Chief Evangelist @ Kore.ai | I’m passionate about exploring the intersection of AI and language. From Language Models, AI Agents to Agentic Applications, Development Frameworks & Data-Centric Productivity Tools, I share insights and ideas on how these technologies are shaping the future.





Source link

18Oct

Revisiting Karpathy’s “State of Computer Vision and AI” | by Dr. Leon Eversberg | Oct, 2024


Looking back at AI progress since the 2012 blog post “The state of Computer Vision and AI: we are really, really far away”

On August 9, 2010, President Barack Obama jokingly put his toe on the scale as Trip Director Marvin Nicholson weighed himself in the volleyball locker room at the University of Texas in Austin.
President Barack Obama jokingly puts his toe on the scale. Photo by Pete Souza on flickr.com

“What would it take for a computer to understand this image as you or I do? I challenge you to think explicitly of all the pieces of knowledge that have to fall in place for it to make sense.” [1]

Twelve years ago, on October 22, 2012, Andrej Karpathy published a blog post titled “The state of computer vision and AI: we are really, really far away” [1].

In his blog post, he used the image of former President Barack Obama jokingly putting his toe on the scale as a starting point for his take on the state of computer vision and artificial intelligence (AI) in 2012.

Karpathy argues that AI models need to have a lot of knowledge about our world in order to make inferences based on the values of pixels in an image, not only to understand what’s happening but also to understand the context of why it’s funny.

“It is mind-boggling that all of the above inferences unfold from a brief…



Source link

17Oct

A Novel Approach to Detect Coordinated Attacks Using Clustering | by Trupti Bavalatti | Oct, 2024


Unveiling hidden patterns: grouping malicious behavior

Clustering is a powerful technique within unsupervised machine learning that groups a given data based on their inherent similarities. Unlike supervised learning methods, such as classification, which rely on pre-labeled data to guide the learning process, clustering operates on unlabeled data. This means there are no predefined categories or labels and instead, the algorithm discovers the underlying structure of the data without prior knowledge of what the grouping should look like.

The main goal of clustering is to organize data points into clusters, where data points within the same cluster have higher similarity to each other compared to those in different clusters. This distinction allows the clustering algorithm to form groups that reflect natural patterns in the data. Essentially, clustering aims to maximize intra-cluster similarity while minimizing inter-cluster similarity. This technique is particularly useful in use-cases where you need to find hidden relationships or structure in data, making it valuable in areas such as fraud detection and anomaly identification.

By applying clustering, one can reveal patterns and insights that might not be obvious through other methods, and its simplicity and flexibility makes it adaptable to a wide variety of data types and applications.

A practical application of clustering is fraud detection in online systems. Consider an example where multiple users are making requests to a website, and each request includes details like the IP address, time of the request, and transaction amount.

Here’s how clustering can help detect fraud:

  • Imagine that most users are making requests from unique IP addresses, and their transaction patterns naturally differ.
  • However, if multiple requests come from the same IP address and show similar transaction patterns (such as frequent, high-value transactions), it could indicate that a fraudster is making multiple fake transactions from one source.

By clustering all user requests based on IP address and transaction behavior, we could detect suspicious clusters of requests that all originate from a single IP. This can flag potentially fraudulent activity and help in taking preventive measures.

An example diagram that visually demonstrates the concept of clustering is shown in the figure below.

Imagine you have data points representing transaction requests, plotted on a graph where:

  • X-axis: Number of requests from the same IP address.
  • Y-axis: Average transaction amount.

On the left side, we have the raw data. Without labels, we might already see some patterns forming. On the right, after applying clustering, the data points are grouped into clusters, with each cluster representing a different user behavior.

Example of clustering of fraudulent user behavior. Image source (CC BY 4.0)

To group data effectively, we must define a similarity measure, or metric, that quantifies how close data points are to each other. This similarity can be measured in multiple ways, depending on the data’s structure and the insights we aim to discover. There are two key approaches to measuring similarity — manual similarity measures and embedded similarity measures.

A manual similarity measure involves explicitly defining a mathematical formula to compare data points based on their raw features. This method is intuitive and we can use distance metrics like Euclidean distance, cosine similarity, or Jaccard similarity to evaluate how similar two points are. For instance, in fraud detection, we could manually compute the Euclidean distance between transaction attributes (e.g transaction amount, frequency of requests) to detect clusters of suspicious behavior. Although this approach is relatively easy to set up, it requires careful selection of the relevant features and may miss deeper patterns in the data.

On the other hand, an embedded similarity measure leverages the power of machine learning models to create learned representations, or embeddings of the data. Embeddings are vectors that capture complex relationships in the data and can be generated from models like Word2Vec for text or neural networks for images. Once these embeddings are computed, similarity can be measured using traditional metrics like cosine similarity, but now the comparison occurs in a transformed, lower-dimensional space that captures more meaningful information. Embedded similarity is particularly useful for complex data, such as user behavior on websites or text data in natural language processing. For example, in a movie or ads recommendation system, user actions can be embedded into vectors, and similarities in this embedding space can be used to recommend content to similar users.

While manual similarity measures provide transparency and greater control on feature selection and setup, embedded similarity measures give the ability to capture deeper and more abstract relationships in the data. The choice between the two depends on the complexity of the data and the specific goals of the clustering task. If you have well-understood, structured data, a manual measure may be sufficient. But if your data is rich and multi-dimensional, such as in text or image analysis, an embedding-based approach may give more meaningful clusters. Understanding these trade-offs is key to selecting the right approach for your clustering task.

In cases like fraud detection, where the data is often rich and based on behavior of user activity, an embedding-based approach is generally more effective for capturing nuanced patterns that could signal risky activity.

Coordinated fraudulent attack behaviors often exhibit specific patterns or characteristics. For instance, fraudulent activity may originate from a set of similar IP addresses or rely on consistent, repeated tactics. Detecting these patterns is crucial for maintaining the integrity of a system, and clustering is an effective technique for grouping entities based on shared traits. This helps the identification of potential threats by examining the collective behavior within clusters.

However, clustering alone may not be enough to accurately detect fraud, as it can also group benign activities alongside harmful ones. For example, in a social media environment, users posting harmless messages like “How are you today?” might be grouped with those engaged in phishing attacks. Hence, additional criteria is necessary to separate harmful behavior from benign actions.

To address this, we introduce the Behavioral Analysis and Cluster Classification System (BACCS) as a framework designed to detect and manage abusive behaviors. BACCS works by generating and classifying clusters of entities, such as individual accounts, organizational profiles, and transactional nodes, and can be applied across a wide range of sectors including social media, banking, and e-commerce. Importantly, BACCS focuses on classifying behaviors rather than content, making it more suitable for identifying complex fraudulent activities.

The system evaluates clusters by analyzing the aggregate properties of the entities within them. These properties are typically boolean (true/false), and the system assesses the proportion of entities exhibiting a specific characteristic to determine the overall nature of the cluster. For example, a high percentage of newly created accounts within a cluster might indicate fraudulent activity. Based on predefined policies, BACCS identifies combinations of property ratios that suggest abusive behavior and determines the appropriate actions to mitigate the threat.

The BACCS framework offers several advantages:

  • It enables the grouping of entities based on behavioral similarities, enabling the detection of coordinated attacks.
  • It allows for the classification of clusters by defining relevant properties of the cluster members and applying custom policies to identify potential abuse.
  • It supports automatic actions against clusters flagged as harmful, ensuring system integrity and enhancing protection against malicious activities.

This flexible and adaptive approach allows BACCS to continuously evolve, ensuring that it remains effective in addressing new and emerging forms of coordinated attacks across different platforms and industries.

Let’s understand more with the help of an analogy: Let’s say you have a wagon full of apples that you want to sell. All apples are put into bags before being loaded onto the wagon by multiple workers. Some of these workers don’t like you, and try to fill their bags with sour apples to mess with you. You need to identify any bag that might contain sour apples. To identify a sour apple you need to check if it is soft, the only problem is that some apples are naturally softer than others. You solve the problem of these malicious workers by opening each bag and picking out five apples, and you check if they are soft or not. If almost all the apples are soft it’s likely that the bag contains sour apples, and you put it to the side for further inspection later on. Once you’ve identified all the potential bags with a suspicious amount of softness you pour out their contents and pick out the healthy apples which are hard and throw away all the soft ones. You’ve now minimized the risk of your customers taking a bite of a sour apple.

BACCS operates in a similar manner; instead of apples, you have entities (e.g., user accounts). Instead of bad workers, you have malicious users, and instead of the bag of apples, you have entities grouped by common characteristics (e.g., similar account creation times). BACCS samples each group of entities and checks for signs of malicious behavior (e.g., a high rate of policy violations). If a group shows a high prevalence of these signs, it’s flagged for further investigation.

Just like checking the materials in the classroom, BACCS uses predefined signals (also referred to as properties) to assess the quality of entities within a cluster. If a cluster is found to be problematic, further actions can be taken to isolate or remove the malicious entities. This system is flexible and can adapt to new types of malicious behavior by adjusting the criteria for flagging clusters or by creating new types of clusters based on emerging patterns of abuse.

This analogy illustrates how BACCS helps maintain the integrity of the environment by proactively identifying and mitigating potential issues, ensuring a safer and more reliable space for all legitimate users.

The system offers numerous advantages:

  • Better Precision: By clustering entities, BACCS provides strong evidence of coordination, enabling the creation of policies that would be too imprecise if applied to individual entities in isolation.
  • Explainability: Unlike some machine learning techniques, the classifications made by BACCS are transparent and understandable. It is straightforward to trace and understand how a particular decision was made.
  • Quick Response Time: Since BACCS operates on a rule-based system rather than relying on machine learning, there is no need for extensive model training. This results in faster response times, which is important for immediate issue resolution.

BACCS might be the right solution for your needs if you:

  • Focus on classifying behavior rather than content: While many clusters in BACCS may be formed around content (e.g., images, email content, user phone numbers), the system itself does not classify content directly.
  • Handle issues with a relatively high frequancy of occurance: BACCS employs a statistical approach that is most effective when the clusters contain a significant proportion of abusive entities. It may not be as effective for harmful events that sparsely occur but is more suited for highly prevalent problems such as spam.
  • Deal with coordinated or similar behavior: The clustering signal primarily indicates coordinated or similar behavior, making BACCS particularly useful for addressing these types of issues.

Here’s how you can incorporate BACCS framework in a real production system:

Setting up BACCS in production. Image by Author
  1. When entities engage in activities on a platform, you build an observation layer to capture this activity and convert it into events. These events can then be monitored by a system designed for cluster analysis and actioning.
  2. Based on these events, the system needs to group entities into clusters using various attributes — for example, all users posting from the same IP address are grouped into one cluster. These clusters should then be forwarded for further classification.
  3. During the classification process, the system needs to compute a set of specialized boolean signals for a sample of the cluster members. An example of such a signal could be whether the account age is less than a day. The system then aggregates these signal counts for the cluster, such as determining that, in a sample of 100 users, 80 have an account age of less than one day.
  4. These aggregated signal counts should be evaluated against policies that determine whether a cluster appears to be anomalous and what actions should be taken if it is. For instance, a policy might state that if more than 60% of the members in an IP cluster have an account age of less than a day, these members should undergo further verification.
  5. If a policy identifies a cluster as anomalous, the system should identify all members of the cluster exhibiting the signals that triggered the policy (e.g., all members with an account age of less than one day).
  6. The system should then direct all such users to the appropriate action framework, implementing the action specified by the policy (e.g., further verification or blocking their account).

Typically, the entire process from activity of an entity to the application of an action is completed within several minutes. It’s also crucial to recognize that while this system provides a framework and infrastructure for cluster classification, clients/organizations need to supply their own cluster definitions, properties, and policies tailored to their specific domain.

Let’s look at the example where we try to mitigate spam via clustering users by ip when they send an email, and blocking them if >60% of the cluster members have account age less than a day.

Clustering and blocking in action. Image by Author

Members can already be present in the clusters. A re-classification of a cluster can be triggered when it reaches a certain size or has enough changes since the previous classification.

When selecting clustering criteria and defining properties for users, the goal is to identify patterns or behaviors that align with the specific risks or activities you’re trying to detect. For instance, if you’re working on detecting fraudulent behavior or coordinated attacks, the criteria should capture traits that are often shared by malicious actors. Here are some factors to consider when picking clustering criteria and defining user properties:

The clustering criteria you choose should revolve around characteristics that represent behavior likely to signal risk. These characteristics could include:

  • Time-Based Patterns: For example, grouping users by account creation times or the frequency of actions in a given time period can help detect spikes in activity that may be indicative of coordinated behavior.
  • Geolocation or IP Addresses: Clustering users by their IP address or geographical location can be especially effective in detecting coordinated actions, such as multiple fraudulent logins or content submissions originating from the same region.
  • Content Similarity: In cases like misinformation or spam detection, clustering by the similarity of content (e.g., similar text in posts/emails) can identify suspiciously coordinated efforts.
  • Behavioral Metrics: Characteristics like the number of transactions made, average session time, or the types of interactions with the platform (e.g., likes, comments, or clicks) can indicate unusual patterns when grouped together.

The key is to choose criteria that are not just correlated with benign user behavior but also distinct enough to isolate risky patterns, which will lead to more effective clustering.

Defining User Properties

Once you’ve chosen the criteria for clustering, defining meaningful properties for the users within each cluster is critical. These properties should be measurable signals that can help you assess the likelihood of harmful behavior. Common properties include:

  • Account Age: Newly created accounts tend to have a higher risk of being involved in malicious activities, so a property like “Account Age
  • Connection Density: For social media platforms, properties like the number of connections or interactions between accounts within a cluster can signal abnormal behavior.
  • Transaction Amounts: In cases of financial fraud, the average transaction size or the frequency of high-value transactions can be key properties to flag risky clusters.

Each property should be clearly linked to a behavior that could indicate either legitimate use or potential abuse. Importantly, properties should be boolean or numerical values that allow for easy aggregation and comparison across the cluster.

Another advanced strategy is using a machine learning classifier’s output as a property, but with an adjusted threshold. Normally, you would set a high threshold for classifying harmful behavior to avoid false positives. However, when combined with clustering, you can afford to lower this threshold because the clustering itself acts as an additional signal to reinforce the property.

Let’s consider that there is a model X, that catches scam and disables email accounts that have model X score > 0.95. Assume this model is already live in production and is disabling bad email accounts at threshold 0.95 with 100% precision. We have to increase the recall of this model, without impacting the precision.

  • First, we need to define clusters that can group coordinated activity together. Let’s say we know that there’s a coordinated activity going on, where bad actors are using the same subject line but different email ids to send scammy emails. So using BACCS, we will form clusters of email accounts that all have the same subject name in their sent emails.
  • Next, we need to lower the raw model threshold and define a BACCS property. We will now integrate model X into our production detection infra and create property using lowered model threshold, say 0.75. This property will have a value of “True” for an email account that has model X score >= 0.75.
  • Then we’ll define the anomaly threshold and say, if 50% of entities in the campaign name clusters have this property, then classify the clusters as bad and take down ad accounts that have this property as True.

So we essentially lowered the model’s threshold and started disabling entities in particular clusters at significantly lower threshold than what the model is currently enforcing at, and yet can be sure the precision of enforcement does not drop and we get an increase in recall. Let’s understand how –

Supposed we have 6 entities that have the same subject line, that have model X score as follows:

Entities actioned by ML model. Image by Author

If we use the raw model score (0.95) we would have disabled 2/6 email accounts only.

If we cluster entities on subject line text, and define a policy to find bad clusters having greater than 50% entities with model X score >= 0.75, we would have taken down all these accounts:

Entities actioned by clustering, using ML scores as properties. Image by Author

So we increased the recall of enforcement from 33% to 83%. Essentially, even if individual behaviors seem less risky, the fact that they are part of a suspicious cluster elevates their importance. This combination provides a strong signal for detecting harmful activity while minimizing the chances of false positives.

By lowering the threshold, you allow the clustering process to surface patterns that might otherwise be missed if you relied on classification alone. This approach takes advantage of both the granular insights from machine learning models and the broader behavioral patterns that clustering can identify. Together, they create a more robust system for detecting and mitigating risks and catching many more entities while still keeping a lower false positive rate.

Clustering techniques remain an important method for detecting coordinated attacks and ensuring system safety, particularly on platforms more prone to fraud, abuse or other malicious activities. By grouping similar behaviors into clusters and applying policies to take down bad entities from such clusters, we can detect and mitigate harmful activity and ensure a safer digital ecosystem for all users. Choosing more advanced embedding-based approaches helps represent complex user behavioral patterns better than manual methods of similarity detection measures.

As we continue advancing our security protocols, frameworks like BACCS play a crucial role in taking down large coordinated attacks. The integration of clustering with behavior-based policies allows for dynamic adaptation, enabling us to respond swiftly to new forms of abuse while reinforcing trust and safety across platforms.

In the future, there is a big opportunity for further research and exploration into complementary techniques that could enhance clustering’s effectiveness. Techniques such as graph-based analysis for mapping complex relationships between entities could be integrated with clustering to offer even higher precision in threat detection. Moreover, hybrid approaches that combine clustering with machine learning classification can be a very effective approach for detecting malicious activities at higher recall and lower false positive rate. Exploring these methods, along with continuous refinement of current methods, will ensure that we remain resilient against the evolving landscape of digital threats.

References

  1. https://developers.google.com/machine-learning/clustering/overview



Source link

15Oct

AI Feels Easier Than Ever, But Is It Really? | by Anna Via | Oct, 2024


The 4 Big Challenges of building AI products

Picture by ynsplt on Unsplash

A few days ago, I was speaking at an event about how to move from using ChatGPT at a personal level to implementing AI-powered technical solutions for teams and companies. We covered everything from prompt engineering and fine-tuning to agents and function calling. One of the questions from the audience stood up to me, even though it was one I should have expected: “How long does it take to get an AI-powered feature into production?”

In many ways, integrating AI into features can be incredibly easy. With recent progress, leveraging a state-of-the-art LLM can be as simple as making an API call. The entry barriers to use and integrate AI are now really low. There is a big but though. Getting an AI feature into production while accounting for all risks linked with this new technology can be a real challenge.

And that’s the paradox: AI feels easier and more accessible than ever, but its open-ended (free input / free output…



Source link

Protected by Security by CleanTalk