12Oct

Gaussian Naive Bayes, Explained: A Visual Guide with Code Examples for Beginners | by Samy Baladram | Oct, 2024


CLASSIFICATION ALGORITHM

Bell-shaped assumptions for better predictions

⛳️ More CLASSIFICATION ALGORITHM, explained:
· Dummy Classifier
· K Nearest Neighbor Classifier
· Bernoulli Naive Bayes
Gaussian Naive Bayes
· Decision Tree Classifier
· Logistic Regression
· Support Vector Classifier
· Multilayer Perceptron (soon!)

Building on our previous article about Bernoulli Naive Bayes, which handles binary data, we now explore Gaussian Naive Bayes for continuous data. Unlike the binary approach, this algorithm assumes each feature follows a normal (Gaussian) distribution.

Here, we’ll see how Gaussian Naive Bayes handles continuous, bell-shaped data — ringing in accurate predictions — all without getting into the intricate math of Bayes’ Theorem.

All visuals: Author-created using Canva Pro. Optimized for mobile; may appear oversized on desktop.

Like other Naive Bayes variants, Gaussian Naive Bayes makes the “naive” assumption of feature independence. It assumes that the features are conditionally independent given the class label.

However, while Bernoulli Naive Bayes is suited for datasets with binary features, Gaussian Naive Bayes assumes that the features follow a continuous normal (Gaussian) distribution. Although this assumption may not always hold true in reality, it simplifies the calculations and often leads to surprisingly accurate results.

Bernoulli NB assumes binary data, Multinomial NB works with discrete counts, and Gaussian NB handles continuous data assuming a normal distribution.

Throughout this article, we’ll use this artificial golf dataset (made by author) as an example. This dataset predicts whether a person will play golf based on weather conditions.

Columns: ‘RainfallAmount’ (in mm), ‘Temperature’ (in Celcius), ‘Humidity’ (in %), ‘WindSpeed’ (in km/h) and ‘Play’ (Yes/No, target feature)
# IMPORTING DATASET #
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
import pandas as pd
import numpy as np

dataset_dict = {
'Rainfall': [0.0, 2.0, 7.0, 18.0, 3.0, 3.0, 0.0, 1.0, 0.0, 25.0, 0.0, 18.0, 9.0, 5.0, 0.0, 1.0, 7.0, 0.0, 0.0, 7.0, 5.0, 3.0, 0.0, 2.0, 0.0, 8.0, 4.0, 4.0],
'Temperature': [29.4, 26.7, 28.3, 21.1, 20.0, 18.3, 17.8, 22.2, 20.6, 23.9, 23.9, 22.2, 27.2, 21.7, 27.2, 23.3, 24.4, 25.6, 27.8, 19.4, 29.4, 22.8, 31.1, 25.0, 26.1, 26.7, 18.9, 28.9],
'Humidity': [85.0, 90.0, 78.0, 96.0, 80.0, 70.0, 65.0, 95.0, 70.0, 80.0, 70.0, 90.0, 75.0, 80.0, 88.0, 92.0, 85.0, 75.0, 92.0, 90.0, 85.0, 88.0, 65.0, 70.0, 60.0, 95.0, 70.0, 78.0],
'WindSpeed': [2.1, 21.2, 1.5, 3.3, 2.0, 17.4, 14.9, 6.9, 2.7, 1.6, 30.3, 10.9, 3.0, 7.5, 10.3, 3.0, 3.9, 21.9, 2.6, 17.3, 9.6, 1.9, 16.0, 4.6, 3.2, 8.3, 3.2, 2.2],
'Play': ['No', 'No', 'Yes', 'Yes', 'Yes', 'No', 'Yes', 'No', 'Yes', 'Yes', 'Yes', 'Yes', 'Yes', 'No', 'No', 'Yes', 'Yes', 'No', 'No', 'No', 'Yes', 'Yes', 'Yes', 'Yes', 'Yes', 'Yes', 'No', 'Yes']
}
df = pd.DataFrame(dataset_dict)

# Set feature matrix X and target vector y
X, y = df.drop(columns='Play'), df['Play']

# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, train_size=0.5, shuffle=False)
print(pd.concat([X_train, y_train], axis=1), end='\n\n')
print(pd.concat([X_test, y_test], axis=1))

Gaussian Naive Bayes works with continuous data, assuming each feature follows a Gaussian (normal) distribution.

  1. Calculate the probability of each class in the training data.
  2. For each feature and class, estimate the mean and variance of the feature values within that class.
  3. For a new instance:
    a. For each class, calculate the probability density function (PDF) of each feature value under the Gaussian distribution of that feature within the class.
    b. Multiply the class probability by the product of the PDF values for all features.
  4. Predict the class with the highest resulting probability.
Gaussian Naive Bayes uses the normal distribution to model the likelihood of different feature values for each class. It then combines these likelihoods to make a prediction.

Transforming non-Gaussian distributed data

Remember that this algorithm naively assume that all the input features are having Gaussian/normal distribution?

Since we are not really sure about the distribution of our data, especially for features that clearly don’t follow a Gaussian distribution, applying a power transformation (like Box-Cox) before using Gaussian Naive Bayes can be beneficial. This approach can help make the data more Gaussian-like, which aligns better with the assumptions of the algorithm.

All columns are scaled using Power Transformation (Box-Cox Transformation) and then standardized.
from sklearn.preprocessing import PowerTransformer

# Initialize and fit the PowerTransformer
pt = PowerTransformer(standardize=True) # Standard Scaling already included
X_train_transformed = pt.fit_transform(X_train)
X_test_transformed = pt.transform(X_test)

Now we are ready for the training.

1. Class Probability Calculation: For each class, calculate its probability: (Number of instances in this class) / (Total number of instances)

from fractions import Fraction

def calc_target_prob(attr):
total_counts = attr.value_counts().sum()
prob_series = attr.value_counts().apply(lambda x: Fraction(x, total_counts).limit_denominator())
return prob_series

print(calc_target_prob(y_train))

2. Feature Probability Calculation : For each feature and each class, calculate the mean (μ) and standard deviation (σ) of the feature values within that class using the training data. Then, calculate the probability using Gaussian Probability Density Function (PDF) formula.

For each weather condition, determine the mean and standard deviation for both “YES” and “NO” instances. Then calculate their PDF using the PDF formula for normal/Gaussian distribution.
The same process is applied to all of the other features.
def calculate_class_probabilities(X_train_transformed, y_train, feature_names):
classes = y_train.unique()
equations = pd.DataFrame(index=classes, columns=feature_names)

for cls in classes:
X_class = X_train_transformed[y_train == cls]
mean = X_class.mean(axis=0)
std = X_class.std(axis=0)
k1 = 1 / (std * np.sqrt(2 * np.pi))
k2 = 2 * (std ** 2)

for i, column in enumerate(feature_names):
equation = f"{k1[i]:.3f}·exp(-(x-({mean[i]:.2f}))²/{k2[i]:.3f})"
equations.loc[cls, column] = equation

return equations

# Use the function with the transformed training data
equation_table = calculate_class_probabilities(X_train_transformed, y_train, X.columns)

# Display the equation table
print(equation_table)

3. Smoothing: Gaussian Naive Bayes uses a unique smoothing approach. Unlike Laplace smoothing in other variants, it adds a tiny value (0.000000001 times the largest variance) to all variances. This prevents numerical instability from division by zero or very small numbers.

Given a new instance with continuous features:

1. Probability Collection:
For each possible class:
· Start with the probability of this class occurring (class probability).
· For each feature in the new instance, calculate the probability density function of that feature within the class.

For ID 14, we calculate the PDF each of the feature for both “YES” and “NO” instances.

2. Score Calculation & Prediction:
For each class:
· Multiply all the collected PDF values together.
· The result is the score for this class.
· The class with the highest score is the prediction.

from scipy.stats import norm

def calculate_class_probability_products(X_train_transformed, y_train, X_new, feature_names, target_name):
classes = y_train.unique()
n_features = X_train_transformed.shape[1]

# Create column names using actual feature names
column_names = [target_name] + list(feature_names) + ['Product']

probability_products = pd.DataFrame(index=classes, columns=column_names)

for cls in classes:
X_class = X_train_transformed[y_train == cls]
mean = X_class.mean(axis=0)
std = X_class.std(axis=0)

prior_prob = np.mean(y_train == cls)
probability_products.loc[cls, target_name] = prior_prob

feature_probs = []
for i, feature in enumerate(feature_names):
prob = norm.pdf(X_new[0, i], mean[i], std[i])
probability_products.loc[cls, feature] = prob
feature_probs.append(prob)

product = prior_prob * np.prod(feature_probs)
probability_products.loc[cls, 'Product'] = product

return probability_products

# Assuming X_new is your new sample reshaped to (1, n_features)
X_new = np.array([-1.28, 1.115, 0.84, 0.68]).reshape(1, -1)

# Calculate probability products
prob_products = calculate_class_probability_products(X_train_transformed, y_train, X_new, X.columns, y.name)

# Display the probability product table
print(prob_products)

For this particular dataset, this accuracy is considered quite good.
from sklearn.naive_bayes import GaussianNB
from sklearn.metrics import accuracy_score

# Initialize and train the Gaussian Naive Bayes model
gnb = GaussianNB()
gnb.fit(X_train_transformed, y_train)

# Make predictions on the test set
y_pred = gnb.predict(X_test_transformed)

# Calculate the accuracy
accuracy = accuracy_score(y_test, y_pred)

# Print the accuracy
print(f"Accuracy: {accuracy:.4f}")

GaussianNB is known for its simplicity and effectiveness. The main thing to remember about its parameters is:

  1. priors: This is the most notable parameter, similar to Bernoulli Naive Bayes. In most cases, you don’t need to set it manually. By default, it’s calculated from your training data, which often works well.
  2. var_smoothing: This is a stability parameter that you rarely need to adjust. (the default is 0.000000001)

The key takeaway is that this algoritm is designed to work well out-of-the-box. In most situations, you can use it without worrying about parameter tuning.

Pros:

  1. Simplicity: Maintains the easy-to-implement and understand trait.
  2. Efficiency: Remains swift in training and prediction, making it suitable for large-scale applications with continuous features.
  3. Flexibility with Data: Handles both small and large datasets well, adapting to the scale of the problem at hand.
  4. Continuous Feature Handling: Thrives with continuous and real-valued features, making it ideal for tasks like predicting real-valued outputs or working with data where features vary on a continuum.

Cons:

  1. Independence Assumption: Still assumes that features are conditionally independent given the class, which might not hold in all real-world scenarios.
  2. Gaussian Distribution Assumption: Works best when feature values truly follow a normal distribution. Non-normal distributions may lead to suboptimal performance (but can be fixed with Power Transformation we’ve discussed)
  3. Sensitivity to Outliers: Can be significantly affected by outliers in the training data, as they skew the mean and variance calculations.

Gaussian Naive Bayes stands as an efficient classifier for a wide range of applications involving continuous data. Its ability to handle real-valued features extends its use beyond binary classification tasks, making it a go-to choice for numerous applications.

While it makes some assumptions about data (feature independence and normal distribution), when these conditions are met, it gives robust performance, making it a favorite among both beginners and seasoned data scientists for its balance of simplicity and power.

import pandas as pd
from sklearn.naive_bayes import GaussianNB
from sklearn.preprocessing import PowerTransformer
from sklearn.metrics import accuracy_score
from sklearn.model_selection import train_test_split

# Load the dataset
dataset_dict = {
'Rainfall': [0.0, 2.0, 7.0, 18.0, 3.0, 3.0, 0.0, 1.0, 0.0, 25.0, 0.0, 18.0, 9.0, 5.0, 0.0, 1.0, 7.0, 0.0, 0.0, 7.0, 5.0, 3.0, 0.0, 2.0, 0.0, 8.0, 4.0, 4.0],
'Temperature': [29.4, 26.7, 28.3, 21.1, 20.0, 18.3, 17.8, 22.2, 20.6, 23.9, 23.9, 22.2, 27.2, 21.7, 27.2, 23.3, 24.4, 25.6, 27.8, 19.4, 29.4, 22.8, 31.1, 25.0, 26.1, 26.7, 18.9, 28.9],
'Humidity': [85.0, 90.0, 78.0, 96.0, 80.0, 70.0, 65.0, 95.0, 70.0, 80.0, 70.0, 90.0, 75.0, 80.0, 88.0, 92.0, 85.0, 75.0, 92.0, 90.0, 85.0, 88.0, 65.0, 70.0, 60.0, 95.0, 70.0, 78.0],
'WindSpeed': [2.1, 21.2, 1.5, 3.3, 2.0, 17.4, 14.9, 6.9, 2.7, 1.6, 30.3, 10.9, 3.0, 7.5, 10.3, 3.0, 3.9, 21.9, 2.6, 17.3, 9.6, 1.9, 16.0, 4.6, 3.2, 8.3, 3.2, 2.2],
'Play': ['No', 'No', 'Yes', 'Yes', 'Yes', 'No', 'Yes', 'No', 'Yes', 'Yes', 'Yes', 'Yes', 'Yes', 'No', 'No', 'Yes', 'Yes', 'No', 'No', 'No', 'Yes', 'Yes', 'Yes', 'Yes', 'Yes', 'Yes', 'No', 'Yes']
}

df = pd.DataFrame(dataset_dict)

# Prepare data for model
X, y = df.drop('Play', axis=1), (df['Play'] == 'Yes').astype(int)

# Split data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.5, shuffle=False)

# Apply PowerTransformer
pt = PowerTransformer(standardize=True)
X_train_transformed = pt.fit_transform(X_train)
X_test_transformed = pt.transform(X_test)

# Train the model
nb_clf = GaussianNB()
nb_clf.fit(X_train_transformed, y_train)

# Make predictions
y_pred = nb_clf.predict(X_test_transformed)

# Check accuracy
accuracy = accuracy_score(y_test, y_pred)
print(f"Accuracy: {accuracy:.4f}")



Source link

10Oct

Building 5 Machine Learning Models: From Simplicity to Optimization


Building, comparing, and optimizing models.

Model Selection

Now we are moving to the second part of our project on Machine Learning Model Selection in Multivariate Analysis with Anonymized Data.

This second part is where the glamour comes in — predictive modeling, machine learning. Everyone is eager to jump straight into building machine learning models. I get that, and I feel the same excitement because I love this stage.

But before we get there, we must go through data processing — which is exactly what we covered in the previous tutorial.

We begin by installing the XGBoost package, one of the favorites among those who participate in Machine Learning competitions on the Kaggle platform.

# This package does not come with Anaconda and needs to be installed
!pip install -q xgboost

This package doesn’t come with Anaconda, so you need to install it separately. To…



Source link

09Oct

Implementing Sequential Algorithms on TPU | by Chaim Rand | Oct, 2024


Accelerating AI/ML Model Training with Custom Operators — Part 3.A

Photo by Bernd Dittrich on Unsplash

This is a direct sequel to a previous post on the topic of implementing custom TPU operations with Pallas. Of particular interest are custom kernels that leverage the unique properties of the TPU architecture in a manner that optimizes runtime performance. In this post, we will attempt to demonstrate this opportunity by applying the power of Pallas to the challenge of running sequential algorithms that are interspersed within a predominantly parallelizable deep learning (DL) workload.

We will focus on Non Maximum Suppression (NMS) of bounding-box proposals as a representative algorithm, and explore ways to optimize its implementation. An important component of computer vision (CV) object detection solutions (e.g., Mask RCNN), NMS is commonly used to filter out overlapping bounding boxes, keeping only the “best” ones. NMS receives a list of bounding box proposals, an associated list of scores, and an IOU threshold, and proceeds to greedily and iteratively choose the remaining box with the highest score and disqualify all other boxes with which it has an IOU that exceeds the given threshold. The fact that the box chosen at the n-th iteration depends on the preceding n-1 steps of the algorithm dictates the sequential nature of its implementation. Please see here and/or here for more on the rational behind NMS and its implementation. Although we have chosen to focus on one specific algorithm, most of our discussion should carry over to other sequential algorithms.

Offloading Sequential Algorithms to CPU

The presence of a sequential algorithm within a predominantly parallelizable ML model (e.g., Mask R-CNN) presents an interesting challenge. While GPUs, commonly used for such workloads, excel at executing parallel operations like matrix multiplication, they can significantly underperform compared to CPUs when handling sequential algorithms. This often leads to computation graphs that include crossovers between the GPU and CPU, where the GPU handles the parallel operations and the CPU handles the sequential ones. NMS is a prime example of a sequential algorithm that is commonly offloaded onto the CPU. In fact, a close analysis of torchvision’s “CUDA” implementation of NMS, reveals that even it runs a significant portion of the algorithm on CPU.

Although offloading sequential operations to the CPU may lead to improved runtime performance, there are several potential drawbacks to consider:

  1. Cross-device execution between the CPU and GPU usually requires multiple points of synchronization between the devices which commonly results in idle time on the GPU while it waits for the CPU to complete its tasks. Given that the GPU is typically the most expensive component of the training platform our goal is to minimize such idle time.
  2. In standard ML workflows, the CPU is responsible for preparing and feeding data to the model, which resides on the GPU. If the data input pipeline involves compute-intensive processing, this can strain the CPU, leading to “input starvation” on the GPU. In such scenarios, offloading portions of the model’s computation to the CPU could further exacerbate this issue.

To avoid these drawbacks you could consider alternative approaches, such as replacing the sequential algorithm with a comparable alternative (e.g., the one suggested here), settling for a slow/suboptimal GPU implementation of the sequential algorithm, or running the workload on CPU — each of which come with there own potential trade-offs.

Sequential Algorithms on TPU

This is where the unique architecture of the TPU could present an opportunity. Contrary to GPUs, TPUs are sequential processors. While their ability to run highly vectorized operations makes them competitive with GPUs when running parallelizable operations such as matrix multiplication, their sequential nature could make them uniquely suited for running ML workloads that include a mix of both sequential and parallel components. Armed with the Pallas extension to JAX, our newfound TPU kernel creation tool, we will evaluate this opportunity by implementing and evaluating a custom implementation of NMS for TPU.

Disclaimers

The NMS implementations we will share below are intended for demonstrative purposes only. We have not made any significant effort to optimize them or to verify their robustness, durability, or accuracy. Please keep in mind that, as of the time of this writing, Pallas is an experimental feature — still under active development. The code we share (based on JAX version 0.4.32) may become outdated by the time you read this. Be sure to refer to the most up-to-date APIs and resources available for your Pallas development. Please do not view our mention of any algorithm, library, or API as an endorsement for their use.

We begin with a simple implementation of NMS in numpy that will serve as a baseline for performance comparison:

import numpy as np

def nms_cpu(boxes, scores, max_output_size, threshold=0.1):
epsilon = 1e-5

# Convert bounding boxes and scores to numpy
boxes = np.array(boxes)
scores = np.array(scores)

# coordinates of bounding boxes
start_x = boxes[:, 0]
start_y = boxes[:, 1]
end_x = boxes[:, 2]
end_y = boxes[:, 3]

# Compute areas of bounding boxes
areas = (end_x - start_x) * (end_y - start_y)

# Sort by confidence score of bounding boxes
order = np.argsort(scores)

# Picked bounding boxes
picked_boxes = []

# Iterate over bounding boxes
while order.size > 0 and len(picked_boxes)

# The index of the remaining box with the highest score
index = order[-1]

# Pick the bounding box with largest confidence score
picked_boxes.append(index.item())

# Compute coordinates of intersection
x1 = np.maximum(start_x[index], start_x[order[:-1]])
x2 = np.minimum(end_x[index], end_x[order[:-1]])
y1 = np.maximum(start_y[index], start_y[order[:-1]])
y2 = np.minimum(end_y[index], end_y[order[:-1]])

# Compute areas of intersection and union
w = np.maximum(x2 - x1, 0.0)
h = np.maximum(y2 - y1, 0.0)

intersection = w * h
union = areas[index] + areas[order[:-1]] - intersection

# Compute the ratio between intersection and union
ratio = intersection / np.clip(union, min=epsilon)

# discard boxes above overlap threshold
keep = np.where(ratio order = order[keep]

return picked_boxes

To evaluate the performance of our NMS function, we generate a batch of random boxes and scores (as JAX tensors) and run the script on a Google Cloud TPU v5e system using the same environment and same benchmarking utility as in our previous post. For this experiment, we specify the CPU as the JAX default device:

import jax
from jax import random
import jax.numpy as jnp

def generate_random_boxes(run_on_cpu = False):
if run_on_cpu:
jax.config.update('jax_default_device', jax.devices('cpu')[0])
else:
jax.config.update('jax_default_device', jax.devices('tpu')[0])

n_boxes = 1024
img_size = 1024

k1, k2, k3 = random.split(random.key(0), 3)

# Randomly generate box sizes and positions
box_sizes = random.randint(k1,
shape=(n_boxes, 2),
minval=1,
maxval=img_size)
top_left = random.randint(k2,
shape=(n_boxes, 2),
minval=0,
maxval=img_size - 1)
bottom_right = jnp.clip(top_left + box_sizes, 0, img_size - 1)

# Concatenate top-left and bottom-right coordinates
rand_boxes = jnp.concatenate((top_left, bottom_right),
axis=1).astype(jnp.bfloat16)
rand_scores = jax.random.uniform(k3,
shape=(n_boxes,),
minval=0.0,
maxval=1.0)

return rand_boxes, rand_scores

rand_boxes, rand_scores = generate_random_boxes(run_on_cpu=True)

time = benchmark(nms_cpu)(rand_boxes, rand_scores, max_output_size=128)
print(f'nms_cpu: {time}')

The resultant average runtime is 2.99 milliseconds. Note the assumption that the input and output tensors reside on the CPU. If they are on the TPU, then the time to copy them between the devices should also be taken into consideration.

If our NMS function is a component within a larger computation graph running on the TPU, we might prefer a TPU-compatible implementation to avoid the drawbacks of cross-device execution. The code block below contains a JAX implementation of NMS specifically designed to enable acceleration via JIT compilation. Denoting the number of boxes by N, we begin by calculating the IOU between each of the N(N-1) pairs of boxes and preparing an NxN boolean tensor (mask_threshold) where the (i,j)-th entry indicates whether the IOU between boxes i and j exceed the predefined threshold.

To simplify the iterative selection of boxes, we create a copy of the mask tensor (mask_threshold2) where the diagonal elements are zeroed to prevent a box from suppressing itself. We further define two score-tracking tensors: out_scores, which retains the scores of the chosen boxes (and zeros the scores of the eliminated ones), and remaining_scores, which maintains the scores of the boxes still being considered. We then use the jax.lax.while_loop function to iteratively choose boxes while updating the out_scores and remaining_scores tensors. Note that the format of the output of this function differs from the previous function and may need to be adjusted to fit into subsequent steps of the computation graph.

import functools

# Given N boxes, calculates mask_threshold an NxN boolean mask
# where the (i,j) entry indicates whether the IOU of boxes i and j
# exceed the threshold. Returns mask_threshold, mask_threshold2
# which is equivalent to mask_threshold with zero diagonal and
# the scores modified so that all values are greater than 0
def init_tensors(boxes, scores, threshold=0.1):
epsilon = 1e-5

# Extract left, top, right, bottom coordinates
left = boxes[:, 0]
top = boxes[:, 1]
right = boxes[:, 2]
bottom = boxes[:, 3]

# Compute areas of boxes
areas = (right - left) * (bottom - top)

# Calculate intersection points
inter_l = jnp.maximum(left[None, :], left[:, None])
inter_t = jnp.maximum(top[None, :], top[:, None])
inter_r = jnp.minimum(right[None, :], right[:, None])
inter_b = jnp.minimum(bottom[None, :], bottom[:, None])

# Width, height, and area of the intersection
inter_w = jnp.clip(inter_r - inter_l, 0)
inter_h = jnp.clip(inter_b - inter_t, 0)
inter_area = inter_w * inter_h

# Union of the areas
union = areas[None, :] + areas[:, None] - inter_area

# IoU calculation
iou = inter_area / jnp.clip(union, epsilon)

# Shift scores to be greater than zero
out_scores = scores - jnp.min(scores) + epsilon

# Create mask based on IoU threshold
mask_threshold = iou > threshold

# Create mask excluding diagonal (i.e., self IoU is ignored)
mask_threshold2 = mask_threshold * (1-jnp.eye(mask_threshold.shape[0],
dtype=mask_threshold.dtype))

return mask_threshold, mask_threshold2, out_scores

@functools.partial(jax.jit, static_argnames=['max_output_size', 'threshold'])
def nms_jax(boxes, scores, max_output_size, threshold=0.1):
# initialize mask and score tensors
mask_threshold, mask_threshold2, out_scores = init_tensors(boxes,
scores,
threshold)

# The out_scores tensor will retain the scores of the chosen boxes
# and zero the scores of the eliminated ones
# remaining_scores will maintain non-zero scores for boxes that
# have not been chosen or eliminated
remaining_scores = out_scores.copy()

def choose_box(state):
i, remaining_scores, out_scores = state
# choose index of box with highest score from remaining scores
index = jnp.argmax(remaining_scores)
# check validity of chosen box
valid = remaining_scores[index] > 0
# If valid, zero all scores with IOU greater than threshold
# (including the chosen index)
remaining_scores = jnp.where(mask_threshold[index] *valid,
0,
remaining_scores)
# zero the scores of the eliminated tensors (not including
# the chosen index)
out_scores = jnp.where(mask_threshold2[index]*valid,
0,
out_scores)

i = i + 1
return i, remaining_scores, out_scores

def cond_fun(state):
i, _, _ = state
return (i

i = 0
state = (i, remaining_scores, out_scores)

_, _, out_scores = jax.lax.while_loop(cond_fun, choose_box, state)

# Output the resultant scores. To extract the chosen boxes,
# Take the max_output_size highest scores:
# min = jnp.minimum(jnp.count_nonzero(scores), max_output_size)
# indexes = jnp.argsort(out_scores, descending=True)[:min]
return out_scores

# nms_jax can be run on either the CPU the TPU
rand_boxes, rand_scores = generate_random_boxes(run_on_cpu=True)

time = benchmark(nms_jax)(rand_boxes, rand_scores, max_output_size=128)
print(f'nms_jax on CPU: {time}')

rand_boxes, rand_scores = generate_random_boxes(run_on_cpu=False)

time = benchmark(nms_jax)(rand_boxes, rand_scores, max_output_size=128)
print(f'nms_jax on TPU: {time}')

The runtimes of this implementation of NMS are 1.231 and 0.416 milliseconds on CPU and TPU, respectively.

We now present a custom implementation of NMS in which we explicitly leverage the fact that on TPUs Pallas kernels are executed in a sequential manner. Our implementation uses two boolean matrix masks and two score-keeping tensors, similar to the approach in our previous function.

We define a kernel function, choose_box, responsible for selecting the next box and updating the score-keeping tensors, which are maintained in scratch memory. We invoke the kernel across a one-dimensional grid where the number of steps (i.e., the grid-size) is determined by the max_output_size parameter.

Note that due to some limitations (as of the time of this writing) on the operations supported by Pallas, some acrobatics are required to implement both the “argmax” function and the validity check for the selected boxes. For the sake of brevity, we omit the technical details and refer the interested reader to the comments in the code below.

from jax.experimental import pallas as pl
from jax.experimental.pallas import tpu as pltpu

# argmax helper function
def pallas_argmax(scores, n_boxes):
# we assume that the index of each box is stored in the
# least significant bits of the score (see below)
idx = jnp.max(scores.astype(float)).astype(int) % n_boxes
return idx

# Pallas kernel definition
def choose_box(scores, thresh_mask1, thresh_mask2, ret_scores,
scores_scratch, remaining_scores_scratch, *, nsteps, n_boxes):
# initialize scratch memory on first step
@pl.when(pl.program_id(0) == 0)
def _():
scores_scratch[...] = scores[...]
remaining_scores_scratch[...] = scores[...]

remaining_scores = remaining_scores_scratch[...]

# choose box
idx = pallas_argmax(remaining_scores, n_boxes)

# we use any to verfiy validity of the chosen box due
# to limitations on indexing in pallas
valid = (remaining_scores>0).any()

# updating score tensors
remaining_scores_scratch[...] = jnp.where(thresh_mask1[idx,...]*valid,
0,
remaining_scores)
scores_scratch[...] = jnp.where(thresh_mask2[idx,...]*valid,
0,
scores_scratch[...])

# set return value on final step
@pl.when(pl.program_id(0) == nsteps - 1)
def _():
ret_scores[...] = scores_scratch[...]

@functools.partial(jax.jit, static_argnames=['max_output_size', 'threshold'])
def nms_pallas(boxes, scores, max_output_size, threshold=0.1):
n_boxes = scores.size
mask_threshold, mask_threshold2, scores = init_tensors(boxes,
scores,
threshold)

# In order to work around the Pallas argsort limitation
# we create a new scores tensor with the same ordering of
# the input scores tensor in which the index of each score
# in the ordering is encoded in the least significant bits
sorted = jnp.argsort(scores, descending=True)

# descending integers: n_boxes-1, ..., 2, 1, 0
descending = jnp.flip(jnp.arange(n_boxes))

# new scores in descending with the least significant
# bits carrying the argsort of the input scores
ordered_scores = n_boxes * descending + sorted

# new scores with same ordering as input scores
scores = jnp.empty_like(ordered_scores
).at[sorted].set(ordered_scores)

grid = (max_output_size,)
return pl.pallas_call(
functools.partial(choose_box,
nsteps=max_output_size,
n_boxes=n_boxes),
grid_spec=pltpu.PrefetchScalarGridSpec(
num_scalar_prefetch=0,
in_specs=[
pl.BlockSpec(block_shape=(n_boxes,)),
pl.BlockSpec(block_shape=(n_boxes, n_boxes)),
pl.BlockSpec(block_shape=(n_boxes, n_boxes)),
],
out_specs=pl.BlockSpec(block_shape=(n_boxes,)),
scratch_shapes=[pltpu.VMEM((n_boxes,), scores.dtype),
pltpu.VMEM((n_boxes,), scores.dtype)],
grid=grid,
),
out_shape=jax.ShapeDtypeStruct((n_boxes,), scores.dtype),
compiler_params=dict(mosaic=dict(
dimension_semantics=("arbitrary",)))
)(scores, mask_threshold, mask_threshold2)

rand_boxes, rand_scores = generate_random_boxes(run_on_cpu=False)

time = benchmark(nms_pallas)(rand_boxes, rand_scores, max_output_size=128)
print(f'nms_pallas: {time}')

The average runtime of our custom NMS operator is 0.139 milliseconds, making it roughly three times faster than our JAX-native implementation. This result highlights the potential of tailoring the implementation of sequential algorithms to the unique properties of the TPU architecture.

Note that in our Pallas kernel implementation, we load the full input tensors into TPU VMEM memory. Given the limited the capacity of VMEM, scaling up the input size (i.e., increase the number of bounding boxes) will likely lead to memory issues. Typically, such limitations can be addressed by chunking the inputs with BlockSpecs. Unfortunately, applying this approach would break the current NMS implementation. Implementing NMS across input chunks would require a different design, which is beyond the scope of this post.

The results of our experiments are summarized in the table below:

Results of NMS experiments (lower is better) — by Author

These results demonstrate the potential for running full ML computation graphs on TPU, even when they include sequential components. The performance improvement demonstrated by our Pallas NMS operator, in particular, highlights the opportunity of customizing kernels in a way that leverages the TPUs strengths.

In our previous post we learned of the opportunity for building custom TPU operators using the Pallas extension for JAX. Maximizing this opportunity requires tailoring the kernel implementations to the specific properties of the TPU architecture. In this post, we focused on the sequential nature of the TPU processor and its use in optimizing a custom NMS kernel. While scaling the solution to support an unrestricted number of bounding boxes would require further work, the core principles we have discussed remain applicable.

Still in the experimental phase of its development, there remain some limitations in Pallas that may require creative workarounds. But the strength and potential are clearly evident and we anticipate that they will only increase as the framework matures.



Source link

08Oct

Talent Systems Specialist | GovAI Blog


Our first research agenda, published in 2018, helped define and shape the nascent field of AI governance. Our team and affiliate community possess expertise in a wide variety of domains, including compute governance, US-China relations, arms race dynamics, EU policy, and AI progress forecasting.

GovAI researchers have published in top journals and conferences, including International Organization, NeurIPS, and Nature Machine Intelligence. Our alumni have gone on to research roles at top academic institutions, including the University of Oxford and the University of Cambridge, and top AI labs, including DeepMind and OpenAI.

As Talent Systems Specialist, you will report to Ryan (Director of Operations), and also work closely with Georg (Chief of Staff), Valerie (Research Manager), and others on GovAI’s talent systems and programs. Responsibilities will include:

  • Project management of many of GovAI’s hiring rounds, from outreach to offers, using the systems and processes already in place. This includes quickly understanding how the existing systems work, and leveraging them to ensure that the teams responsible for candidate evaluation have the information, direction, and tools they need to succeed.
  • Designing, implementing, and continuously improving the tools, systems, and processes we use in our recruiting. This includes systems for candidate identification, outreach to promising individuals, contending with the proliferation of AI-assisted applications, and running the end-to-end evaluation process. This work has high potential to directly shape our talent strategy by defining what is possible for the organisation.
  • Balancing multiple concurrent workstreams, prioritising based on impact and strategic alignment. Because this role will be simultaneously coordinating hiring rounds and improving the systems on which those rounds are run, the individual will need to progress multiple objectives in parallel and quickly adapt when priorities shift.
  • Collaborating with staff across GovAI to gather requirements, understand pain points, and create solutions that enhance efficiency and effectiveness across the organisation. We are particularly excited about individuals who will proactively suggest improvements to “the way we do things” and can challenge our assumptions.
  • Supporting the Director of Operations in maintaining and elevating the quality and professionalism of our recruiting and people operations.

Depending on your interests and skills, there is room for the role to grow in several directions, including:

  • Internal Product Manager, owning nearly all of GovAI’s internal systems, aiming to enable all of GovAI’s programmes to rapidly scale. This could include building and managing a CRM, research repository, website, virtual course platform, research dissemination tools, and more. A key focus would be using data to evaluate and improve the impact of our programmes.
  • Recruiting Lead, taking a deeper level of ownership over end-to-end talent selection. Responsibilities could include defining and scoping new roles, designing effective evaluations, coordinating internal and external stakeholders, and participating directly in candidate grading and hiring decisions. This role could also involve managing one or more direct reports on the Operations team who support hiring efforts.
  • Head of People, overseeing the complete GovAI staff experience beyond recruiting and onboarding. This would include developing processes for staff to assess and optimise their performance, fostering best practices for delivering effective feedback, helping team members realise ambitious professional development goals, and further strengthening GovAI’s culture and working practices.

At GovAI we believe there is no such thing as a perfect candidate and we don’t expect a successful hire to excel in all of the dimensions listed below. If you are hesitant to apply because you are unsure whether you are qualified or you worry your background doesn’t make you an obvious fit, we still strongly encourage you to apply.

We’re searching for candidates who are:

  • Highly organised and skilled in project management. This role involves managing complex, concurrent work streams. We are looking for someone who can demonstrate highly structured work habits, confidently prioritise tasks, and take a methodical approach to maintaining order and progress.
  • Proactive in identifying and driving improvements, from ideation through execution. This role should seek out opportunities to enhance our systems and processes, gather high-level guidance from senior stakeholders (e.g. GovAI leadership), and take initiative to build scalable solutions that meet the organisation’s evolving needs.
  • Good at working as part of a fast-moving team. GovAI is a small-but-growing organisation and most team members wear many hats. This role should be comfortable iterating, pivoting when priorities change, and ensuring their solutions can stand on their own when the team moves on to new challenges.
  • Excellent communicators, both verbally and in writing. This role requires clear and prompt communication with a wide range of stakeholders, often synthesising rapid or fragmented feedback into concrete solutions.
  • Adept at translating high-level goals into actionable plans, including defining project owners, setting up project plans, and overseeing their execution from start to finish.
  • Driven by excellence and a commitment to producing high-quality results. Successful candidates will actively seek out opportunities to improve their skills and maximise their impact.

Although not mandatory, the following qualities would make a candidate exceptionally promising:  

  • Experience in product management, user experience, systems architecture, or data design. Strong candidates might have a good understanding of how to build solutions that address complex needs, integrate with existing processes, and remain adaptable to future requirements.
  • Experience in recruiting, talent development, staff support, or people operations. Strong candidates might have experience with recruiting or designing talent search processes, or building HR/people-oriented programs and systems.
  • Strong interpersonal skills and leadership abilities. It would be considered a strength if this individual could provide effective line management for junior Operations team members and ensure the Operations team as a whole receives appropriate support and guidance.
  • Excited by the opportunity to use their careers to positively influence the lasting impact of artificial intelligence, in line with our organisation’s mission.

This position will be full-time, and managed by Ryan, GovAI’s Director of Operations. Our offices are located in the UK and we strongly prefer team members to be based in Oxford or London, although we are open to hiring individuals who work remotely, and Ryan is based in New York City. We are able to sponsor visas. 

This role will be compensated in line with our salary principles. As such, the salary for this role will depend on the successful applicant’s experience, but we expect the full-time range to be between £60,000 and £80,000 for candidates based in the UK. In rare cases where salary considerations would prevent a candidate from accepting an offer, there may also be some flexibility in compensation. 

Benefits associated with the role include health, dental, and vision insurance, flexible work hours, extended parental leave, ergonomic equipment, a 10% employer pension contribution, and 33 days of paid vacation (including public holidays). Based on location, the role may also offer a £5,000 annual wellbeing budget, a £1,500 annual commuting budget, and a relocation stipend.

The application process includes three stages: a written submission in the first round, a paid remote work test in the second round, and an interview in the final round. Please apply using the form linked below. 

We aim to fill this role as soon as possible and may begin reviewing applications before the deadline. Applications submitted earlier may be given additional consideration.

We also note that end-of-year hiring rounds have a higher risk of delays as our graders navigate competing holiday schedules. While we intend to reach a decision before the end of the year, we appreciate your patience if the final outcome is only reached in early 2025.

GovAI is committed to fostering a culture of inclusion and we encourage individuals with underrepresented perspectives and backgrounds to apply. We especially encourage applications from women, gender minorities, people of colour, and people from regions other than North America and Western Europe who are excited about contributing to our mission. We are an equal opportunity employer and want to make it as easy as possible for everyone who joins our team to thrive in our workplace. 

If you need assistance with the application due to a disability, or have any other questions about applying, please email

re*********@go********.ai











.



Source link

08Oct

Demystifying Large Language Model Function Calling | by Cobus Greyling | Oct, 2024


Large Language Model (LLM) Function Calling enables models to interact directly with external functions and APIs, expanding their utility beyond language processing.

Before diving into demystifying LLM function calling, just a few considerations…

The term Large Language Model is increasingly seen as a general reference rather than a precise or technically accureate description.

Today, the term Foundation Models encompass a broader range of capabilities, including not only language but also vision and multimodal functionalities.

There are also specialised models like Small Language Models optimised for lightweight applications and Large Action Models, which are fine-tuned for structured outputs and agent-based tasks.

This evolution reflects the diversity in AI architectures, with models designed to meet specific needs across various domains and applications. As the landscape grows, terminology will likely continue to evolve.

When using the OpenAI API with function calling, the model itself does not run the functions.

Instead, it generates parameters for potential function calls.

Your application then decides how to handle these parameters, maintaining full control over whether to call the suggested function or take another action.

In AI language models, the introduction of functions adds a new layer of autonomy.

The function calling capability allows the model to independently determine whether a function call is needed to handle a particular task or if it should respond directly.

By doing so, the model dynamically selects the most suitable response strategy based on the context, enhancing both its adaptability and effectiveness.

This decision-making power introduces a more nuanced autonomy, enabling the model to switch seamlessly between execution and conversation.

In function calling with language models, the model operates autonomously to determine whether a specific function call is appropriate based on the request.

When it identifies a match, it transitions to a more structured approach, preparing data parameters needed for the function. This allows the language model to act as a mediator, enabling efficient function handling while maintaining flexibility in processing the request.

AI autonomy can be viewed on a spectrum, with varying levels of independence depending on the system’s design.

By integrating function calls within generative AI applications, we introduce not only structure but also an initial layer of autonomy.

This enables AI systems to assess and respond to specific requests with a degree of self-direction. As AI technology evolves, these levels of autonomy are expected to increase, allowing models to handle tasks with greater independence and sophistication.

Consequently, this progression will enhance AI’s capacity to manage complex functions autonomously.

From the Python application below, it can be seen that two functions are defined, for adding and another for subtracting.

These functions need not be as confined as in this simple illustrative example, it can break out to an API which is external.

You also see the schema which is defined for the functions and a description for each schema, together with a description for each input parameter.

pip install openai==0.28

import openai
import json

# Prompt user to input API key
api_key = input("Please enter your OpenAI API key: ")
openai.api_key = api_key

# Define the tools: an addition function and a subtraction function
def add_numbers(a, b):
return {"result": a + b}

def subtract_numbers(a, b):
return {"result": a - b}

# Define the function schema for OpenAI function calling
functions = [
{
"name": "add_numbers",
"description": "Add two numbers together",
"parameters": {
"type": "object",
"properties": {
"a": {
"type": "number",
"description": "The first number to add"
},
"b": {
"type": "number",
"description": "The second number to add"
}
},
"required": ["a", "b"]
}
},
{
"name": "subtract_numbers",
"description": "Subtract one number from another",
"parameters": {
"type": "object",
"properties": {
"a": {
"type": "number",
"description": "The number to subtract from"
},
"b": {
"type": "number",
"description": "The number to subtract"
}
},
"required": ["a", "b"]
}
}
]

# Define a function to handle the function calling based on the function name
def handle_function_call(function_name, arguments):
if function_name == "add_numbers":
return add_numbers(arguments['a'], arguments['b'])
elif function_name == "subtract_numbers":
return subtract_numbers(arguments['a'], arguments['b'])
else:
raise ValueError(f"Unknown function: {function_name}")

# Prompting the model with function calling
def call_gpt(prompt):
response = openai.ChatCompletion.create(
model="gpt-4-0613", # gpt-4-0613 supports function calling
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": prompt}
],
functions=functions,
function_call="auto" # This allows the model to decide which function to call
)

# Prompting the model with function calling
def call_gpt(prompt):
response = openai.ChatCompletion.create(
model="gpt-4-0613", # gpt-4-0613 supports function calling
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": prompt}
],
functions=functions,
function_call="auto" # This allows the model to decide which function to call
)

# Check if the model wants to call a function
message = response["choices"][0]["message"]
if "function_call" in message:
function_name = message["function_call"]["name"]
arguments = json.loads(message["function_call"]["arguments"])
result = handle_function_call(function_name, arguments)
print (function_name, arguments, result)
return f"Function called: {function_name}, Result: {result['result']}"
else:
return message["content"]

# Test the app
while True:
user_input = input("Enter a math problem (addition or subtraction) or 'exit' to quit: ")
if user_input.lower() == "exit":
break
response = call_gpt(user_input)
print(response)

Chief Evangelist @ Kore.ai | I’m passionate about exploring the intersection of AI and language. From Language Models, AI Agents to Agentic Applications, Development Frameworks & Data-Centric Productivity Tools, I share insights and ideas on how these technologies are shaping the future.

https://platform.openai.com/docs/guides/function-calling



Source link

07Oct

Exploring How the New OpenAI Realtime API Simplifies Voice Agent Flows | by Sami Maameri | Oct, 2024


Setting up a Voice Agent using Twilio and the OpenAI Realtime API

At the recent OpenAI Dev Day on October 1st, 2024, OpenAI’s biggest release was the reveal of their Realtime API:

“Today, we’re introducing a public beta of the Realtime API, enabling all paid developers to build low-latency, multimodal experiences in their apps.

Similar to ChatGPT’s Advanced Voice Mode, the Realtime API supports natural speech-to-speech conversations using the six preset voices already supported in the API.”

(source: OpenAI website)

As per their message, some of its key benefits include low latency, and its speech to speech capabilities. Let’s see how that plays out in practice in terms of building out voice AI agents.

It also has an interruption handling feature, so that the realtime stream will stop sending audio if it detects you are trying to speak over it, a useful feature for sure when building voice agents.

In this article we will:

  • Compare what a phone voice agent flow might have looked like before the Realtime API, and what it looks like now,
  • Review a GitHub project from Twilio that sets up a voice agent using the new Realtime API, so we can see what the implementation looks like in practice, and get an idea how the websockets and connections are setup for such an application,
  • Quickly review the React demo project from OpenAI that uses the Realtime API,
  • Compare the pricing of these various options.

Before the OpenAI Realtime API

To get a phone voice agent service working, there are some key services we require

  • Speech to Text ( e.g Deepgram),
  • LLM/Agent ( e.g OpenAI),
  • Text to Speech (e.g ElevenLabs).

These services are illustrated in the diagram below

(source https://github.com/twilio-labs/call-gpt, MIT license)

That of course means integration with a number of services, and separate API requests for each parts.

The new OpenAI Realtime API allows us to bundle all of those together into a single request, hence the term, speech to speech.

After the OpenAI Realtime API

This is what the flow diagram would look like for a similar new flow using the new OpenAI Realtime API.

Obviously this is a much simpler flow. What is happening is we are just passing the speech/audio from the phone call directly to the OpenAI Realtime API. No need for a speech to text intermediary service.

And on the response side, the Realtime API is again providing an audio stream as the response, which we can send right back to Twilio (i.e to the phone call response). So again, no need for an extra text to speech service, as it is all taken care of by the OpenAI Realtime API.

Let’s look at some code samples for this. Twilio has provided a great github repository example for setting up this Twilio and OpenAI Realtime API flow. You can find it here:

Here are some excerpts from key parts of the code related to setting up

  • the websockets connection from Twilio to our application, so that we can receive audio from the caller, and send audio back,
  • and the websockets connection to the OpenAI Realtime API from our application.

I have added some comments in the source code below to try and explain what is going on, expecially regarding the websocket connection between Twilio and our applicaion, and the websocket connection from our application to OpenAI. The triple dots (…) refere to sections of the source code that have been removed for brevity, since they are not critical to understanding the core features of how the flow works.

// On receiving a phone call, Twilio forwards the incoming call request to
// a webhook we specify, which is this endpoint here. This allows us to
// create programatic voice applications, for example using an AI agent
// to handle the phone call
//
// So, here we are providing an initial response to the call, and creating
// a websocket (called a MediaStream in Twilio, more on that below) to receive
// any future audio that comes into the call
fastify.all('/incoming', async (request, reply) => {
const twimlResponse = `

Please wait while we connect your call to the A. I. voice assistant, powered by Twilio and the Open-A.I. Realtime API
O.K. you can start talking!



`;

reply.type('text/xml').send(twimlResponse);
});

fastify.register(async (fastify) => {

// Here we are connecting our application to the websocket media stream we
// setup above. That means all audio that comes though the phone will come
// to this websocket connection we have setup here
fastify.get('/media-stream', { websocket: true }, (connection, req) => {
console.log('Client connected');

// Now, we are creating websocket connection to the OpenAI Realtime API
// This is the second leg of the flow diagram above
const openAiWs = new WebSocket('wss://api.openai.com/v1/realtime?model=gpt-4o-realtime-preview-2024-10-01', {
headers: {
Authorization: `Bearer ${OPENAI_API_KEY}`,
"OpenAI-Beta": "realtime=v1"
}
});

...

// Here we are setting up the listener on the OpenAI Realtime API
// websockets connection. We are specifying how we would like it to
// handle any incoming audio streams that have come back from the
// Realtime API.
openAiWs.on('message', (data) => {
try {
const response = JSON.parse(data);

...

// This response type indicates an LLM responce from the Realtime API
// So we want to forward this response back to the Twilio Mediat Stream
// websockets connection, which the caller will hear as a response on
// on the phone
if (response.type === 'response.audio.delta' && response.delta) {
const audioDelta = {
event: 'media',
streamSid: streamSid,
media: { payload: Buffer.from(response.delta, 'base64').toString('base64') }
};
// This is the actual part we are sending it back to the Twilio
// MediaStream websockets connection. Notice how we are sending the
// response back directly. No need for text to speech conversion from
// the OpenAI response. The OpenAI Realtime API already provides the
// response as an audio stream (i.e speech to speech)
connection.send(JSON.stringify(audioDelta));
}
} catch (error) {
console.error('Error processing OpenAI message:', error, 'Raw message:', data);
}
});

// This parts specifies how we handle incoming messages to the Twilio
// MediaStream websockets connection i.e how we handle audio that comes
// into the phone from the caller
connection.on('message', (message) => {
try {
const data = JSON.parse(message);

switch (data.event) {
// This case ('media') is that state for when there is audio data
// available on the Twilio MediaStream from the caller
case 'media':
// we first check out OpenAI Realtime API websockets
// connection is open
if (openAiWs.readyState === WebSocket.OPEN) {
const audioAppend = {
type: 'input_audio_buffer.append',
audio: data.media.payload
};
// and then forward the audio stream data to the
// Realtime API. Again, notice how we are sending the
// audio stream directly, not speech to text converstion
// as would have been required previously
openAiWs.send(JSON.stringify(audioAppend));
}
break;

...
}
} catch (error) {
console.error('Error parsing message:', error, 'Message:', message);
}
});

...

fastify.listen({ port: PORT }, (err) => {
if (err) {
console.error(err);
process.exit(1);
}
console.log(`Server is listening on port ${PORT}`);
});

So, that is how the new OpenAI Realtime API flow plays out in practice.

Regarding the Twilio MediaStreams, you can read more about them here. They are a way to setup a websockets connection between a call to a Twilio phone number and your application. This allows streaming of audio from the call to and from you application, allowing you to build programmable voice applications over the phone.

To get to the code above running, you will need to setup a Twilio number and ngrok also. You can check out my other article over here for help setting those up.

Since access to the OpenAI Realtime API has just been rolled, not everyone may have access just yet. I intially was not able to access it. Running the application worked, but as soon as it tries to connect to the OpenAI Realtime API I got a 403 error. So in case you see the same issue, it could be related to not having access yet also.

OpenAI have also provided a great demo for testing out their Realtime API in the browser using a React app. I tested this out myself, and was very impressed with the speed of response from the voice agent coming from the Realtime API. The response is instant, there is no latency, and makes for a great user experience. I was definitley impressed when testing it out.

Sharing a link to the source code here. It has intructions in the README.md for how to get setup

This is a picture of what the application looks like once you get it running on local

(source https://github.com/openai/openai-realtime-console, MIT license)

Let’s compare the cost the of using the OpenAI Realtime API versus a more conventional approach using Deepagram for speech to text (STT) and text to speech (TTS) and using OpenAI GPT-4o for the LLM part.

Comparison using the prices from their websites shows that for a 1 minute conversation, with the caller speaking half the time, and the AI agent speaking the other half, the cost per minute using Deepgram and GPT-4o would be $0.0117/minute, whereas using the OpenAI Realtime API would be $0.15/minute.

That means using the OpenAI Realtime API would be just over 10x the price per minute.

It does sound like a fair amount more expensive, though we should balance that with some of the benefits the OpenAI Realtime API could provide, including

  • reduced latencies, crucial for having a good voice experience,
  • ease of setup due to fewer moving parts,
  • conversation interruption handling provided out of the box.

Also, please do be aware that prices can change over time, so the prices you find at the time of reading this article, may not be the same as those reflected above.

Hope that was helpful! What do you think of the new OpenAI Realtime API? Think you will be using it in any upcoming projects?

While we are here, are there any other tutorials or articles around voice agents andvoice AI you would be interested in? I am deep diving into that field a bit just now, so would be happy to look into anything people find interesting.

Happy hacking!



Source link

07Oct

Top 5 Geospatial Data APIs for Advanced Analysis | by Amanda Iglesias Moreno | Oct, 2024


Explore Overpass, Geoapify, Distancematrix.ai, Amadeus, and Mapillary for Advanced Mapping and Location Data

Kyle Glenn in Unsplash (Source: https://unsplash.com/es/@kylejglenn)

Geographic data is important in many analyses, enabling us to decide based on location and spatial patterns. Examples of projects where geodata can come in handy include predicting house prices, route optimization in transportation, or establishing a marketing strategy for business.

However, as a data scientist, you will frequently face the challenge of where to obtain this data. In many cases, there are public sources with information on geographic data; however, in many cases, the information they provide needs to be revised for the analyses we want to perform.

This article will evaluate five of the most useful APIs for obtaining large-scale geographic data. We will assess their usage, advantages and disadvantages, and the main applications of the information they provide. Think of this article as a fundamental foundation for the use and applications of these APIs, so that you can later delve deeper into all the tools they offer.

The Overpass API allows access to the information available on the OpenStreetMap website. OpenStreetMap is an open geographic…



Source link

06Oct

Efficient Testing of ETL Pipelines with Python | by Robin von Malottki | Oct, 2024


How to Instantly Detect Data Quality Issues and Identify their Causes

Photo by Digital Buggu and obtained from Pexels.com

In today’s data-driven world, organizations rely heavily on accurate data to make critical business decisions. As a responsible and trustworthy Data Engineer, ensuring data quality is paramount. Even a brief period of displaying incorrect data on a dashboard can lead to the rapid spread of misinformation throughout the entire organization, much like a highly infectious virus spreads through a living organism.

But how can we prevent this? Ideally, we would avoid data quality issues altogether. However, the sad truth is that it’s impossible to completely prevent them. Still, there are two key actions we can take to mitigate the impact.

  1. Be the first to know when a data quality issue arises
  2. Minimize the time required to fix the issue

In this blog, I’ll show you how to implement the second point directly in your code. I will create a data pipeline in Python using generated data from Mockaroo and leverage Tableau to quickly identify the cause of any failures. If you’re looking for an alternative testing framework, check out my article on An Introduction into Great Expectations with python.



Source link

03Oct

LLM Symbolic Reasoning For Visual AI Agents | by Cobus Greyling | Oct, 2024


These symbols can represent spatial realities (for example, chair next to table), allowing the model to reason logically about tasks & answer questions based on these spatial observations. This helps AI make decisions or plan actions in real-world contexts.

Human reasoning can be understood as a cooperation between the intuitive & associative, and the deliberative & logical. ~ Source

Considering the image below, conversational AI systems traditionally followed System 2 approaches, characterised by deliberate and logical reasoning.

These systems relied on intent detection and structured flows to determine action sequences. With the rise of Generative AI and Large Language Models (LLMs), there’s a shift toward System 1 solutions, which are more intuitive and associative.

A possible approach to activity reasoning is to build a symbolic system consisting of symbols and rules, connecting various elements to mimic human reasoning.

Previous attempts, though useful, faced challenges due to handcrafted symbols and limited rules derived from visual annotations. This limited their ability to generalise complex activities.

To address these issues, a new symbolic system is proposed with two key properties: broad-coverage symbols and rational rules. Instead of relying on expensive manual annotations, LLMs are leveraged to approximate these properties.

Given an image, symbols are extracted from visual content, and fuzzy logic is applied to deduce activity semantics based on rules, enhancing reasoning capabilities.

This shift exemplifies how intuitive, associative reasoning enabled by LLMs is pushing the boundaries of AI agent systems in tasks like activity recognition.

With just a quick glance at an image, we as humans can naturally translate visual inputs into symbols or concepts.

This allows us to use common-sense reasoning to understand and imagine the broader context beyond the visible scene — similar to how we infer the existence of gravity without directly seeing it.



Source link

03Oct

Chain-Of-Symbol Prompting To Improve Spatial Reasoning | by Cobus Greyling | Oct, 2024


Redundant natural language and irrelevant information also can hamper the performance of LLMs.

Using symbols to represent spatial relationships in word sequences can provide a clearer and more understandable framework for large language models (LLMs).

This method potentially simplifies data processing for the models.

Researchers are exploring the use of symbols in prompting LLMs, highlighting its importance as it extends beyond traditional language comprehension. This under-explored area offers valuable insights into enhancing LLM capabilities and understanding.

Using a set of symbols to represent spatial relationships between objects in complex environments.

To examine the impact of symbolic representations in prompting, researchers focus on complex spatial understanding and planning tasks.

These tasks require large language models (LLMs) to interpret virtual spatial environments described in natural language, while also planning and achieving specific objectives within those settings.

This approach helps assess how effectively LLMs can navigate and manipulate spatial information.

The model is posed with the following question:

Answer this question: I have a chair, two potatoes, a cauliflower, a lettuce head, two tables, a cabbage, two onions, and three fridges. How many vegetables do I have?

Symbolic reasoning involves reasoning about objects, their relation to each-other and concepts.

And The Model Response:

The question lists several items, but only some of them are vegetables. Here’s the breakdown:

  • Potatoes: 2 (vegetable)
  • Cauliflower: 1 (vegetable)
  • Lettuce head: 1 (vegetable)
  • Cabbage: 1 (vegetable)
  • Onions: 2 (vegetable)

The other items (chair, tables, and fridges) are not vegetables.

Total number of vegetables: 2 + 1 + 1 + 1 + 2 = 7.

So, you have 7 vegetables.

Spatial reasoning in natural language processing focuses on understanding and processing spatial information within text. Some researchers have explored methods for training models on navigation tasks that require agents to reach specific locations based on textual descriptions.

Others have developed tasks where models generate instructions from “before” and “after” image pairs or created benchmarks for answering spatial questions about environments. Additionally, it has been observed that large language models struggle with text-based games that involve multi-step reasoning.



Source link

Protected by Security by CleanTalk