16Jul

The AI Act and a (sorely missing!) right to AI individualization; Why are we building Skynet? – European Law Blog


Blogpost 37/2024

The industry has tricked us; Scientists and regulators have failed us. AI is developing not individually (as humans become individuals) but collectively. A huge collective hive to collect, store and process all of humanity’s information; a single entity (or a few, interoperability as an open issue today as their operation itself) to process all our questions, wishes and knowledge. The AI Act that has just been released ratifies, for the moment at least, this approach: EU’s ambitious attempt to regulate AI deals with it as if it was simply a phenomenon in need of better organisation, without granting any rights (or participation, thus a voice) to individuals. This is not only a missed opportunity but also a potentially risky approach; while we may not be building Skynet as such, we are accepting an industry-imposed shortcut that will ultimately hurt individual rights, if not individual development per se.

This mode of AI development has been a result of short-termism: an, immediate, need to get results quickly and to make a ‘fast buck’. Unlimited (and unregulated, save for the GDPR) access to whatever information is available for processing obviously speeds things up – and keeps costs down. Data-hungry AI models learn faster through access to as-large-as-possible repositories of information; then, improvements can be fed into next-generation AI models, that are even more data-hungry than their predecessors. The cycle can be virtuous or vicious, depending how you see it.

In 1984 iconic film The Terminator humans fought against Skynet, “an artificial neural network-based conscious group mind and artificial general superintelligence system”. Skynet was a single, collective intelligence (“group mind”) that quickly learned everything that humans knew and controlled all of the machines. Machines (including, Terminators) did not develop independently, but as units within a hive, answering to and controlled by a single, omnipresent and omnipotent entity – Skynet.

Isn’t this exactly what we are doing today? Are we not happy to let Siri, Alexa, ChatGPT (or whatever other AI entity the industry and scientists launch) process as a single entity, a single other-party with which each one of us interacts, all of our information through our daily queries and interactions with them? Are we not also happy to let them control, using that same information, all of our smart devices at home or at the workplace? Are we not, voluntarily, building Skynet?

 

But, I do not want to be talking to (everybody’s) Siri!

All our AI end-user software (or otherwise automated software assistants) is designed and operates as a single, global entity. I may be interacting with Siri on my iPhone (or Google Assistant, Alexa, Cortana etc.), asking it to carry out various tasks for me, but the same do millions of other people on the planet. In essence, Siri is a single entity interacting simultaneously with each one of us. It is learning from us and with us. Crucially, however, the improvement from the learning process goes to the one, global, Siri. In other words, each one of us is assisted individually through our interaction with Siri, but Siri develops and improves itself as a one and only entity, globally.

The same is the case today with any other AI-powered or AI-aspiring entity. ChatGPT answers any question or request that pops in one’s mind, however this interaction assists each one of us individually but develops ChatGPT itself globally, as a single entity. Google Maps drives us (more or less) safely home but at the same time it catalogues how all of us are able to move on the planet. Amazon offers us suggestions on books or items we may like to buy, and Spotify on music we may like to listen to, but at the same time their algorithms learn what humans need or how they appreciate art.

Basically, if one wanted to trace this development back, they would come across the moment that software transformed from a product to a service. In the beginning, before prevalence of the internet, software was a product: one bought it off-the-shelf, installed it on their computer and used it (subject to the occasional update) without having anything to do with the manufacturer. However, when each and every computer and computing device on the planet became interconnected, the software industry, on the pretence of automated updates and improved user experience, found an excellent way to increase its revenue: software became not a product but a service, payable in monthly instalments that apparently will never stop. Accordingly, in order to (lawfully) remain a service, software needed to remain constantly connected to its manufacturer/provider, feeding it at all times with details on our use and other preferences.

No user was ever asked about the “software-as-a-service” transformation (governments, particularly from tax-havens, happily obliged, offering tax residencies for such services against competitive taxation). Similarly, no user has been asked today whether they want to interact with (everybody’s) Siri. One AI-entity to interact with all of humanity is a fundamentally flawed assumption. Humans  act individually, each one at their own initiative, not as units within a hive. The tools they invent to assist them they use individually. Of course it is true that each one’s personal self-improvement when added up within our respective societies leads to overall progress, however, still, humanity’s progress is achieved individually, independently and in unknown and frequently surprising directions.

On the contrary, scientists and the industry are offering us today a single tool  (or, in any case, very few, interoperability among them still an open issue) to be used by each one of us in a recordable and processable (by that tool, not by us!) manner. This is unprecedented in humanity’s history. The only entity so far to, in its singularity, interact with each one of us separately, to be assumed omnipresent and omnipotent, is God.

 

The AI Act: A half-baked GDPR mimesis phenomenon

The biggest shortcoming of the recently published AI Act, and EU’s approach to AI overall, is that it deals with it only as a technology that needs, better, organisation. The EU tries to map and catalogue AI, and then to apply a risk-based approach to reduce its negative effects (while, hopefully, still allowing it to, lawfully, develop in regulatory sandboxes etc.). To this end the EU employs organisational and technical measures to deal with AI, complete with a bureaucratic mechanism to monitor and apply them in practice.

The similarity of this approach to the GDPR’s approach, or a GDPR-mimesis phenomenon, has already been identified. The problem is that, even under this overly protective and least-imaginative approach, the AI Act is only a half-baked GDPR mimesis example. This is because the AI Act fails to follow the GDPR’s fundamental policy option to include the users (data subjects) in its scope. On the contrary, the AI Act leaves users out.

The GDPR’s policy option to include the users may appear self-evident now, in 2024, however it is anything but. Back in the 1970s, when the first data protection laws were being drafted in Europe, the pendulum could have swinged towards any direction: legislators may well have chosen to deal with personal data processing as a technology only in need of better organisation, too. They could well have chosen to introduce only high-level principles on how controllers should process personal data. However, importantly, they did not. They found a way to include individuals, to grant them rights, to empower them. They did not leave personal data processing only to organisations and bureaucrats to manage.

This is something that the AI Act is sorely missing. Even combined with the AI Liability Directive, still it leaves users out of the AI scene. This is a huge omission: users need to be able to participate, to actively use and take advantage of AI, and to be afforded with the means to protect themselves from it, if needed.

 

In urgent need: A (people’s) right to AI individualisation

It is this need for users to participate in the AI scene that a right to AI individualisation would serve. A right to AI individualisation would allow users to use AI in the way each one sees fit, deliberately, unmonitored and unobserved by the AI manufacturer. The link with the provider, that today is always-on and feeds all of our innermost thoughts, wishes and ideas back to a collective hive, needs to be broken. In other words, we only need the technology, the algorithm alone, to train it and use it ourselves without anybody’s interference. This is not a matter simply of individualisation of the experience on the UX end, but, basically, on the backend.-The ‘connection with the server’, that has been forced upon us through the Software-as-a-Service transformation, needs to be severed and control, of its own, personalised AI, should be given back to the user. In other words,  We need to be afforded the right to move from (everybody’s) Siri to each one’s Maria, Tom, or R2-D2.

Arguably, the right to data protection serves this need already, granting us control over processing of our personal data by third parties. However, the right to data protection involves  the, known, nuances of, for example, various legal bases permitting the processing anyway or technical-feasibility limitations of rights afforded to individuals. After all, it is under this existing regulatory model, that remains in effect, that today’s model of AI development was allowed to take place anyway. A specific, explicitly spelled-out right to AI individualisation would address exactly that; closing existing loopholes that the industry was able to take advantage of, while placing users in the centre.

A host of other considerations would follow the introduction of such a right. Principles such as data portability (art. 20 of the GDPR), interoperability (art. 6 of EU Directive 2009/24/EC) or, even, a right to be forgotten (art. 17 of the GDPR) would have to be revisited. Basically, our whole perspective would be overturned: users would be transformed from passive recipients to active co-creators, and AI itself from a single-entity monolith to a billion individualised versions, same as the number of the users it serves.

As such, a right to AI individualisation would need to be embedded in systems’ design, similar to privacy by-design and by-default requirements. This is a trend increasingly noticeable in contemporary law-making: while digital technologies permeate our lives, legislators find that sometimes it is not enough to regulate the end-result, meaning human behaviour, but also the tools or methods that led to it, meaning software. Soon, software development and software systems’ architecture will have to pay close attention to (if not be dictated by) a large array of legal requirements, found in personal data protection, cybersecurity, online platforms and other fields of law. In essence, it would appear that, contrary to an older belief that code is law, at the end of the day (it is) law (that) makes code.



Source link

16Jul

From Scratch to Deep Quantile Forecasting | by Jinhang Jiang | Jul, 2024


An end-2-end empirical sharing of multi-step quantile forecasting with Tensorflow, NeuralForecast, and Zero-shot LLMs.

Image by Author
  1. Short Introduction
  2. Data
  3. Build a Toy Version of Quantile Recurrent Forecaster
  4. Quantile Forecasting with the State-of-Art Models
  5. Zero-shot Quantile Forecast with LLMs
  6. Conclusion

Quantile forecasting is a statistical technique used to predict different quantiles (e.g., the median or the 90th percentile) of a response variable’s distribution, providing a more comprehensive view of potential future outcomes. Unlike traditional mean forecasting, which only estimates the average, quantile forecasting allows us to understand the range and likelihood of various possible results.

Quantile forecasting is essential for decision-making in contexts with asymmetric loss functions or varying risk preferences. In supply chain management, for example, predicting the 90th percentile of demand ensures sufficient stock levels to avoid shortages, while predicting the 10th percentile helps minimize overstock and associated costs. This methodology is particularly advantageous in sectors such as finance, meteorology, and energy, where understanding distribution extremes is as critical as the mean.

Both quantile forecasting and conformal prediction address uncertainty, yet their methodologies differ significantly. Quantile forecasting directly models specific quantiles of the response variable, providing detailed insights into its distribution. Conversely, conformal prediction is a model-agnostic technique that constructs prediction intervals around forecasts, guaranteeing that the true value falls within the interval with a specified probability. Quantile forecasting yields precise quantile estimates, whereas conformal prediction offers broader interval assurances.

The implementation of quantile forecasting can markedly enhance decision-making by providing a sophisticated understanding of future uncertainties. This approach allows organizations to tailor strategies to different risk levels, optimize resource allocation, and improve operational efficiency. By capturing a comprehensive range of potential outcomes, quantile forecasting enables organizations to make informed, data-driven decisions, thereby mitigating risks and enhancing overall performance.

To demonstrate the work, I chose to use the data from the M4 competition as an example. The data is under CC0: Public Domain license which can be accessed here. The data can also be loaded through datasetsforecast package:

# Install the package
pip install datasetsforecast
# Load Data
df, *_ = M4.load('./data', group='Weekly')
# Randomly select three items
df = df[df['unique_id'].isin(['W96', 'W100', 'W99'])]
# Define the start date (for example, "1970-01-04")
start_date = pd.to_datetime("1970-01-04")
# Convert 'ds' to actual week dates
df['ds'] = start_date + pd.to_timedelta(df['ds'] - 1, unit='W')
# Display the DataFrame
df.head()
Image by Author

The original data contains over 300 unique time series. To demonstrate, I randomly selected three time series: W96, W99, and W100, as they all have the same history length. The original timestamp is masked as integer numbers (i.e., 1–2296), I manually converted it back to normal date format with the first date to be January 4th, 1970. The following figure is a preview of W99:

Image by Author

First, let’s build a quantile forecaster from scratch to understand how the target data flows through the pipeline and how the forecasts are generated. I picked the idea from the paper A Multi-Horizon Quantile Recurrent Forecaster by Wen et al. The authors proposed a Multi-Horizon Quantile Recurrent Neural Network (MQ-RNN) framework that combines Sequence-to-Sequence Neural Networks, Quantile Regression, and Direct Multi-Horizon Forecasting for accurate and robust multi-step time series forecasting. By leveraging the expressiveness of neural networks, the nonparametric nature of quantile regression, and a novel training scheme called forking-sequences, the model can effectively handle shifting seasonality, known future events, and cold-start situations in large-scale forecasting applications.

We cannot reproduce everything in this short blog, but we can try to replicate part of it using the TensorFlow package as a demo. If you are interested in the implementation of the paper, there is an ongoing project that you can leverage: MQRNN.

Let’s first load the necessary package and define some global parameters. We will use the LSTM model as the core, and we need to do some preprocessing on the data to obtain the rolling windows before fitting. The input_shape is set to (104, 1) meaning we are using two years of data for each training window. In this walkthrough, we will only look into an 80% confidence interval with the median as the point forecast, which means the quantiles = [0.1, 0.5, 0.9]. We will use the last 12 weeks as a test dataset, so the output_steps or horizon is equal to 12 and the cut_off_date will be ‘2013–10–13’.

# Install the package
pip install tensorflow

# Load the package
from sklearn.preprocessing import StandardScaler
from datetime import datetime
from tensorflow.keras.models import Model
from tensorflow.keras.layers import Input, LSTM, Dense, concatenate, Layer

# Define Global Parameters
input_shape = (104, 1)
quantiles = [0.1, 0.9]
output_steps = 12
cut_off_date = '2013-10-13'
tf.random.set_seed(20240710)

Next, let’s convert the data to rolling windows which is the desired input shape for RNN-based models:

# Preprocess The Data
def preprocess_data(df, window_size = 104, forecast_horizon = 12):
# Ensure the dataframe is sorted by item and date

df = df.sort_values(by=['unique_id', 'ds'])
# List to hold processed data for each item
X, y, unique_id, ds = [], [], [], []
# Normalizer
scaler = StandardScaler()
# Iterate through each item
for key, group in df.groupby('unique_id'):
demand = group['y'].values.reshape(-1, 1)
scaled_demand = scaler.fit_transform(demand)
dates = group['ds'].values
# Create sequences (sliding window approach)
for i in range(len(scaled_demand) - window_size - forecast_horizon + 1):
X.append(scaled_demand[i:i+window_size])
y.append(scaled_demand[i+window_size:i+window_size+forecast_horizon].flatten())
unique_id.append(key)
ds.append(dates[i+window_size:i+window_size+forecast_horizon])
X = np.array(X)
y = np.array(y)
return X, y, unique_id, ds, scaler

Then we split the data into train, val, and test:

# Split Data
def split_data(X, y, unique_id, ds, cut_off_date):
cut_off_date = pd.to_datetime(cut_off_date)
val_start_date = cut_off_date - pd.Timedelta(weeks=12)
train_idx = [i for i, date in enumerate(ds) if date[0] val_idx = [i for i, date in enumerate(ds) if val_start_date test_idx = [i for i, date in enumerate(ds) if date[0] >= cut_off_date]

X_train, y_train = X[train_idx], y[train_idx]
X_val, y_val = X[val_idx], y[val_idx]
X_test, y_test = X[test_idx], y[test_idx]

train_unique_id = [unique_id[i] for i in train_idx]
train_ds = [ds[i] for i in train_idx]
val_unique_id = [unique_id[i] for i in val_idx]
val_ds = [ds[i] for i in val_idx]
test_unique_id = [unique_id[i] for i in test_idx]
test_ds = [ds[i] for i in test_idx]

return X_train, y_train, X_val, y_val, X_test, y_test, train_unique_id, train_ds, val_unique_id, val_ds, test_unique_id, test_ds

The authors of the MQRNN utilized both horizon-specific local context, essential for temporal awareness and seasonality mapping, and horizon-agnostic global context to capture non-time-sensitive information, enhancing the stability of learning and the smoothness of generated forecasts. To build a model that sort of reproduces what the MQRNN is doing, we need to write a quantile loss function and add layers that capture local context and global context. I added an attention layer to it to show you how the attention mechanism can be included in such a process:

# Attention Layer
class Attention(Layer):
def __init__(self, units):
super(Attention, self).__init__()
self.W1 = Dense(units)
self.W2 = Dense(units)
self.V = Dense(1)
def call(self, query, values):
hidden_with_time_axis = tf.expand_dims(query, 1)
score = self.V(tf.nn.tanh(self.W1(values) + self.W2(hidden_with_time_axis)))
attention_weights = tf.nn.softmax(score, axis=1)
context_vector = attention_weights * values
context_vector = tf.reduce_sum(context_vector, axis=1)
return context_vector, attention_weights

# Quantile Loss Function
def quantile_loss(q, y_true, y_pred):
e = y_true - y_pred
return tf.reduce_mean(tf.maximum(q*e, (q-1)*e))

def combined_quantile_loss(quantiles, y_true, y_pred, output_steps):
losses = [quantile_loss(q, y_true, y_pred[:, i*output_steps:(i+1)*output_steps]) for i, q in enumerate(quantiles)]
return tf.reduce_mean(losses)

# Model architecture
def create_model(input_shape, quantiles, output_steps):
inputs = Input(shape=input_shape)
lstm1 = LSTM(256, return_sequences=True)(inputs)
lstm_out, state_h, state_c = LSTM(256, return_sequences=True, return_state=True)(lstm1)
context_vector, attention_weights = Attention(256)(state_h, lstm_out)
global_context = Dense(100, activation = 'relu')(context_vector)
forecasts = []
for q in quantiles:
local_context = concatenate([global_context, context_vector])
forecast = Dense(output_steps, activation = 'linear')(local_context)
forecasts.append(forecast)
outputs = concatenate(forecasts, axis=1)
model = Model(inputs, outputs)
model.compile(optimizer='adam', loss=lambda y, f: combined_quantile_loss(quantiles, y, f, output_steps))
return model

Here are the plotted forecasting results:

We also evaluated the SMAPE for each item, as well as the percentage coverage of the interval (how much actual was covered by the interval). The results are as follows:

This toy version can serve as a good baseline to start with quantile forecasting. The distributed training is not configured for this setup nor the model architecture is optimized for large-scale forecasting, thus it might suffer from speed issues. In the next section, we will look into a package that allows you to do quantile forecasts with the most advanced deep-learning models.

The neuralforecast package is an outstanding Python library that allows you to use most of the SOTA deep neural network models for time series forecasting, such as PatchTST, NBEATs, NHITS, TimeMixer, etc. with easy implementation. In this section, I will use PatchTST as an example to show you how to perform quantile forecasting.

First, load the necessary modules and define the parameters for PatchTST. Tuning the model will require some empirical experience and will be project-dependent. If you are interested in getting the potential-optimal parameters for your data, you may look into the auto modules from the neuralforecast. They will allow you to use Ray to perform hyperparameter tuning. And it is quite efficient! The neuralforecast package carries a great set of models that are based on different sampling approaches. The ones with the base_window approach will allow you to use MQLoss or HuberMQLoss, where you can specify the quantile levels you are looking for. In this work, I picked HuberMQLoss as it is more robust to outliers.

# Install the package
pip install neuralforecast

# Load the package
from neuralforecast.core import NeuralForecast
from neuralforecast.models import PatchTST
from neuralforecast.losses.pytorch import HuberMQLoss, MQLoss

# Define Parameters for PatchTST
PARAMS = {'input_size': 104,
'h': output_steps,
'max_steps': 6000,
'encoder_layers': 4,
'start_padding_enabled': False,
'learning_rate': 1e-4,
'patch_len': 52, # Length of each patch
'hidden_size': 256, # Size of the hidden layers
'n_heads': 4, # Number of attention heads
'res_attention': True,
'dropout': 0.1, # Dropout rate
'activation': 'gelu', # Activation function
'dropout': 0.1,
'attn_dropout': 0.1,
'fc_dropout': 0.1,
'random_seed': 20240710,
'loss': HuberMQLoss(quantiles=[0.1, 0.5, 0.9]),
'scaler_type': 'standard',
'early_stop_patience_steps': 10}

# Get Training Data
train_df = df[df.ds

# Fit and predict with PatchTST
models = [PatchTST(**PARAMS)]
nf = NeuralForecast(models=models, freq='W')
nf.fit(df=train_df, val_size=12)
Y_hat_df = nf.predict().reset_index()

Here are plotted forecasts:

Here are the metrics:

Through the demo, you can see how easy to implement the model and how the performance of the model has been lifted. However, if you wonder if there are any easier approaches to do this task, the answer is YES. In the next section, we will look into a T5-based model that allows you to conduct zero-shot quantile forecasting.

We have been witnessing a trend where the advancement in NLP will also further push the boundaries for time series forecasting as predicting the next word is a synthetic process for predicting the next period’s value. Given the fast development of large language models (LLMs) for generative tasks, researchers have also started to look into pre-training a large model on millions of time series, allowing users to do zero-shot forecasts.

However, before we draw an equal sign between the LLMs and Zero-shot Time Series tasks, we have to answer one question: what is the difference between training a language model and training a time series model? It would be “tokens from a finite dictionary versus values from an unbounded.” Amazon recently released a project called Chronos which well handled the challenge and made the large time series model happen. As the authors stated: “Chronos tokenizes time series into discrete bins through simple scaling and quantization of real values. In this way, we can train off-the-shelf language models on this ‘language of time series,’ with no changes to the model architecture”. The original paper can be found here.

Currently, Chronos is available in multiple versions. It can be loaded and used through the autogluon API with only a few lines of code.

# Get Training Data and Transform
train_df = df[df.dstrain_df_chronos = TimeSeriesDataFrame(train_df.rename(columns={'ds': 'timestamp', 'unique_id': 'item_id', 'y': 'target'}))

# Zero-shot forecast with Chronos
predictor = TimeSeriesPredictor(prediction_length=output_steps, freq='W', quantile_levels = [0.1, 0.9]).fit(
train_df_chronos, presets="chronos_base",
random_seed = 20240710
)
Y_hat_df_chronos = predictor.predict(train_df_chronos).reset_index().rename(columns={'mean': 'Chronos',
'0.1': 'P10',
'0.9': 'P90',
'timestamp': 'ds',
'item_id': 'unique_id'})

Here are the plotted forecasts:

Here are the metrics:

As you can see, Chronos showed a very decent performance compared to PatchTST. However, it does not mean it has surpassed PatchTST, since it is very likely that Chronos has been trained on M4 data. In their original paper, the authors also evaluated their model on the datasets that the model has not been trained on, and Chronos still yielded very comparable results to the SOTA models.

There are many more large time series models being developed right now. One of them is called TimeGPT which was developed by NIXTLA. The invention of this kind of model not only made the forecasting task easier, more reliable, and consistent, but it is also a good starting point to make reasonable guesses for time series with limited historical data.

From building a toy version of a quantile recurrent forecaster to leveraging state-of-the-art models and zero-shot large language models, this blog has demonstrated the power and versatility of quantile forecasting. By incorporating models like TensorFlow’s LSTM, NeuralForecast’s PatchTST, and Amazon’s Chronos, we can achieve accurate, robust, and computationally efficient multi-step time series forecasts. Quantile forecasting not only enhances decision-making by providing a nuanced understanding of future uncertainties but also allows organizations to optimize strategies and resource allocation. The advancements in neural networks and zero-shot learning models further push the boundaries, making quantile forecasting a pivotal tool in modern data-driven industries.

Note: All the images, numbers and tables are generated by the author. The complete code can be found here: Quantile Forecasting.



Source link

15Jul

Winter Fellowship 2025 | GovAI Blog


About the Team

GovAI’s mission is to help humanity navigate the transition to a world with advanced AI. Our world-class research has helped shape the nascent field of AI governance. Our team and affiliate community possess expertise in a wide variety of domains, including US-China relations, arms race dynamics, EU policy, and AI progress forecasting.

We are looking for early-career individuals or established professionals new to the field of AI governance to join our team for three months and learn about the field of AI governance while making connections with other researchers and practitioners. This opportunity will be a particularly good fit for individuals who are excited to use their careers to shape the lasting implications of AI.

About the Fellowship

Summer and Winter Fellows join GovAI to conduct independent research on a topic of their choice, with mentorship from leading experts in the field of AI governance. Fellows will also join a series of Q&A sessions with AI governance experts, research seminars, and researcher work-in-progress meetings. Each Fellow will be paired with a supervisor from the GovAI team or affiliate network.

You can read about the topics our previous cohorts of Summer and Winter Fellows worked on here and here.

Past Fellows have gone on to work on AI governance full-time in government or at organisations including GovAI, OpenAI, the AI Now Institute, and RAND. Others have gone on to build relevant expertise at leading universities such as MIT, Stanford University, University College London, and the University of Oxford.

Fellowship Experience

As a Fellow, you will spend the first week or two of the fellowship exploring research topic options before settling on a research proposal with input from your mentors.

Valerie Belu, GovAI’s Research Management Associate, will support you in deciding what project and output will be most valuable for you to work towards, for example, publishing a report, journal article, or blog post. You will also take time to explore the wider AI governance space and discuss follow-on career opportunities in the field of AI governance with our team.

If you are an experienced professional or established academic considering transitioning into the field of AI governance, we might be open to tailoring the fellowship experience to your needs. This could include only attending parts of the fellowship or teaching buy-outs, if necessary. Please reach out to our Research Management Associate Valerie Belu (

va**********@go********.ai











) if you would like to discuss specifics before applying.

Qualifications and Selection Criteria

We strongly encourage you to apply if you have an interest in our work and are considering using your career to study or shape the long-term implications of advanced AI.

Given the multidisciplinary nature of our work, we are interested in candidates from a broad set of disciplines including political science, public policy, history, economics, sociology, law, philosophy, and computer science. We are particularly interested in hosting more researchers with strong technical backgrounds. There are no specific educational requirements for the role, although we expect that the most promising candidates will typically have relevant graduate study or research experience in related areas.

When assessing applications, we will be looking for candidates who have the following strengths or show positive signs of being able to develop them:

Quality of work: The ability to produce clearly written, insightful, and even-handed research. We are particularly excited about strong reasoning ability and clear and concise writing.

Relevant expertise: Skills or knowledge that are likely to be helpful for work on AI governance. We think that relevant expertise can take many different forms. Note that we also do not have any strict degree requirements.

Judgement: The ability to prioritise between different research directions, and good intuitions about the feasibility of different research directions.

Team fit: Openness to feedback, commitment to intellectual honesty and rigour, comfort in expressing uncertainty, and a serious interest in using your career to contribute to AI governance.

Salary, Duration, and Location

Summer and Winter Fellowships last for three months, and Fellows will receive a stipend of £9,000, plus support for travelling to Oxford. While in Oxford, we provide our Fellows with lunch on weekdays and a desk in our office. This is intended to be a full-time and in-person role, based in Oxford, UK. We are able to sponsor visas. For successful applicants who require a visa, note that you will need to remain in your country of visa application for some time while the visa application is underway.

Winter Fellows will join for three months, between January and April (precise dates TBC).

How to Apply and What to Expect

The application process consists of a written submission in the first round, a remote work test in the second round, and an interview in the final round. We expect to reach out to Winter Fellowship candidates for paid work tests in August, offer interviews in September, and communicate final decisions to candidates in October. Please feel free to reach out to

re*********@go********.ai











if you would need a decision communicated earlier than the standard timeline (this may or may not be possible), or have questions about the application process.

We accept applications from anywhere in the world. We are committed to fostering a culture of inclusion, and we encourage individuals with diverse backgrounds and experiences to apply. We especially encourage applications from women, gender minorities, and people of colour who are excited about contributing to our mission. We are an equal opportunity employer. If you are concerned that you’re not the right fit but have a strong interest in the Fellowship, we encourage you to apply anyway.



Source link

15Jul

LangSmith, LangGraph Cloud & LangGraph Studio | by Cobus Greyling | Jul, 2024


In this article I do a complete end-to-end walkthrough of an Agent built using LangGraph, deployed to LangGraph Cloud & viewed via LangGraph Studio. Ending with LangSmith on managing applications & LLM performance.

Considering the intersection of language and AI, developments have been taking place at a tremendous pace. And LangChain finds itself at the forefront of shaping how generative AI applications are developed and managed.

A few initial observations regarding generative AI and language:

  1. A few months ago it was thought that OpenAI has captured the market with their highly capable LLMs.
  2. Then a slew of open-sourced models, most notably from Meta disrupted the perceived commercial model.
  3. LLM providers realised that Language Models will become a mere utility and started to focus on end-user applications and RAG-like functionalities referred to as grounding, agent-like functionality and personal assistants.
  4. Hallucination had to be solved for, and it was discovered that LLMs do not have emergent capabilities, but rather LLMs do exceptionally well at in-context learning (ICL). An application structure developed around implementing, scaling and managing ICL implementations; which we now know as RAG.
  5. RAG (non-gradient) started to be preferred above fine-tuning (gradient) approaches for reasons of being transparent, not as opaque as fine-tuning. Adding to generative AI apps being observable, inspectable and easy modifiable.
  6. Because we started using all aspects of LLMs (NLG, reasoning, planning, dialog state management, etc.) except the knowledge intensive nature of LLMs, Small Language Models become very much applicable.
  7. This was due to very capable open-sourced SLMs, quantisation, local, offline inference, advanced capability in reasoning and chain-of-thought training.
  8. And, the focus is shifting to two aspects…the first being a data centric approach. Where unstructured data can be discovered, designed and augmented for RAG and fine-tuning. Recent fine-tuning did not focus on augmenting the knowledge-intensive nature of Language Models, but rather to imbue the LMs with specific behavioural capabilities.
  9. This is evident in the recent acquisition bye OpenAI to move closer to the data portion and delivering RAG solutions.
  10. The second aspect the need for a no-code to low-code AI productivity suite providing access to models, hosting, flow-engineering, fine-tuning, prompt studio and guardrails.
  11. There is also a notable movement to add graph data…graph is an abstract data type…An abstract data type is a mathematical model for data types, defined by its behaviour (semantics) from the point of view of a user of the data. Abstract data types are in stark contrasts with data structures, which are concrete representations of data, and are the point of view of an implementer, not a user. This data structure is less opaque and easy to interpret.

langChain introduced LangSmith as a tool for detailed tracing and management of Generative AI applications. The offering included a prompt playground, and prompt hub.

langChain also recently introduced LangGraph, which adds to some degree structure to agentic applications.

An abstract data type is a mathematical model for data types, defined by its behaviour (semantics) from the point of view of a user of the data.

Abstract data types are in stark contrasts with data structures, which are concrete representations of data, and are the point of view of an implementer, not a user. This data structure is less opaque and easy to interpret.

Directed graph (or digraph) is a graph that is made up of a set of nodes connected by directed edges.

Graph data structure consists of a finite set of nodes together with a set of unordered pairs of these nodes for an undirected graph.

Considering the graph representation below, the nodes are shown, together with the edges and the edge options.



Source link

15Jul

Vienna calling (Luxembourg) – About the admissibility of an Action for Annulment of the Nature Restoration Law – European Law Blog


Blogpost 36/2024

This blogpost is dedicated to legal questions arising from the ongoing ‘coalition crisis’ in Austria, following Environment Minister Leonore Gewessler’s decision to vote in favour of the Regulation on Nature Restoration despite the opposing will of Austria’s Chancellor Karl Nehammer and 7 out of 9 Regional Governments(Bundesländer). While Nehammer is of the opinion that this violates Austrian constitutional law (‘The constitution applies to climate activists as well.’) and has filed an abuse of office complaint, the question arises if the announced action for annulment before the CJEU – if not supported by all members of the government – would be admissible and who else could challenge the law in Luxembourg.

A quick reminder on the facts of the case: The Council adopted the Nature Restoration Law on 17 June 2024, with Gewessler’s (The Greens) vote being the decisive one as otherwise the required number of EU residents would not have been met (Article 16(4) Treaty on European Union, TEU). However, the second party in Austria’s coalition, the Austrian People’s Party (‘ÖVP’) and Chancellor Nehammer were not amused about Gewessler going rogue. When Gewessler announced her intention to support the law in the EU Council of Ministers one day before the vote, Nehammer sent a letter to the Belgium Presidency arguing that Gewessler was ‘not entitled to commit the Republic of Austria according to Art 16 (2) TEU in this regard’ due to a binding uniform opinion of the Regional Governments. Nevertheless, the Council confirmed that the vote would hold, and Brussels capital-regions Environment Minister Alain Maron, who chaired the talks, referred to an ‘internal controversy in Austria’. Notwithstanding the law’s passing, for now, Gewessler attracted harsh criticism from her coalition partners, accusing her of having ‘trampled federalism underfoot’. Even if the ÖVP is committed to maintaining the coalition (since legislative elections in September are approaching), this did not stop them from announcing their will to submit an action for annulment in addition to the criminal charges already filed.

Regarding the merits of the case, there are better arguments that an action for annulment would likely not succeed. This is also reflected by discussions in Austria and Germany together with a recently published Verfassungsblog. The contribution on Verfassungsblog convincingly demonstrates that even if Council members may be bound by additional national guidelines during votes (just as the ÖVP claimed that Gewessler was bound by national law to the uniform opinion of the provinces according to Article 23d Federal Constitutional Act (Bundes-Verfassungsgesetz, B-VG)) this does not affect the validity of votes on the EU-level since the CJEU is only bound to the (formal) requirements of Article 16(2) TEU, which are firstly a representative on ministerial level who is secondly able to commit the government in question. Within these limits, it is up to each Member State to determine how it is represented in the Council (see also Annex I Council’s Rules of Procedure (2009/937/EU). Article 73(2) B-VG stipulates that Austria is represented in the Council by the competent Minister, who, considering the Federal Ministries Act is Leonore Gewessler in matters of the environment, leaving no doubt that she could commit her government (with no further authorization needed). According to the authors, the letter sent by Nehammer to Alexander De Croo, does not lead to a different legal assessment – even in the light of Article 4(3) TEU. One could also question the presence of a ‘manifest’ violation of a national provision of ‘fundamental importance’ in view of the ongoing discussion in Austria right now whether Article 23d has been violated as two Länder withdrew from the former uniform opinion that proves the controversy of the issue (see the comments by Prof. Hipold). Another unfavourable point could be the wording of Nehammer’s letter (‘in this regard’). Although it would be conceivable to withdraw a minister’s power of representation – for example, by dismissing her – acting ministers have the power to speak for a country in the Council (see points raised by Prof. Ruffert).

However, another question implied by Austrian Prof. Bußjäger is whether one minister alone can submit an action for annulment (on behalf of the state). Against this background, the question arises of whether such an action would even pass the formal barriers of Article 263 Treaty on the Functioning of the European Union (TFEU).

According to Art. 263(2) TFUE the Court shall have jurisdiction in

‘actions brought by a Member State, the European Parliament, the Council or the Commission on grounds of lack of competence, infringement of an essential procedural requirement, infringement of the Treaties or of any rule of law relating to their application, or misuse of powers.’ (emphasis added)

Contrary to the non-privileged applicants in paragraph four of the same article, the standing of these so-called privileged applicants is not dependent on anything else, such as individual or direct concern. The Court argued in Italy v Council that even the fact that the act in question was voted for in the Council by the representative of a Member State does not hinder its application for annulment (see also: Lenaerts et al., EU Procedural Law (para. 7.77)). This being made clear, the question remains, who can fulfil the Member State notion.

The answer – according to settled case law – is that the term ‘Member States’ refers to ‘government authorities of the Member States’ (see, for example, Région wallonne v Commission (para. 6.)). Therefore, infra-State authorities – such would be in the current case of Austria one or more Bundesländer – do not satisfy this condition. The only way for them to apply for an action for annulment would be the ‘hard way’ by proofing that they are directly and individually concerned by the contested measure. In fact, this has already been the case in an action for annulment by the Austrian region Oberösterreich in Land Oberösterreich v Commission. In its judgment, the General Court had to assess whether the Land Oberösterreich was individually affected by a Commission decision addressed to the Republic of Austria, which concerned the denial of a request for derogation from a directive in favour of a draft law of the Land Oberösterreich. This led the Court to affirm its locus standi as the contested decision had the effect of preventing the exercise of its own powers conferred on it by the Austrian constitutional order.

It can be concluded that even if a Bundesland itself is unable to submit an action for annulment relying on Article 263(2) TFEU, the Court does indeed consider infra-state conferral of power when it comes to the fulfilment of the criteria of paragraph four, which can ultimately lead to an admissible application for annulment (see also Alves (p. 249 f.)). Nevertheless, it is doubtful that the CJEU will grant standing to one of the Bundesländer that were against the EU Nature Restoration Law since, in the present case, the reviewable act would be the regulation itself (and not as in the above-mentioned case, a decision of the Commission that affects the measure by the Bundesland) which expands the circle of potentially affected applicants and would most definitely contradict the assumption of individual concern under Plaumann. In addition, as made clear above, there is no consensus as to whether there has been a breach of national constitutional law that would affect the constitutional powers of the Länder (even if the regulation would, of course, limit the Länder in the exercise of their conferred powers that include nature conservation).

While the CJEU clarified that only the state government can submit an action for annulment, Article 263(2) TFEU does not state further criteria. One needs to have a closer look at the Austrian constitution to understand the Government’s internal decision-making process. According to Article 69(1) B-VG the Federal Government consists of the Federal Chancellor, the Vice-Chancellor and all the other Federal Ministers. Every one of them is considered a ‘highest organ’, which means there is no hierarchy between them. Until recently, the question of which majority requirements were necessary for a government resolution was unresolved – even if the prevailing opinion was that unanimity was required. However, this changed with the second COVID-19-law when a third paragraph was added stating that ‘the Federal Government shall pass its resolutions unanimously’ (see also: Muzak, B-VG, Art. 69). In other words, under Austrian constitutional law, a unanimous decision by all ministers is required for the collegial body of the Federal Government to adopt a decision. Hence, in the absence of a specific provision that, to the author’s knowledge, applies to the present case, an action for annulment needs the approval of all the members of the government, which is impossible, as Minister Gewessler (and probably the other five Green coalition members) will not consent. Even if the Austrian Government is represented before the CJEU by the Constitutional Service, a solo effort by the responsible Minister for the EU and Constitution would go against Austrian constitutional law (for the effects on the EU level see below). Again, as with action brought by regional entities, one or several ministers can still submit an action through Article 263(4) TFEU (while, of course, needing to prove direct and individual concern).

However, two possible scenarios remain of how a ‘privileged’ action for annulment might succeed after all. The first possibility (and it is not really one): ÖVP could wait until parliament elections on 29 September 2024 and the renewed government. If the Greens go into opposition and a conservative coalition is formed, there is a good chance that unanimity will be found among the new members of the government. Nonetheless, there is a reason why this alternative is of a very theoretical nature. Even though the EU Nature Restoration Law has not yet been published in the OJ, it will soon be. Once published, an action for annulment can be brought within two months and ten days (Article 263(6) TFEU and Article 51 of the Rules of Procedure of the Court of Justice). Hence, it is hard to imagine that the deadline for bringing an action will not have expired by the time the new Government is formed. The second (and more likely) scenario would be that Austrian Chancellor Nehammer and/or his constitutional Minister decide to submit an action for annulment on behalf of the government (without the consent of the entire government), infringing Austrian constitutional law. In the case that the action is brought by the aforementioned Constitutional Service, it will still be considered admissible by the CJEU as the internal decision-making process is (again) a question of domestic constitutional law and not amongst the requirements of Article 263(2) that bind the Court. However, there is a certain irony as Nehammer’s approach would fulfil precisely what he and his party are now accusing Gewessler of: An offence against national constitutional provisions.

Given the above, the case in question would undoubtedly represent a novelty before the CJEU, and many questions (both of a formal and substantive nature) still need to be conclusively clarified. However, one needs to await if and who of the Austrian Government (or, less likely, Regional Governments) submits an action for annulment in the two months following the publication of the Nature Restoration law. Suppose one fears similar coups during EU legislation procedures will soon occur in other Member States. In that case, one can confidently argue that the actors were presumably politically motivated in their respective actions and that the existence of all the necessary factors (national pre-election campaign mood, vote of a country that is decisive in a Council vote, etc.) will probably not be repeated so quickly. When it comes to climate activists, those who have in the past stood up for a reinterpretation of the individual concern criteria under Plaumann by the CJEU may feel a certain satisfaction if the Court – even if granting standing for the Member State – will most likely (albeit for different reasons) dismiss the action as unfounded.



Source link

13Jul

Build an AI Paraphraser Tool in 5 MinutesWith SimplerLLM | by Hasan Aboul Hasan | Jul, 2024


In this post, I’ll show you step-by-step how you can build an AI Paraphraser Tool using Python and SimplerLLM in Minutes.

Something like this:

Intro: How Do Paraphrasing Tools Work?

Before the era of AI, most paraphrasing tools swapped words with their synonyms, maintaining the original meaning of the text.

However, after AI took over this domain, these tools improved to the extent that they are now able to analyze the input text and create an alternative version with a different structure and wording while conveying the same meaning.

Here are some of the things most paraphraser tools do:

  1. Word Substitution: The tool identifies and replaces words with their synonyms while maintaining the original meaning of the text.
  2. Sentence Restructure: One of the most critical steps in paraphrasing is rearranging the structure of sentences. For example, it may convert active voice to passive voice or change the order of phrases and clauses to create a different sentence flow.
  3. Consolidating Information: Summarize information from long sentences or paragraphs into shorter, more concise versions that cover the essential points.
  4. Adjusting Formality and Tone: This would be done based on the settings or intended use. For instance, it can transform a casual tone into a more formal one or vice versa.
  5. Removing Redundancy: Detect and remove redundant phrases or words, making the text clearer without unnecessary repetition.
  6. Ensuring Coherence: Beyond word-level changes, effective paraphrasing ensures that the rephrased text remains logically connected, maintaining the flow and readability of the content.

As you can see, it’s not just about changing words. As we go now over some types of paraphrasing styles, you’ll see how the prompt changes a lot with each one.

Let’s Start! — The Implementation

In our case, the main engine that paraphrases the text is our power prompt. These prompts are fed to OpenAI’s GPT model or any other Model you prefer, which will do all the work for us.

The code structure is very simple. It just reads the content of a text file, paraphrases it using the power prompt chosen, and saves the response in another text file. So, the only part that needs to be very well-crafted is the prompt.

By now, you should have grasped the idea behind how the code works. So, it’s time to get technical!

Get Our Environment Ready

First, our code is dependent on the SimplerLLM library, which makes building AI tools much easier, as you can see now.

let’s start by creating a virtual environment and installing the library.

So, open a new terminal and run the following step-by-step:

1- Create the Virtual Environment:

python - m venv venv

2- Create and Activate the Virtual Environment:

venv/scripts/activate

3- Install SimplerLLM:

pip install simplerllm

Now, we have a virtual environment with simplerllm installed, which will help isolate the project and avoid conflicts between package versions.

Note that You will get all the codes and prompts mentioned, so don’t worry about the snippets now 🙂

First things first, we’ll need to create a .env file and add our OpenAI API Key so that the SimplerLLM functions can use it to generate the responses.

If you don’t have an API key, go to OpenAI’s website and generate a new one. Then, add it to the .env file in this form:

OPENAI_API_KEY = “YOUR_API_KEY”

Now, we’re ready to use the code; here it is:

from SimplerLLM.tools.generic_loader import load_content
from SimplerLLM.language.llm import LLM, LLMProvider
from prompts import Standard, Academic, Kiddie, Formal, Expand, Shorten

text = load_content("input.txt")

#Edit the prompt name in accordance to what you want to convert it to
final_prompt = Academic.format(input = text.content)

llm_instance = LLM.create(provider=LLMProvider.OPENAI, model_name="gpt-4o")

response = llm_instance.generate_response(prompt=final_prompt, max_tokens=1000)

with open("response.txt", "w", encoding='utf-8') as f:
f.write(response)

As you can see, we’ve imported two things to our code; the SimplerLLM functions we’ll use and the prompts module, which I created and saved all the power prompts in, and I’ll give you them for free!

The text variable uses the SimplerLLm function

load_content

that takes your text file as input and loads its respective data. Here’s how it would look:

text = load_content(“input.txt”)

Academic Paraphraser

Now, we need to format the prompt and store it in the final_prompt variable. This can be done by using the

Academic

prompt, which we imported from the prompts module and passed along with the content of the text file.

final_prompt = Academic.format(input = text.content)

Then, we create an OpenAI LLM instance in order to call their GPT model, and we call it using the

generate_reponse

function as shown in the response variable.

llm_instance = LLM.create(provider=LLMProvider.OPENAI, model_name=”gpt-4o”)

response = llm_instance.generate_response(prompt=final_prompt, max_tokens=1000)

💡 Note that by default, this function returns a maximum of 350 tokens as output; that’s why we added the max_tokens parameter to increase it to 1000. If your expected token count is bigger than 1000 tokens make sure you increase this number as needed.

💡 To calculate the number of tokens that your text is, use this tokenizer tool by OpenAI to do so. or you can use the tiktoken library to calculate directly in your python code.

Then, we take the response generated and save it the

response.txt

Here’s what the output will look like:

As you can see, the paraphrased output text is still very close in the number of characters; however, the content drastically changed when we used strictly formal language, and the vocabulary used is way more complex than before.

The changes that occur to the text are in accordance with what I constructed the prompt to do.

Kiddie Paraphraser

Let’s try using the kiddie format instead of academic and see how the output changes. So, just replace kiddie instead of academic in the final_prompt variable like this:

final_prompt = Kiddie.format(input = text.content)

And here’s the result:

The result now is very different from the one above. The words used are more informal, and the idea is explained very simply, making it easy for kids to understand and enjoy.

Shortening Paraphraser

Let’s now try another type of paraphrasing that not only changes words and sentence structure but also changes the text’s length either by increasing or decreasing.

As we did before, we’ll replace kiddie in the final_prompt variable by the shorten to decrease its length. Here’s what we get:

The paraphrased text now was shortened to 147 words, where it was before 204. Plus, it did some word substitution with a little bit of sentence restructuring.

As you can see, the output changes a lot depending on the prompt we choose. So, the better the prompt, the better the result.

This is what we call prompt engineering, which helps you create the most optimal prompts to get the best out of them.

The Code and Prompts

Here are both the code we used above and the prompts.py module file, which contains the prompts we used above.

The Code

The Prompts



Source link

13Jul

Speculative RAG By Google Research | by Cobus Greyling | Jul, 2024


Speculative RAG is a framework that uses a larger generalist language model to efficiently verify multiple RAG drafts produced in parallel by a smaller, specialised distilled language model.

Each draft is based on a distinct subset of retrieved documents, providing diverse perspectives and reducing input token counts per draft.

According to the research, this method enhances comprehension and mitigates position bias over long contexts. By delegating drafting to the smaller model and having the larger model perform a single verification pass, Speculative RAG accelerates the RAG process.

Experiments show that Speculative RAG achieves state-of-the-art performance with reduced latency.

Improving accuracy by up to 12.97% and reducing latency by 51% compared to traditional RAG systems.

This new RAG framework uses a smaller specialist RAG drafter to generate high-quality draft answers.

Each draft comes from a distinct subset of retrieved documents, offering diverse perspectives and reducing input token counts per draft.

The generalist LM works with the RAG drafter without needing additional tuning.

It verifies and integrates the most promising draft into the final answer, enhancing comprehension of each subset and mitigating the lost-in-the-middle phenomenon.

Google believes this method significantly accelerates RAG by having the smaller specialist LM handle drafting, while the larger generalist LM performs a single, unbiased verification pass over the drafts in parallel.

Extensive experiments on four free-form question-answering and closed-set generation benchmarks demonstrate the superior effectiveness and efficiency of this method.

  1. This study is a good example of how Small Language Models are being used on a larger framework which employs model orchestration.
  2. SLMs are leveraged for their reasoning capabilities, for which they have been specifically created.
  3. SLMs are ideal in this scenario, as they are not required to be knowledge intensive in nature for this implementation. Relevant and contextual knowledge is injected at inference.
  4. The aim of this framework is to optimise token count and hence safe cost.
  5. Reducing latency by 51% compared to conventional RAG systems.
  6. Enhances accuracy by up to 12.97%.
  7. Avoid fine-tuning of models.
  8. Multiple RAG drafts are produced in parallel by smaller, specialised Language Models.
  9. This smaller, specialised RAG model, excels at reasoning over retrieved documents and can rapidly produce accurate responses. This reminds of SLMs Orca-2 and Phi-3 which were trained to have exceptional reasoning capabilities.
  10. The best results were achieved with the RAG drafter being the Mistral 7B SLM.
  11. And the verifier Mixtral 8x7B.



Source link

12Jul

Gower’s Distance for Mixed Categorical and Numerical Data | by Haden Pelletier | Jul, 2024


A distance measure for clustering mixed data

Most likely you have heard of Manhattan distance or Euclidean distance. These are two different metrics which provide information as to how distant (or different) two given data points are.

Manhattan and Euclidean distance graphed. Image by author

In a nutshell, Euclidean distance is the shortest distance from point A to point B. Manhattan distance calculates the sum of the absolute differences between the x and y coordinates and finds the distance between them as if they were placed on a grid where you could only go up, down, left, or right (not diagonal).

Distance metrics often underlie clustering algorithms, such as k-means clustering, which uses Euclidean distance. This makes sense, as in order to define clusters, you have to first know how similar or different 2 data points are (aka how distant they are from each other).

Calculating the distance between 2 points

To show this process in action, I will start with an example using Euclidean distance.



Source link

11Jul

Moving From Natural Language Understanding To Mobile UI Understanding | by Cobus Greyling | Jul, 2024


As with conversations, context is of paramount importance. It is very hard to derive meaning from any conversation if there is not sufficient context. That is the underlying principle of RAG, to supply the LLM with context at inference.

Ferret-UI is a model designed to understand user interactions with a mobile screen.

The image is below is quite self explanatory, on how the mobile screen can be interrogated in natural language. There are numerous use-cases which comes to mind.

This solution can be seen as a conversational enablement of a mobile operating system. Or the information can be used to learn from user behaviour and supply users with a customised experience.

This is something which is referred to as ambient orchestration, where user behaviour can be learn and suggestions can be made by the mobile OS, automation of user routines can be intelligent and truly orchestrated.



Source link

10Jul

Teaching Small Language Models to Reason | by Cobus Greyling | Jul, 2024


Chain-Of-Thought Prompting at a foundational level is so successful, that it gave rise to something some refer to as the Chain-Of-X phenomenon. Google Research explored how to generate a CoT data ontology for existing datasets using LLMs and then how to fine-tune smaller Language Models on the CoT.

As most everyone knows, Chain-Of-Thought prompting improves the reasoning capabilities of large language models.

Google asserts that reasoning capabilities only emerge in models with at least tens of billions of parameters. This research from Google explores transferring these capabilities to smaller models via knowledge distillation.

They fine-tuned a student model using the Chain-Of-Thought outputs from a larger teacher model.

Researchers from Google found that this method improves task performance in arithmetic, common sense, and symbolic reasoning datasets.

Chain of thought (CoT) prompting teaches Language Models (LMs) to decompose a reasoning task into a series of intermediate steps.

It is demonstrated that this prompting significantly increases the task accuracy of large language models (LLMs) across common sense, symbolic and mathematical reasoning datasets.

However, the reasoning capabilities of smaller LMs do not improve with CoT prompting, mostly producing illogical CoT. Notably, CoT prompting even reduces the accuracy of models with less than 10 billion parameters.

Research attributes this to abilities, such as semantic understanding and symbolic mapping, only emerging at larger scale models.

Google Research propose a two-step pipeline for CoT (Chain-Of-Thought) knowledge distillation.

Annotation with CoT Reasoning

  1. Use a teacher model, like PaLM 540B or GPT-3 175B, to annotate an existing supervised dataset with CoT reasoning.
  2. Perform few-shot prompting with 8 examples to generate CoTs, adapting prompts to provide the target answer after the question and before the example CoT. This helps correct small mistakes.
  3. Remove incorrect CoTs based on the target answer to ensure quality.

Fine-Tuning the Student Model

  1. Fine-Tune a student model using teacher forcing.
  2. Provide the question as input and the CoT and answer as the target.
  3. This training eliminates the need for prompting during fine-tuning.

An overview of the proposed method is shown in the figure below:



Source link

Protected by Security by CleanTalk