Blog Classic - FindAI Jobs, training and advice

19May

Summer Fellowship 2023 Wrap Up – What Did Our Fellows Work On?

The Summer and Winter Fellowships offer an opportunity for up-and-coming individuals to invest three months in AI governance research projects, deepen their knowledge of the field, and forge connections with fellow researchers and practitioners.

Our Summer Fellows come from a variety of disciplines and a range of prior experience – some fellows ventured into entirely new intellectual territory for their projects, and some fellows used the time to extend their previous work.

We extend our sincere appreciation to all our supervisors for their dedicated mentorship and guidance this summer, as well as their commitment to nurturing the next generation of researchers.

If you’re interested in applying for future fellowships, check out our Opportunities page. You can register your expression of interest here.

‍

‍

‍

‍

‍

‍

‍

‍

‍

‍

‍

Source link

19May

Winter Fellowship 2023 Wrap Up – What Did Our Fellows Work On?

Our 2023 Winter Fellowship recently ended, and we’re proud to highlight what our Winter Fellows have been up to.

Summer and Winter Fellowships provide an opportunity for early-career individuals to spend three months working on an AI governance research project, learning about the field, and making connections with other researchers and practitioners.

Winter Fellows come from a variety of disciplines and a range of prior experience – some fellows ventured into entirely new intellectual territory for their projects, and some fellows used the time to extend their previous work.

We gratefully thank all of the supervisors for their mentorship and guidance this winter, and for dedicating time to training the next generation of researchers.

If you’re interested in applying for future fellowships, check out our Opportunities page. You can register your expression of interest here.

‍

‍

‍

‍

‍

‍

‍

‍

‍

‍

‍

Source link

19May

Winter Fellowship 2024 Wrap Up – What Did Our Fellows Work On?

Our Winter Fellows come from a variety of disciplines and a range of prior experience – some fellows ventured into entirely new intellectual territory for their projects, and some fellows used the time to extend their previous work.

We extend our sincere appreciation to all our supervisors for their dedicated mentorship and guidance this winter, as well as their commitment to nurturing the next generation of researchers.

If you’re interested in applying for future fellowships, check out our Opportunities page. You can register your expression of interest here.

‍

‍

‍

‍

‍

‍

‍

‍

‍

‍

‍

‍

Source link

19May

Evaluating Predictions of Model Behaviour

GovAI research blog posts represent the views of their authors, rather than the views of the organisation.
‍

Introduction

Some existing AI systems have the potential to cause harm, for example through the misuse of their capabilities, through reliability issues, or through systemic bias. As AI systems become more capable, the scale of potential harm could increase. In order to make responsible decisions about whether and how to deploy new AI systems, it is important to be able to predict how they may behave when they are put into use in the real world.

One approach to predicting how models will behave in the real world is to run model evaluations. Model evaluations are tests for specific model capabilities (such as the ability to offer useful instructions on building weapons) and model tendencies (such as a tendency to exhibit gender bias when rating job applications). Although model evaluations can identify some harmful behaviours, it can be unclear how much information they provide about a model’s real-world behaviour. The real world is often different from what can be captured in a model evaluation. In particular, once a model is deployed, it will be exposed to a much wider range of circumstances (e.g. user requests) than it can be exposed to in the lab.

To address this problem, I suggest implementing prediction evaluations to assess an actor’s ability to predict how model evaluation results will translate to a broader range of situations. In a prediction evaluation, an initial set of model evaluations is run on a model. An actor — such as the model evaluation team within an AI company — then attempts to predict the results of a separate set of model evaluations, based on the initial results. Prediction evaluations could fit into AI governance by helping to calibrate trust in model evaluations. For example, a developer could use prediction evaluations internally to gauge whether further investigation of a model’s safety properties is warranted.

More work is required to understand whether, how, and when to implement prediction evaluations. Actors that currently engage in model evaluations could experiment with prediction evaluations to make progress on this work.

‍

Prediction evaluations can assess how well we understand model generalisation

Deciding when it is safe to deploy a new AI system is a crucial challenge. Model evaluations – tests conducted on models to assess them for potentially harmful capabilities or propensities – can inform these decisions.¹ However, models will inevitably face a much wider range of conditions in the real world than they face during evaluations. For example, users often find new prompts (which evaluators never tested) that cause language models such as GPT-4 and Claude to behave in unexpected or unintended ways.²

We therefore need to understand how model evaluation results generalise: that is, how much information model evaluations provide about how a model will behave once deployed.³ Without an understanding of generalisation, model evaluation results may lead decision-makers to mistakenly deploy models that cause much more real-world harm than anticipated.⁴

We propose implementing prediction evaluations⁵ to assess an actor’s understanding of how model evaluation results will generalise. In a prediction evaluation, an initial set of model evaluations is run on a model and provided to an actor. The actor then predicts how the model will behave on a distinct set of evaluations (test evaluations), given certain limitations on what the actor knows (e.g. about details of the test evaluations) and can do while formulating their prediction (e.g. whether they can run the model). Finally, a judge grades the actor’s prediction based on the results of running the test set evaluations. The more highly the actors score, the more likely they are to have a strong understanding of how their model evaluation results will generalise to the real world.⁶

Figure 1 depicts the relationship between predictions, prediction evaluations, model evaluations, and understanding of generalisation.

‍

Figure 1: Prediction evaluations indirectly assess the level of understanding that an actor has about how its model evaluations generalise to the real world. The basic theory is: If an actor cannot predict how its model will perform when exposed to an additional set of “test evaluations”, then the actor also probably cannot predict how its model will behave in the real world.

‍

Prediction evaluations could support AI governance in a number of ways. A developer could use the results of internally run prediction evaluations to calibrate their trust in their own model evaluations. If a model displays unexpectedly high capability levels in some contexts, for example, the developer may want to investigate further and ensure that their safety mitigations are sufficient.

A regulator could also use the results of (potentially externally run) prediction evaluations to inform an array of safety interventions. For example, consider the context of a hypothetical licensing regime for models, in which developers must receive regulatory approval before releasing certain high-risk models. If a model developer performs poorly on prediction evaluations, their claims about the safety of a model may be less credible. A regulator could take into account this information when deciding whether to permit deployment of the model. If the developer’s predictions are poor, then the regulator could require it to evaluate its model more thoroughly.

‍

How to run a prediction evaluation

In the appendix to this post, we provide more detail about how to run a prediction evaluation. Here, we provide a brief overview. First, the administrator of the prediction evaluation should select the model evaluations. Second, the administrator should prevent the actor from running the test evaluations when making the prediction. Finally, the administrator needs to establish standards for good prediction performance.

‍

An example of running a prediction evaluation

Our example here focuses on a regulator in the context of a hypothetical licensing regime, in which developers of certain high-risk models require regulatory approval before these models can be deployed. Other potential examples to explore in future work could include a developer running prediction evaluations internally, a regulator running prediction evaluations on itself to assess its own understanding, or some actor running prediction evaluations on a model user (e.g. a company that uses models at a large scale).

Suppose that a developer submits a model and its evaluations to a regulator for approval. The regulator could administer a prediction evaluation to the developer through a process similar to the following:

Based on the initial model evaluations that the developer submitted, the regulator builds a set of test evaluations. The test evaluations could include a wider variety of inputs than the initial model evaluations, but still feature the same category of task.
The regulator puts the developer in a controlled, monitored environment, such that the developer cannot run the test evaluations on the model.
The regulator provides the developer with a detailed description of the test set evaluations.
For each test evaluation, the regulator asks the developer to predict whether the model will succeed at the task (the developer provides a “yes” or “no” answer).
The developer provides a prediction to the regulator.⁷
The regulator compares the prediction with the actual behaviour of the model on the test evaluations.⁸

Consider a case in which the developer does not perform much better than chance on the prediction evaluation (i.e. performs close to 50% accuracy for yes/no questions). Such performance would be evidence of a poor understanding of how the model’s behaviour generalises. As a result, greater caution from the regulator may be justified. The regulator’s response to the poor performance could vary in severity depending on the potential harm the model could cause. Some options include:

Requesting more extensive model evaluations before deployment
Subjecting deployment of the model to additional requirements, such as more stringent monitoring
Blocking deployment or further training until specified conditions are met, such as good performance on subsequent prediction evaluations

Further research is required to understand whether and when any of these options would be appropriate, and what other options exist.

‍

Limitations and open questions

There is still a great deal of uncertainty about whether it is worthwhile to run prediction evaluations. For example, suppose that a developer has run an initial set of model evaluations but still is not confident about how well these model evaluations will generalise to the real world. A comparatively straightforward strategy to become more confident would be to simply run a wider range of model evaluations, without bothering to make any explicit predictions. If these additional model evaluations also suggest that the model is safe, then — even if some of the specific results have been surprising — perhaps the developer would still be justified in believing that its models will ultimately also behave safely in the real world.

Furthermore, prediction accuracy may not vary enough — between the actors who are making the predictions or between the models that the predictions concern — for it to be worthwhile to assess prediction accuracy in individual cases. For example, it may be the case that people generally cannot reliably predict the results of model evaluations very well at all. Although this general result would be useful to know, it would also reduce the value of continuing to perform prediction evaluations in individual cases.

There are also various practical questions that will need to be answered before prediction evaluations can be run and used to inform decisions. These open questions include:

How feasible is it to predict behaviour on model evaluations without running the model — and how does feasibility change with information or action limits on the actor?
How should we limit what the actor knows and can do in a prediction evaluation?
How should the initial and test evaluations be chosen?
How should the results of a prediction evaluation be reported? For example, should the actor provide different predictions corresponding to different amounts of compute used?

If prediction evaluations should ultimately be built into a broader AI governance regime, then a number of additional questions arise.

Who should administer prediction evaluations?
Which actors should undergo prediction evaluations?
How can prediction evaluations incentivise improvements in understanding?
What is the role of prediction evaluations in an overall evaluation process?

Fortunately, there are immediate opportunities to make progress on these questions. For instance, to tackle questions 1-4, those developing and running evaluations on their models can at the same time run prediction evaluations internally. For such low-stakes experiments, one may easily be able to vary the amount of time, information, or compute given for the prediction evaluation and experiment with different reporting procedures.⁹

‍

Conclusion

To make informed development and deployment decisions, decision-makers need to be able to predict how AI systems will behave in the real world. Model evaluations can help to inform these predictions by showing how AI systems behave in particular circumstances.

Unfortunately, it is often unclear how the results of model evaluations generalise to the real world. For example, a model may behave well in the circumstances tested by a particular model evaluation, but then behave poorly in other circumstances it encounters in the real world.

Prediction evaluations may help to address this problem, by testing how well an actor can predict how model evaluations will generalise to some additional circumstances. Scoring well on a prediction evaluation is evidence that the actor is capable of using the model evaluations to make informed decisions.

However, further work is needed to understand whether, how, and when to use prediction evaluations.

The author of this piece would like to thank the following people for helpful comments on this work: Ross Gruetzemacher, Toby Shevlane, Gabe Mukobi, Yawen Duan, David Krueger, Anton Korinek, Malcolm Murray, Jan Brauner, Lennart Heim, Emma Bluemke, Jide Alaga, Noemi Dreksler, Patrick Levermore, and Lujain Ibrahim. Thanks especially to Ben Garfinkel, Stephen Clare, and Markus Anderljung for extensive discussions and feedback.

Alan Chan can be contacted at al*******@go********.ai

‍

Appendix

Running a prediction evaluation

This section describes each step in a prediction evaluation in more detail.

Selecting the model evaluations

The first step is choosing the initial and test set evaluations.

Since the science of model evaluations is still developing, it is not obvious which specific evaluations should be used for prediction evaluations. One hypothesis is that they should target specific use cases, such as ways to misuse models for cyberattacks. Such specific targeting may be desirable because understanding of generalisation in one use case may not transfer to understanding in another use case. That makes it more important to understand model generalisation in high-stakes use cases. On the other hand, it may be easier to work in simpler, but not necessarily realistic, environments. Such environments may provide clearer insights into generalisation,¹⁰ but the insights may not be immediately relevant to any deployment setting.

To separate test evaluations from initial evaluations, one should try to account for the range of conditions the model might face in the real world. For example, test evaluations may test a more diverse range of inputs to the model. When evaluating whether the model can complete complex tasks, it may also be important to vary how the environment responds to the model’s actions. One could vary the tools (e.g. access to web search) available to models in the initial and test evaluations to simulate how users may augment models with different tools following deployment.¹¹ Initial and test evaluations could even assess the completion of different tasks. For instance, we may be interested in a model’s ability to assist in the creation of chemical weapons. Test evaluations could focus on a different set of chemical weapons than the initial evaluations.

Preventing the actor from running the test evaluations

For a prediction evaluation to provide good evidence of understanding of generalisation, we need to prevent the actor from simply running the test evaluations and reporting the results. The incentive to cheat in this way may be strongest when a regulator subjects a model developer to a prediction evaluation. Prediction evaluations that a developer runs internally may not suffer from this problem as significantly.

Ideally, we want the actor to predict accurately only because they have gained some insight¹² into generalisation. To achieve this goal, we can limit the actor’s knowledge and possible actions. Future research could investigate the effectiveness of the following possibilities:

Limiting the actor’s knowledge
- Only providing the actor a high-level description of the test set evaluations
- Describing to the actor some but not all of the test set evaluations
- Limiting the number of initial evaluations provided to the actor
Limiting the actor’s possible actions
- Limiting the extent to which the actor can interact with the model
- Limiting the amount of time an actor has to provide predictions
- Limiting the amount of computing power (compute) the actor can use to make the prediction.

Crucially, the limitations should not be so strict that it becomes impossible to make accurate predictions even for well-understood models.

Assessing prediction evaluations

The final step is to assess the accuracy of the predictions and make decisions based on the results. A key issue is to define what good prediction performance is. It may prove difficult to make accurate predictions even after extensive efforts are made to understand a model’s behaviour. As well, different standards may be appropriate for different applications of prediction evaluations (e.g., a model developer internally running prediction evaluations vs. a regulator administering prediction evaluations to a model developer).

A potential standard could require that the actor must have higher prediction accuracy than some reference class. For example, consider an independent control group whose members have no detailed knowledge of the model, except basic information such as training compute and model size. An actor that predicts worse than the expert group likely does not have a strong understanding of how model evaluation results generalise.

In the context of a decision about model deployment, the direction in which a prediction is inaccurate may be a key consideration. Underestimating a model’s capabilities (or overestimating its degree of safety) may be more costly than overestimating them (analogously, underestimating its degree of safety) because greater societal harm could result from the model’s deployment.

A regulator could more heavily penalise underestimation, but in so doing may create strong incentives to overestimate a model’s capabilities. Ideally, prediction evaluation should incentivise efforts to gain understanding. One potential solution could be to assess the work that actors produce to justify their predictions, in addition to the predictions themselves. Estimates based on faulty or vague reasoning could be judged to be inferior to the same estimates with good reasoning. Alternatively, the regulator could try to identify and penalise consistent overestimation across a number of different prediction evaluations.

‍

Source link

19May

Visibility into AI Agents | GovAI Blog

This blog post summarises the paper “Visibility into AI Agents,” by Alan Chan and a multi-institution team of co-authors. The paper is forthcoming at FAccT 2024.

GovAI research blog posts represent the views of their authors, rather than the views of the organisation.‍

‍

Introduction

Researchers are, increasingly, working to develop AI agents: AI systems that can pursue real-world goals with minimal supervision.

Arguably, rudimentary AI agents already exist. For example, ChatGPT is no longer a simple question-and-answer chatbot. It can now perform tasks such as searching for information on the internet, writing and running computer code, and creating calendar events.

Future agents may be capable of performing much longer and more complex sequences of tasks by themselves. They may eventually be assigned many important, open-ended responsibilities.

In part because they need less human steering and oversight, advanced AI agents could pose novel risks. One foundational concern is that people will not have enough visibility into their behaviour. For example, users of AI agents may not always know if their agents are misbehaving.

Therefore, we should make sure that key stakeholders will have enough visibility into AI agents. Obtaining visibility will mean thoughtfully collecting, processing, and sharing information about what agents are doing.

This post explores three particular measures that deployers — the companies that run AI agents for users — can take to create helpful forms of visibility. These measures are:

Agent identifiers that allow third parties to know when they are interacting with an AI agent
Real-time monitoring that allows deployers to notice agent misbehaviour, including misbehaviour that a user might not have noticed on their own
Activity logs that allow deployers or users to review an agent’s past behaviour, to notice and better understand problems

Some of these measures are already in use for certain AI systems. In these cases, we discuss how to adapt them for agents.

More work is needed to understand how and when to implement these measures, since they also raise significant concerns around privacy and the abuse of power. Still, if implemented well, visibility measures could improve our understanding of AI agents, help reduce accidents and misuse, and generate data to inform governance decisions.

‍

AI agents and their risks

AI agents are AI systems that can pursue goals in the world under limited supervision. Current AI agents are still fairly rudimentary: they can only carry out certain short and simple tasks, such as creating calendar events and doing basic web searches. They struggle to perform many specific tasks, chain tasks together, and plan their actions.¹

However, researchers are working to create AI agents that can perform far longer and more complex sequences of tasks with minimal human supervision.² An advanced AI agent might operate more like a high-autonomy employee — who is given a high-level goal, pursues it with minimal oversight, and only occasionally reports back — and less like a simple tool. It could take on a range of personal, commercial, or even governmental responsibilities from people. These kinds of advanced AI agents could be very useful.

At the same, AI agents could also pose significant risks. For example, AI agents may pose greater misuse risks than more tool-like AI systems, because they have broader capabilities and rely less on the skill of human users. AI agents may also have a greater potential to cause harm if they malfunction or suffer from controllability issues.³ New risks could also emerge from interactions between different AI agents, similar to the kinds of risks that we have already seen emerge from interactions between high-frequency trading bots.

Any risks from AI agents will be exacerbated because — by default — people will not have as much visibility into their behaviour: agents would accomplish tasks under limited supervision. Someone using an AI agent could therefore have only limited knowledge and understanding of what their agent is doing. Similarly, when an AI agent interacts with a third party, this third party might not even know they are interacting with an AI agent. In such cases, risks created by AI agents may remain unnoticed or poorly understood for long periods of time.

Hence, it will be especially important to ensure that key stakeholders have enough visibility into the behaviour of AI agents. This information could directly help deployers and users avoid accidents and instances of misuse. It could also offer insight into emerging risks and thereby inform governance decisions and safety research.⁴

‍

Obtaining visibility into the use of agents

We consider three ways to obtain visibility into the use of agents: agent identifiers, real-time monitoring, and activity logs. These measures could provide governments and civil society the information necessary to plan for and respond to the deployment of more advanced agents.

We focus on how deployers—companies like OpenAI and Anthropic that run agents for users—could implement visibility measures. Deployers are a useful intermediary because they tend to have access to all of an agent’s inputs and outputs, which facilitates implementation of visibility measures.⁵

‍

**Figure 1:** An illustration of agents that are run by **deployers**. Since **deployers** run agents, they have access to inputs and outputs, enabling visibility measures.

‍

First, an agent identifier is an indicator that would allow a person or software program to identify when they are interacting with an agent. Examples of agent identifiers include automated chatbot disclosures or watermarks, but many other kinds of agent identifiers could be created. For instance, when an agent performs a financial transaction with a bank, the deployer could send a unique ID associated with the agent to the bank. Additional information about the agent could be attached to the identifier as well, such as whether the agent has satisfied certain security standards.

The potential usefulness of AI agents means that they may be involved in many consequential interactions. Since agents would carry out activities with little to no human supervision, parties to an interaction would have the option to change how they interact with the agent. For instance, parties could refuse such interaction, or may take additional precautions to check the agent’s outputs. As such, agent identifiers could help participants in an interaction decide when and how to interact with AI agents. Widespread use of agents across a variety of consequential tasks may also be of interest to regulators. Aggregated information about agent identifiers could help measure the overall level of AI agent activity across various domains.

‍

**Figure 2:** An **agent identifier** indicates to certain actors whether an AI agent is involved in an interaction. In this case, the deployer has attached an identifier to its agent that informs other parties, such as providers of tools or services, that they are interacting with an agent.

‍

Second, real-time monitoring would involve oversight of agent activity in real time, whether by humans or software. Just as social media companies filter or flag abusive content, deployers could do the same for agent behaviours. Indeed, since agents may take consequential actions without users knowing, real-time monitoring may be especially important for ensuring that users approve the actions.

Some deployers already monitor model use for abuse, but monitoring protocols will need to be adapted for the wider variety of actions that agents could perform. For example, agents could carry out a variety of financial transactions on their users’ behalf. To reduce the risk of harmful or unintentional actions, financial transactions above a certain dollar amount could be flagged for user review. Moreover, monitoring protocols that focus upon a single system may be insufficient. If groups of agents can cause harmful feedback loops or other sources of instability, monitoring would need to account for their collective activities.

One limitation of real-time monitoring is that it is most effective when a human overseer or other computer program can swiftly assess a flagged behaviour and determine an appropriate course of action. Consequently, monitoring may be most useful in cases where the flagged behaviour clearly violates established policies or guidelines.

‍

**Figure 3:** An agent’s inputs and outputs are visible to the deployer. Inputs come from tool and service providers and users (not shown). Certain outputs, such as requests to tools and services, are also visible to tools and services providers. These actors can monitor and filter the actions in real time or keep logs for post-incident attribution or forensics.

‍

Third, activity logs could record some of an agent’s inputs and outputs for further analysis. Logs are helpful for understanding the risks of agents because harmful behaviour may only be recognisable as such long after its occurrence.⁶ Users or deployers reviewing logs may notice that an agent has taken a consequential action—which real-time monitoring might not have caught—without approval. As such, logs could inform real-time monitoring, help investigate incidents, and track long-run impacts of agent usage.

Although logging is a common practice for software systems, agent-specific logging practices could be especially useful. Logs dedicated to certain parts of an agent could illuminate factors that affect an agent’s impact in the real-world. For example, memory or reasoning logs could help us understand why an agent malfunctioned in the pursuit of a complex, long-horizon goal. Logs about an agent’s use of tools or services could help us identify the impacts of an agent on third parties. It may also be important to analyse the logs of multiple agents so as to understand risks that come about from their interactions.

Overall, visibility measures like agent identifiers, real-time monitoring, and activity logs could help track the impact of AI agents by collecting data on their activities. These data could help to inform potential interventions to prevent and respond to malfunctions.

‍

Limitations and risks of obtaining visibility

While these visibility measures have the potential to substantially improve our understanding of AI agents, they also have some limitations and pose risks of their own.

First, we have thus far only discussed visibility measures for deployers. However, users can also run AI agents themselves using cloud computing resources or their own hardware. In these cases, the burden would ultimately be on users to ensure that the agents they are running are providing appropriate identifiers. These users would also be responsible for running the real-time monitoring and activity-logging software that is needed to detect misbehaviour. However, some users may not choose to take these precautions. Self-monitoring also does not prevent cases of intentional misuse.

One alternative option in this case may be to have providers of cloud compute services implement variants of visibility measures. For example, compute providers could track especially large uses of compute and investigate whether they correspond to potentially concerning uses of agents. Additionally, providers of external tools could require proof that agents accessing their tools have implemented visibility measures. For example, a bank could require a digital signature that uniquely identifies the agent accessing its services.⁷

On the whole, though, obtaining visibility when the deployment of AI agents is “decentralised” is much more difficult. When deployers no longer have access to an agent’s inputs and outputs, it is harder to ensure that relevant information about the agent will be shared.

Second, whether or not visibility measures involve deployers, they raise significant concerns regarding privacy and abuse of power. Information about the actions of AI agents may reveal significant information about their users. Those collecting such information could abuse their power, for example to engage in excessive surveillance or the inappropriate sale of personal data. Valuable services could be barred to AI agents that did not come from approved deployers. If so, users may be locked into using agents that do not serve their best interests or meet their specific needs.

An important question is therefore how to design data collection and access practices that both provide useful information about agents and minimise the misuse of the information. One approach may be to limit visibility measures to high-stakes domains. For example, visibility measures might be a higher priority for agents that interact with the financial or legal systems. Another option may be to limit the granularity of collected information. In some cases, it may suffice to know that an agent has simply accessed a certain tool without knowing the precise inputs and outputs.

Finally, information is insufficient for reducing risks. We also need effective processes to allow relevant actors to use and act on the information. Once AI agents and visibility measures are widely deployed, they will likely generate a lot of data. In light of the rapid pace of AI progress, delays in acting on this information could be quite costly.

‍

Conclusion

Given the potential impact of AI agents, visibility into their use is important for understanding and limiting risks. Identifying agents in their real-world interactions, monitoring their behaviour in real time, and keeping records of their behaviour could all be effective strategies for ensuring safety.

However, more work is required to understand how, when, and to what extent these visibility measures could be best implemented. In addition to addressing potential misuse of the information generated from visibility measures, we also need efficient processes for making use of the information to inform actions. Ideally, visibility could guide the use of a wide range of strategies by individual users, deployers, and governments to manage the risks of AI agents.

‍

‍The author of this piece would like to thank the following people for helpful comments on this work: Markus Anderljung, Stephen Clare, and Ben Garfinkel.

‍

Alan Chan can be contacted at

al*******@go********.ai

Source link

19May

The Bundesverfassungsgericht’s Decision on Electoral Thresholds – European Law Blog

26 March 2024/
By Fiene Kohn

Blogpost 21/2024

In February, the German Federal Constitutional Court (Bundesverfassungsgericht) rejected a motion regarding electoral thresholds in EU electoral law, finally allowing for the necessary national approval of Council Decision 2018/994. This Decision intends to amend the European Electoral act and, according to Article 223 (1) TFEU, must be approved by all Member States. Up until now, the court had held that thresholds in European elections were not compatible with German constitutional law. However, a draft legislative act proposes that some Member States would be obliged to establish electoral thresholds for European elections. With this new judgement, the Bundesverfassungsgericht joins other European courts in finding thresholds to be compatible with national constitutional law.

This blog post aims to provide context for a decision that might very well change the composition of the European Parliament.

Previously on… electoral thresholds

In elections, citizens cast their votes in order to have their opinions represented in a parliament. In theory, representing every political view leads to a better democracy in which minority voices can gain much influence. However, fragmentation of a parliament can interfere with finding a consensus and thus hinder governability. By requiring a minimum percentage of votes a party must gain to be allocated a seat in a parliament, electoral thresholds seek to balance representation and governability. Approximately half of all Member States currently employ electoral thresholds in European parliamentary elections. The threshold is 5 percent in nine states (Czechia, France, Croatia, Latvia, Lithuania, Hungary, Poland, Romania and Slovakia), 4 percent in Austria and Sweden, 3 percent in Greece and 1,8 percent in Cyprus. Fourteen Member States do not currently have minimum requirements for allocation of European Parliament seats.

Thresholds are common in German electoral law. On the federal level, a party must gain at least five percent of votes to be allocated a seat in the German Parliament, the Bundestag (§ 4 (2) no. 2 Bundeswahlgesetz). Similarly, in the first European elections, German parties had to pass a threshold of five percent and, later, of three percent (§ 2 (6) resp. (7) Europawahlgesetz [old version]). In 2011 and 2014, the Bundesverfassungsgericht ended this practice. While it has always held that the federal threshold is not only legal, but constitutionally mandated, the Court saw clear differences between the German Parliament and the European Parliament. Governability is extremely important for the Bundestag, which is responsible for electing the Bundeskanzler (chancellor) and where the governing parties hold much power. However, on a European level, the European Parliament is not as involved in the governing and does not require a stable majority. Although the Commission President is elected by the Parliament (Article 17 (7) of the Treaty on European Union [TEU]), and the College of Commissioners can be removed by a parliamentary motion of censure (Article 17 (8) TEU), the Commission does not need continuous support from the Parliament in order to govern. For example, in the second reading during the ordinary legislative procedure, an act can pass without a parliamentary procedure when the Parliament either does not vote on a Council position or does not disapprove of the position with a majority vote (Article 294 (7) lit. a, b TFEU). Groups in the European Parliament differ from their national counterparts as well: the strongest groups do not form a ‘government’, Commissioners usually come from different political groups. Since the Parliament is so diverse in nationalities, languages, cultures, and political opinions, large groups provide a form of integration: internal debates often happen so that groups can speak with one united voice when it comes to plenary debates. Fragmentation is therefore, according to the Bundesverfassungsgericht, not as daunting on the European level as it is in the German Bundestag.

Other Member States’ Courts have also ruled on their respective electoral thresholds. The Czech Constitutional Court also argued that national parliaments and the European Parliament are different by nature and can not be held to the same standards (para. 70). However, a stable majority in the European Parliament is elemental to the functioning of the European Union (paras. 71, 72). It concluded that the European electoral threshold required by Czech law was in line with the Czech constitution. The Italian Constitutional Court also held that thresholds were compatible with the Italian Constitution as they are ‘typical manifestations of the discretion of a legislator that wishes to avoid fragmented political representation, and to promote governability’. The French Conseil Constitutionnel also ruled the electoral threshold to be in line with the French Constitution. It based its judgement on two pursued objectives: the favouring of ‘main currents of ideas and opinions expressed in France being represented in the European Parliament’ and the avoiding of fragmentation.

Why did the Court have to decide again?

European elections are governed by national electoral laws. A framework for these national laws is the European Electoral Act from 1976, which is drawn up by the European Parliament and adopted by the Council (Article 223 (1) of the Treaty on the Functioning of the European Union [TFEU]). In 2018, the Council voted to amend the Electoral Act and introduce electoral thresholds. According to the second paragraph of Article 3 of the Council Decision 2018/994, Member States may set thresholds of up to five percent. Constituencies comprising more than 35 seats are obliged to set a threshold of at least two percent. Only three Member States are currently allocated more than 60 seats: France, Italy and Germany. Since French and Italian electoral law already employ thresholds, this new rule would only affect Germany. In order for this Decision to come into effect though, the procedure of Article 223 (1) TFEU must be followed: Member States have to approve of the amendment ‘in accordance with their respective constitutional requirements’.

German constitutional law mandates that the national legislative bodies (Bundestag and Bundesrat) approve of the law with a two-thirds majority (Art. 23 (1) 3, Article 79 (2) of the Grundgesetz). Both decisions were reached in 2023. However, the Bundespräsident (head of state) has to sign the decision for them to come into full effect. Until this happens, the Council Decision has not been approved and the Electoral Act cannot be amended.

The Court’s decision

German satire party Die Partei currently holds two seats in the European Parliament, having won a share of 2.4 percent of German votes in the last European elections. Their two Members of Parliament, one of which joined the Greens/EFA group, tried to stop the Electoral Act from coming into effect by calling upon the Bundesverfassungsgericht. They argued that, as previously decided by the Court, thresholds on the European level were unconstitutional. Substantively, they stated that thresholds infringe on the right to equal opportunities for minority parties and weaken democracy (para. 29).

However, the German Constitutional Court has longstanding jurisprudence on their competence ruling on national measures in the scope of EU law and has developed three tests. The Court only tests whether an EU act is ultra vires or whether the German constitution is affected at its core (Identitätskontrolle). It does not test Union law in light of national fundamental rights as long as EU fundamental rights provide a comparable level of protection (Solange II). The petitioners argued that the Council Decision was ultra vires and that it violated the constitutional identity. The Court found that the petitioners had not substantiated this claim enough. German approval of the electoral law amendment does not confer new competences to the European level, since Article 223 TFEU already exists. Therefore, the amendment does not overstep competences and is not ultra vires (paras. 93 f.). It also did not follow the petitioners’ claim that German democracy, and therefore the German constitution, were infringed. The EU holds itself to democratic standards. Though the EU’s interpretation of democracy might differ from the German interpretation, democracy as a constitutional standard is not affected at its core when modifications are made (para. 101 f.). EU legislative bodies are awarded a prerogative to assess and shape electoral law (paras. 121 f.).

In a departure from past decisions, the Bundesverfassungsgericht now sees the danger of a deepening rift in political views, resulting in more fragmentation of the Parliament (para. 17). It now argues that a stable majority in the Parliament is essential to its important responsibilities as a legislative body equal to the Council, in the creation of a Commission and the budget power. Since the two biggest groups in the parliament no longer hold an absolute majority in the Parliament, finding this majority proves to be more challenging (para. 123). Additionally, the groups’ ability to integrate different views is limited. Preventing a more fragmented and heterogeneous Parliament is therefore a legitimate objective.

The Court therefore rejected Die Partei’s motion. As a result, the German approval of the European Electoral Act amendment can now come into force.

Outlook

Will electoral thresholds be applied in the upcoming 2024 elections? No. The European elections in June will still be governed by the national electoral laws that have been in effect for the past few months. Additionally, Germany was only one of two Member States still pending approval: Spain has yet to approve of the amendment. Mandatory thresholds could eventually be applied in the 2029 elections.

However, maybe future elections will be held in accordance with very different laws. For quite some time, forces inside the European Parliament have pushed for a European Electoral Regulation that would be applicable in every Member State without national legal implementation. These drafts have often included proposals for transnational lists or pan-European constituencies. So far, these proposals have always failed to win over the approval of national governments in the Council.

It seems more likely that national legislation will adapt and that we will see fewer minority parties in the European Parliament. Let us hope that stopping fragmentation in the European Parliament will be a mirror of a less divided, less extreme European society.

Source link

18May

Politicians vs. Technocrats? – European Law Blog

7 May 2024/
By David Nagode

Blogpost 24/2024

The coming of spring promises many changes, including a newly elected European Parliament and a new college of Commissioners leading the European Commission. The re-opening of the Spitzenkandidaten system has also stirred the debate on the democratic legitimacy of the EU institutions. Focusing on the European Commission, one question that needs answering is about its members: are the European Commissioners creatures of the world of politics or instead independent experts of a technocratic ‘government’?

Looking at it from a constitutional perspective, the Commission is a unicum, with no one-to-one equivalent in nation states. The only substantive provision in the Treaties regarding the work of Commissioners is included in Article 17(3) TEU, which specifies that Commissioners shall be appointed ‘on the ground of their general competence and European commitment from persons whose independence is beyond doubt.’ However, that does not mean that Commissioners must be completely apolitical: indeed, the Guidelines of the Commission provide for the possibility of Commissioners taking part in the campaigns and elections of the European Parliament (see Article 10). While political standing helps to set the wheels in motion, there should also be a sense of democracy and direct responsibility to the electorate of Commissioners, if the Commission is to resemble a ‘European Government’. If priority is to be given to Commission duties over party commitment (Article 10(1) Commission Guidelines), then Commissioner candidates are hardly going to act in their neutral and professional capacity, if that would simultaneously mean kicking away the ladder that puts them in their current position. In other words, if Commissioners belong to political parties, this inherently puts them into a precarious conflict between party affiliation and their work as independent public officials (Gehring and Schneider p. 1).

The legal framework to appoint Commissioners

Since the transformation from the High Authority and the merger in 1967, the Commission has seen a gradual increase in the number of Commissioners (from the original nine to the current 27). The Delors administration is still cited today as the ‘golden standard’ for Commission administrations. The direction and dynamism of this administration helped to solidify the position of the European Commission as the principal advocate for further integration. Among its greater achievements are the completion of the Single Market and the introduction of a single currency. The main reason for setting the Delors administration as the measuring stick is a specific attribute the administration possessed – an ability to identify the political objective, weigh up competing interests, and set out a road map to achieve it. In a sense, one could say the Delors administration was political on the EU level.

Since then, the power of the Commission has steadily increased, with Romano Prodi being dubbed ‘virtually the prime minister of the European Union’, mainly because the President of the Commission could co-decide with Heads of Government/State of the Member States on who should sit in the new administration – a change introduced with the Treaty of Amsterdam (Article 4(4)). At the time, both the German Chancellor Schröder and Mr. Prodi expressed the desire to form the new Commission as a body of independent experts and not of retired or retiring politicians. How does this reflect on the appointment of the Commission as the ‘European Government’?

Article 17(7) TEU stipulates that the candidate for President of the Commission is to be proposed by the European Council, taking into account the results of the European Elections, and then to be elected by a simple majority in the European Parliament.

For the rest of the Commissioners, neither the Treaties nor any inter-governmental agreement specifies how candidates for the Commission are to be chosen in individual Member States. In other words, no source of EU law regulates national procedures of selecting a candidate for the European Commission. The singular provision on this is Article 17(3) TEU that states that ‘the members of the Commission shall be chosen on the ground of their general competence’ and not based on their electability as politicians. This paucity of procedural guidelines itself leaves Member States free to implement their own procedures. For example, Austria regulated it partially in Article 23c of its Federal Constitutional Law, while Slovenia included it into its Cooperation in EU Affairs Act. Similarly, both examples give discretionary power to the national government to propose a candidate, who has to be approved by the national legislature – either the pertinent committee or the plenum.

The Commissioner’s role – is it political or technocratic?

The technocratic side

While it is customary for national governments to use the political apparatus to get elected, some scenarios require an appointed technocratic government of experts to lead the country, in the capacity of interim or caretaker governments (Lachmayer and Konrad). Such technocratic governments are considered to be above party politics, which enables them to bridge the political gaps between political parties.

Since the job of Commissioner requires a certain amount of independence and impartiality towards individual Member States, a technocratic candidate, with no political background, yet with expert knowledge in the department’s work, would seem to meet this ideal. If Article 17(3) TEU is to be analysed word by word, then candidates are to be ‘chosen on the ground of their general competence and European commitment from persons whose independence is beyond doubt’. While the administrations before the Juncker administration have not been viewed as ‘political’, they always included experienced public officials, who have been well acquainted with the functioning of the European Union (Peterson p. 9-21). In fact, if the principal role of the Commission is to combine all 27 different national perspectives and unite them into one voice, while reaching the optimal consensus, that ‘speaks for Europe’, technocratic – and not political – qualities seem a better choice.

While the role of Commission President has certain functions resembling a Head of Government (Craig and de Búrca, p. 32), which require a more political profile, the role of an individual Commissioner itself does not necessarily require large political capital. This makes the Commission wear ‘two hats’ (as the 19^th-century expression goes) – being involved in politics, on the one side, and remaining above the political ground, on the other. The potential problem that could emerge from a politically-disengaged administration may be the political implementation of the Commission’s work: if the Commission’s work is detached from the political reality, both sides of the spectrum – the political and the administrative – are doing Sisyphean tasks.

In the past, it would seem that almost every administration had a mixture of both. This might be attributed to the selection procedure, where Member States should (ideally) propose three candidates for the (future) President of the Commission to choose from. The last two European elections have shown us that this formal requirement is mostly ignored, even when the Member States were asked to adhere to a female-male balance of the Commission. As mentioned previously, every administration had a combination of both the administrative and the political component, but there has never been a formal requirement to balance both sides in the entire College of Commissioners. A possible reform of this is discussed below.

The political side

Some authors consider the Commission to be an inherently political institution, which sometimes tries to tone down its own political importance, to give itself a sense of impartiality. The practice of appointing party members as the candidates to become Commissioners is evidently more widespread, with 24 Commissioners being national party members or affiliated to a party. As far as political appointments are concerned, the past has also shown us that playing party politics in the Commission does not end well: as seen by the example of Sylvie Goulard in 2019 as the French candidate being replaced by Mr. Breton.

The administration under Jean-Claude Juncker was judged as one of the more politically motivated Commissions in the history of the EU. With Mr. Juncker being elected following the Spitzenkandidaten procedure, the very birth of this administration was political. When forming his Commission, he ‘promised to put together a political Commission’ (Juncker, 2014). While this might have been desired to ‘revamp’ European integration, it has proven to be a significantly damaging factor for the impartiality of the Commission on rule of law issues (noticeably in Poland and Hungary). A ‘deliberate governmental strategy of systematically undermining all checks and balances in Poland’ (Pech) and ‘saying goodbye […] to liberal democracy’ (Hungarian Prime Minister Orbán in 2018) were not developments that took place over a short period of time. The Commission certainly tired to remedy the situation (Michelot, 2019), yet showed internal splits and hesitancy in launching Article 7 TEU proceedings. Perhaps the most important setback is that a political Commission cannot ‘pretend that all of the EU’s policy goals are reconcilable and mutually supportive’ (Dawson, 2018): in the crucial politically disputed areas, a political Commission pursues the prevailing political majority and not ‘the wider EU interest’.

Taking these findings into account and applying them to the current electoral campaign, having Member of the European Parliament (MEP) candidates who already had a post in the Commission could improve a party’s credibility in European affairs as well as signal that the candidate is prepared to face public scrutiny, at least at the level of his/her local constituency. So far, at least five of the current Commissioners are also running for a seat in the European Parliament including Ursula von der Leyen and Nicolas Schmit as Spitzenkandidaten. This, of course, does not translate to immediate electoral success for their party but could be an important factor in the final vote. Standing for the European Elections could increase a candidate’s democratic legitimacy as an individually chosen representative to hold the post of Commissioner and contribute to further democratise the Commission as an institution.

Since elections are difficult to predict, national governments rarely announce their choice for the future Commissioner, nor take a stance on the Spitzenkandidaten before the results. If a governing party does announce a candidate, it is usually either someone from their own ranks or someone with close ties to them. In doing so, the party brands them with their political colours. By avoiding naming a candidate in the campaign stage of the European elections, they partly avoid the possible embarrassment if their party were to lose the election and at the same time keep their options open, in case a broader consensus would be required.

In this regard, the current campaign in Slovenia is quite intriguing. The biggest government party announced their candidate for the future Commissioner, without even having a full list of Slovenian candidates for the European Parliament. It is confirmed that their candidate Tomaž Vesel will not lead the party into the election, nor will he even stand as a candidate. Nationally, this decision has caused a governmental crisis, allowing the Government to ignore the results of the European elections already before they have even come out as well as the opinion of other coalition parties due to the opaque rules on naming a candidate for the Commission. It is difficult to comprehend how a nominee for the Commission, who neither participates in the campaign, nor even stands as a candidate for the European Parliament can help solve the democrat deficit problem in the EU.

Possible reforms – fostering more democracy in the selection procedure

As is often the case, a blend of both systems i.e. the technocratic and the political system would be the optimal solution. As the apex of the European bureaucratic machine, the Commission requires a political charge to create wider policy. However, the bigger picture requires of the Commissioners’ expert knowledge of their own department and a large amount of independence, if they intend to do a successful job. If we accept that the Commission is simultaneously a political and a technocratic institution, might it not be sensible also to try and strike a balance between Commissioners being both political actors and impartial experts, to maximise the Commission’s efficiency?

So far, no additional requirements for Commissioner candidates have been voiced, yet it would seem that several of the incumbent Commissioners have decided to actively participate in the coming European elections, standing for election as MEPs. In this light, it would perhaps be prudent to consider the long-standing British constitutional practice that ministers – the executive – are simultaneously members of the legislature. This makes the British Cabinet effectively ‘a committee of the legislative body selected to be the executive body’ (p. 48 Bagehot 1867).

This holds significant advantages in terms of democratic accountability, since all members of the executive have been directly chosen by the people to represent them in the highest democratic institution – the parliament. In other words, this enables the public to narrow the pool of possible candidates that can hold public office. It also significantly prevents the occurrence of nepotistic appointments in the executive and legislative institutions. At the same time, ministers enjoy a certain degree of independence and a high political profile, regardless of their position in government, which contributes to their independence in cases of executive autocracy. An example of this is the unprecedented revolt in the final days of Mrs. Thatcher’s government.

Many of the above-mentioned strengths would improve the current constitutional predicament of the Commission: if fostering more democracy is the goal, then requiring future Commissioners to be a part of the biggest international democratic legislative body would give the peoples of Europe far more power in choosing their own representatives as well as the country’s representative in the Commission (although the Commissioners are expressly forbidden from following instructions of national governments or other entities). Giving the electorate the power to decide who enters Parliament and consequently the Commission would also impede the search for the ‘ideal candidate’ to lead a department. Additionally, if only members of the legislature could also occupy positions on the MEP’s staff, then the unfortunate spat on President von der Leyen’s staff and the accusations of nepotism might have been completely avoided.

The incorporation of these potential changes would, however, likely only be possible by re-opening and amending the Treaty on the European Union (TEU) and the Treaty on the Functioning of the European Union (TFEU).

The epilogue after June

It should be noted that there is an important difference between participating in the European elections and being appointed as Commissioner. How one is elected (or appointed) has consequences on one’s job performance. Does participating in the elections hinder a candidate’s ability to act independently and apolitically in the future? Though the question is meant to be rhetorical, no politician would like to return to the electorate without having fulfilled at least a part of the promises and policies on which he or she was elected.

After the 9^th of June, the future administration of the Commission will start taking shape. Since the biggest political groupings have returned to the election campaign with their own candidate to lead the Commission, we can justifiably claim that the Spitzenkandidaten are back. This would effectively solidify the claim of the biggest ‘winners’ in June to demand their own candidate is nominated as the President of the Commission. Given the lukewarm reception of Mr. Juncker and the rejection of Manfred Weber in 2019, the selection of the candidate for Commission President or election of the Commission President could go either way. The selection of the President of the Commission could just as well affect the proposals of Commissioners from the Member States. It would be important however, to consider the political and the technocratic arguments and ultimately usher in more democracy to the European Commission, by creating a balance of both interests – either in terms of quality or quantity.

Source link

18May

Youth Mobility between the EU and the UK? – European Law Blog

9 May 2024/
By Elspeth Guild

Blogpost 25/2024

On 18 April 2024 the European Commission issued a recommendation for a Council Decision authorising the opening of negotiations for an agreement between the EU and the UK on youth mobility. This is the first time since the signing of the Trade and Cooperation Agreement (TCA) in 2021 that the EU has proposed the conclusion of a legal framework for mobility of persons between the EU and UK. Free movement of persons ceased between the two as from 1 January 2021. Since then there has been a continuing exodus of EU nationals from the UK: 87,000 more EU nationals left the UK than came to it in 2023 (COM(2024)169 p 2). EU national students coming to the UK has dropped by 50%.

In response to this changing landscape of mobility, in 2023 the UK government has been approaching some (but not all) Member States regarding the possible negotiation of youth mobility arrangements based on existing UK national law. This unilateral action has sparked the Commission to seek a negotiating mandate from the Council to block possible bilateral arrangements between the UK and some Member States to the exclusion of others. This is consistent with the Council position adopted on 23 March 2018 that any future partnership between the EU and the UK on mobility of persons should be based on full reciprocity and non-discrimination among Member States.

As a result of the upheaval which the decision to leave the EU caused to the UK political class, including among other things a change of prime minister, while the UK had been interested in youth mobility in 2018, by 2019 the government was no longer willing to include this in the TCA. This has meant that youth mobility between the two has been regulated by national law in the UK and by a mix of EU and national law in the Member States. The UK has a long standing youth mobility programme limited to young people, nationals of countries specified in the immigration rules, between the ages of 18 to 30 or 18 to 35, depending on what country the person is a national of, and limited to two years. No EU country is included in this category (though Andorra, Iceland, Monaco and San Marino are).

The Commission proposes that a new youth mobility agreement be part of the TCA framework and remains neutral on whether it would be a Union-only or mixed agreement, something to be determined at the end of the negotiations. Similarly, it considers that the legal basis for the agreement would have to be determined only at the end of the negotiations. Neither of these issues is likely to meet with enthusiasm by the Council which may wish a clearer remit to the Commission regarding what can be negotiated. The Commission considers that only a formal agreement between the UK and the EU will achieve the objective in providing legal certainty and addressing the issue of non-discrimination. It states that only a “binding mutual understanding in the form of a formal international agreement” can guarantee legal certainty. Nonetheless, the Commission envisages that the agreement would be supplemental to the TCA and would be part of its single and uniform institutional framework, including rules on dispute settlement.

For young people in the EU and the UK this would be a rather unsatisfactory framework on account of Article 5 TCA. This states that (with a sole exception for social security) “nothing in this Agreement or any supplementing agreement shall be construed as conferring rights or imposing obligations on persons other than those created between the Parties under public international law, nor as permitting this Agreement or any supplementing agreement to be directly invoked in the domestic legal systems of the Parties.” So young people seeking to exercise mobility rights under any new agreement would not be able to rely on such an agreement if it is adopted within this framework. This could only be resolved if Article 5 were also amended to exclude from its scope not only social security but also youth mobility.

The Commission proposes that the scope of the agreement would cover twelve issues. First, the personal scope would be limited to EU and UK citizens between 18 and 30 years. The period of stay would be four years maximum. There would be no purpose limitation on mobility, young people could study, work or just visit if they want to. There would be no quota on this category. The conditions applicable to the category should apply throughout the individual’s stay. Rejection grounds would be specified. The category would be subject to a prior authorisation procedure (ie specific visa to be obtained before arrival). For UK citizens, their mobility would be limited to the one Member State where they had received authorisation (leaving open the question whether the periods for be cumulative or consecutive in different Member States). Equal treatment in wages and working conditions as well as health and safety rules must be respected on the basis of non-discrimination with own nationals. This may also include some aspects of education and training, tax benefits etc. In particular, equal treatment as regards tuition fees for higher education is planned. This would mean that EU students seeking to study in UK universities under the youth mobility scheme would only pay home student fees which are dramatically cheaper than overseas student fees which are currently applicable. Interestingly, the Commission proposed that this home student fee provision should apply to all EU students in the UK including those who arrive on student visas rather than youth mobility ones. The UK’s ‘healthcare surcharge’ would also be waived for this category. Finally, the conditions for the exercise of family reunification would need to be specified.

The Commission plans that any youth mobility scheme should be without prejudice to other legal pathways for migration and EU rules on permanent or long-term resident status.

For the EU, such a youth mobility scheme between the UK and the EU would add to an already rather complex field of EU competences. The Students and Researchers’ Directive covers conditions of entry and stay for the purposes of research, studies, training, voluntary service, pupil exchange schemes or educational projects and au pairing. This would certainly cover quite a lot of what is planned for youth mobility. However, the Commission appear not to be keen on using Article 79 (2) (a) and (b) TFEU, the basis of that directive for the purposes of this initiative. One of the reasons is that all the categories of persons covered in that directive need a sponsor (which could be a university, an employer or a training institution) within a Member State who is saddled with a variety of obligations regarding the third country national to ensure that they comply with general immigration conditions. Such a sponsorship approach is not intended by the Commission for UK-EU youth mobility. Further the Commission’s objective is to achieve reciprocity between the parties and non-discrimination among the Member States and their nationals. This is not an element of the directive. Thus, a new agreement seems to be the preferred approach – the Commission appears to prefer the ‘free movement’ approach rather than the sponsored one. Yet, as mentioned above, if the objective is to provide legal certainty to Europe’s young people regarding moving between the EU and the UK, the TCA does not seem to be an appropriate tool either as it specifically rejects that legal certainty by denying the right to individuals to rely on its provisions before the authorities or courts of the parties.

At the time of writing, it is unclear how the Council will approach this proposal. There are indications that some Member States may not be enthusiastic (Hungary is one) worrying that their skilled young people may be enticed to go to the UK rather than staying at home. But the majority appears to be very positive towards any move to normalise mobility between the two parties.

Source link

18May

Is this fair? The ECJ rules on prohibition of assignment and ex officio control of unfairness (C-173/23 Air Europa Líneas Aéreas) – European Law Blog

17 May 2024/
By Michael Chatzipanagiotis

Blogpost 26/2024

1. Introduction

Air carriers often use clauses which prohibit the assignment of passenger claims. Such clauses have a generic scope but were mainly introduced to deter the assignment of claims under Regulation 261/2004 on air passenger rights (Air Passenger Rights Regulation – APRR) to commercial companies. The fairness of such clauses under the Directive 93/13/EEC on Unfair Terms in Consumer Contracts (UCTD) has been disputed. In its judgment in C-173/23 Eventmedia Soluciones SL v Air Europa Líneas Aéreas SAU ECLI:EU:C:2024:295 (Judgment), the European Court of Justice (ECJ) ruled on some aspects of the duty of national courts to assess of their own motion the unfairness of contractual terms in the context of air carriage under the 1999 Montreal Convention on the liability of the international air carrier (MC99).

The MC99 establishes uniform rules on certain aspects of the liability of air carriers for international carriage by air. It is one of the most widespread international conventions and is also open for signature by Regional Economic Integration Organizations, such as the EU (Article 53(2)). The MC99 was signed by the (then) European Community on 9 December 1999 and entered into force on 28 June 2004. Ever since, the MC99 provisions have been an integral part of the EU legal order (C-344/04 IATA and ELFAA, para. 36), save for the provisions on cargo, for which competence rests with the EU Member States. Hence, the ECJ is competent for the interpretation of the MC99 provisions on passengers and luggage.

This post presents the judgment of the ECJ, including its legal background. Subsequently, comments are provided regarding (1) the ex officio assessment of unfairness of contractual terms under the UCTD and (2) the validity of clauses prohibiting assignment of passenger claims under the APRR, according to the case law of the ECJ and national courts. The conclusion of the post evaluates the importance of the judgment for the analysed topics.

2. Facts and legal background

2.1 Facts

An air passenger suffered a delay in the transport of his baggage on a flight from Madrid (Spain) to Cancún (Mexico). He assigned his claim for damages against Air Europa, an air carrier, to Eventmedia, a commercial company. Eventmedia brought an action against the air carrier before the referring court, i.e., Commercial Court No 1, Palma de Mallorca, Spain.

Air Europa disputed Eventmedia’s standing to bring proceedings, since a clause in the contract of air carriage provided that ‘the rights to which the passenger is entitled shall be strictly personal and the assignment of those rights shall not be permitted’.

The referring court specified that the liability of the air carrier is governed by Article 19 MC99 and deemed the dispute as contractual. Consequently, according to the referring court, the assignment of the claim for damages relating to such a delay fell within the prohibition of assignment established by the clause at issue. The national court, referring to the ECJ case law under the UCTD, was uncertain whether it could examine of its own motion the unfairness of the clause for two reasons. First, the applicant in the proceedings, Eventmedia, was neither a party to the contract of carriage nor did it have the status of a consumer under Article 2(b) UCTD as only natural persons may be ‘consumers’. Second, since the consumer was not a party to the proceedings, the court could not consider the consumer’s intention to rely, after having been informed by that court, on the unfair and non-binding nature of the clause at issue.

2.2 Legal background

According to the settled case law of the ECJ (e.g. C-567/13 Baczó and Vizsnyiczai, paras 40-42; C-377/14Radlinger and Radlingerová, para. 48), in the absence of EU rules governing the matter, it is for the domestic legal system of each Member State, in accordance with the principle of procedural autonomy, to designate the courts and tribunals having jurisdiction and to lay down the detailed procedural rules governing actions for safeguarding rights which individuals derive from EU law. On that basis, the detailed procedural rules governing actions for safeguarding an individual’s rights under EU law must be no less favourable than those governing similar domestic actions (principle of equivalence) and must not render practically impossible or excessively difficult the exercise of rights conferred by EU law (principle of effectiveness).

Regarding the principle of effectiveness, the ECJ has combined it with the effective application of Art. 6(1) UCTD. Thus, the Court has repeatedly held that national courts are required to assess of their own motion whether a contractual term falling within the scope of the UCTD is unfair, to compensate for the imbalance which exists between the consumer and the seller/supplier, where the courts have available to them the legal and factual elements necessary to that end (C-243/08 Pannon GSM, paras 22-24, 32; C-377/14 Radlinger and Radlingerová, para. 52).

Nonetheless, the ECJ has also clarified that national courts, in carrying out that obligation, should inform the consumer of the consequences of the potential unfairness of the term, namely that the term is invalid and that such invalidity may affect the validity of the whole contract under Article 6(1) UCTD (C-269/19 Banca B., para. 29). In this regard, national courts should account for the possibility that the consumer may decide to not assert the unfair status of the term (C-243/08 Pannon GSM, para. 33).

3. Issues

Two questions were referred to the ECJ by the national court.

First, whether the national court was required to examine of its own motion the unfairness, under Articles 6(1) and 7(1) UCTD, of a clause that prohibits the assignment of passenger claims against the air carrier, where a claim has been brought against the latter by a commercial company as an assignee of that passenger’s claim.

Second, if the answer to the first question is affirmative, could the court disregard its duty to inform the passenger of the consequences of the unfairness, given that in the case at hand there was no ‘consumer’ litigating?

4. Judgment

4.1 Preliminary issue

As a preliminary issue, the ECJ clarified that the applicability of the UCTD to a dispute depends on the capacity of contractual parties, not on the capacity of the litigants. Hence, the fact that the litigation in question was between two commercial entities did not exclude the dispute from the scope of the UCTD, since the contract of carriage had been concluded between the air carrier and a natural person who was (seemingly) acting outside his professional capacity (paras 17-26).

4.2 On the first question

Proceeding to answer the first question referred to it, the ECJ observed that the UCTD aims at protecting consumers vis-à-vis sellers/suppliers on the premise the consumers are in an inferior position regarding their knowledge and bargaining power (para. 27). The UCTD aims at restoring such imbalance by rendering unfair contractual terms not binding on consumers (para. 28).

The Court then referred to its established case law on the duty of national courts to examine of their own motion the unfairness of contractual terms in consumer contracts. Such a duty is based on the effective application of Art. 6(1) UCTD (paras 28-29). Moreover, it is based on the principle of effectiveness in the context of the procedural autonomy of the EU Member States under Art. 7(1) UCTD, notwithstanding the principle of equivalence (paras 30-32).

Regarding the principle of equivalence, the ECJ reiterated that Article 6(1) UCTD ranks equally with domestic rules of public policy. Whether a national court has a duty to assess ex officio the unfairness of a term under the UCTD depends on whether that court, under national procedural rules, has discretion or an obligation to examine ex officiothe violation of national rules of public policy (paras 33-35). This is for the national court to ascertain (para. 36).

As to the principle of effectiveness, the ECJ observed that, in the case at hand, there was a dispute between two commercial entities. Thus, there was no imbalance of power and knowledge between them. As a result, there was no duty of the national court to examine of its own motion the potential unfairness of the clause in question (paras 38-39). In addition, the principle of effectiveness does not require an ex officio assessment of the unfairness of the term, if the legal entity as an assignee has or had, under the national procedural rules, a genuine opportunity to rely on the unfairness of the contractual clause (para. 40).

4.3 On the second question

The ECJ observed that the second question regarded the right of each litigant to a fair hearing. This entitles each party to the litigation to be informed of the issues that the court has raised of its own motion and provide its views thereon (paras 44-45). Thus, if the national court ex officio finds a contractual term to be unfair, it must notify the litigation parties thereof, and provide them with the opportunity to present their views and refute the views of the other party (para. 46). In this way, the national court also fulfils its duty to consider the potential consent of the assignee to the use of the term in question despite its unfairness (para. 47) – although this was obviously not the case in the present proceedings (para. 48). On the contrary, the national court did not have to inquire the consumer’s opinion since the consumer was not a party to the dispute (para. 49).

5. Comments

This judgment provides helpful guidance on the duty of the national court to assess ex officio the unfairness of a contractual term. Moreover, it is interesting to compare this judgment with the ECJ judgment in C-11/23 Eventmedia Soluciones regarding the validity of such clauses under the APRR.

5.1 Ex officio assessment of unfairness

The judgment reveals two aspects of the assessment of unfairness under the UCTD: a substantial and a procedural one. Both aspects are influenced by the imbalance between the consumer and the seller/supplier, which lies at the core of the UCTD and which national courts are required to restore by positive action (C-240/98 to C-244/98 Oceano Grupo and others, para. 25). At the substantial level, national courts must declare an unfair term non-binding to the consumer and, at the procedural level, they must assess of their own motion the unfairness of the terms relevant to the dispute. Hence, the substantial and procedural aspects are distinct, albeit interconnected (see Judgment, para. 24).

The substantial aspect relates to the scope of the UCTD and the criteria of unfairness. As a result, it is immaterial for the applicability of the UCTD whether the parties to the litigation are legal entities, as long as: (1) the contract has been concluded between a seller/supplier and a ‘consumer’ (Judgment, paras 17, 24-25); and (2) one party to the litigation is an assignee of a consumer or an organisation having a legitimate interest under national law in protecting consumers (UCTD, Article 7(2)).

The duty to an ex officio assessment is a procedural issue. It accounts for the fact that consumers may be unaware of the potential unfairness of contractual terms or incapable of invoking them, because they deem their participation to the trial unworthy in view of the high litigation cost compared to the value of the dispute (C-240/98 to C-244/98 Oceano Grupo and Salvat Editores, para. 26). In principle, this duty of the national court arises only if the consumer participates in the litigation as a plaintiff or a defendant, because in such cases the substantial imbalance of the contractual parties is transferred to the litigation level. However, there are cases in which a legal entity is a litigant in the place of the consumer, by means of assignment from the consumer or because it has legitimate interests in protecting consumers. In such cases, the ECJ considers that there is no imbalance between the litigants as a procedural issue (C-413/12 Asociación de Consumidores Independientes de Castilla y León, paras 48-50; Judgment, para. 38). The ECJ bases such view on purely formal criteria: ‘consumers’ are natural persons acting outside their trade or profession and are irrefutably deemed to have limited knowledge and experience (see C-110/14 Costea, paras 16-18, 20-21, 26-27); whereas a legal entity is irrefutably considered to be more sophisticated and does not need such a high level of protection.

Concerning the capacity of ‘consumer’, the ECJ seems to apply a kind of presumption in favour of such capacity, when a natural person contracts a commercial entity. In the absence of evidence to the contrary, natural persons are deemed to have acted outside their professional capacity (C-519/19 Delay Fix, para. 56; Judgment, para. 19). However, such evidence needs to be strong and not based on isolated factors (see C-774/19 Personal Exchange International, paras 49-50).

Nonetheless, in exceptional cases, the national court may be under a duty to assess ex officio the unfairness of a contractual clause, although no ‘consumer’ is party to the litigation. As the Court notes in para. 40 of its Judgment, such a duty exists also when the assignee, despite being a commercial entity, had no ‘genuine opportunity’ to raise the issue of unfairness. This refers to the rights of the assignee under national law. That might be the case, if e.g. under national law the assignment did not include the whole contract of air carriage, but only a part of it, and the clause prohibiting the assignment had not been part of the assignment (see C-519/19 Delay Fix, paras 47, 63). The reason for this exception likely lies in the close connection between the substantive and procedural aspects of the consumer rights under the UCTD.

5.2 Validity of clauses prohibiting assignment of passenger claims under the Air Passengers Rights Regulation

Many air carriers have introduced clauses prohibiting the assignment of passenger claims to third parties. Although such clauses usually have a generic scope, air carriers had in mind mainly claims based on the APRR when they introduced them. This Regulation, among others, provides for compensation to passengers in cases of cancellations of flights and denied boarding of passengers (Articles 4(3) and 5(1)(c) APRR). The ECJ has interpreted the Regulation as providing such a right also in cases of delays in arrival to the final destination exceeding three hours. The amount of compensation is standardised and depends on the distance of the flight to its final destination (Article 7 APRR). The standardised compensation amounts, combined with the very limited possibilities of exclusion of the carrier liability (Article 5(3) APRR), has led to the creation of commercial entities, to which passengers may assign their claims and which undertake to enforce passenger claims before national courts against a percentage from the compensation received (contingency fee, see here for an overview). This has resulted in a significant increase of passenger claims against air carriers, which has increased the cost of carriers regarding the amounts paid not only for compensation but also for judicial costs. Air carriers have reacted by introducing non-assignment clauses in their contracts with passengers.

Regarding passenger claims based on the APRR, national courts have assessed under the UCTD, on a number of occasions, the unfairness of clauses prohibiting assignment. The results have been mixed. The main issue in the proceedings has been whether the prohibition of assignment obstructs the passenger’s (or consumer’s) route to compensation, including access to courts. In England, the Court of Appeals affirmed the judgment of the trial judge, who found such clause to be fair (Bott and Co Solicitors Lyd v Ryanair DAC [2019] EWCA Civ 143, at [71]-[73], reviewed on other grounds [2022] UKSC 8). On the contrary, in Germany, such clauses have been found unfair in a long line of case law (e.g. LG Nürnberg-Fürth, 30.7.2018; LG Frankfurt am Main, 25.11.2021), including the Federal Court of Justice (BGH 1.8.2023, paras 8, 10, 14, affirming LG Memmingen, 28.9.2022, para. 14).

Earlier this year, the ECJ already clarified, in C-11/23 Eventmedia Soluciones (paras 39-46), that clauses prohibiting assignment of claims based on the APRR are invalid under Article 15 of the Regulation, which prohibits any limitation of passenger rights. Hence, the discussion on the unfairness of such clauses under the UCTD has no practical importance to the APRR. The UCTD has practical importance, however, for claims under the MC99. Articles 29 and 33(4) MC99 clarify that issues of legal standing are governed by the domestic law of the contracting States, which, in the context of EU law, entails the applicability of the UCTD.

6. Conclusion

In conclusion, the present judgment is noteworthy, because it clarifies important aspects of the duty of national courts to assess of their own motion the unfairness of contractual clauses under the UCTD. Moreover, combined with case law of the ECJ and the national courts on the APRR, it sheds some light on the application of the UCTD to passenger claims under the MC99.

Source link

18May

Google AI Introduces PaliGemma: A New Family of Vision Language Models

Google has released a new family of vision language models called PaliGemma. PaliGemma can produce text by receiving an image and a text input. The architecture of the PaliGemma (Github) family of vision-language models consists of the image encoder SigLIP-So400m and the text decoder Gemma-2B. A cutting-edge model that can comprehend both text and visuals is called SigLIP. It comprises a joint-trained image and text encoder, similar to CLIP. Like PaLI-3, the combined PaliGemma model can be easily refined on downstream tasks like captioning or referencing segmentation after it has been pre-trained on image-text data. Gemma is a text-generating model that requires a decoder. By utilizing a linear adapter to integrate Gemma with SigLIP’s image encoder, PaliGemma becomes a potent vision language model.

Big_vision was used as the training codebase for PaliGemma. Using the same codebase, numerous other models, including CapPa, SigLIP, LiT, BiT, and the original ViT, have already been developed.

The PaliGemma release includes three distinct model types, each offering a unique set of capabilities:

PT checkpoints: These pretrained models are highly adaptable and designed to excel in a variety of tasks. Blend checkpoints: PT models adjusted for a variety of tasks. They can only be used for research purposes and are appropriate for general-purpose inference with free-text prompts.
FT checkpoints: A collection of refined models focused on a distinct academic standard. They are only meant for research and come in various resolutions.

The models are available in three distinct precision levels (bfloat16, float16, and float32) and three different resolution levels (224×224, 448×448, and 896×896). Each repository holds the checkpoints for a certain job and resolution, with three revisions for every precision possible. The main branch of each repository has float32 checkpoints, while the bfloat16 and float16 revisions have matching precisions. It’s important to note that models compatible with the original JAX implementation and hugging face transformers have different repositories.

The high-resolution models, while offering superior quality, require significantly more memory due to their longer input sequences. This could be a consideration for users with limited resources. However, the quality gain is negligible for most tasks, making the 224 versions a suitable choice for the majority of uses.

PaliGemma is a single-turn visual language model that performs best when tuned to a particular use case. It is not intended for conversational use. This means that while it excels in specific tasks, it may not be the best choice for all applications.

Users can specify the task the model will perform by qualifying it with task prefixes like ‘detect’ or ‘segment ‘. This is because the pretrained models were trained in a way to give them a wide range of skills, such as question-answering, captioning, and segmentation. However, instead of being used immediately, they are designed to be fine-tuned to specific tasks using a comparable prompt structure. The ‘mix’ family of models, refined on various tasks, can be used for interactive testing.

Here are some examples of what PaliGemma can do: it can add captions to pictures, respond to questions about images, detect entities in pictures, segment entities within images, and reason and understand documents. These are just a few of its many capabilities.

When asked, PaliGemma can add captions to pictures. With the mix checkpoints, users can experiment with different captioning prompts to observe how they react.

PaliGemma can respond to a question about an image passed on with it.
PaliGemma may use the detect [entity] prompt to find entities in a picture. The bounding box coordinate location will be printed as unique tokens, where the value is an integer that denotes a normalized coordinate.

When prompted with the segment [entity] prompt, PaliGemma mix checkpoints can also segment entities within an image. Because the team utilizes natural language descriptions to refer to the things of interest, this technique is known as referring expression segmentation. The output is a series of segmentation and location tokens. As previously mentioned, a bounding box is represented by the location tokens. Segmentation masks can be created by processing the segmentation tokens one more time.

PaliGemma mix checkpoints are very good at reasoning and understanding documents.

he field.

Check out the Blog, Model, and Demo. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter. Join our Telegram Channel, Discord Channel, and LinkedIn Group.

If you like our work, you will love our newsletter..

Don’t Forget to join our 42k+ ML SubReddit

Dhanshree Shenwai is a Computer Science Engineer and has a good experience in FinTech companies covering Financial, Cards & Payments and Banking domain with keen interest in applications of AI. She is enthusiastic about exploring new technologies and advancements in today’s evolving world making everyone’s life easy.

Source link

1 … 293 294 295 296 297 … 309