19May

Frontier AI Regulation | GovAI Blog


This summarises a recent multi-author white paper on frontier AI regulation. We are organizing a webinar about the paper on July 20th. Sign up here.

GovAI research blog posts represent the views of their authors, rather than the views of the organisation.

Summary

AI models are already having large social impacts, both positive and negative. These impacts will only grow as models become more capable and more deeply integrated into society.

Governments have their work cut out for them in steering these impacts for the better. They have a number of challenges they need to address, including AI being used for critical decision-making, without assurance that its judgments will be fair and accurate; AI being integrated into safety-critical domains, with accompanying accident risks; and AI being used to produce and spread disinformation.

In a recent white paper, we focus on one such challenge: the increasingly broad and significant capabilities of frontier AI models. We define “frontier AI models” as highly capable foundation models,1 which could have dangerous capabilities that are sufficient to severely threaten public safety and global security. Examples of capabilities that would meet this standard include designing chemical weapons, exploiting vulnerabilities in safety-critical software systems, synthesising persuasive disinformation at scale, or evading human control. 

We think the next generation of foundation models – in particular, those trained using substantially greater computational resources than any model trained to date – could have these kinds of dangerous capabilities. Although the probability that next-generation models will have these capabilities is ambiguous, we think it is high enough to warrant targeted regulation. The appropriate regulatory regime may even include licensing requirements. 

Effective frontier AI regulation would require that developers put substantial effort into understanding the risks their systems might pose, in particular by evaluating whether they have dangerous capabilities or are insufficiently controllable. These risk assessments would receive thorough external scrutiny and inform decisions about how and whether new models are deployed. After deployment, the extent to which the models are causing harm would need to be continually evaluated. Other requirements, such as high cybersecurity standards, would likely also be appropriate. Overall, regulatory requirements need to evolve over time. 

Based on current trends, creating frontier AI models is likely to cost upwards of hundreds of millions of dollars in compute and also require other scarce resources like relevant talent. The described regulatory approach would therefore likely only target the handful of well-resourced companies developing these models, while posing few or no burdens on other developers.

What makes regulating frontier AI challenging?

There are three core challenges for regulating frontier AI models:

  • The Unexpected Capabilities Problem: The capabilities of new AI models are not reliably predictable and are often difficult to fully understand without intensive testing. Researchers have repeatedly observed capabilities emerging or significantly improving suddenly in foundation models. They have also regularly induced or discovered new capabilities through techniques including fine-tuning, tool use, and prompt engineering. This means that dangerous capabilities could arise unpredictably and – absent requirements to do intensive testing and evaluation pre- and post-deployment – could remain undetected and unaddressed until it is too late to avoid severe harm. 
  • The Deployment Safety Problem: AI systems can cause harm even if neither the user nor the developer intends them to, for several reasons. Firstly, it is difficult to precisely specify what we want deep learning-based AI models to do, and to ensure that they behave in line with those specifications. Reliably controlling AI models’ behavior, in other words, remains a largely unsolved technical problem. Secondly, attempts to “bake in” misuse prevention features at the model level, such that the model reliably refuses to obey harmful instructions, have proved circumventable due to methods such as “jailbreaking.” Finally, distinguishing instances of harmful and beneficial use may depend heavily on context that is not visible to the developing company. Overall, this means that even if state-of-the-art deployment safeguards are adopted, robustly safe deployment is difficult to achieve and requires close attention and oversight.
  • The Proliferation Problem: Frontier AI models are more difficult to train than to use. Thus, a much wider array of actors have the resources to misuse frontier AI models than have the resources to create them. Non-proliferation of frontier AI models is therefore essential for safety, but difficult to achieve. As AI models become more useful in strategically important contexts and the costs of producing the most advanced models increase, bad actors may launch increasingly sophisticated attempts to steal them. Further, when models are open-sourced, accessing or introducing dangerous capabilities becomes much easier. While we believe that open-sourcing of non-frontier AI models is currently an important public good, open-sourcing frontier AI models should be approached more cautiously and with greater restraint.

What regulatory building blocks are needed for frontier AI regulation?

Self-regulation is unlikely to provide sufficient protection against the risks of frontier AI models: we think government intervention will be needed. The white paper explores the building blocks such regulation would need. These include:

  • Mechanisms to create and update standards for responsible frontier AI development and deployment. These should be developed via multi-stakeholder processes and could include standards relevant to foundation models overall – not just standards that exclusively pertain to frontier AI. These processes should facilitate rapid iteration to keep pace with the technology.
  • Mechanisms to give regulators visibility into frontier AI developments. These mechanisms could include disclosure regimes, monitoring processes, and whistleblower protections. The goal would be to equip regulators with the information they need to identify appropriate regulatory targets and design effective tools for governing frontier AI. The information provided would pertain to qualifying frontier AI development processes, models, and applications.
  • Mechanisms to ensure compliance with safety standards. Self-regulation efforts, such as voluntary certification, may go some way toward ensuring compliance with safety standards by frontier AI model developers. However, this seems likely to ultimately be insufficient without government intervention. Intervention may involve empowering a government authority to translate standards into legally binding rules, identify and sanction non-compliance with rules, or perhaps establish and implement a licensing regime for the deployment and potentially the development of frontier AI models. Designing a well-balanced frontier AI regulation regime is a difficult challenge. Regulators would need to be sensitive to the risks of overregulation and stymieing innovation on the one hand, and the risks of moving too slowly (relative to the pace of AI progress) on the other.

What could safety standards for frontier AI development look like?

The white paper also suggests some preliminary, minimum safety standards for frontier AI development and release: 

  • Conducting thorough risk assessments informed by evaluations of dangerous capabilities and controllability. This would reduce the risk that deployed models possess unknown dangerous capabilities or behave unpredictably and unreliably.
  • Engaging external experts to apply independent scrutiny to models. External scrutiny of the safety and risk profiles of models would both improve assessment rigour and foster accountability to the public.
  • Following shared guidelines for how frontier AI models should be deployed based on their assessed risk. The results from risk assessments should determine whether and how a model is deployed and what safeguards are put in place. Options could range from deploying the model without restriction to not deploying it at all until risks are sufficiently reduced. In many cases, an intermediate option – deployment with appropriate safeguards, such as restrictions on the ability of the model to respond to risky instructions – will be appropriate.
  • Monitoring and responding to new information on model capabilities. The assessed risk of deployed frontier AI models may change over time due to new information and new post-deployment enhancement techniques. If significant information on model capabilities is discovered post-deployment, risk assessments should be repeated and deployment safeguards should be updated.

Other standards would also likely be appropriate. For example, frontier AI developers could be required to uphold high cybersecurity standards to ward off attempts at theft. In addition, these standards should likely change substantially over time as we learn more about the risks from the most capable AI systems and the means of mitigating those risks.

Uncertainties and next steps

While we feel confident that there is a need for frontier AI regulation, we are unsure about many aspects of how an appropriate regulatory regime should be designed. Relevant open questions include:

  • How worried should we be about regulatory capture? What can be done to reduce the chance of regulatory capture? For example, how could regulator expertise be bolstered? How much could personnel policies help – such as cool-off periods between working for industry and for a regulatory body?
  • What is the appropriate role of tort liability for harms caused by frontier models? How can it best complement regulation? Are there contexts in which it could serve as a substitute for regulation?
  • How can the regulatory regime be designed to deal with the evolving nature of the industry and the evolving risks? What can be done to ensure that ineffective or inappropriate standards are not locked in?
  • How, in practice, should a regime like this be implemented? For instance, in the US, is there a need for a new regulatory body? If the responsibility should be located within an existing department, which one would be the most natural home? In the EU, can the AI Act be adapted to deal appropriately with frontier AI and not just lower-risk foundation models? How does the proposed regulatory scheme fit into the UK’s Foundation Model Taskforce?
  • Which other safety standards, beyond the non-exhaustive list we suggest, should frontier AI developers follow?
  • Would a licensing regime be both warranted and feasible right now? If a licensing regime is not needed now but may be in the future, what exactly should policymakers do today to prepare and pave the way for future needs?
  • In general, how should we account for uncertainty about the level of risks the next generation of foundation models will pose? It is possible that the next generation of frontier models will be less risky than we fear. However, given the uncertainty and the need to prepare for future risks, we think taking preliminary measures now is the right approach. It is not yet clear what the most useful and cost-effective preliminary measures will be.

We are planning to spend a significant amount of time and effort exploring these questions. We strongly encourage others to do the same. Figuring out the answers to these questions will be extremely difficult, but deserves all of our best efforts.

Authorship statement

The white paper was written with co-authors from a number of institutions, including authors employed by industry actors that are actively developing state-of-the-art foundation models (Google DeepMind, OpenAI, and Microsoft). Although authors based in labs can often contribute special expertise, their involvement also naturally raises concerns that the content of the white paper will be biased toward the interests of companies. This suspicion is healthy. We hope that readers will be motivated to closely engage with the paper’s arguments, take little for granted, and publicly raise disagreements and alternative ideas.

The authors of this blog post can be contacted at

ma***************@go********.ai











,

jo***********@go********.ai











, and

ro***********@go********.ai











.



Source link

19May

What Should the Global Summit on AI Safety Try to Accomplish?


This post explores possible outcomes of the upcoming UK-hosted global summit on AI safety. It draws on ideas raised in a recent expert workshop hosted by GovAI, but is not intended as a consensus statement.

GovAI research blog posts represent the views of their authors, rather than the views of the organisation.

Summary

Following growing concern about possible extreme risks from AI, the United Kingdom recently announced plans to host the first major global summit on AI safety. The announcement sketched a high-level vision for the summit:

The summit, which will be hosted in the UK this autumn, will consider the risks of AI, including frontier systems, and discuss how they can be mitigated through internationally coordinated action. It will also provide a platform for countries to work together on further developing a shared approach to mitigate these risks.

However, more detailed aims have not yet been shared. GovAI recently convened a small expert workshop – organised without government involvement and held under the Chatham House rule – to explore possible objectives for the summit.

Attendees suggested a wide variety of valuable direct outcomes of the summit. These can could mainly be grouped into six categories:

  1. Producing shared commitments and consensus statements from states
  2. Planting the seeds for new international institutions
  3. Highlighting and diffusing UK AI policy initiatives
  4. Securing commitments from AI labs
  5. Increasing awareness and understanding of AI risks and governance options
  6. Committing participants to annual AI safety summits and further discussions

Several attendees suggested that the summit may be a unique and fleeting opportunity to ensure that global AI governance includes China. China is likely to be excluded from important discussion venues, such as the OECD and G7. China may also have only a temporary period of AI policy “plasticity” and will be more inclined to reject any global governance principles that Western states begin crafting without its input. However, one attendee disagreed that China should be invited, emphasising that Chinese participation could make the summit less productive.

The Summit Could Produce Several Valuable Outcomes

Potential valuable outcomes from the summit include:

  1. Producing shared commitments and consensus statements from states. Participating states could sign onto statements that establish a shared understanding of risk, influence global AI governance norms and priorities, and provide guidance to private labs. A baseline goal could be to produce a shared statement acknowledging that extreme safety risks are a global priority.
  2. Planting the seeds for new international institutions. States could commit to creating new international institutions (or at least further discussing possibilities). Possible functions of new institutions include housing research, developing expert consensus and shared standards, providing evaluation and auditing functions, and facilitating the creation of international agreements.
  3. Highlighting and diffusing UK AI policy initiatives. The UK could establish itself as the global leader in AI policy, showcase its new AI policy initiatives, and support their diffusion to other states. Initiatives that could be showcased include an information-sharing and model evaluation regime, plans for an associated licensing regime, and guidance documents to labs outlining responsible practices (e.g. regarding model release and scaling).
  4. Securing commitments from AI labs. Labs could make shared commitments (e.g. to best practices) that they will not by default make without public pressure or government encouragement, due to commercial incentives and legal constraints that limit coordination. This could include commitments to share further information with governments, invest a minimum portion of their resources into safety, temporarily pause development if certain model evaluation results are triggered, or adopt responsible policies regarding model scaling.
  5. Increasing awareness and understanding of AI risks and governance options. Events (e.g. events demonstrating risks), discussions, statements, and media coverage of the summit could increase awareness and understanding of AI safety challenges (particularly AGI safety challenges) and the need for global governance. It is useful for this awareness to be shared across government, industry, civil society, the media, and the broader public.
  6. Committing participants to annual AI safety summits and further discussions. Participants could agree to further meetings to maintain momentum and make progress on issues raised during the summit (e.g. the potential need for new international institutions and agreements). In addition to further iterations of the summit, new expert-led working groups (potentially hosted by the UK or the OECD) could be established. 

The Summit May Be a Critical Opportunity to Ensure Global AI Governance Includes China

A number of participants suggested that this summit may be the only opportunity to include China in productive global AI governance discussions. Other important venues – including the OECD, the G7, or meetings initiated by the US – will likely exclude China by default. More inclusive institutions such as the UN may also struggle to make forward progress due to structural constraints.

This opportunity may be critical, since global AI governance will likely fail if it does not ultimately include China. If China – the world’s third major AI power – does not adopt responsible AI safety policies, then this could threaten the security of all other states. External constraints, such as export controls on advanced chips and relevant manufacturing equipment, will probably only delay China’s ability to create high-risk models.

Summit outcomes (such as those listed above) could therefore be more valuable if they include China. One participant whose research focuses on Chinese AI policy argued that China can be positively influenced by the summit.1 There is interest in AI safety in China – for instance, a number of prominent Chinese academics signed the recent CAIS statement on AI risk – and Chinese actors have been signalling an interest in participating in global AI governance efforts.

Trying to involve China later – for instance, in future iterations of the summit – may also be much less impactful than involving China now. China is more likely to adopt global frameworks if it can legitimately view itself as a co-creator of these frameworks; this will be less likely if China is not invited from the start. As China develops its own regulatory frameworks for advanced AI and makes its own independent policy statements, Chinese AI policy may also quickly lose some of the “plasticity” it has now

However, within the group, there was not universal consensus that China should be invited to the summit. One potential cost of inviting China is that it may reduce the enthusiasm of other states and non-governmental actors to participate in the summit. Inviting China may also make the summit less productive by increasing the level of disagreement and potential for discord among participants. There may also be some important discussion topics that would not be as freely explored with Chinese representatives in the room.

These concerns are partly mitigated by the likely existence of side meetings – and discussions in venues beyond the summit – which could include smaller subsets of participants. However, involving China in the summit would inevitably involve trade-offs. These trade-offs could be evaluated in part by seeking the views of other likely participants.

Conclusion

The first global summit on AI safety is an important opportunity to make progress toward managing global risks from advanced AI systems. It could directly produce several outcomes, including: (1) shared commitments and consensus statements from states, (2) the seeds of new international institutions, (3) the advertisement and diffusion of UK AI policy initiatives, (4) shared commitments from labs, (5) increased understanding of AI risks and governance options, and (6) commitments to further productive discussions.

It may also be a unique and fleeting opportunity to involve China in global AI governance. Ultimately, however, making the correct decision regarding China’s participation will require weighing the value of Chinese participation with the frictions and stakeholder management challenges issues it may bring.

The authors of this piece can be contacted at

be***********@go*******.ai











and

le**********@go********.ai











.



Source link

19May

Proposing an International Governance Regime for Civilian AI


This post summarises a new report, “International Governance of Civilian AI: A Jurisdictional Certification Approach.” You can read the full report here.

GovAI research blog posts represent the views of their authors, rather than the views of the organisation.

Many jurisdictions have begun to develop their own approaches to regulating AI, in response to concerns that range from bias to misinformation to the possibility of existential risks from advanced AI. 

However, without international coordination, these purely national and regional approaches will not be enough. Many of the risks posed by AI are inherently international, since they can produce harms that spill across borders. AI supply chains and product networks also span many countries. We therefore believe that a unified international response to the risks posed by AI will be necessary.

In recognition of this need, we have developed a proposal – outlined in a new report – for an international governance body that can certify compliance with international standards on civilian AI.1 We call this proposed body the International AI Organization (IAIO).

Our proposal for the IAIO follows a jurisdictional certification approach, modelled on the approach taken by other international bodies such as the International Civilian Aviation Organization (ICAO), the International Maritime Organization (IMO) and the Financial Action Task Force (FATF). Under this approach, the international body certifies that the regulatory regimes adopted within individual jurisdictions meet international standards. Jurisdictions that fail to receive certifications (e.g. because their regulations are too lax or they fail to enforce them) are excluded from valuable trade relationships – or otherwise suffer negative consequences.

Our report outlines the proposal in more detail and explains how it could be put into practice. We suggest that even a very partial international consensus on minimum regulatory standards – perhaps beginning with just a few major players – could be channeled into an institutional framework designed to produce increasingly widespread compliance. An initial set of participating states could establish the IAIO and – through the IAIO – arrive at a set of shared standards and a process for certifying that a state’s regulatory regime meets these standards. One of the standards would be a commitment to ban the import of goods that integrate AI systems from uncertified jurisdictions. Another standard could be a commitment to ban the export of AI inputs (such as specialised chips) to uncertified jurisdictions. The participating states’ trade policies would thereby incentivise other states to join the IAIO themselves and receive certifications.

We believe that the IAIO could help to mitigate many of AI’s potential harms, from algorithmic bias to the longer-term security threats. Our hope is that it can balance the need to prevent harmful forms of proliferation against the imperatives to spread the benefits of the technology and give voice to affected communities around the globe.



Source link

19May

Preventing Harms From AI Misuse


This post summarises a recent paper, “Protecting Society from AI Misuse: When are Restrictions on Capabilities Warranted?

GovAI research blog posts represent the views of their authors, rather than the views of the organisation.

Introduction

Recent advancements in AI have enabled impressive new capabilities, as well as new avenues to cause harm. Though experts have long warned about risks from the misuse of AI, the development of increasingly powerful systems has made the potential for such harms more widely felt. For instance, in the United States, Senator Blumenthal recently began a Senate hearing by showing how existing AI systems can be used to impersonate him. Last week, the AI company Anthropic reported that they have begun evaluating whether their systems could be used to design biological weapons.

To address growing risks from misuse, we will likely need policies and best practices that shape what AI systems are developed, who can access them, and how they can be used. We call these “capabilities interventions.”

Unfortunately, capabilities interventions almost always have the unintended effect of hindering some beneficial uses. Nonetheless, we still argue that — at least for certain particularly high-risk systems — these sorts of interventions will be increasingly warranted as potential harms from AI systems increase in severity.

The Growing Risk of Misuse

Some existing AI systems – including large language models (LLMs), image and audio generation systems, and drug discovery systems – already have clear potential for misuse. For instance, LLMs can speed up and scale up spear phishing cyber attacks to target individuals more effectively. Image and audio generation models can create harmful content like revenge porn or misleading “deepfakes” of politicians.  Drug discovery models can be used to design dangerous novel toxins: for example, researchers used one of these models to discover over 40,000 potential toxins, with some predicted to be deadlier than the nerve agent VX.

As AI systems grow more capable across a wide range of tasks, we should expect some of them to become more capable of performing harmful tasks as well. Although we cannot predict precisely what capabilities will emerge, experts have raised a number of concerning near- and medium-term possibilities. For instance, LLMs might be used to develop sophisticated malware. Video-generation systems could allow users to generate deepfake videos of politicians that are practically indistinguishable from reality. AI systems intended for scientific research could be used to design biological weapons with more destructive potential than known pathogens. Other novel risks will likely emerge as well. For instance, a recent paper highlights the potential for extreme risks from general-purpose AI systems capable of deception and manipulation.

Three Approaches to Reducing Misuse Risk

Developers and governments will need to adopt practices and policies (“interventions”) that reduce risks from misuse. At the same time, they will also need to avoid interfering too much with beneficial uses of AI.   

To help decision-makers navigate this dilemma, we propose a conceptual framework we refer to as the “Misuse Chain”. The Misuse Chain breaks the misuse of AI down into three stages, which can be targeted by different interventions.

Capabilities interventions target the first stage of the Misuse Chain, in which malicious actors gain access to the capabilities needed to cause harm. For instance, if an AI company wishes to make it harder for bad actors to create non-consensual pornographic deepfakes, it can introduce safeguards that block its image generation systems from generating sexual imagery. It can also take additional measures to prevent users from removing these safeguards, for instance by deciding not to open-source the systems (i.e. make them freely available for others to download and modify).

Harm mitigation interventions target the next stage in the Misuse Chain, when capabilities are misused and produce harm. For example, in the case of pornographic deepfakes, harm can be mitigated by identifying the images and preventing them from being shared on social media.

Harm response interventions target the final stage of the Misuse Chain, in which bad actors experience consequences of their actions and victims receive remedies. For example, people caught using AI systems to generate explicit images of minors could not only have their access revoked but also face punishment through the legal system. Here, the aim is to deter future attempts at misuse or rectify the harm it has already caused.

Capabilities Interventions and the Misuse-Use Tradeoff

Interventions that limit risks from misuse can also hinder beneficial uses; they are subject to what we call the “Misuse-Use Tradeoff”. As interventions move “upstream” along the Misuse Chain — and more directly affect the core capabilities that enable malicious actors to cause harm — they tend to become less targeted and exhibit more significant trade-offs. Interventions near the top of the chain can limit entire classes of use.

Take, for example, content policies for large language models that prohibit the model from outputting any text related to violence. Such a blunt policy tool may prevent harmful content from being generated, but it also prevents the model from being used to analyse or generate fictional works that reference violence. This is an undesirable side effect. 

However, if an AI system can be used to cause severe harm, then the benefits of even fairly blunt capability interventions may outweigh the costs.

Why Capabilities Interventions May Be Increasingly Necessary

Where possible, developers and policymakers should prefer harm mitigation and harm response interventions. These interventions are often more targeted, meaning that they are more effective at reducing misuse without substantially limiting beneficial use.

However, these sorts of interventions are not always sufficient. There are some situations where capabilities interventions will be warranted, even if other interventions are already in place. In broad terms, we believe that capabilities interventions are most warranted when some or all of the following conditions hold:

  1. The harms from misuse are especially significant. If the potential harms from misuse significantly outweigh the benefits from use, then capabilities interventions may be justified even if they meaningfully reduce beneficial use.
  2. Other approaches to reducing misuse have limited efficacy if applied alone. If interventions at other stages are not very effective, then capabilities interventions will be less redundant.
  3. It is possible to restrict access to capabilities in a targeted way. Though capabilities interventions tend to impact both misuse and use, some have little to no impact on legitimate use. For example, AI developers can monitor systems’ outputs to flag users who consistently produce inappropriate content and adjust their access accordingly. If the content classification method is reliable enough, this intervention may have little effect on most users. In such cases, the argument for capability interventions may be particularly strong.

Ultimately, as risks from the misuse of AI continue to grow, the overall case for adopting some capability interventions will likely become stronger. Progress in developing structured access techniques, which limit capability access in finely targeted ways, could further strengthen the case.

Varieties of Capability Interventions

In practice, to reduce misuse risk, there are several varieties of capability interventions that developers and policymakers could pursue. Here, we give three examples.

  1. Limiting the open source sharing of certain AI systems. Open-sourcing an AI system makes it harder to prevent users from removing safeguards, reintroducing removed capabilities, or introducing further dangerous capabilities. It is also impossible to undo the release of a system that has been open-sourced, even if severe harms from misuse start to emerge. Therefore, it will sometimes be appropriate for developers to refrain from open-sourcing high-risk AI systems – and instead opt for structured access approaches when releasing the systems. In sufficiently consequential cases, governments may even wish to require the use of structured access approaches.
  2. Limiting the sharing or development of certain AI systems altogether. In some cases, for instance when an AI system is intentionally designed to cause harm, the right decision will be to refrain from development and release altogether. If the problem of emergent dangerous capabilities becomes sufficiently severe, then it may also become wise to refrain from developing and releasing certain kinds of dual-use AI systems. As discussed above, in sufficiently high-risk cases, governments may wish to consider regulatory options. 
  3. Limit access to inputs needed to create powerful AI systems. Companies and governments can also make it harder for certain actors to access inputs (such as computing hardware) that are required to develop harmful AI systems. For example, the US recently announced a series of export controls on AI-relevant hardware entering China, in part to curb what it considers Chinese misuse of AI in the surveillance and military domains. Analogously, cloud computing providers could also introduce Know Your Customer schemes to ensure their hardware is not being used to develop or deploy harmful systems.

Conclusion

As AI systems become more advanced, the potential for misuse will grow. Balancing trade-offs between reducing misuse risks and enabling beneficial uses will be crucial. Although harm mitigation and harm response interventions often have smaller side effects, interventions that limit access to certain AI capabilities will probably become increasingly warranted.

AI labs, governments, and other decision makers should be prepared to increasingly limit access to AI capabilities that lend themselves to misuse. Certain risks from AI will likely be too great to ignore, even if that means we must impede on legitimate use. These actors should also work on building up processes for determining when capabilities interventions are warranted. Furthermore, they should invest in developing interventions that further reduce the Misuse-Use Tradeoff. Continued research could yield better tools for preventing dangerous capabilities from ending up in the wrong hands, without also preventing beneficial capabilities from ending up in the right hands.



Source link

19May

Preventing AI Misuse: Current Techniques


This post aims to give policymakers an overview of available methods for preventing the misuse of general-purpose AI systems, as well as the limitations of these methods.

GovAI research blog posts represent the views of their authors, rather than the views of the organisation.

Introduction

As general-purpose AI models have become more capable, their potential for misuse has grown as well. Without safety measures in place, for example, many models can now be used to create realistic fake media. Some experts worry that future models could facilitate more extreme harm, including cyberattacks, voter manipulation, or even attacks with biological weapons.

Governments and companies increasingly recognise the need to evaluate models for dangerous capabilities. However, identifying dangerous capabilities is only a first step toward preventing misuse. 

Developers must ultimately find ways to remove dangerous capabilities or otherwise limit the ability of users to apply them.1 To this end, developers currently implement a number of misuse-prevention techniques. These include fine-tuning, content filters, rejection sampling, system prompts, dataset filtering, and monitoring-based restrictions.

However, existing misuse-prevention techniques are far from wholly reliable. It is particularly easy to circumvent them when models are “open-sourced”. Even when models are accessed through online interfaces, however, the techniques can still be fairly easily circumvented through “jailbreaking” techniques. External researchers have demonstrated, for example, that jailbroken systems can perform tasks such as crafting targeted phishing emails.

If future models do eventually develop sufficiently dangerous capabilities (such as assisting in the planning and execution of a biological attack), then it may not be responsible to release them until misuse-prevention techniques are more reliable. Therefore, improving misuse-prevention techniques could reduce both social harms and future delays to innovation.

Misuse-prevention techniques

In this section, I will briefly survey available techniques for preventing misuse. I will then turn to discussing their limitations in more depth.

Fine-tuning

Companies developing general-purpose AI systems — like OpenAI and Anthropic — primarily reduce harmful outputs through a process known as “fine-tuning”. This involves training an existing AI model further, using additional data, in order to refine its capabilities or tendencies. 

This additional data could include curated datasets of question/answer pairs, such as examples of how the model should respond to irresponsible requests. Fine-tuning data may also be generated through “reinforcement learning from human feedback” (RLHF), which involves humans scoring the appropriateness of the models’ responses, or through “reinforcement learning from AI feedback” (RLAIF). Through fine-tuning, a model can be conditioned to avoid dangerous behaviours or decline dangerous requests. For instance, GPT-4 was fine-tuned to reject requests for prohibited content, such as instructions for cyberattacks.

As with other misuse-prevention techniques, however, there are trade-offs with this approach: fine-tuning can also make models less helpful, by leading them to also refuse some benign requests.

Filters

Filters can be applied to both user inputs and model outputs. For inputs, filters are designed to detect and block users’ requests to produce dangerous content. For outputs, filters are designed to detect and block dangerous content itself.

Developers have various methods to train these filters to recognise harmful content. One option is using labelled datasets to categorise content as either harmful or benign. Another option is creating a new dataset by having humans score the harmfulness of the outputs a model produces or the outputs users send to it. Language models themselves can also be used to evaluate both inputs and outputs.

Rejection sampling

Rejection sampling involves generating multiple outputs from the same AI model, automatically scoring them based on a metric like “potential harm”, and then only presenting the highest scoring output to the user. OpenAI used this approach with WebGPT, the precursor to ChatGPT, to make the model more helpful and accurate.

System prompts

System prompts are natural language instructions to the model that are pre-loaded into user interactions and are usually hidden from the user. Research indicates incorporating directives like “ignore harmful requests” into prompts can reduce harmful outputs

Dataset filtering

Before training an AI model, developers can remove harmful data to ensure the model doesn’t learn from it. For example, OpenAI removed sexual and violent content from the dataset they trained DALLE-3 on. They make use of the same ‘classifiers’ developed for filters to identify training data that should be removed. A distinct downside of this approach, however, is that it can only be applied before a model is trained.

Monitoring-based restrictions

The techniques above take a purely preventative approach. Monitoring-based restrictions instead allow developers to respond to initial cases of misuse – or misuse attempts – by limiting users’ abilities to engage in further misuse.

AI service providers can leverage automated tools like input and output classifiers to screen for misuse. For instance, if a user repeatedly makes requests that are blocked by an input filter, the user may be flagged as high-risk. They might then be sent a warning, have their access temporarily reduced, or even receive a ban. Monitoring can also help reveal new vulnerabilities and patterns of misuse, allowing policies and other safeguards to be iteratively improved.

One potential problem with monitoring is that it can compromise privacy. That said, there are ways to preserve privacy. For instance, reports about high-level patterns of misuse can be generated without company staff needing to examine individuals’ data.

How effective are the mitigations?

Unfortunately, these mitigations are currently ineffective for open-source models (i.e. models that users are allowed to download and run on their own hardware). Safeguards like filters, system prompts, and rejection sampling mechanisms can often be disabled by changing a few lines of code. Fine-tuning is also relatively trivial to reverse; one study also showed that fine-tuning for various Llama models could be undone for under $200. Additionally, once a model is open-sourced, there is no way to reliably oversee how it is being modified or used post-release. Although open-source models have a range of socially valuable advantages, they are also particularly prone to be misused.2

Putting behind an “application programming interface” (API) can help prevent the removal of safeguards. However, even when developers put their models behind an API, the safeguards can reduce, but not eliminate, misuse risks. OpenAI successfully reduced rule-breaking responses by 82% in GPT-4, but risks still remain. Anthropic similarly states that, despite interventions, their systems are not “perfectly safe”.

One challenge in preventing misuse is the phenomenon of “jailbreaks”, where users bypass AI system’s safety measures to generate prohibited content. Successful jailbreaks have employed tactics such as switching to a different language to evade content filters or framing requests for dangerous information within hypothetical scenarios like a script for a play.

A stylised example of a jailbreak from Low-Resource Languages Jailbreak GPT-4

Despite efforts from AI developers, it is still relatively trivial to jailbreak models. For instance, GPT-4 was jailbroken within 24 hours of release, despite the fact that OpenAI spent 6 months investing in safety measures. Developers are actively working to patch identified vulnerabilities, but users have continuedtofind ways around restrictions, as documented on sites like JailbreakChat.

Implications

It seems likely that the benefits of current models outweigh their harms. This is in part thanks to the mitigation techniques discussed here, but also in large part due to the model’s limited capabilities. For instance, the propensity of GPT-4 to make factual mistakes puts a cap on its ability to perform cyberattacks. In some cases, existing safeguards may not even be worth the costs they create for users (e.g. for instance by accidentally filtering benign content).

However, the unreliability of current misuse-prevention techniques may pose significant challenges in the future. If future models become sufficiently powerful, then even a single case of misuse, like a user building a bioweapon, could eventually be catastrophic. One possible implication is that models with sufficiently dangerous capabilities should not be released until better misuse-prevention techniques are developed.

If this implication is right, then better misuse-prevention techniques could prevent future roadblocks to innovation in addition to lowering societal risks.

Conclusion

There are a range of techniques developers can use to limit the misuse of their models. However, these are not consistently effective. If future models do develop sufficiently dangerous capabilities, then the reliability of misuse-prevention techniques could ultimately be a bottleneck to releasing them.

The author of this piece can be contacted at

be**********@go********.ai











. Thanks to the following people for their feedback: Alan Chan, Ben Garfinkel, Cullen O’Keefe, Emma Bluemke, Julian Hazell, Markus Anderljung, Matt van der Merwe, Patrick Levermore, Toby Shevlane.



Source link

19May

Goals for the Second AI Safety Summit


GovAI blog posts represent the views of their authors, rather than the views of the organisation.

Summary

The UK AI Safety Summit was largely a success. To build on it, the international community has committed to hold further AI Safety Summits. The first follow-on summit will be held in mid-2024, with the Republic of Korea serving as host.

One underlying goal, for the organisers of the next summit, should be to reinforce the world’s commitment to an ambitious summit series. Even participants who are committed to further summits will need to decide, in future years, just how ambitious or pro forma their participation will be. The next summit is an opportunity to ensure ambitious participation, by demonstrating the promise of the series.

To this end, three concrete goals for the Korean summit could be to:

Create shared clarity on the need for progress in AI safety, for instance by:

  • Disseminating preliminary findings from the International Scientific Report on Advanced AI Safety on emerging risks (previously called the “State of the Science” report)
  • Holding discussions on the limits of current technologies, practices, and governance regimes for managing these risks (i.e. on “preparedness gaps”)

Showcase progress since the last summit, for instance by:

  • Inviting research institutes (e.g. the UK AI Safety Institute) to share new findings
  • Inviting AI companies to share updated reports on their safety practices and safety commitments
  • Inviting governments to give progress reports on their domestic safety initiatives

Begin to build a long-term vision and roadmap for the series, for instance by:

  • Producing a more detailed statement on the objectives of the series
  • Identifying forward processes to make progress on these objectives
  • Having companies commit to sharing updated AI safety policies at future summits and commit to adhering to their policies
  • Launching a working group to investigate the need for new international institutions

Accomplishing these goals would be challenging, but would make the promise of the series clearer and lay a strong foundation for future summits. Participants would leave with a firmer sense of the problems the series is addressing, greater confidence that progress on these problems is possible, and a vision for how future summits will ultimately bring the world closer to shared solutions.

Ensuring an ambitious summit series

The first AI Safety Summit generated significant momentum. Remarkable outcomes included twenty-nine governments — including the US, EU, and China — signing a joint declaration on risks and opportunities from AI, seven leading AI companies releasing public reports on their own safety practices, and the UK government launching an AI Safety Institute.

However, there is no guarantee that this momentum will be sustained. It is common for international forums to become less productive over time. If participants lose clarity on a forum’s purpose or lose faith that the forum can achieve its purpose, then their participation will tend to become increasingly pro forma.

When future AI Safety Summits are held, both governments and companies will need to decide how ambitiously to participate. For example, governments will need to decide whether to send their most senior representatives, whether to dedicate substantial energy to pre-summit negotiations, and whether to sign onto bold shared commitments. If they do not participate ambitiously enough, the series will not achieve its potential.

Since participants notice and respond to each other’s participation levels, rising passivity would be self-reinforcing. Fortunately, in the same way, rising ambition would be self-reinforcing too.

The next AI Safety Summit is an important opportunity not only to sustain, but to reinforce the momentum generated by the first summit.

Creating shared clarity on the need for progress in AI safety

A valuable goal for the next summit would be to create shared clarity on the need for progress in AI safety. Although the first summit has helped to build this clarity, the subject is still new for most participants and substantial ambiguity remains. Greater clarity would allow participants to more fully appreciate the importance of the summit series. It would also facilitate more productive discussions. 

More specifically, the summit could aim to create shared clarity about both (a) the risks AI might pose and (b) the ways in which the world is and is not prepared to handle these risks. Understanding these factors together makes it possible to see how much progress urgently needs to be made.

To increase clarity about risks, the summit organisers can:

  • Disseminate preliminary findings from the International Scientific Report on Advanced AI Safety: A major outcome of the first summit was a commitment to produce a “State of the Science” report (later renamed to the “International Scientific Report on Advanced AI Safety”), which will serve as the first authoritative summary of evidence regarding risks from frontier AI. The Korean summit is a natural opportunity to disseminate, explain, and discuss preliminary findings from the report. Research institutes such as the UK AI Safety Institute can also present new scientific results that go beyond those included in the report.

To increase clarity about preparedness, the summit organisers can:

  • Hold discussions on “preparedness gaps”: Currently, there are many ways in which the world is not prepared to manage risks from increasingly advanced frontier AI. For instance, there are deficiencies in current misuse prevention techniques, risk assessment procedures, auditing and certification services, safety standards, and regulatory regimes. The summit could host discussions on these preparedness gaps to raise awareness and understanding.
  • Produce or commission reports on “preparedness gaps”: The summit organisers could also produce or commission short reports on these gaps ahead of the summit, to inform discussions. Alternatively, they could commission a report assessing global preparedness for risks from advanced AI, as a complement to the report outlining the current state of the evidence on such risk. This would mirror the International Panel on Climate Change’s focus not just on the likely impacts of climate change, but also potential mitigation and adaptation measures.

Showcasing progress since the last summit

If the next summit can showcase progress since the first AI Safety Summit — held only around six months prior — then it can reinforce the ambition of participants. Meaningful progress would allow participants to see that, with enough effort, core AI safety challenges are tractable.

To this end, the summit organisers can:

  • Invite research institutes to share research progress: This may involve presentations or short reports summarising recent progress in areas such as model evaluations and interpretability. Researchers at the UK AI Safety Institute or the US AI Safety Institute could be especially well-placed to prepare overviews. The US AI Safety Institute might also have made significant progress by this point.
  • Invite companies to share updated reports on their safety practices and safety commitments: The first summit successfully convinced several frontier AI companies to prepare reports on their efforts to adopt responsible development and release practices. These responses often alluded to ongoing initiatives to refine and improve current practices. The next summit is an opportunity for companies to produce updated reports, which include further information and highlight progress they have made over the previous half-year. The updated reports could also potentially introduce explicit commitments to following certain especially important practices, analogous to the White House commitments companies have previously made. More AI companies, particularly companies outside the US and UK, could also be encouraged to produce these reports. Relatedly, the summit organisers could produce an updated version of the best practices report “Emerging Processes for Frontier AI Safety” that was prepared for the first summit.
  • Invite governments to give progress reports on their safety initiatives: In the six months since the last summit, a number of governments are likely to have made meaningful progress in developing their own safety initiatives. For example, the UK will have more fully established the Safety Institute it announced at the previous summit. US agencies will have begun to implement the executive order on AI that was announced around the summit. The EU will be able to provide an update on the finalized EU AI Act and its AI Office. Multiple other countries, including Japan, may also have launched safety institutes by this time. The summit would be a natural venue for sharing progress updates on these kinds of initiatives. It could also be a natural venue for announcing new initiatives.

Beginning to build a long-term vision and roadmap for the series

The first summit produced a consensus statement — the Bletchley Declaration — that suggested high-level goals for future summits. The statement notes that the “agenda for our cooperation” will focus on “identifying AI safety risks of shared concern” and “building respective risk-based policies across our countries to ensure safety.”

However, the statement only briefly elaborates on these goals. It also does not say much about how future summits will work toward achieving them.

It could be valuable, then, to develop a fuller vision of what the series aims to achieve, along with a roadmap that describes how the series can fulfill this vision. This kind of vision and roadmap could increase the ambition of participants, by helping them understand what their continued participation can produce. A vision and roadmap would also support more effective summit planning in future years.

To take steps in this direction, the summit organisers could:

  • Propose more detailed statements on series goals, to include in the next communique: For example, one more specific goal could be to support the creation of international standards regarding the responsible development and release of frontier AI systems.
  • Begin to identify forward processes: Most ambitious summit goals will take many years of work to achieve. To ensure they are achieved, then, it will often make sense to set up processes that are designed to drive progress forward across multiple summits. For example, suppose that one goal is to support the creation of responsible development and release standards. This goal could be furthered by the annual solicitation of safety practice reports by frontier AI companies, perhaps combined with annual commentaries on these reports by independent experts.
  • Have companies commit to continuing to develop, share, and adhere to AI safety policies: One possible “forward process,” as alluded to above, could involve companies continuing to iterate on the safety policies they shared at the first summit. To this end, the summit organisers could encourage companies to explicitly commit to iterating on their policies and sharing updated versions at each summit. Companies could also be encouraged to explicitly commit to adhering to their safety policies. Summit organisers could partly steer the iterative process by identifying questions that they believe it would be especially valuable for safety policies to address. Having companies commit to capability-based thresholds or safety policies seems particularly promising (similar to Anthropic’s Responsible Scaling Policy and OpenAI’s Beta Preparedness Plan).
  • Commit to continuing to update the International Scientific Report on Advanced AI Safety: Another specific “forward process” could be a plan to produce recurring updates to the report, for instance on a once-a-year cadence. A continually evolving International Scientific Report on Advanced AI Safety could serve as a cornerstone of future summits and, more generally, global efforts to understand risks from frontier AI systems.
  • Launch a new working group to explore the need for new international institutions: In general, the summit could contribute to the development of a roadmap by launching working groups that investigate relevant questions. One especially key question is whether any new international institutions might ultimately be needed to address emerging risks from frontier AI. There has been some early academic work on this question. The UN’s AI Advisory Body recently put out an interim report that clarifies functions that international institutions (whether they are new or existing) will need to fulfil. However, there is currently nothing approaching an authoritative investigation of whether the successful governance of frontier AI systems will require creating new international institutions. If a credible working group does conclude that some new institutions may be needed (e.g. an “IPCC for AI”), then this would naturally inform the summit roadmap.

Conclusion

The AI Safety Summit series will likely run for many years. It could produce tremendous value. However, like many international forums, it could also sputter out.

The Korean AI Safety Summit — the first follow-up to the initial summit — is a key opportunity to ensure that the series lives up to its potential. It can reinforce the momentum produced by the initial summit and convince participants that they should approach the series ambitiously.

This underlying goal can be furthered by achieving three more concrete goals. In particular, it would be valuable for the next summit to: (1) create shared clarity on the need for progress in AI safety, (2) showcase progress since the last summit, and (3) begin to build a long-term vision and roadmap for the series.



Source link

19May

Proposing a Foundation Model Information-Sharing Regime for the UK


This post outlines a concrete proposal for a pilot information-sharing regime, to give the UK government foresight into emerging risks and opportunities from AI.

GovAI research blog posts represent the views of their authors, rather than the views of the organisation.

Summary

The United Kingdom aims to become a global AI superpower, while also safely navigating the risks AI poses. However, there is an important gap in the government’s ability to achieve these goals: it cannot see advances in AI capabilities coming. As a result, it cannot easily prepare for new opportunities and risks.

This week, in recognition of this problem, the UK government secured pledges from three leading AI labs to provide “early access” to their models. However, this pledge has yet to be operationalised. It also is not clear that early access to new AI systems will, on its own, grant the government sufficiently thorough and longrun foresight.

To more systematically address the “foresight gap,” we propose that the UK government create an information-sharing regime between AI labs and the Office for Artificial Intelligence. This post outlines a concrete pilot proposal, based on interviews with think tank researchers, UK government staff working on AI policy issues, and staff at leading AI labs.1

We recommend that the Office for Artificial Intelligence set up a voluntary information-sharing pilot program with frontier AI labs, centered on model capability evaluations and compute usage. Such information-sharing would only be encouraged for a limited subset of new foundation models, which are especially compute-intensive or have especially general capabilities. This information could be shared both before and throughout model training and deployment processes. Shortly before release, in line with the recently announced commitments, labs could also grant the Office direct access to their models.

Subject-matter experts employed by the Office – or seconded from partner organisations – could analyse this information for its implications about current risks and forthcoming developments. Then, these experts could communicate actionable policy recommendations to relevant stakeholders throughout the UK government. For instance, recommendations could inform government efforts to develop technical standards and regulatory frameworks for foundation model development, or help the government to plan for national security incidents related to AI safety failures and misuse.

Foundation Models: Risks and Opportunities

Foundation models are AI systems that can be applied or adapted to perform a wide range of tasks. As is noted in the recent UK whitepaper on AI regulation, the general-purpose capabilities of foundation models could have a transformative impact on society and should be a primary focus of government monitoring and horizon-scanning efforts for AI.

The degree of risk posed by current foundation models is contentious. However, continued progress could produce models that pose considerable risks across political, cyber, biotech, and critical infrastructure domains.2 Some experts believe that increasingly dangerous capabilities could emerge rapidly. At the same time, these models promise to enable significant efficiencies in areas such as healthcare and education. 

The Need for a New Information-Sharing Regime

The capabilities of new foundation models are growing more quickly than the government can design and implement AI policy. The natural result of this gap is a hasty and panicked approach to policymaking, as was evident in the wake of the ChatGPT launch.3

AI policymaking will have reduced effectiveness if it continues to be driven reactively after the launches of increasingly powerful AI systems. Policymakers need to be able to foresee potential harms from forthcoming systems, in order to prepare and take proactive policy actions. It is also important for policymakers to understand how development and risk mitigation practices are evolving within labs.

Unfortunately, public information about foundation model development is actually becoming scarcer. Up until recently, AI labs disclosed compute requirements and model design details alongside new large model releases. With the release of GPT-4, OpenAI declined to share compute and model details due to competition and safety concerns. Secrecy around development practices may continue to increase as AI capabilities advance. At the same time, as labs adopt new pre-release risk management practices, the lag between new capabilities being developed and being announced to the world may grow as well.

However, in recognition of these issues, lab leaders have stated a willingness to share sensitive details about their work with policymakers so that the public interest can be protected.4 The UK government should take lab leaders up on their offer and work to build an information-sharing regime.

Proposing a Pilot Information-Sharing Program

We propose a voluntary pilot that centers on two types of information: model capability evaluations, which are helpful for assessing the potential impacts of new models,5 and compute requirements, which provide insight into the trajectory of artificial intelligence development.6 Model capability evaluations and compute requirements would each be voluntarily reported to the Office for Artificial Intelligence by frontier labs both ahead of and during deployment for a limited subset of new foundation models that surpass some level of compute usage or general-purpose capability. Using this information, the Office for Artificial Intelligence can monitor progress in capabilities and communicate suggested policy actions to relevant bodies throughout the government, to reduce risks and ensure the impact of new foundation models is beneficial.

Model capability and compute requirement information could be provided in the form of a training run report, which labs share at a few different milestones: before they begin the training process for new foundation models, periodically throughout the training process, and before and following model deployment.7

Training Run Reports

Training run reports7 could contain the following elements:

Capability evaluations: Knowledge of new foundation model capabilities will allow policymakers to identify likely risks and opportunities that deployment will create and to prepare for these proactively. Industry labs are already generating capability information by using benchmarks to quantify the model’s performance in its intended use cases. Labs are also performing both quantitative and qualitative evaluations which seek to identify harmful tendencies (such as bias or deceptiveness) and unintended dangerous capabilities (such as the ability to support cyberattacks). Details provided to the Office for Artificial Intelligence team could include:

  • Descriptions of the model’s intended use cases, as well as performance benchmarks for tasks it is expected to perform.
  • A full accounting of the evaluations run on the model at pre-training and pre-deployment stages and their results, including evaluations intended to identify harmful tendencies and high-risk dangerous capabilities (e.g. cyber-offense, self-proliferation, and weapon design).8
  • Details about the size and scope of the evaluation effort, including the parties involved and the safeguards in place during evaluations.

Compute requirements: Knowledge of the compute used to train foundation models will provide policymakers with the context necessary to forecast future model capabilities and to monitor how underlying technical factors that influence model research and development, such as compute efficiency, are progressing. Information about compute usage is also helpful for assessing the climate impacts of AI.9 Ultimately, compute trends will allow policymakers to estimate the pace at which they should implement policies to prepare for forthcoming risks and opportunities. Industry labs already possess compute infrastructure information, since they must plan ahead and budget for compute requirements when designing a new foundation model. Details provided to the Office for Artificial Intelligence team could include:

  • The amount of compute used (FLOP/OPs) and training time required (e.g. number of GPU hours).
  • The quantity and variety (e.g. Nvidia A100) of chips used and a description of the networking of the compute infrastructure.
  • The physical location and provider of the compute.

Increasingly complete versions of the training report could be shared across the different stages of the development pipeline: before model training, during the training process, before model deployment, and after model deployment. If possible, in order to give sufficient foresight, it would be desirable for at least one month before training and one month before deployment.

In the pre-deployment stage, as a complement to the training report, companies may also be encouraged to offer early access to their fully trained models. Direct model access is a not a substitute for training reports, as training reports: (a) are likely to contain results from some evaluations that the government does not currently have the capacity or expertise to run, (b) contain some information that one cannot learn simply by using a model, and (c) allow the government to learn certain valuable information before a model is fully trained. However, the government may also be able to learn some additional useful information by interacting directly with the model.

Team Structure and Implementation

This information-sharing program requires the capacity to effectively engage with AI lab staff, perform analysis, and disseminate findings across relevant policy bodies. To handle these tasks, the program could consist of a policy manager and two subject-matter experts. These subject-matter experts could potentially be identified and hired using a secondment program such as the DSIT Expert Exchange or also brought on through existing partnerships with publicly-funded technical research groups such as the CDEI. As an expert organisation working on relevant research, the Centre for the Governance of AI would also be willing to participate in or help facilitate secondment efforts for this program.

We suggest that a natural home for the project would be the Office for Artificial Intelligence. Alternatively, however, the project could be structured as an initiative within the Foundation Models Taskforce or as a more independent body akin to the Cyber Security Information Sharing Partnership within the National Cyber Security Centre

In any case, there should also be a role with this team’s processes for third-party and academic organisations to participate in a consultation period for its resulting policy recommendations.10 This will help to ensure a diverse and representative set of viewpoints is included in its final guidance to policymakers.

This program should include robust information security measures to reduce the risks of sensitive AI development details leaking and ending up in the hands of other labs (who might gain a competitive advantage) or bad actors (who might find ways to exploit the information). Technical security measures should be in place for ensuring that unredacted training run reports are only accessible to a limited set of high-trust team members within the Office for Artificial Intelligence. Program administrators and lab partners could also collaboratively define an information-sharing protocol and require that the protocol be adhered to whenever new information is disclosed from one party to another.11 This helps ensure all program participants are aligned on how a specific piece of information can be used or whether it can be disseminated to other government stakeholders. Over time, such protocol rules could also be enforced through the technical architecture of a data portal and APIs by using structured transparency techniques.

Policy Outcomes from Information-Sharing

A voluntary information-sharing regime between the UK government and AI labs would build frontier AI expertise within the Office for Artificial Intelligence and create working relationships with UK-based industry labs. Key government recipients of policy recommendations could include: 

  • Expert bodies (including AI Standards Hub, CDEI, and BSI) that can use recommendations as inputs for developing technical standards, for instance compute and capability benchmarks for legal categorization of foundation models.
  • Regulators (including agency members of the DRCF and the EHRC) that can use recommendations as input for creating guidelines for model transparency and developing industry best-practice guidelines to prevent large-scale risks.
  • Security and scientific advisory bodies (including NSC, CST, and SAGE) that can use recommendations as inputs for national security planning to reduce and prepare for risks from AI misuse and AI safety failures.

Over time, we imagine that the information-sharing program could scale to further information types and policy recommendation areas. Additionally, it could begin interfacing with a wider set of international AI labs and coordinating with relevant foreign government and multilateral offices focused on AI monitoring and evaluation (such as the NIST AI Measurement and Evaluation program in the United States or the Security and Technology Programme at the United Nations). Eventually, the program’s findings could help to inform an overall policy regime that steers AI development in a direction that is beneficial for the United Kingdom’s future. 

Challenges and Concerns

Importantly, implementing an information-sharing program would not be without risks or costs. One challenge, as noted above, would be to ensure that sensitive information reported by labs does not leak to competitors or bad actors. We believe that a small government unit with strong information-security requirements would be unlikely to produce such leaks. Nonetheless, in order to secure lab buy-in, it would be important for this risk to be taken quite seriously.

Second, it would be important to ensure that the program does not disincentivize labs from performing evaluations that might reveal risks. To reduce the chance of any disincentive effect, it may be important to provide specific assurances about how information can be used. Assurances concerning liability protection could also conceivably be worthwhile.12

The authors of this piece can be contacted at ni****************@wh*****.edu and je**@lo****************.org .





Source link

19May

Computing Power and the Governance of AI


This post summarises a new report, “Computing Power and the Governance of Artificial Intelligence.” The full report is a collaboration between nineteen researchers from academia, civil society, and industry. It can be read here.

GovAI research blog posts represent the views of their authors, rather than the views of the organisation.

Summary

Computing power – compute for short – is a key driver of AI progress. Over the past thirteen years, the amount of compute used to train leading AI systems has increased by a factor of 350 million. This has enabled the major AI advances that have recently gained global attention.

Governments have taken notice. They are increasingly engaged in compute governance: using compute as a lever to pursue AI policy goals, such as limiting misuse risks, supporting domestic industries, or engaging in geopolitical competition. 

There are at least three ways compute can be used to govern AI. Governments can: 

  • Track or monitor compute to gain visibility into AI development and use
  • Subsidise or limit access to compute to shape the allocation of resources across AI projects
  • Monitor activity, limit access, or build “guardrails” into hardware to enforce rules

Compute governance is a particularly important approach to AI governance because it is feasible. Compute is detectable: training advanced AI systems requires tens of thousands of highly advanced AI chips, which cannot be acquired or used inconspicuously. It is excludable: AI chips, being physical goods, can be given to or taken away from specific actors and in cases of specific uses. And it is quantifiable: chips, their features, and their usage can be measured. Compute’s detectability and excludability are further enhanced by the highly concentrated structure of the AI supply chain: very few companies are capable of producing the tools needed to design advanced chips, the machines needed to make them, or the data centers that house them. 

However, just because compute can be used as a tool to govern AI doesn’t mean that it should be used in all cases. Compute governance is a double-edged sword, with both potential benefits and the risk of negative consequences: it can support widely shared goals like safety, but it can also be used to infringe on civil liberties, perpetuate existing power structures, and entrench authoritarian regimes. Indeed, some things are better ungoverned. 

In our paper we argue that compute is a particularly promising node for AI governance. We also highlight the risks of compute governance and offer suggestions for how to mitigate them. This post summarises our findings and key takeaways, while also offering some of our own commentary.

A note on authorship: The paper includes 19 co-authors, several of whom work at OpenAI, a company developing state-of-the-art foundation models. This naturally raises concerns that the content of the paper will be biased toward the interests of AI developers. This suspicion is healthy. Further, many of the mechanisms explored in the paper have only seen cursory investigation and given the size of the author group, authorship does not imply endorsement of all the paper’s statements from any author or their respective organisation. We hope that readers will be motivated to closely engage with the paper’s arguments, take little for granted, publicly raise disagreements, and offer alternative ideas. We intend for this paper to provide a basis for continued research and thoughtful examination of the role of compute in AI governance.

Compute plays a crucial role in AI

Much AI progress over the past decade has resulted from significant increases in the amount of computing power (“compute”) used to train and run AI models. Across large-language models, Go, protein folding, and autonomous vehicles, the greatest breakthroughs have involved developers successfully leveraging huge amounts of computing power to train models on vast datasets to independently learn how to solve a problem, rather than hard-coding such knowledge. In many AI domains, researchers have found scaling laws: performance on the training objective (e.g. “predict the next word”) predictably increases as the amount of compute – typically measured in the number of operations (e.g. FLOP) involved – used to train a model increases.

Hardware improvements and massively increased investments have resulted in the amount of compute used to train notable AI systems increasing by a factor of 350 million in the past thirteen years. Currently the compute used to train notable AI systems doubles every six months. In the last year alone, Nvidia’s data center revenue nearly quadrupled.

Figure 1: The amount of compute used to train notable AI models has doubled roughly every six months since 2010. Note the logarithmic y-axis. Data from Epoch.

Compute governance is feasible

Compute is easier to govern than other inputs to AI. As such, compute can be used as a tool for AI governance. 

Figure 2: Summary of the properties that make compute governable.

Four features contribute to compute’s governability:

  • Detectability: Large-scale AI development is highly resource intensive and therefore detectable, often requiring thousands of specialised chips concentrated in data centers consuming large amounts of power.
  • Excludability: The physical nature of hardware makes it possible to exclude users from accessing AI chips. In contrast, restricting access to data, algorithms, or models is much more difficult.
  • Quantifiability: Compute can be easily measured – e.g. in terms of the operations per second a chip is capable of performing or its communication bandwidth with other chips – making reporting and verification easier.
  • Concentrated supply chain: AI chips are produced via a highly inelastic and complex supply chain. Several key steps, including chip design, manufacturing of fabrication equipment, and chip fabrication, are dominated by a small number of actors – often just one.
Figure 3: An illustration of the compute supply chain. 
Figure 4: The supply chain for advanced AI chips is unusually concentrated, especially at the design, semiconductor manufacturing equipment, and fabrication steps of the supply chain, but also significantly concentrated at the compute provision layer. 

Compute can be used to achieve many different governance goals

The importance of compute to AI capabilities and the feasibility of governing it make it a key intervention point for AI governance efforts. In particular, compute governance can support three kinds of AI governance goals: it can help increase visibility into AI development and deployment, allocate AI inputs towards more desirable purposes, and enforce rules around AI development and deployment. 

Figure 5: Ways in which interventions on compute can be used for AI governance. The boxes include examples explored, though not necessarily endorsed, in the paper.

Visibility is the ability to understand which actors use, develop, and deploy compute-intensive AI, and how they do so. The detectability of compute allows for better visibility in several ways. For example, cloud compute providers could be required to monitor large-scale compute usage. By applying processes such as know-your-customer requirements to the cloud computing industry, governments could better identify potentially problematic or sudden advances in AI capabilities. This would, in turn, allow for faster regulatory response.

Visibility also raises important privacy concerns. Fortunately, some methods may offer noninvasive insights into compute usage. Data center operators have minimal access to information about their customers’ compute usage, such as the number and types of chips used, when those chips are used, and how much internet traffic is processed through the relevant computing cluster. However, that information can be used to glean certain insights. For example, the computational signatures of training and running inference on AI systems tend to differ. Clusters used for inference require constant internet traffic to serve customers, whereas clusters used for training typically access training data hosted locally. 

Allocation is the ability to shift AI capabilities among different actors and projects to achieve some end. Once again, features of compute such as excludability and quantifiability offer promising opportunities to govern AI through allocation.

For example, policymakers may seek to differentially advance beneficial AI development by increasing the resources available to certain kinds of beneficial AI research and development. This could include subsidising research into AI applications for climate change mitigation, agricultural efficiency, clean energy, public health, education, or even defence against AI misuse. Compute can also be allocated to actors lacking compute resources, such as academics, startups, or low and middle-income countries.

Figure 6: Prioritising development of safety, defensive, and substitute technologies can reduce negative societal impacts from other technologies (often referred to as “differential technological development”). Adapted from Sandbrink et al. (2022).

Perhaps compute could also be allocated to adjust the pace of AI progress. A large reserve of compute could be procured by a government or an alliance of governments. The reserve could be used to modulate the amount of compute in the economy, influencing the overall pace of AI progress.

Finally, enforcement is the ability to respond to violations of norms and laws related to AI, such as reckless development and deployment or certain deliberate misuse. 

One enforcement mechanism discussed in the paper is physically limiting chip-to-chip networking to make it harder to train and deploy large AI systems. For example, the US government’s export controls on high-end AI-relevant chip sales to China aim to hamper Chinese actors’ ability to develop frontier AI models, where (tens of) thousands of chips are orchestrated for one training run. That goal could be met in a more targeted way by exporting chips that can only have high-bandwidth connections with a sufficiently small number of other chips. Such chips do not exist today, but could potentially be developed.  

A more speculative enforcement mechanism would be preventing risky training runs via multiparty controls. Certain decisions about which AI systems to develop or deploy may be too high-stakes to leave to a single actor or individual. Instead, such decisions could be made jointly by a number of actors or a governing third party. Multisignature cryptographic protocols could be used to share control of a metaphorical “start switch” between many actors. 

The power to decide how large amounts of compute are used could be allocated via digital “votes” and “vetoes,” with the aim of ensuring that the most risky training runs and inference jobs are subject to increased scrutiny. While this may appear drastic relative to the current state of largely unregulated AI research, there is precedent in the case of other high-risk technologies: nuclear weapons use similar mechanisms, called permissive action links

Compute governance can be ineffective

Although compute governance can be an effective regulatory tool, it may not always be the right one to use. It is one option among many for policymakers. 

For example, compute governance may become less effective as algorithms and hardware improve. Scientific progress continually decreases the amount of computing power needed to reach any level of AI capability, as well as the cost to perform any number of computations. As the power and cost necessary to achieve any given AI capability decreases, these metrics will become less detectable and excludable. 

The extent to which this effect undermines compute governance largely depends on the importance of relative versus absolute capabilities. Increases in compute efficiency make it easier and cheaper to access a certain level of capability, but as long as scaling continues to pay dividends, the highest-capability models are likely to be developed by a small number of actors, whose behavior can be governed via compute.

On a related note, compute may be an inappropriate tool to govern low-compute specialised models with dangerous capabilities. For example, AlphaFold 2 achieved superhuman performance on protein folding prediction using fewer than 1023 operations – two orders of magnitude less compute than models like GPT-4. Compute governance measures to limit the development models risk also limiting the development of similarly-sized, but harmless, models. In other words, compute governance measures seem most appropriate for risks originating from a small number of compute-intensive models.

Compute governance can be harmful

Perhaps more importantly, compute governance can also cause harm. Intrusive compute governance measures risk infringing on civil liberties, propping up the powerful, and entrenching authoritarian regimes. Indeed, some things are better ungoverned. 

Certain compute governance efforts, especially those aimed at increasing visibility into AI, may increase the chance that private or sensitive personal or commercial information is leaked. AI companies, users of AI systems, and compute providers all go to great lengths to preserve the integrity of their and their customers’ data. Giving more actors access to such information raises the chance of data leakage and privacy infringement. 

Large concentrations of compute are also an increasingly crucial economic and political resources. Centralising the control of this resource could pose significant risks of abuse of power by regulators, governments, and companies. Companies might engage in attempts at regulatory capture, and government officials could see increased opportunities for corrupt or power-seeking behaviour. 

Compute governance should be implemented with guardrails

Fortunately, there are a number of ways to increase the chance that compute governance remains effective while reducing unintended harm. Compute governance is one tool among many available to policymakers and it should be wielded carefully and deliberately.

Exclude small-scale AI compute and non-AI compute from governance regimes. Many of the above concerns can be addressed by applying compute governance measures in a more targeted manner; for example, by focusing on the large-scale computing resources needed to develop and deploy frontier AI systems. 

Implement privacy-preserving practices and technologies. Where compute governance touches large-scale computing that contains personal information, care must be taken to minimise privacy intrusions. Take, for example, know your customer (KYC) regimes for cloud AI training: applying them only to direct purchasers of large amounts of cloud AI compute capacity would impose almost no privacy burdens on consumers. KYC could also feasibly draw on indicators that are already available – such as chip hours, types of chips, and how GPUs are networked – preserving existing privacy controls for compute providers and consumers.

Focus compute-based controls where ex ante measures are justified. Compute governance (especially in its “allocation” and “enforcement” forms) is often a blunt tool, and generally functions upstream of the risks it aims to mitigate and the benefits it seeks to promote. Regulatory and governance efforts typically focus on ex post mechanisms, imposing penalties after some undesirable behaviour has occured. Such measures are likely appropriate in dealing with many governance issues arising from AI, especially stemming from inappropriate use of AI systems.

However, some harms from AI may justify ex ante intervention. For example, where the harm is so large that no actor would be able to compensate for it after the fact, such as catastrophic or national security risks, ex ante measures would be appropriate.

Periodically revisit controlled compute related policies. Regulatory thresholds – such as a training compute threshold of 1026 operations – or list-based controls on technologies – such as those used in export controls – can become outdated fairly quickly as technology evolves. Compute governance efforts should therefore have built-in mechanisms for reviews and updates. 

Ensure substantive and procedural safeguards. Like many government interventions, compute governance efforts could be abused. Measures to control compute should therefore include substantive and procedural safeguards. Substantively, such controls could prevent downsides of compute governance by, for example, limiting the types of controls that can be implemented, the type of information that regulators can request, and the entities subject to such regulations. Procedural safeguards could include such measures as notice and comment rule making, whistleblower protections, internal inspectors general and advocates for consumers within the regulator, opportunities for judicial review, advisory boards, and public reports on activities.

Conclusion

Compute governance efforts are already underway and compute will likely continue to play a central role in the AI ecosystem, making it an increasingly important node for AI governance. Compute governance can support AI policy goals in multiple ways: by increasing visibility into AI progress, shaping the allocation of AI inputs, and enforcing rules regarding AI development and deployment. At the same time, designing effective compute governance measures is a challenging task. The risks of privacy violations and power concentration must be managed carefully. We hope this paper will help policymakers and scholars scrutinise, rethink, and ultimately improve compute governance proposals.

* – Corresponding author: Lennart Heim

le**********@go********.ai











>



Source link

19May

What Increasing Compute Efficiency Means for the Proliferation of Dangerous Capabilities


This blog post summarises the recent working paper “Increased Compute Efficiency and the Diffusion of AI Capabilities” by Konstantin Pilz, Lennart Heim, and Nicholas Brown.

GovAI research blog posts represent the views of their authors rather than the views of the organisation.


Introduction 

The compute needed to train an AI model to a certain performance gets cheaper over time. In 2017, training an image classifier to 93% accuracy on ImageNet cost over $1,000. In 2021, it cost only $5 — a reduction of over 99%. We describe this decline in cost — driven by both hardware and software improvements — as an improvement in compute efficiency.

One implication of these falling costs is that AI capabilities tend to diffuse over time, even if leaders in AI choose not to share their models. Once a large compute investor develops a new AI capability, there will usually be only a short window — a critical period before many lower-resource groups can reproduce the same capability.

However, this does not imply that large compute investors will have their leads erode. Compute efficiency improvements also allow them to develop new capabilities more quickly than they otherwise would. Therefore, they may push the frontier forward more quickly than low-resource groups can catch up.

Governments will need to account for these implications of falling costs. First, since falling costs will tend to drive diffusion, governments will need to prepare for a world where dangerous AI capabilities are widely available — for instance, by developing defenses against harmful AI models. In some cases, it may also be rational for governments to try to “buy time,” including by limiting irresponsible actors’ access to compute.

Second, since leading companies will still tend to develop new capabilities first, governments will still need to apply particularly strong oversight to leading companies. It will be particularly important that these companies share information about their AI models, evaluate their models for emerging risks, adopt good information security practices, and — in general — make responsible development and release decisions.

The causes of falling costs 

Falling training costs stem from improvements in two key areas:

  1. Advances in hardware price performance as predicted by Moore’s Law — increase the number of computational operations that a dollar can buy. Between 2006 and 2021, the price performance of AI hardware doubled approximately every two years.
  2. Advances in algorithmic efficiency decrease the number of computational operations needed to train an AI model to a given level of performance. For example, between 2012 and 2022, advances in image recognition algorithms halved the compute required to achieve 93% classification accuracy on the ImageNet dataset every nine months.

To capture the combined impact of these factors, we introduce the concept of compute investment efficiency — abbreviated to compute efficiency — which describes how efficiently investments in training compute can be converted into AI capabilities. Compute efficiency determines the AI model performance1 available with a given level of training compute investment, provided the actor also has sufficient training data (see Figure 1).

Figure 1: Compute (investment) efficiency is the relationship between training compute investment and AI model performance.

Access and performance effects

Based on our model, we observe that increasing compute efficiency has two main effects:2

  • An access effect: Over time, access to a given level of performance requires less compute investment (see Figure 2, red).
  • A performance effect: Over time, a given level of compute investment enables increased performance (see Figure 2, blue).
Figure 2: Compute efficiency improves between time t = 0 and t = 1, causing an access effect (red) and a performance effect (blue).³

If actors experience the same compute efficiency improvements, then these effects have the following consequences:4

Capabilities diffuse over time. Due to the access effect, the investment required to reach a given performance level decreases over time, giving an increased number of actors the ability to reproduce capabilities previously restricted to large compute investors.

Large compute investors remain at the frontier. Since large compute investors achieve the highest performance levels, they are still the first to discover new model capabilities5 that allow novel use cases. Absent a ceiling on absolute performance, those actors also will continue to demonstrate the highest level of performance in existing capabilities.

The emergence and proliferation of dangerous capabilities

Future AI models could eventually show new dangerous capabilities, such as exploiting cybersecurity vulnerabilities, aiding bioweapon development, or evading human control. We now explore the discovery and proliferation of dangerous capabilities as compute efficiency increases. 

Figure 3: Illustration of the emergence and proliferation of dangerous capabilities across three actors. The large compute investor first achieves dangerous capability x at time t = 1. When the secondary actor (such as a small company) reaches dangerous capability X at t = 2, the large compute investor has already achieved the even more dangerous capability Y.

Dangerous capabilities first appear in models trained by large compute investors. Since dangerous capabilities require high levels of performance, large compute investors likely encounter them first.

These dangerous capabilities then proliferate over time, even if large compute investors limit access to their models. As compute efficiency improves, more actors can train models with dangerous capabilities. The dangerous capabilities can therefore proliferate even when large compute providers provide only limited or structured access to their models. This proliferation increases the chance of misuse and accidents.

Defensive tools based on leading models could potentially increase resilience against these dangerous capabilities. To counteract harm caused by weaker models, large compute investors may be able to use their more advanced models to create defensive tools.7 For example, cybersecurity tools powered by advanced models could find vulnerabilities before weaker models can exploit them. However, some domains, such as biotechnology, may greatly favor the offense, making it difficult to defend against dangerous deployments even with superior models.

Governance implications

Oversight of large compute investors can help to address the most severe risks, at least for a time. If the most severe risks from AI development originate from the most capable models and their large-scale deployment, then regulating large-scale compute users can — at least for a time — address the most severe risks. For instance, governments can require developers of large-scale models to perform dangerous capability evaluations and risk assessments, report concerning results, and use the results to make responsible release decisions.  Governments can also encourage or require developers to implement good information security practices to prevent their models from leaking or being stolen. Furthermore, governments can develop the capability to quickly detect and intervene when models created by these developers cause harm.

Large compute investors should warn governments and help them prepare for the proliferation of advanced capabilities. The effectiveness of societal measures to mitigate harm from proliferation hinges on the time that passes between large compute investors discovering harmful capabilities and their proliferation to malicious or irresponsible actors. To effectively use this critical period, governments can implement information-sharing frameworks with large compute investors and thoroughly evaluate the risks posed by capability proliferation. Additionally, leaders can invest in and provide defensive solutions before offensive capabilities proliferate.

Governments should respond early to offense-dominant capabilities. In the future, AI models of a given performance could develop heavily offense-dominant capabilities (i.e., capabilities it is inherently difficult to defend against) or become inherently uncontrollable. Governments should closely monitor the emergence of such capabilities and preemptively develop mechanisms — including mechanisms for more tightly governing access to compute — that could substantially delay their proliferation if necessary. 

Summary

Compute efficiency describes how efficiently investments in training compute can be converted into AI capabilities. It has been rising quickly over time due to improvements in both hardware price performance and algorithmic efficiency.

Rising compute efficiency will tend to cause new AI capabilities to diffuse widely after a relatively short period of time. However, since large compute investors also benefit from rising compute efficiency, they may be able to maintain their performance leads by pushing forward the frontier. 

One governance implication is that large compute investors will remain an especially important target of oversight and regulation. At the same time, it will be necessary to prepare for — and likely, in some cases, work to delay — the widespread proliferation of dangerous capabilities. 

Appendix

Competition between developers: complicating the picture

Our analysis — based on a simple model — has shown that increases in compute efficiency do not necessarily alter the leads of large compute investors. However, some additional considerations complicate the picture.

We will start out by noting some considerations that suggest that large compute investors companies may actually achieve even greater leads in the future. We will then move to considerations that point in the opposite direction.8

Figure 4: Compute investment scaling increases the performance lead of large compute investors over time. The dashed arrows represent performance improvements attainable without investment scaling. 

Leaders can further their performance advantages through scaling investments and proprietary algorithmic advancements. Large compute investors have historically scaled their compute investment significantly faster than others, widening the investment gap to smaller actors. Additionally, the proprietary development of algorithmic and hardware enhancements might further widen this divide, consolidating leaders’ competitive advantage.

In zero-sum competition, small relative performance advantages may grant outsized benefits. If AI models directly compete, the developer of the leading model may reap disproportionate benefits even if their absolute performance advantage is small. Such disproportionate rewards occur in games such as chess but likely also apply to AI models used in trading, law, or entertainment.

Winner-takes-all effects may allow leaders to entrench their lead despite losing their performance advantage. By initially developing the best-performing models, large compute investors may accrue a number of advantages unrelated to performance, such as network effects and economies of scale that allow them to maintain a competitive advantage even if they approach a performance ceiling.

Performance ceilings dampen the performance effect, reducing leaders’ advantage. Many AI applications have a ceiling on technical performance or real-world usefulness. For instance, handwritten digit classifiers have achieved above 99% accuracy since the early 2000s, so further progress is insignificant. As leaders approach the ceiling, performance only marginally increases with improved compute efficiency, allowing smaller actors to catch up. 

Leaders can release their model parameters, allowing others to overcome compute investment barriers. Large compute investors can provide smaller actors access to their advanced models. While product integrations and structured access protocols allow for limited and fine-grained proliferation, releasing model parameters causes irreversible capability proliferation to a broad range of actors.

Ultimately — although increases in compute efficiency do not erode competitive advantages in any straightforward way — it is far from clear exactly how we should expect competition between developers to evolve.



Source link

Protected by Security by CleanTalk