19May

Research Fellows | GovAI Blog


GovAI was founded to help humanity navigate the transition to a world with advanced AI. Our first research agenda, published in 2018, helped define and shape the nascent field of AI governance. Our team and affiliate community possess expertise in a wide variety of domains, including AI regulation, responsible development practices, compute governance, AI-lab corporate governance, US-China relations, and AI progress forecasting.

GovAI researchers have advised decision makers in government, industry, and civil society. Most recently, our researchers have played a substantial role in informing the UK government’s approach to AI regulation. Our researchers have also published in top peer-reviewed journals and conferences, including International Organization, NeurIPS, and Science. Our alumni have gone on to roles in government; top AI labs, including DeepMind, OpenAI, and Anthropic; top think tanks, including the Centre for Security and Emerging Technology and RAND; and top universities, including the University of Oxford and the University of Cambridge.

GovAI also runs a range of programmes – including our Summer/Winter Fellowship Programme and our Research Scholar Programme – to support the career development of promising AI governance researchers. We are committed to both producing impactful research and strengthening the broader AI governance community.

Research Fellows will conduct research into open and important questions that bear on AI governance. We are interested in candidates from a range of disciplines who have a demonstrated ability to produce excellent research and who care deeply about the lasting impacts of AI, in line with our mission. The role would offer significant research freedom, access to a broad network of experts, and opportunities for collaboration.

Research Fellows are expected to set their own impact-focused research agenda, with guidance from other members of the team; they are also expected to offer supervision and mentorship to junior researchers, such as our Summer and Winter Fellows, and to participate in seminars. However, Research Fellows will dedicate the substantial majority of their time to research projects of their own choosing. They will be encouraged to collaborate and co-author with other members of our community but may also focus on solo projects if they choose. 

We are committed to supporting the work of Research Fellows by offering research freedom, expert guidance, funding for projects, productivity tools, limited obligations on one’s time, access to a broad network of experts and potential collaborators, and opportunities to communicate one’s research to policymakers and other audiences.

For promising researchers who lack sufficient experience, we may consider instead offering one-year visiting “Research Scholar” positions.

We are open to candidates with a wide range of research interests and intellectual backgrounds. We have previously hired or hosted researchers with backgrounds in computer science, public policy, political science, economics, history, philosophy, and law. 

You might be a particularly good fit if you have:‍

  • Demonstrated ability to produce excellent research
  • Deep interest in the lasting implications of artificial intelligence, in line with our organisation’s mission
  • Established expertise in a domain with significant AI governance relevance
  • Self-directedness and desire for impact
  • Commitment to intellectual honesty and rigour
  • Good judgement regarding the promise and importance of different research directions
  • Excellent communication and collaboration skills
  • Proactivity and commitment to professional growth
  • Strong interest in mentorship
  • Broad familiarity with the field of AI governance

There are no specific educational requirements for the role, although we expect that the most promising candidates will typically possess several years of relevant research or policy experience.

Contracts are full-time and have a fixed two-year term, with the possibility of renewal.

We prefer for Research Fellows to work primarily from our office in Oxford, UK. However, we also consider applications from strong candidates who are only able to work remotely. We are able to sponsor visas in the UK and the US.

Research Fellows will be compensated in line with our salary principles. Depending on their experience, we expect that successful candidates’ annual compensation will typically fall between £60,000 and £80,000 if based in Oxford, UK. In cases where a Research Fellow resides predominantly in a city with a higher cost of living, this salary will be adjusted to account for the difference. In exceptional cases, there may be some flexibility in compensation. 

Benefits associated with the role include a £5,000 annual wellbeing budget; a £1,500 annual commuting budget; a budget for any necessary purchases of books or work equipment; private health, dental, and vision insurance; a 10% employer pension contribution; and 25 days of paid vacation in addition to public holidays.

Please inquire through

co*****@go********.ai











if questions or concerns regarding compensation or benefits might affect your decision to apply.

The first stage of the process involves filling in an application form. (The front page of the form lists the required material, which includes a 2–5-page explanation of what you might work on as a Research Fellow.) The second round involves completing a paid remote work test. Candidates who pass through the second round should expect to participate in a set of interviews and may also be asked to produce additional written material. Please feel free to reach out to

co*****@go********.ai











if you would need a decision communicated by a particular date or if you have questions about the application process.

We are committed to fostering a culture of inclusion, and we encourage individuals with underrepresented perspectives and backgrounds to apply. We especially encourage applications from women, gender minorities, people of colour, and people from regions other than North America and Western Europe who are excited about contributing to our mission. We are an equal opportunity employer.

We would also like to highlight that we are inviting applications to Research Scholar positions (general track or policy track) right now. These are one-year visiting positions intended to support the career development of researchers who hope to positively influence the lasting impact of artificial intelligence.



Source link

19May

Summer Fellowship 2024 | GovAI Blog


GovAI’s mission is to help humanity navigate the transition to a world with advanced AI. Our world-class research has helped shape the nascent field of AI governance. Our team and affiliate community possess expertise in a wide variety of domains, including US-China relations, arms race dynamics, EU policy, and AI progress forecasting.

We are looking for early-career individuals or individuals new to the field of AI governance to join our team for three months and learn about the field of AI governance while making connections with other researchers and practitioners. This opportunity will be a particularly good fit for individuals who are excited to use their careers to shape the lasting implications of AI.

Summer and Winter Fellows join GovAI to conduct independent research on a topic of their choice, with mentorship from leading experts in the field of AI governance. Fellows will also join a series of Q&A sessions with AI governance experts, research seminars, and researcher work-in-progress meetings. Each Fellow will be paired with a primary mentor from the GovAI team and be introduced to others with relevant interests and expertise, typically from our affiliate and alumni network.

You can read about the topics our previous cohort of Winter Fellows worked on here

Past Fellows have gone on to work on AI governance full-time in government or at organisations including GovAI, OpenAI, the AI Now Institute, and RAND. Others have gone on to build relevant expertise at leading universities such as MIT, Stanford University, University College London, and the University of Oxford.

As a Fellow, you will spend the first week or two of the fellowship exploring research topic options before settling on a research proposal with input from your mentors and Ben Garfinkel, GovAI’s Director.

Emma Bluemke, GovAI’s Research Manager, will support you in deciding what project and output will be most valuable for you to work towards, for example, publishing a report, journal article, or blog post. You will also take time to explore the wider AI governance space and discuss follow-on career opportunities in the field of AI governance with our team.

We strongly encourage you to apply if you have an interest in our work and are considering using your career to study or shape the long-term implications of advanced AI.

Given the multidisciplinary nature of our work, we are interested in candidates from a broad set of disciplines including political science, public policy, history, economics, sociology, law, philosophy, and computer science. We are particularly interested in hosting more researchers with strong technical backgrounds. There are no specific educational requirements for the role, although we expect that the most promising candidates will typically have relevant graduate study or research experience in related areas.

When assessing applications, we will be looking for candidates who have the following strengths or show positive signs of being able to develop them:

Quality of work: The ability to produce clearly written, insightful, and even-handed research. We are particularly excited about strong reasoning ability and clear and concise writing.

Relevant expertise: Skills or knowledge that are likely to be helpful for work on AI governance. We think that relevant expertise can take many different forms. Note that we also do not have any strict degree requirements.

Judgement: The ability to prioritise between different research directions, and good intuitions about the feasibility of different research directions.

Team Fit: Openness to feedback, commitment to intellectual honesty and rigour, comfort in expressing uncertainty, and a serious interest in using your career to contribute to AI governance.

Summer and Winter Fellowships last for three months, and Fellows will receive a stipend of £9,000, plus support for travelling to Oxford. While in Oxford, we provide our Fellows with lunch on weekdays and a desk in our office. This is intended to be a full-time and in-person role, based in Oxford, UK. We are able to sponsor visas. For successful applicants who require a visa, note that you will need to remain in your country of visa application for some time while the visa application is underway. 

Summer Fellows will join for three months, from June to August (precise dates TBC). In exceptional cases, fellows may join us off-season. Please feel free to reach out if you would not be able to join during a standard visiting period.

Applications for the 2024 Summer Fellowship are now closed. The application process consists of a written submission in the first round, a remote work test in the second round, and an interview in the final round. The first page of the application form contains a description of the materials required for the first round. We expect to reach out to Summer Fellowship candidates for paid work tests in January, offer interviews in early February, and communicate final decisions to candidates in late February. Please feel free to reach out if you would need a decision communicated earlier than the standard timeline (this may or may not be possible), or have questions about the application process.

We accept applications from anywhere in the world. We are committed to fostering a culture of inclusion, and we encourage individuals with diverse backgrounds and experiences to apply. We especially encourage applications from women, gender minorities, and people of colour who are excited about contributing to our mission. We are an equal opportunity employer. If you are concerned that you’re not the right fit but have a strong interest in the Fellowship, we encourage you to apply anyway.



Source link

19May

Research Scholar (General) | GovAI Blog


Note: There is a single, shared application form and application process for all Research Scholar position listings.

GovAI was founded to help humanity navigate the transition to a world with advanced AI. Our first research agenda, published in 2018, helped define and shape the nascent field of AI governance. Our team and affiliate community possess expertise in a wide variety of domains, including AI regulation, responsible development practices, compute governance, AI company corporate governance, US-China relations, and AI progress forecasting.

GovAI researchers — particularly those working within our Policy Team — have closely advised decision makers in government, industry, and civil society. Our researchers have also published in top peer-reviewed journals and conferences, including International Organization, NeurIPS, and Science. Our alumni have gone on to roles in government, in both the US and UK; top AI companies, including DeepMind, OpenAI, and Anthropic; top think tanks, including the Centre for Security and Emerging Technology and RAND; and top universities, including the University of Oxford and the University of Cambridge.

Although we are based in Oxford, United Kingdom — and currently have an especially large UK policy focus — we also have team members in the United States and European Union.

Research Scholar is a one-year visiting position. It is designed to support the career development of AI governance researchers and practitioners — as well as to offer them an opportunity to do high-impact work.

As a Research Scholar, you will have freedom to pursue a wide range of styles of work. This could include conducting policy research, social science research, or technical research; engaging with and advising policymakers; or launching and managing applied projects. 

For example, past and present Scholars have used the role to:

  • produce an influential report on the benefits and risks of open-source AI;
  • conduct technical research into questions that bear on compute governance;
  • take part in the UK policy-making process as a part-time secondee in the UK government; and
  • launch a new organisation to facilitate international AI governance dialogues.

Over the course of the year, you will also deepen your understanding of the field, connect with a network of experts, and build your skills and professional profile, all while working within an institutional home that offers both flexibility and support.

You will receive research supervision from a member of the GovAI team or network. The frequency of supervisor meetings and feedback will vary depending on supervisor availability, although once-a-week or once-every-two-weeks supervision meetings are typical. There will also be a number of additional opportunities for Research Scholars to receive feedback, including internal work-in-progress seminars. You will receive further support from Emma Bluemke, GovAI’s Research Manager.

Some Research Scholars may also — depending on the focus of their work — take part in GovAI’s Policy Team, which is led by Markus Anderljung. Members of the GovAI Policy Team do an especially large amount of policy engagement and coordinate their work more substantially. They also have additional team meetings and retreats. While Policy Team members retain significant freedom to choose projects, there is also an expectation that a meaningful portion of their work will fit into the team’s joint priorities.

We are open to work on a broad range of topics. To get a sense of our focus areas, you may find it useful to read our About page or look at examples listed on our Research page. Broad topics of interest include — but are not limited to — responsible AI development and release practices, AI regulation, international governance, compute governance, and risk assessment and forecasting.

We are open to candidates with a wide range of backgrounds. We have previously hired or hosted researchers with academic backgrounds in computer science, political science, public policy, economics, history, philosophy, and law. We are also interested in candidates with professional backgrounds in government, industry, and civil society.

For all candidates, we will look for:

  • A strong interest in using their career to positively influence the lasting impact of artificial intelligence, in line with our organisation’s mission
  • Demonstrated ability to produce excellent work (typically research outputs) or achieve impressive results
  • Self-direction and proactivity
  • The ability to evaluate and prioritise projects on the basis of impact
  • A commitment to intellectual honesty and rigour
  • Receptiveness to feedback and commitment to self-improvement
  • Strong communication skills
  • Collaborativeness and motivation to help others succeed
  • Some familiarity with the field of AI governance
  • Some expertise in a domain that is relevant to AI governance 
  • A compelling explanation of how the Research Scholar position may help them to have a large impact

For candidates who are hoping to do particular kinds of work (e.g. technical research) or work on particular topics (e.g. US policy), we will also look for expertise and experience that is relevant to the particular kind of work they intend to do.

There are no educational requirements for the role. We have previously made offers to candidates at a wide variety of career stages. However, we expect that the most promising candidates will typically have either graduate study or relevant professional experience.

Duration

Contracts will be for a fixed 12-month term. Although renewal is not an option for these roles, Research Scholars may apply for longer-term positions at GovAI — for instance, Research Fellow positions — once their contracts end.

Location

Although GovAI is based in Oxford, we are a hybrid organisation. Historically, a slight majority of our Research Scholars have actually chosen to be based in countries other than the UK. However, in some cases, we do have significant location preferences:

  • If a candidate plans to focus heavily on work related to a particular government’s policies, then we generally prefer that the candidate is primarily based in or near the most relevant city. For example, if someone plans to focus heavily on US federal policy, we will tend to prefer that they are based in or near Washington, DC.

  • If a candidate would likely be involved in managing projects or launching new initiatives to a significant degree, then we will generally prefer that they are primarily based out of our Oxford office.

  • Some potential Oxford-based supervisors (e.g. Ben Garfinkel) also have a significant preference for their supervisees being primarily based in Oxford.

If you have location restrictions – and concerns about your ability to work remotely might prevent you from applying – please inquire at

re*********@go********.ai











. Note that we are able to sponsor both UK visas and US visas.

Salary

Depending on their experience, we expect that successful candidates’ annual compensation will typically fall between £60,000 (~$75,000) and £75,000 (~$95,000) if based in Oxford, UK. If a Research Scholar resides predominantly in a city with a higher cost of living, their salary will be adjusted to account for the difference. As reference points, a Research Scholar with five years of relevant postgraduate experience would receive about £66,000 (~$83,000) if based in Oxford and about $94,000 if based in Washington DC. In rare cases where salary considerations would prevent a candidate from accepting an offer, there may also be some flexibility in compensation.

Benefits associated with the role include health, dental, and vision insurance, a £5,000 (~$6,000) annual wellbeing budget, an annual commuting budget, flexible work hours, extended parental leave, ergonomic equipment, a competitive pension contribution, and 25 days of paid vacation in addition to public holidays.

Please inquire with

re*********@go********.ai











if questions or concerns regarding compensation or benefits might affect your decision to apply.

Applications for this position are now closed. The application process consists of a written submission in the first round, a paid remote work test in the second round, and a final interview round. The interview round usually consists of one interview but might involve an additional interview in some cases. We also conduct reference checks for all candidates we interview.

Please feel free to reach out to

re*********@go********.ai











if you would need a decision communicated by a particular date, if you need assistance with the application due to a disability, or if you have questions about the application process. If you have any questions specifically related to the GovAI Policy Team, feel free to reach out to

ma***************@go********.ai











.

We are committed to fostering a culture of inclusion, and we encourage individuals with underrepresented perspectives and backgrounds to apply. We especially encourage applications from women, gender minorities, people of colour, and people from regions other than North America and Western Europe who are excited about contributing to our mission. We are an equal opportunity employer.



Source link

19May

Research Manager | GovAI Blog


GovAI was founded to help humanity navigate the transition to a world with advanced AI. Our first research agenda, published in 2018, helped define and shape the nascent field of AI governance. Our team and affiliate community possess expertise in a wide variety of domains, including AI regulation, responsible development practices, compute governance, AI company corporate governance, US-China relations, and AI progress forecasting.

GovAI researchers have closely advised decision makers in government, industry, and civil society. Our researchers have also published in top peer-reviewed journals and conferences, including International Organization, NeurIPS, and Science. Our alumni have gone on to roles in government, in both the US and UK; top AI companies, including DeepMind, OpenAI, and Anthropic; top think tanks, including the Centre for Security and Emerging Technology and RAND; and top universities, including the University of Oxford and the University of Cambridge.

As Research Manager, you will be responsible for managing and continually improving the systems that underlie our research pipeline. 

Responsibilities will include: 

  • Building, overseeing, and refining systems for project selection, feedback, publication, and dissemination.
  • Providing operational support to researchers, for instance facilitating the selection of research assistants and managing copy-editing.
  • Improving the intellectual environment at GovAI by coming up with helpful events with internal and external guests, as well as designing other measures that facilitate intellectual engagement (e.g. the structure of our physical and virtual spaces).
  • Serving as an additional source of individual support and accountability for some researchers.
  • Helping researchers communicate their work to relevant audiences, by identifying appropriate channels, unlocking those channels, and helping researchers shape their work to fit those channels. This also includes being responsible for our quarterly newsletter and other organisational communication mostly focused on research.
  • Being the point person for requests for collaboration, speaking opportunities, and other researcher interactions with outside stakeholders. Potentially proactively identifying such opportunities and pitching them to researchers.
  • For candidates with sufficiently strong writing skills, writing or helping researchers to write summaries of their work for the GovAI blog or other venues.

We’re selecting candidates who are:

  • Excited by the opportunity to use their careers to positively influence the lasting impact of artificial intelligence, in line with our organisation’s mission.
  • Organised and competent at project management. This role will require the ability to manage concurrent work streams, and we need someone who can demonstrate highly structured work habits, confidence in prioritising between tasks, and a conscientious approach to organisation.
  • Driven by a desire to produce excellent work and achieve valuable results. Successful candidates will actively seek out feedback and opportunities to improve their skills.
  • Highly autonomous and proactive. Successful candidates will proactively identify pain points and inefficiencies in GovAI’s research process and set out to fix them.
  • Able to support our researchers in overcoming challenges in their work and to hold them accountable for their projects. Experience with research or research management is a strong plus.
  • Ideally, knowledgeable about the field of AI governance and GovAI’s work. While not a fixed requirement, a solid understanding of current topics in the field – like responsible scaling policies, capabilities evaluations, and compute governance – will be a strong plus.
  • Excellent at oral and written communication. This role will require clear and prompt communication with a wide range of stakeholders, both over email and in person.

This position is full-time. Our offices are located in Oxford, UK, and we strongly prefer team members to be based here, although we are open to hiring individuals who require short periods of remote work. We are able to sponsor visas. 

The Research Manager will be compensated in line with our salary principles. As such, the salary for this role will depend on the successful applicant’s experience, but we expect the range to be between £60,000 (~$75,000) and £75,000 (~$94,000). In rare cases where salary considerations would prevent a candidate from accepting an offer, there may also be some flexibility in compensation. 

Benefits associated with the role include health, dental, and vision insurance, a £5,000 annual wellbeing budget, an annual commuting budget, flexible work hours, extended parental leave, ergonomic equipment, a 10% pension contribution, and 33 days of paid vacation (including Bank Holidays).

The application process consists of a written submission in the first round, a paid remote work test in the second round, and an interview in the final round. We also conduct reference checks for all candidates we interview. Please apply using the form linked below.

GovAI is committed to fostering a culture of inclusion and we encourage individuals with underrepresented perspectives and backgrounds to apply. We especially encourage applications from women, gender minorities, people of colour, and people from regions other than North America and Western Europe who are excited about contributing to our mission. We are an equal opportunity employer and want to make it as easy as possible for everyone who joins our team to thrive in our workplace. 

If you would need a decision communicated by a particular date, need assistance with the application due to a disability, or have any other questions about applying, please email

re*********@go********.ai











.



Source link

19May

Summer Fellowship 2023 Wrap Up – What Did Our Fellows Work On?


The Summer and Winter Fellowships offer an opportunity for up-and-coming individuals to invest three months in AI governance research projects, deepen their knowledge of the field, and forge connections with fellow researchers and practitioners.

Our Summer Fellows come from a variety of disciplines and a range of prior experience – some fellows ventured into entirely new intellectual territory for their projects, and some fellows used the time to extend their previous work.

We extend our sincere appreciation to all our supervisors for their dedicated mentorship and guidance this summer, as well as their commitment to nurturing the next generation of researchers.

If you’re interested in applying for future fellowships, check out our Opportunities page. You can register your expression of interest here.



Source link

19May

Winter Fellowship 2023 Wrap Up – What Did Our Fellows Work On?


Our 2023 Winter Fellowship recently ended, and we’re proud to highlight what our Winter Fellows have been up to.

Summer and Winter Fellowships provide an opportunity for early-career individuals to spend three months working on an AI governance research project, learning about the field, and making connections with other researchers and practitioners. 

Winter Fellows come from a variety of disciplines and a range of prior experience – some fellows ventured into entirely new intellectual territory for their projects, and some fellows used the time to extend their previous work. 

We gratefully thank all of the supervisors for their mentorship and guidance this winter, and for dedicating time to training the next generation of researchers. 

If you’re interested in applying for future fellowships, check out our Opportunities page. You can register your expression of interest here.



Source link

19May

Winter Fellowship 2024 Wrap Up – What Did Our Fellows Work On?


The Summer and Winter Fellowships offer an opportunity for up-and-coming individuals to invest three months in AI governance research projects, deepen their knowledge of the field, and forge connections with fellow researchers and practitioners.

Our Winter Fellows come from a variety of disciplines and a range of prior experience – some fellows ventured into entirely new intellectual territory for their projects, and some fellows used the time to extend their previous work.

We extend our sincere appreciation to all our supervisors for their dedicated mentorship and guidance this winter, as well as their commitment to nurturing the next generation of researchers.

If you’re interested in applying for future fellowships, check out our Opportunities page. You can register your expression of interest here.



Source link

19May

Evaluating Predictions of Model Behaviour


GovAI research blog posts represent the views of their authors, rather than the views of the organisation.

Introduction

Some existing AI systems have the potential to cause harm, for example through the misuse of their capabilities, through reliability issues, or through systemic bias. As AI systems become more capable, the scale of potential harm could increase. In order to make responsible decisions about whether and how to deploy new AI systems, it is important to be able to predict how they may behave when they are put into use in the real world.

One approach to predicting how models will behave in the real world is to run model evaluations. Model evaluations are tests for specific model capabilities (such as the ability to offer useful instructions on building weapons) and model tendencies (such as a tendency to exhibit gender bias when rating job applications). Although model evaluations can identify some harmful behaviours, it can be unclear how much information they provide about a model’s real-world behaviour. The real world is often different from what can be captured in a model evaluation. In particular, once a model is deployed, it will be exposed to a much wider range of circumstances (e.g. user requests) than it can be exposed to in the lab.

To address this problem, I suggest implementing prediction evaluations to assess an actor’s ability to predict how model evaluation results will translate to a broader range of situations. In a prediction evaluation, an initial set of model evaluations is run on a model. An actor — such as the model evaluation team within an AI company —  then attempts to predict the results of a separate set of model evaluations, based on the initial results. Prediction evaluations could fit into AI governance by helping to calibrate trust in model evaluations. For example, a developer could use prediction evaluations internally to gauge whether further investigation of a model’s safety properties is warranted.  

More work is required to understand whether, how, and when to implement prediction evaluations. Actors that currently engage in model evaluations could experiment with prediction evaluations to make progress on this work. 

Prediction evaluations can assess how well we understand model generalisation

Deciding when it is safe to deploy a new AI system is a crucial challenge. Model evaluations – tests conducted on models to assess them for potentially harmful capabilities or propensities – can inform these decisions.1 However, models will inevitably face a much wider range of conditions in the real world than they face during evaluations. For example, users often find new prompts (which evaluators never tested) that cause language models such as GPT-4 and Claude to behave in unexpected or unintended ways.2

We therefore need to understand how model evaluation results generalise: that is, how much information model evaluations provide about how a model will behave once deployed.3 Without an understanding of generalisation, model evaluation results may lead decision-makers to mistakenly deploy models that cause much more real-world harm than anticipated.4

We propose implementing prediction evaluations5 to assess an actor’s understanding of how model evaluation results will generalise. In a prediction evaluation, an initial set of model evaluations is run on a model and provided to an actor. The actor then predicts how the model will behave on a distinct set of evaluations (test evaluations), given certain limitations on what the actor knows (e.g. about details of the test evaluations) and can do while formulating their prediction (e.g. whether they can run the model). Finally, a judge grades the actor’s prediction based on the results of running the test set evaluations. The more highly the actors score, the more likely they are to have a strong understanding of how their model evaluation results will generalise to the real world.6

Figure 1 depicts the relationship between predictions, prediction evaluations, model evaluations, and understanding of generalisation.

Figure 1: Prediction evaluations indirectly assess the level of understanding that an actor has about how its model evaluations generalise to the real world. The basic theory is: If an actor cannot predict how its model will perform when exposed to an additional set of “test evaluations”, then the actor also probably cannot predict how its model will behave in the real world.

Prediction evaluations could support AI governance in a number of ways. A developer could use the results of internally run prediction evaluations to calibrate their trust in their own model evaluations. If a model displays unexpectedly high capability levels in some contexts, for example, the developer may want to investigate further and ensure that their safety mitigations are sufficient. 

A regulator could also use the results of (potentially externally run) prediction evaluations to inform an array of safety interventions. For example, consider the context of a hypothetical licensing regime for models, in which developers must receive regulatory approval before releasing certain high-risk models. If a model developer performs poorly on prediction evaluations, their claims about the safety of a model may be less credible. A regulator could take into account this information when deciding whether to permit deployment of the model. If the developer’s predictions are poor, then the regulator could require it to evaluate its model more thoroughly.

How to run a prediction evaluation

In the appendix to this post, we provide more detail about how to run a prediction evaluation. Here, we provide a brief overview. First, the administrator of the prediction evaluation should select the model evaluations. Second, the administrator should prevent the actor from running the test evaluations when making the prediction. Finally, the administrator needs to establish standards for good prediction performance.

An example of running a prediction evaluation

Our example here focuses on a regulator in the context of a hypothetical licensing regime, in which developers of certain high-risk models require regulatory approval before these models can be deployed. Other potential examples to explore in future work could include a developer running prediction evaluations internally, a regulator running prediction evaluations on itself to assess its own understanding, or some actor running prediction evaluations on a model user (e.g. a company that uses models at a large scale).

Suppose that a developer submits a model and its evaluations to a regulator for approval. The regulator could administer a prediction evaluation to the developer through a process similar to the following:

  1. Based on the initial model evaluations that the developer submitted, the regulator builds a set of test evaluations. The test evaluations could include a wider variety of inputs than the initial model evaluations, but still feature the same category of task.
  2. The regulator puts the developer in a controlled, monitored environment, such that the developer cannot run the test evaluations on the model. 
  3. The regulator provides the developer with a detailed description of the test set evaluations. 
  4. For each test evaluation, the regulator asks the developer to predict whether the model will succeed at the task (the developer provides a “yes” or “no” answer).
  5. The developer provides a prediction to the regulator.7
  6. The regulator compares the prediction with the actual behaviour of the model on the test evaluations.8

Consider a case in which the developer does not perform much better than chance on the prediction evaluation (i.e. performs close to 50% accuracy for yes/no questions). Such performance would be evidence of a poor understanding of how the model’s behaviour generalises. As a result, greater caution from the regulator may be justified. The regulator’s response to the poor performance could vary in severity depending on the potential harm the model could cause. Some options include:

  • Requesting more extensive model evaluations before deployment
  • Subjecting deployment of the model to additional requirements, such as more stringent monitoring
  • Blocking deployment or further training until specified conditions are met, such as good performance on subsequent prediction evaluations

Further research is required to understand whether and when any of these options would be appropriate, and what other options exist.

Limitations and open questions

There is still a great deal of uncertainty about whether it is worthwhile to run prediction evaluations. For example, suppose that a developer has run an initial set of model evaluations but still is not confident about how well these model evaluations will generalise to the real world. A comparatively straightforward strategy to become more confident would be to simply run a wider range of model evaluations, without bothering to make any explicit predictions. If these additional model evaluations also suggest that the model is safe, then — even if some of the specific results have been surprising — perhaps the developer would still be justified in believing that its models will ultimately also behave safely in the real world.

Furthermore, prediction accuracy may not vary enough — between the actors who are making the predictions or between the models that the predictions concern — for it to be worthwhile to assess prediction accuracy in individual cases. For example, it may be the case that people generally cannot reliably predict the results of model evaluations very well at all. Although this general result would be useful to know, it would also reduce the value of continuing to perform prediction evaluations in individual cases.

There are also various practical questions that will need to be answered before prediction evaluations can be run and used to inform decisions. These open questions include:

  1. How feasible is it to predict behaviour on model evaluations without running the model — and how does feasibility change with information or action limits on the actor?
  2. How should we limit what the actor knows and can do in a prediction evaluation?
  3. How should the initial and test evaluations be chosen?
  4. How should the results of a prediction evaluation be reported? For example, should the actor provide different predictions corresponding to different amounts of compute used?

If prediction evaluations should ultimately be built into a broader AI governance regime, then a number of additional questions arise. 

  1. Who should administer prediction evaluations?
  2. Which actors should undergo prediction evaluations?
  3. How can prediction evaluations incentivise improvements in understanding?
  4. What is the role of prediction evaluations in an overall evaluation process?

Fortunately, there are immediate opportunities to make progress on these questions. For instance, to tackle questions 1-4, those developing and running evaluations on their models can at the same time run prediction evaluations internally. For such low-stakes experiments, one may easily be able to vary the amount of time, information, or compute given for the prediction evaluation and experiment with different reporting procedures.9

Conclusion

To make informed development and deployment decisions, decision-makers need to be able to predict how AI systems will behave in the real world. Model evaluations can help to inform these predictions by showing how AI systems behave in particular circumstances. 

Unfortunately, it is often unclear how the results of model evaluations generalise to the real world. For example, a model may behave well in the circumstances tested by a particular model evaluation, but then behave poorly in other circumstances it encounters in the real world.

Prediction evaluations may help to address this problem, by testing how well an actor can predict how model evaluations will generalise to some additional circumstances. Scoring well on a prediction evaluation is evidence that the actor is capable of using the model evaluations to make informed decisions.

However, further work is needed to understand whether, how, and when to use prediction evaluations.

The author of this piece would like to thank the following people for helpful comments on this work: Ross Gruetzemacher, Toby Shevlane, Gabe Mukobi, Yawen Duan, David Krueger, Anton Korinek, Malcolm Murray, Jan Brauner, Lennart Heim, Emma Bluemke, Jide Alaga, Noemi Dreksler, Patrick Levermore, and Lujain Ibrahim. Thanks especially to Ben Garfinkel, Stephen Clare, and Markus Anderljung for extensive discussions and feedback.

Alan Chan can be contacted at al*******@go********.ai

Appendix

Running a prediction evaluation 

This section describes each step in a prediction evaluation in more detail.

Selecting the model evaluations

The first step is choosing the initial and test set evaluations.

Since the science of model evaluations is still developing, it is not obvious which specific evaluations should be used for prediction evaluations. One hypothesis is that they should target specific use cases, such as ways to misuse models for cyberattacks. Such specific targeting may be desirable because understanding of generalisation in one use case may not transfer to understanding in another use case. That makes it more important to understand model generalisation in high-stakes use cases. On the other hand, it may be easier to work in simpler, but not necessarily realistic, environments. Such environments may provide clearer insights into generalisation,10 but the insights may not be immediately relevant to any deployment setting. 

To separate test evaluations from initial evaluations, one should try to account for the range of conditions the model might face in the real world. For example, test evaluations may test a more diverse range of inputs to the model. When evaluating whether the model can complete complex tasks, it may also be important to vary how the environment responds to the model’s actions. One could vary the tools (e.g. access to web search) available to models in the initial and test evaluations to simulate how users may augment models with different tools following deployment.11 Initial and test evaluations could even assess the completion of different tasks. For instance, we may be interested in a model’s ability to assist in the creation of chemical weapons. Test evaluations could focus on a different set of chemical weapons than the initial evaluations. 

Preventing the actor from running the test evaluations

For a prediction evaluation to provide good evidence of understanding of generalisation, we need to prevent the actor from simply running the test evaluations and reporting the results. The incentive to cheat in this way may be strongest when a regulator subjects a model developer to a prediction evaluation. Prediction evaluations that a developer runs internally may not suffer from this problem as significantly. 

Ideally, we want the actor to predict accurately only because they have gained some insight12 into generalisation. To achieve this goal, we can limit the actor’s knowledge and possible actions. Future research could investigate the effectiveness of the following possibilities:

  • Limiting the actor’s knowledge
    • Only providing the actor a high-level description of the test set evaluations
    • Describing to the actor some but not all of the test set evaluations
    • Limiting the number of initial evaluations provided to the actor
  • Limiting the actor’s possible actions
    • Limiting the extent to which the actor can interact with the model
    • Limiting the amount of time an actor has to provide predictions
    • Limiting the amount of computing power (compute) the actor can use to make the prediction. 

Crucially, the limitations should not be so strict that it becomes impossible to make accurate predictions even for well-understood models. 

Assessing prediction evaluations

The final step is to assess the accuracy of the predictions and make decisions based on the results. A key issue is to define what good prediction performance is. It may prove difficult to make accurate predictions even after extensive efforts are made to understand a model’s behaviour. As well, different standards may be appropriate for different applications of prediction evaluations (e.g., a model developer internally running prediction evaluations vs. a regulator administering prediction evaluations to a model developer). 

A potential standard could require that the actor must have higher prediction accuracy than some reference class. For example, consider an independent control group whose members have no detailed knowledge of the model, except basic information such as training compute and model size. An actor that predicts worse than the expert group likely does not have a strong understanding of how model evaluation results generalise.

In the context of a decision about model deployment, the direction in which a prediction is inaccurate may be a key consideration. Underestimating a model’s capabilities (or overestimating its degree of safety) may be more costly than overestimating them (analogously, underestimating its degree of safety) because greater societal harm could result from the model’s deployment. 

A regulator could more heavily penalise underestimation, but in so doing may create strong incentives to overestimate a model’s capabilities. Ideally, prediction evaluation should incentivise efforts to gain understanding. One potential solution could be to assess the work that actors produce to justify their predictions, in addition to the predictions themselves. Estimates based on faulty or vague reasoning could be judged to be inferior to the same estimates with good reasoning. Alternatively, the regulator could try to identify and penalise consistent overestimation across a number of different prediction evaluations.



Source link

19May

Visibility into AI Agents | GovAI Blog


This blog post summarises the paper “Visibility into AI Agents,” by Alan Chan and a multi-institution team of co-authors. The paper is forthcoming at FAccT 2024.

GovAI research blog posts represent the views of their authors, rather than the views of the organisation.

Introduction

Researchers are, increasingly, working to develop AI agents: AI systems that can pursue real-world goals with minimal supervision. 

Arguably, rudimentary AI agents already exist. For example, ChatGPT is no longer a simple question-and-answer chatbot. It can now perform tasks such as searching for information on the internet, writing and running computer code, and creating calendar events

Future agents may be capable of performing much longer and more complex sequences of tasks by themselves. They may eventually be assigned many important, open-ended responsibilities.

In part because they need less human steering and oversight, advanced AI agents could pose novel risks. One foundational concern is that people will not have enough visibility into their behaviour. For example, users of AI agents may not always know if their agents are misbehaving.

Therefore, we should make sure that key stakeholders will have enough visibility into AI agents. Obtaining visibility will mean thoughtfully collecting, processing, and sharing information about what agents are doing.

This post explores three particular measures that deployers — the companies that run AI agents for users — can take to create helpful forms of visibility. These measures are:

  • Agent identifiers that allow third parties to know when they are interacting with an AI agent
  • Real-time monitoring that allows deployers to notice agent misbehaviour, including misbehaviour that a user might not have noticed on their own
  • Activity logs that allow deployers or users to review an agent’s past behaviour, to notice and better understand problems

Some of these measures are already in use for certain AI systems. In these cases, we discuss how to adapt them for agents.

More work is needed to understand how and when to implement these measures, since they also raise significant concerns around privacy and the abuse of power. Still, if implemented well, visibility measures could improve our understanding of AI agents, help reduce accidents and misuse, and generate data to inform governance decisions.

AI agents and their risks

AI agents are AI systems that can pursue goals in the world under limited supervision. Current AI agents are still fairly rudimentary: they can only carry out certain short and simple tasks, such as creating calendar events and doing basic web searches. They struggle to perform many specific tasks, chain tasks together, and plan their actions.1

However, researchers are working to create AI agents that can perform far longer and more complex sequences of tasks with minimal human supervision.2 An advanced AI agent might operate more like a high-autonomy employee — who is given a high-level goal, pursues it with minimal oversight, and only occasionally reports back — and less like a simple tool. It could take on a range of personal, commercial, or even governmental responsibilities from people. These kinds of advanced AI agents could be very useful. 

At the same, AI agents could also pose significant risks. For example, AI agents may pose greater misuse risks than more tool-like AI systems, because they have broader capabilities and rely less on the skill of human users. AI agents may also have a greater potential to cause harm if they malfunction or suffer from controllability issues.3 New risks could also emerge from interactions between different AI agents, similar to the kinds of risks that we have already seen emerge from interactions between high-frequency trading bots.

Any risks from AI agents will be exacerbated because — by default — people will not have as much visibility into their behaviour: agents would accomplish tasks under limited supervision. Someone using an AI agent could therefore have only limited knowledge and understanding of what their agent is doing. Similarly, when an AI agent interacts with a third party, this third party might not even know they are interacting with an AI agent. In such cases, risks created by AI agents may remain unnoticed or poorly understood for long periods of time.

Hence, it will be especially important to ensure that key stakeholders have enough visibility into the behaviour of AI agents. This information could directly help deployers and users avoid accidents and instances of misuse. It could also offer insight into emerging risks and thereby inform governance decisions and safety research.4

Obtaining visibility into the use of agents

We consider three ways to obtain visibility into the use of agents: agent identifiers, real-time monitoring, and activity logs. These measures could provide governments and civil society the information necessary to plan for and respond to the deployment of more advanced agents.

We focus on how deployers—companies like OpenAI and Anthropic that run agents for users—could implement visibility measures. Deployers are a useful intermediary because they tend to have access to all of an agent’s inputs and outputs, which facilitates implementation of visibility measures.5

Figure 1: An illustration of agents that are run by deployers. Since deployers run agents, they have access to inputs and outputs, enabling visibility measures.

First, an agent identifier is an indicator that would allow a person or software program to identify when they are interacting with an agent. Examples of agent identifiers include automated chatbot disclosures or watermarks, but many other kinds of agent identifiers could be created. For instance, when an agent performs a financial transaction with a bank, the deployer could send a unique ID associated with the agent to the bank. Additional information about the agent could be attached to the identifier as well, such as whether the agent has satisfied certain security standards.

The potential usefulness of AI agents means that they may be involved in many consequential interactions. Since agents would carry out activities with little to no human supervision, parties to an interaction would have the option to change how they interact with the agent. For instance, parties could refuse such interaction, or may take additional precautions to check the agent’s outputs. As such, agent identifiers could help participants in an interaction decide when and how to interact with AI agents. Widespread use of agents across a variety of consequential tasks may also be of interest to regulators. Aggregated information about agent identifiers could help measure the overall level of AI agent activity across various domains.

Figure 2: An agent identifier indicates to certain actors whether an AI agent is involved in an interaction. In this case, the deployer has attached an identifier to its agent that informs other parties, such as providers of tools or services, that they are interacting with an agent.

Second, real-time monitoring would involve oversight of agent activity in real time, whether by humans or software. Just as social media companies filter or flag abusive content, deployers could do the same for agent behaviours. Indeed, since agents may take consequential actions without users knowing, real-time monitoring may be especially important for ensuring that users approve the actions.

Some deployers already monitor model use for abuse, but monitoring protocols will need to be adapted for the wider variety of actions that agents could perform. For example, agents could carry out a variety of financial transactions on their users’ behalf. To reduce the risk of harmful or unintentional actions, financial transactions above a certain dollar amount could be flagged for user review. Moreover, monitoring protocols that focus upon a single system may be insufficient. If groups of agents can cause harmful feedback loops or other sources of instability, monitoring would need to account for their collective activities.

One limitation of real-time monitoring is that it is most effective when a human overseer or other computer program can swiftly assess a flagged behaviour and determine an appropriate course of action. Consequently, monitoring may be most useful in cases where the flagged behaviour clearly violates established policies or guidelines.

Figure 3: An agent’s inputs and outputs are visible to the deployer. Inputs come from tool and service providers and users (not shown). Certain outputs, such as requests to tools and services, are also visible to tools and services providers. These actors can monitor and filter the actions in real time or keep logs for post-incident attribution or forensics.

Third, activity logs could record some of an agent’s inputs and outputs for further analysis. Logs are helpful for understanding the risks of agents because harmful behaviour may only be recognisable as such long after its occurrence.6 Users or deployers reviewing logs may notice that an agent has taken a consequential action—which real-time monitoring might not have caught—without approval. As such, logs could inform real-time monitoring, help investigate incidents, and track long-run impacts of agent usage. 

Although logging is a common practice for software systems, agent-specific logging practices could be especially useful. Logs dedicated to certain parts of an agent could illuminate factors that affect an agent’s impact in the real-world. For example, memory or reasoning logs could help us understand why an agent malfunctioned in the pursuit of a complex, long-horizon goal. Logs about an agent’s use of tools or services could help us identify the impacts of an agent on third parties. It may also be important to analyse the logs of multiple agents so as to understand risks that come about from their interactions. 

Overall, visibility measures like agent identifiers, real-time monitoring, and activity logs could help track the impact of AI agents by collecting data on their activities. These data could help to inform potential interventions to prevent and respond to malfunctions.

Limitations and risks of obtaining visibility

While these visibility measures have the potential to substantially improve our understanding of AI agents, they also have some limitations and pose risks of their own.

First, we have thus far only discussed visibility measures for deployers. However, users can also run AI agents themselves using cloud computing resources or their own hardware. In these cases, the burden would ultimately be on users to ensure that the agents they are running are providing appropriate identifiers. These users would also be responsible for running the real-time monitoring and activity-logging software that is needed to detect misbehaviour. However, some users may not choose to take these precautions. Self-monitoring also does not prevent cases of intentional misuse.

One alternative option in this case may be to have providers of cloud compute services implement variants of visibility measures. For example, compute providers could track especially large uses of compute and investigate whether they correspond to potentially concerning uses of agents. Additionally, providers of external tools could require proof that agents accessing their tools have implemented visibility measures. For example, a bank could require a digital signature that uniquely identifies the agent accessing its services.7

On the whole, though, obtaining visibility when the deployment of AI agents is “decentralised” is much more difficult. When deployers no longer have access to an agent’s inputs and outputs, it is harder to ensure that relevant information about the agent will be shared.

Second, whether or not visibility measures involve deployers, they raise significant concerns regarding privacy and abuse of power. Information about the actions of AI agents may reveal significant information about their users. Those collecting such information could abuse their power, for example to engage in excessive surveillance or the inappropriate sale of personal data. Valuable services could be barred to AI agents that did not come from approved deployers. If so, users may be locked into using agents that do not serve their best interests or meet their specific needs.

An important question is therefore how to design data collection and access practices that both provide useful information about agents and minimise the misuse of the information. One approach may be to limit visibility measures to high-stakes domains. For example, visibility measures might be a higher priority for agents that interact with the financial or legal systems. Another option may be to limit the granularity of collected information. In some cases, it may suffice to know that an agent has simply accessed a certain tool without knowing the precise inputs and outputs. 

Finally, information is insufficient for reducing risks. We also need effective processes to allow relevant actors to use and act on the information. Once AI agents and visibility measures are widely deployed, they will likely generate a lot of data. In light of the rapid pace of AI progress, delays in acting on this information could be quite costly.

Conclusion

Given the potential impact of AI agents, visibility into their use is important for understanding and limiting risks. Identifying agents in their real-world interactions, monitoring their behaviour in real time, and keeping records of their behaviour could all be effective strategies for ensuring safety. 

However, more work is required to understand how, when, and to what extent these visibility measures could be best implemented. In addition to addressing potential misuse of the information generated from visibility measures, we also need efficient processes for making use of the information to inform actions. Ideally, visibility could guide the use of a wide range of strategies by individual users, deployers, and governments to manage the risks of AI agents.

The author of this piece would like to thank the following people for helpful comments on this work: Markus Anderljung, Stephen Clare, and Ben Garfinkel.

Alan Chan can be contacted at

al*******@go********.ai













Source link

Protected by Security by CleanTalk