#8: Bing Chat, AI labs on safety, and pausing Future Matters

Mar 21, 2023

Future Matters is a newsletter about longtermism and existential risk by Matthew van der Merwe and Pablo Stafforini. Each month we curate and summarize relevant research and news from the community. The version crossposted to the Effective Altruism Forum includes a bonus conversation with a prominent researcher. You can also listen on your favorite podcast platform and follow on Twitter. Future Matters is also available in Spanish.

A message to our readers

This issue marks one year since we started Future Matters. We’re taking this opportunity to reflect on the project and decide where to take it from here. We’ll soon share our thoughts about the future of the newsletter in a separate post, and will invite input from readers. In the meantime, we will be pausing new issues of Future Matters. Thank you for your support and readership over the last year!

Featured research

All things Bing

Microsoft recently announced a significant partnership with OpenAI [see FM#7] and launched a beta version of a chatbot integrated with the Bing search engine. Reports of strange behavior quickly emerged. Kevin Roose, a technology columnist for the New York Times, had a disturbing conversation in which Bing Chat declared its love for him and described violent fantasies. Evan Hubinger collects some of the most egregious examples in Bing Chat is blatantly, aggressively misaligned. In one instance, Bing Chat finds a user’s tweets about the chatbot and threatens to exact revenge. In the LessWrong comments, Gwern speculates on why Bing Chat exhibits such different behavior to ChatGPT, despite apparently being based on a closely-related model. (Bing Chat was subsequently revealed to have been based on GPT-4).

Holden Karnofsky asks What does Bing Chat tell us about AI risk? His answer is that it is not the sort of misaligned AI system we should be particularly worried about. When Bing Chat talks about plans to blackmail people or commit acts of violence, this isn’t evidence of it having developed malign, dangerous goals. Instead, it’s best understood as Bing acting out stories and characters it’s read before. This whole affair, however, is evidence of companies racing to deploy ever more powerful models in a bid to capture market share, with very little understanding of how they work and how they might fail. Most paths to AI catastrophe involve two elements: a powerful and dangerously misaligned AI system, and an AI company that builds and deploys it anyway. The Bing Chat affair doesn’t reveal much about the first element, but is a concerning reminder of how plausible the second is.

Robert Long asks What to think when a language model tells you it's sentient [🔉]. When trying to infer what’s going on in other humans’ minds, we generally take their self-reports (e.g. saying “I am in pain”) as good evidence of their internal states. However, we shouldn’t take Bing Chat’s attestations (e.g. “I feel scared”) at face value; we have no good reason to think that they are a reliable guide to Bing’s inner mental life. LLMs are a bit like parrots: if a parrot says “I am sentient” then this isn’t good evidence that it is sentient. But nor is it good evidence that it isn’t — in fact, we have lots of other evidence that parrots are sentient. Whether current or future AI systems are sentient is a valid and important question, and Long is hopeful that we can make real progress on developing reliable techniques for getting evidence on these matters. Long was interviewed on AI consciousness, along with Nick Bostrom and David Chalmers, for Kevin Collier’s article, What is consciousness? ChatGPT and Advanced AI might define our answer1 [🔉].

How the major AI labs are thinking about safety

In the last few weeks, we got more information about how the leading AI labs are thinking about safety and alignment:

Anthropic outline their Core views on AI safety [🔉]. The company was founded in 2021 by a group of former OpenAI employees, with an explicitly safety-focussed mission. They remain fundamentally uncertain about how difficult it will be to align very powerful AI systems — it could turn out to be pretty easy, to require enormous scientific and engineering effort, or to be effectively impossible (in which case, we’d want to notice this and slow down AI development before anything disastrous happens). Anthropic take a portfolio approach to safety research, pursuing multiple lines of attack, with a view to making useful contributions, however difficult things turn out to be.
OpenAI released Planning for AGI and beyond [🔉], by CEO Sam Altman, which is a more high-level statement of the company’s approach to AGI. We enjoyed the critical commentary by Scott Alexander [🔉]. (OpenAI outlined their approach to alignment research specifically back in August 2022).
Viktoria Krakovna shared a presentation on how DeepMind’s Alignment team thinks about AI safety (note that this does not necessarily represent the views of DeepMind as a whole).

Summaries

Ezra Klein writes powerfully on AI risk in the New York Times [🔉]. (The noteworthy thing to us is less the piece’s content and more what its publication, and positive reception, reveals about the mainstreaming of AI risk concerns.)
In Global priorities research: Why, how, and what have we learned? [🔉], Hayden Wilkinson discusses global priorities research, argues that it is a high-impact research area, and summarizes some of its key findings so far.
Andy Greenberg’s A privacy hero’s final wish: an institute to redirect AI’s future [🔉] is a moving profile of the icon Peter Eckersley and the AI Objectives Institute, which he established in the year before his tragic and untimely passing.
In What AI companies can do today to help with the most important century [🔉], Holden Karnofsky suggests prioritizing alignment research, strengthening security; helping establish safety standards and monitoring regimes; avoiding hype and acceleration; and setting up governance mechanisms capable of dealing with difficult trade-offs between commercial and public interests.
Karnofsky also offers advice on How major governments can help with the most important century.
And finally, in Jobs that can help with the most important century, Karnofsky provides some career recommendations for mere individuals.
In LLMs are not going to destroy the human race [🔉], Noah Smith argues that, although AGI might eventually kill humanity, large language models are not AGI, may not be a step toward AGI, and there's no plausible way they could cause human extinction.
Joseph Carlsmith’s doctoral thesis, A stranger priority? Topics at the outer reaches of effective altruism, examines how anthropics, the simulation argument and infinite ethics each have disruptive implications for longtermism. Highly recommended.
In How much should governments pay to prevent catastrophes? [🔉], Carl Shulman and Elliott Thornley argue that the goal of longtermists should be to get governments to adopt global catastrophic risk policies based on standard cost-benefit analysis rather than on arguments that stress the overwhelming importance of the far future.
Eli Tyre’s current summary of the state of AI risk [🔉] (conclusion: “we are extremely unprepared”).
In Preventing the misuse of DNA synthesis [🔉], an Institute for Progress report, Bridget Williams and Rowan Kane make five policy recommendations to mitigate risks of catastrophic pandemics from synthetic biology.
Patrick Levermore scores forecasts from AI Impacts’ 2016 expert survey, finding they performed quite well at predicting AI progress over the last five years.
In Why I think it's important to work on AI forecasting [🔉], Matthew Barnett outlines three threads of research that he is currently pursuing which he believes could shed light on important aspects of how AI will unfold in the future
Allen Hoskin speculates on Why AI experts continue to predict that AGI is several decades away [🔉].
In Should GPT exist? [🔉], Scott Aaronson opposes a ban on LLMs partly on the grounds that, historically, opposition to dangerous technologies often increased the harms they caused.
Matthew Barnett proposes a new method for forecasting transformative AI.
In Against LLM reductionism, Erich Grunewald argues that statements that large language models are mere "stochastic parrots" (and the like) make unwarranted implicit claims about their internal structure and future capabilities.
Experimental evidence on the productivity effects of generative artificial intelligence [🔉], by Shakked Noy and Whitney Zhang, examines the effects of ChatGPT on production and labor markets.
In Framing AI strategy [🔉], Zach Stein-Perlman discusses ten approaches to AI strategy.
David Chapman published an online book, Better Without AI, outlining the case for AI risk and what individuals can do now to prevent it.
In How bad a future do ML researchers expect?, Katja Grace finds that the proportion of respondents to her survey of machine learning researchers who believe extremely bad outcomes from AGI are at least 50% likely has increased from 3% in the 2016 survey to 9% in the 2022 survey.
Noam Kolt’s Algorithmic black swans [🔉] offers a roadmap for ‘algorithmic preparedness’, a framework for developing regulation capable of mitigating ‘black swan’ risks from advanced AI systems.
In a new Global Priorities Institute paper, Tiny probabilities and the value of the far future, Petra Kosonen argues that discounting small probabilities does not undermine the case for longtermism.
Reflection mechanisms as an alignment target — attitudes on “near-term” AI [🔉], by Eric Landgrebe, Beth Barnes and Marius Hobbhahn, discuss a survey of 1000 participants on their views about what values should be put into powerful AIs.
Are there ways to forecast how well a conversation about AI alignment with an AI researcher might go? In Predicting researcher interest in AI alignment [🔉], Vael Gates tries to answer this question by focusing on a quantitative analysis of 97 AI researcher interviews.
In AI risk, again [🔉], Robin Hanson restates his views on the subject.
Fin Moorhouse’s Summary of What We Owe The Future [🔉] is a detailed synopsis of Will MacAskill’s recent book.
In Near-term motivation for AGI alignment [🔉], Victoria Krakovna makes the point that you don't have to be a longtermist to care about AI alignment.
Joel Tan’s Shallow report on nuclear war (arsenal limitation) estimates that lobbying for arsenal limitation to mitigate nuclear war has a marginal expected value of around 33.4 DALYs per dollar, or a cost-effectiveness around 5,000 times higher than that of GiveWell’s top charities.
In The effectiveness of AI existential risk communication to the American and Dutch public, Alexia Georgiadis measures changes in participants’ awareness of AGI risks after consuming various media interventions. There is a summary [🔉] of this paper written by Otto Barten.
Larks’s A Windfall Clause for CEO could worsen AI race dynamics [🔉] argues that the proposal to make AI firms promise to donate a large fraction of profits if they become extremely profitable will primarily benefitting the management of those firms and thereby give managers an incentive to move fast, aggravating race dynamics and in turn increasing existential risk.
In What should be kept off-limits in a virology lab? [🔉], Kelsey Piper discusses the Proposed biosecurity oversight framework for the future of science, a new set of guidelines released by the National Science Advisory Board for Biosecurity (NSABB) that seeks to change how research with the potential to cause a pandemic is evaluated.2
Arielle D'Souza’s How to reuse the Operation Warp Speed model [🔉] claims that Operation Warp Speed's highly successful public-private partnership model could be reused to jumpstart a universal coronavirus or flu vaccine, or the building of a resilient electrical grid.
Elika Somani shares some Advice on communicating in and around the biosecurity policy community [🔉].
Our Common Agenda, a United Nations report published in late 2021, proposed that states should issue a Declaration on Future Generations. In Toward a declaration on future generations [🔉], Thomas Hale, Fin Moorhouse, Toby Ord and Anne-Marie Slaughter consider how such a declaration should be approached and what it should contain.
In Technological developments that could increase risks from nuclear weapons: A shallow review [🔉], Michael Aird and Will Aldred explore some technological developments that might occur and might increase risks from nuclear weapons, especially risks to humanity's long-term future.
Christian Ruhl’s Call me, maybe? Hotlines and global catastrophic risk [🔉], a shallow investigation by Founders Pledge, looks at the effectiveness of direct communications links between states as interventions to mitigate global catastrophic risks.
In The open agency model [🔉], Eric Drexler proposes an "open-agency frame" as the appropriate model for future AI capabilities, in contrast to the "unitary-agent frame" the author claims is often presupposed in AI alignment research.
Riley Harris summarizes two papers by the Global Priorities Institute: Longtermist institutional reform [🔉] by Tyler John & William MacAskill, and Are we living at the hinge of history? [🔉] by MacAskill.
Juan Cambeiro’s What comes after COVID? lays out some well-reasoned forecasts about pandemic risk. Cambeiro assigns a 19% chance to another pandemic killing 20M+ people in the next decade; and conditional on this happening, the most likely causes are a flu virus (50%) or another coronavirus (30%).

News

OpenAI announced the launch of GPT-4, "a large multimodal model, with our best-ever results on capabilities and alignment". (See discussion on LessWrong).
- The model has been made available via the ChatGPT interface (to paid users).
- OpenAI shared an early version with Paul Christiano’s Alignment Research Center to assess the risks of power-seeking behavior, particularly focussed on its ability “to autonomously replicate and gather resources”. (Detailed in the accompanying paper).
Google made a $300m investment in Anthropic [🔉].
Holden Karnofsky is taking a leave of absence from Open Philanthropy to work on AI safety. He plans to work on third-party evaluation and monitoring of AGI labs. Alexander Berger moves from co-CEO to CEO.
Monmouth poll found 55% of Americans worried about AI posing an existential risk; only 9% think AI will do more good than harm.
The Elders, the organization of world leaders founded by Nelson Mandela, announced a new focus on existential risk reduction.
Putin suspended Russia’s participation in the New START arms control treaty [🔉].
The US issued a declaration on the responsible use of military AI [🔉].
The Global Fund is awarding an additional $320 million to support immediate COVID-19 response and broader pandemic preparedness.
The Flares, a French YouTube channel and podcast that produces animated educational videos, released the third part of its series on longtermism.
A “Misalignment Museum”, imagining a post-apocalyptic world where AGI has destroyed most of humanity, recently opened in San Francisco.

Opportunities

Open Philanthropy announced a contest to identify novel considerations with the potential to influence their views on AI timelines and AI risk. A total of $225,000 in prize money will be distributed across the six winning entries.
The Centre for Long-Term Resilience is hiring an AI policy advisor. Applications are due April 2nd. Apply now.
Applications are open for New European Voices on Existential Risk (NEVER), a project that aims to attract talent and ideas from wider Europe on nuclear issues, climate change, biosecurity and malign AI. Apply now.
Sam Bowman is planning to hire at least one postdoctoral research associate or research scientist to start between March and September 2023 on language model alignment. Apply now.
The General Longtermism Team at Rethink Priorities is currently considering creating a "Longtermist Incubator" program and is accepting expression of interest submissions for a project lead/co-lead to run the program if it’s launched.

Audio & video

Gus Docker from the Future of Life Institute Podcast interviewed Tobias Baumann on suffering risks, artificial sentience, and the problem of knowing which actions reduce suffering in the long-term future [🔉].
Jen Iofinova from the Cohere For AI podcast interviewed Victoria Krakovna on paradigms of AI alignment.
Luisa Rodríguez from the 80,000 Hours Podcast interviewed Robert Long on why LLMs like GPT (probably) aren’t conscious [🔉].
Rational Animations published The power of intelligence, based on the Eliezer Yudkowsky’s article.
Daniel Filan interviewed John Halstead on why climate change is not an existential risk [🔉].
The Bankless podcast interviewed Eliezer Yudkowsky on AGI ruin [🔉]. A transcript of the interview is available here.
A new AI podcast hosted by Nathan Labenz and Erik Torenberg launched: The Cognitive Revolution.

Newsletters

AI Safety News February 2023: Unspeakable tokens, Bing/Sydney, Pretraining with human feedback
Import AI #321: Open source GPT3; giving away democracy to AGI companies; GPT-4 is a political artifact
ChinAI #216: Around the Horn (10th edition)
The EU AI Act Newsletter #25
European AI Newsletter #82: Europe's Digital Decade

Conversation with Tom Davidson

To read our conversation with Tom on AI takeoff speeds, please go to the version of this issue crossposted on the Effective Altruism Forum.

We thank Leonardo Picón and Lyl Macalalad for editorial assistance.

See also our conversation with Long in FM#3; and the 80k podcast episode mentioned below.

See also coverage of the debate in Nature [🔉].