#6: FTX collapse, value lock-in, and counterarguments to AI x-risk

Dec 30, 2022

[T]he sun with all the planets will in time grow too cold for life, unless indeed some great body dashes into the sun and thus gives it fresh life. Believing as I do that man in the distant future will be a far more perfect creature than he now is, it is an intolerable thought that he and all other sentient beings are doomed to complete annihilation after such long-continued slow progress.
— Charles Darwin

Future Matters is a newsletter about longtermism and existential risk by Matthew van der Merwe and Pablo Stafforini. Each month we curate and summarize relevant research and news from the community. The version crossposted to the Effective Altruism Forum includes a bonus conversation with a prominent researcher. You can also listen on your favorite podcast platform and follow us on Twitter. Future Matters is also available in Spanish.

A message to our readers

Welcome back to Future Matters. We took a break during the autumn, but will now be returning to our previous monthly schedule. Future Matters would like to wish all our readers a happy new year!

The most significant development during our hiatus was the collapse of FTX and the fall of Sam Bankman-Fried, until then one of the largest and most prominent supporters of longtermist causes. We were shocked and saddened by these revelations, and appalled by the allegations and admissions of fraud, deceit, and misappropriation of customer funds. As others have stated, fraud in the service of effective altruism is unacceptable, and we condemn these actions unequivocally and support authorities’ efforts to investigate and prosecute any crimes that may have been committed.

Research

Artificial general intelligence and lock-in [🔉], by Lukas Finnveden, C. Jess Riedel and Carl Shulman, considers AGI as enabling, for the first time in history, the creation of long-lived and highly-stable institutions with the capacity to pursue a variety of well-specified goals. The authors argue that, if a significant fraction of the world’s powers agreed to establish and empower them to defend against external threats, such institutions could arise and subsist for millions, or even trillions, of years. We found this report to be one of the most important contributions to longtermist macrostrategy published in recent years, and we hope to feature a conversation with one of the authors in a future issue of the newsletter.

A classic argument for existential risk from superintelligent AI goes something like this: (1) superintelligent AIs will be goal-directed; (2) goal-directed superintelligent AIs will likely pursue outcomes that we regard as extremely bad; therefore (3) if we build superintelligent AIs, the future will likely be extremely bad. Katja Grace’s Counterarguments to the basic AI x-risk case [🔉] identifies a number of weak points in each of the premises in the argument. We refer interested readers to our conversation with Katja for more discussion of this post, as well as to Erik Jenner and Johannes Treutlein’s Responses to Katja Grace’s AI x-risk counterarguments [🔉].

The key driver of AI risk is that we are rapidly developing more and more powerful AI systems, while making relatively little progress in ensuring they are safe. Katja Grace’s Let’s think about slowing down AI [🔉] argues that the AI risk community should consider advocating for slowing down AI progress. She rebuts some of the objections commonly levelled against this strategy: e.g. to the charge of infeasibility, she points out that many technologies (human gene editing, nuclear energy) have been halted or drastically curtailed due to ethical and/or safety concerns. In the comments, Carl Shulman argues that there is not currently enough buy-in from governments or the public to take more modest safety and governance interventions, so it doesn’t seem wise to advocate for such a dramatic and costly policy: “It's like climate activists in 1950 responding to difficulties passing funds for renewable energy R&D or a carbon tax by proposing that the sale of automobiles be banned immediately. It took a lot of scientific data, solidification of scientific consensus, and communication/movement-building over time to get current measures on climate change.”

We enjoyed Kelsey Piper's review of What We Owe the Future [🔉], not necessarily because we agree with her criticisms, but because we thought the review managed to identify, and articulate very clearly, what we take to be the main crux between the longtermist EAs who liked the book and those who, like Piper, had major reservations about it: "most longtermists working in AI safety are worried about scenarios where humans fail to impart the goals they want to the systems they create. But MacAskill think it's substantially more likely that we'll end up in a situation where we know how to set AI goals, and set them based on parochial 21st century values—which makes it utterly crucial that we improve our values so that the future we build upon them isn't dystopian."

In How bad could a war get? [🔉], Stephen Clare and Rani Martin ask what the track record of conflict deaths tells us about the likelihood of wars severe enough to threaten human extinction. They conclude that history provides no strong reason to rule out such wars, particularly given the recent arrival of technologies with unprecedented destructive potential, e.g. nuclear weapons and bioweapons.

Catastrophes that fall short of human extinction could nonetheless reduce humanity to a pre-industrial or pre-agricultural state. In What is the likelihood that civilizational collapse would cause technological stagnation? [🔉], Luisa Rodriguez asks whether we would ever recover from this sort of setback. The past provides some comfort: in humanity’s short history, agriculture was independently invented numerous times, and the industrial revolution followed just 10,000 years later. Given this, it would be surprising if it was extremely hard to do it all over again. Moreover, a post-collapse humanity would likely have materials and knowledge left over from industrial civilisation, placing them at an advantage vs our hunter-gatherer ancestors. On the other hand, certain catastrophes could make things more difficult, such as extreme and long-lasting environmental damage. All-things-considered, Rodriguez thinks that humanity has at least a 97% chance of recovering from collapse.

In The Precipice, Toby Ord proposed a grand strategy of human development involving the attainment of existential security—a stable state of negligible existential risk—followed initially by a "long reflection" and ultimately by the full realization of human potential. Ord's recent contribution to the UN Human Development Report [🔉] focuses on the first of these three stages and considers specifically the institutions needed for existential security. Ord answer is that what is needed are international institutions with outstanding forecasting expertise, strong coordinating ability, and a great deal of buy-in.1

In A lunar backup record of humanity, Caroline Ezell, Alexandre Lazarian and Abraham Loeb offer an intriguing proposal for helping humanity recover from catastrophes. As part of the first lunar settlements, we should build a data storage infrastructure on the moon to keep a continuously updated backup of important materials e.g. books, articles, genetic information, satellite imagery. They suggest this would improve the chances that lunar settlements could rebuild civilisation in the event of a terrestrial catastrophe, and that it could work with current technologies for data storage and transmission.

Max Tegmark explains Why [he] think there's a one-in-six chance of an imminent global nuclear war [🔉], in light of the Russia-Ukraine War. Note that this post was published on 8th October, so the author’s current views might differ. Tegmark provides a simple probabilistic model of how the war might play out. He puts ~30% that Russia launches a nuclear strike on Ukraine, 80% that this would result in a non-nuclear military response from NATO, and 70% that this would be followed by rapid escalation leading to all-out nuclear war, for an overall probability of ~17%. See also Samotsvety’s Nuclear risk update [🔉] (from 3rd October) —they place ~16% on Russia using a nuclear weapon in the next year, and ~10% that nuclear conflict scales beyond Ukraine in the subsequent year, resulting in a ~1.6% probability of global nuclear conflict. We applaud Tegmark and Samotsvety for making clear, quantitative forecasts on this topic.

Marius Hobbhahn’s The next decades might be wild [🔉] is a speculative account of how the next few decades could play out if we are just 10–20 years away from transformative AI. Stefan Schubert responds, taking issue with Hobbhahn’s expectation of a muted reaction from the public and a wholly ineffective response from governments as AI systems start to run amok.

Previously, Holden Karnosfky argued that, if advanced AI systems aimed at destroying or disempowering humanity, they might succeed (see FM#3). In Why would AI "aim" to defeat humanity? [🔉], Karnofsky explains why such systems will, by default, likely adopt such an aim. From the assumptions that (1) we will soon developed powerful AI (2) in a world that is otherwise similar to today's, (3) with techniques broadly similar to those currently being used (4) that push AI systems to be ever more capable and (5) with no specific countermeasures to prevent such systems from causing an existential catastrophe, Karnofsky argues that we should expect the AI systems that emerge to behave as if they had aims; that, due to the nature of the training process, some of these aims will likely be aims humans did not intend; and that, because of that, AI systems will likely also have the intermediate aim of deceiving and ultimately disempowering humanity.2

High-level hopes for AI alignment [🔉] outlines three broad approaches to AI alignment that Karnofsky finds promising. ‘Digital neuroscience’ aims to develop something like lie detection /mind-reading techniques to inspect the motivations of AI systems. We could try to develop ‘Limited AI’: systems that don’t engage in the sort of long-term general planning that seems particularly worrying. Lastly, we could develop systems of ‘AI checks and balances’ where we use AI systems to supervise each other. Karnofsky concludes that the success of any of these approaches depends to a large extent on us having enough time to develop and test them before AI systems become extremely powerful.

AI safety seems hard to measure [🔉], in turn, argues that it is hard to know whether AI safety research is actually making AI systems safer. Karnofsky offers four reasons for this conclusion. First, one cannot easily tell the difference between behaving well and merely appearing to do so. Second, it is difficult to infer how an agent will behave once they have power over you, based on how they have behaved so far, before acquiring such power. Third, current systems are not yet sophisticated enough to display the advanced cognitive abilities—such as the ability to deceive and manipulate —that we want to study. Fourth, systems expected to be vastly more capable than humans will be creatures of a very alien sort, and we just have no idea of how to prepare for our first encounter with them.

Finally, in Racing through a minefield [🔉] Karnofsky outlines a broader problem than the AI alignment problem, which he calls the AI deployment problem. This is the problem confronted by any agent who can potentially develop transformative AI and faces a tradeoff between moving fast and risking developing unsafe AI, and moving slowly and risking unsafe AI be developed by less cautious, faster-moving agents. Karnofsky likens this to a race through a minefield where each agent has an incentive to beat the others but where moving quickly endangers all agents, and offers some possible measures with the potential to make the problem less severe. Continuing with the analogy, these include charting a safe path through the minefield (alignment research), alerting others about the mines (threat assessment), moving more cautiously through the minefield (avoiding races), and preventing others from stepping on mines (global monitoring and policing).

In AI will change the world, but won’t take it over by playing “3-dimensional chess”, Boaz Barak and Ben Edelman question the standard argument for the conclusion that power-seeking AI could cause an existential catastrophe. Briefly, the authors argue that the relevant conflict is not "humans vs. AI", as the argument assumes, but rather "humans aided by AIs with short-term goals vs. AIs with long-term goals". Since AIs will have a much more decisive advantage over humans in short-term than in long-term planning ability, whether humanity will lose control over its future is much less clear than generally believed by the alignment community. Furthermore, to reduce the chance of catastrophe, the authors hold that we should focus less on general AI alignment and more on differential AI capabilities research, specifically on developing AI systems with short rather than long time horizons.

Our World in Data’s new page on artificial intelligence features five separate articles about various aspects of AI. Artificial intelligence is transforming our world [🔉] attempts to answer three questions: Why is it hard to take the prospect of a world transformed by AI seriously? How can we imagine such a world? And what is at stake as this technology becomes more powerful? The brief history of artificial intelligence [🔉] takes a look at how the field of AI has evolved in the past in order to inform our expectations about its future. Artificial intelligence has advanced despite having few resources dedicated to its development focuses on various metrics indicative of the growth of the AI as a field over the past decade. AI timelines summarizes various attempts to forecast the arrival of human-level AI, including surveys of machine learning researchers, predictions by Metaculus forecasters, and Ajeya Cotra's biological anchors report. Finally, Technology over the long run tries to give an intuitive sense of how different the future may look from the present by looking back at how rapidly technology has changed our world in the past.

Dan Luu's Futurist prediction methods and accuracy [🔉] examines resolved long-range forecasts by a dozen or so prominent predictors and relies on this examination to identify forecasting techniques predictive of forecasting performance. Luu finds that the best forecasters tend to have a strong technical understanding of the relevant domains and a record of learning lessons from past predictive errors, while the worst forecasters tend to be overly certain about their methods and to make forecasts motivated by what he calls "panacea thinking", or the belief that a single development or intervention—such as powerful computers or population control—can solve all of humanity's problems.

Clarifying AI x-risk [🔉], by Zac Kenton and others from DeepMind’s AGI safety team, explore the different AI threat models—pathways by which misaligned AI could result in an existential catastrophe. They identify and categorize a number of models in the literature, finding broad agreement between researchers, and set out their team’s own threat model: AGI is developed via scaling up foundation models, fine-tuned by RL from human feedback (RLHF); in the course of training, a misaligned and power-seeking agent emerges and conceals its misalignment from developers; key decision-makers will fail to understand the risk and respond appropriately; interpretability will be hard. They note that among existing threat models, theirs seems closest to that of Ajeya Cotra (see our summary in FM#4, and our conversation with Ajeya in FM#5). The authors’ literature review, with summaries of each threat model, was published separately here.

Peter Wyg’s A theologian's response to anthropogenic existential risk offers an argument for the importance of existential risk reduction and concern for future generations grounded in Christian thought. Wyg points out, for example, that “If human history is just the beginning … then God could well bestow countless future graces: saints will be raised up, sinners will be forgiven, theologians will explore new depths, the faithful will experience new heights of spiritual experience.” We are always keen to better understand how different worldviews think about existential risk and the future, so we found this a valuable read.

Hayden Wilkinson's The unexpected value of the future argues that on various plausible modeling approaches the expected value of humanity's future is undefined. However, Wilkinson does not conclude from this result that the case for longtermism is undermined. Instead, he defends an extension of expected value theory capable of handling expectation-defying prospects without abandoning risk-neutrality.

In an previous issue of this newsletter, we noted that Scott Aaronson had joined OpenAI to work on AI safety (see FM#3). Now halfway through this project, Aaronson has given a lecture sharing his thoughts on activities so far. In the lecture, Aaronson covers his views about the current state of AI scaling; identifies eight different approaches to AI safety; and discusses the three specific projects that he has been working on. These projects are (1) statistically watermarking the outputs of large language models (so that the model's involvement in the generation of long enough strings of text can't be concealed); (2) inserting cryptographic backdoors in AI systems (allowing for an "off switch" that the AI can't disable); and developing a theory of learning in dangerous environments.

Longtermism in an infinite world, by Christian Tarsney and Hayden Wilkinson, considers how the possibility of a universe infinite in potential value affects the risk-neutral, totalist case for longtermism.3 The authors conclusions may be summarized as follows: (1) risk-neutral totalism can be plausibly extended to deal adequately with infinite contexts when agents can affect only finitely many locations; (2) however, we should have a credence higher than zero in hypothesis about the physical world on which individual agents can affect infinitely many locations; (3) if plausible extensions of risk-neutral totalism can also rank such prospects, the case for longtermism would likely be vindicated; (4) by contrast, the case for longtermism would be undermined if instead such extensions imply widespread incomparability.4

One-line summaries

Alasdair Phillips-Robins’s Catastrophic risk, uncertainty, and agency analysis proposes some changes to the governance of federal policymaking.
Jan Leike’s Why I’m optimistic about our alignment approach [🔉] offers some arguments in favor of OpenAI’s approach to alignment research and responses to common objections.
Jaime Sevilla, Anson Ho & Lennart Heim collect some AI forecasting research ideas [🔉].
David Roodman’s Comments on Ajeya Cotra’s draft report on AI timelines offers a critical review of Cotra’s biological anchors model.
Anders Sandberg on Cyborgs v ‘holdout humans’ [🔉] speculates on what might happen if the human species survives for a million years.
Eric Martinez and Christoph Winter’s Cross-cultural perceptions of rights for future generations finds robust popular support for increasing legal protections for future generations from respondents from across six continents.
The Global Priorities Institute released new summaries of "The paralysis argument" [🔉] by Will MacAskill and Andreas Mogensen, and "Do not go gentle: why the Asymmetry does not support anti-natalism" [🔉] by Mogensen.
Longtermist political philosophy: an agenda for future research [🔉] by Jacob Barrett and Andreas Schmidt is GPI's attempt to set out longtermist political philosophy as an academic research field.
Siméon Campos’s AGI timelines in governance [🔉] lists some likely differences between worlds in which AGI is developed before and after 2030, and discusses how those differences should affect approaches to AGI governance.
Tobias Baumann’s Avoiding the Worst: How to Prevent a Moral Catastrophe [🔉] is a book-length introduction to suffering risks.
In Investing in pandemic prevention is essential to defend against future outbreaks [🔉], Bridget Williams and Will MacAskill argue that investments in pandemic preparedness are surprisingly low given the health and economic costs of the COVID-19 pandemic, and identify four promising areas for government funding: vaccines for prototype pathogens, disease surveillance using metagenomic sequencing, clean indoor-air technology, and better personal protective equipment.
How the Patient Philanthropy and Global Catastrophic Risks Funds work together [🔉], by Christian Ruhl & Tom Barnes, explains the differences and complementarities between those two funds managed by Founders Pledge.
In The socialist case for longtermism [🔉], Garrison Lovely contends that longtermism may be regarded as a extension of the socialist concern for the masses of working people, by extending this circle of compassion to an even larger group of moral patients—those yet to be born.
In AI experts are increasingly afraid of what they're creating [🔉], Kelsey Piper explains how the risks posed by AI are becoming harder to ignore as AI systems become increasingly capable and general.
Steve Byrnes’s What does it take to defend the world against out-of-control AGIs? [🔉] argues that aligned AGI would not protect humanity fully from risks posed by misaligned and power-seeking AIs, as is often assumed.
Nate Soares's Warning shots probably wouldn't change the picture much [🔉] draws that conclusion from observing the failure of people concerned with biorisk to get gain-of-function research banned in the wake of the COVID-19 pandemic.
In Parfit + Singer + aliens = ? [🔉], Maxwell Tabarrok argues that expanding the circle of moral concern to include both nonhuman and future sentients makes the value of existential risk reduction highly sensitive to one's credence in the existence of sentient life elsewhere in the universe.
Richard Fisher's Eucatastrophe [🔉] discusses J. R. R. Tolkien's proposed neologism to describe the concept of a "positive catastrophe”—an idea for which there appears to be no English word.
In AI alignment is distinct from its near-term applications [🔉], Paul Christiano worries that applying alignment techniques to train extremely inoffensive systems could undermine support for AI alignment research.
The Economist’s Should we care about people who need never exist? is perhaps the most detailed and rigorous discussion of population ethics ever to appear in a mainstream publication.
To mark the milestone in human population, Bryan Walsh asks: Are 8 billion people too many—or too few? [🔉]
John Bliss’s Existential advocacy examines the strategies being pursued by legal advocates working on mitigating existential risks and safeguarding humanity.
The Global Challenges Foundation has published their annual report on risks threatening humanity—Global Catastrophic Risks 2022: A year of colliding consequences.
Séb Krier's AI from superintelligence to ChatGPT [🔉] recounts the story of how AI systems became so capable and describes current efforts to make them safer.
In a Twitter thread, Will MacAskill lists two reasons why he rejects the objection that longtermism is just an excuse for neglecting the important problems of today’s world: that the interventions longtermists typically favor also benefit people alive today, and that prioritizing actions that seek to benefit future people has a reassuring historical track record.
Ben Cottier’s Understanding the diffusion of large language models [🔉] uses GPT-3 as a case study on the time and resources required for state-of-the-art AI breakthroughs to be replicated by other groups.
Tristan Cook and Guillaume Corlouer’s The optimal timing of spending on AGI safety work develops a quantitative model for allocating funding to AI safety over time.
Sam Clarke and Di Cooke draw some lessons for AI governance from early electricity regulation [🔉].
Hamish Hobbs, Jonas Sandbrik, and Allan Dafoe’s Differential technology development summarizes a preprint [🔉] on this approach to reducing risks from emerging technologies.
A "sequence" of posts by Jesse Clifton, Samuel Martin and Anthony DiGiovanni considers the conditions that make technical work on AGI conflict reduction effective, the circumstances under which these conditions hold, and some promising directions for research to prevent AGI conflicts.
In The Intercept, Mara Hvistendahl’s Experimenting with disaster reports on a number of shocking and previously undisclosed accidents in US biolabs working with dangerous pathogens.
Janne M. Korhonen’s Sheltering humanity against x-risk summarizes the takeaways from a recent meeting to discuss whether extremely resilient bunkers could offer humanity so protection against some existential risks.
Kevin Esvelt's Delay, detect, defend develops a framework for handling risks from biotechnology involving three distinct strategies: delay via deterrence, information denial, and physical denial; detection via reliable and sensitive untargeted sequencing; and defence via pandemic-proof protective equipment, resilient production and supply chains, diagnostics and personalized early warning, and germicidal far-UVC light.
Toby Ord's Lessons from the development of the atomic bomb considers the Manhattan Project5 as an instructive case study in the creation of a transformative technology.

The Blue Marble, taken by the Apollo 17 crew fifty years ago (restored by Toby Ord)

News

Asterisk, a quarterly EA journal, published its first issue. Highlights include an interview with Kevin Esvelt about preventing the next pandemic; an essay about the logic of nuclear escalation by Fred Kaplan; and a review of What We Owe the Future [🔉] by Kelsey Piper (summarized above).

Future Perfect published a series celebrating “the scientists, thinkers, scholars, writers, and activists building a more perfect future”.

For a few months, the Global Priorities Institute has been releasing summaries of some of their papers. If these are still too long for you, you can now read Jack Malde's "mini-summaries" [🔉].

Nonlinear is offering [🔉] $500 prizes for posts that expand on Holden Karnofsky's Most Important Century series.

The Future of Life Institute Podcast has relaunched with Gus Docker as the show's new host. We were fans of Docker's previous podcast, and have been impressed with the interviews published so far, especially the conversations with Robin Hanson on Grabby Aliens, with Ajeya Cotra on forecasting transformative AI, and with Anders Sandberg on ChatGPT and on Grand Futures.

The Nuclear Threat Initiative (NTI) has launched the International Biosecurity and Biosafety Initiative for Science (IBBIS), a program led by Jaime Yassif that seeks to reduce emerging biological risks.

The Centre on Long-Term Risk is raising funds [🔉] to support their work on s-risks, cooperative AI, acausal trade, and general longtermism. Donate here.

Will MacAskill appeared on The Daily Show discussing effective altruism and What We Owe The Future. He was also interviewed [🔉] by Jacob Stern for The Atlantic.

The Forecasting Research Institute, a new organization focused on advancing the science of forecasting for the public good, has just launched. FRI is also hiring for several roles.

Ben Snodin and Marie Buhl have compiled a list of resources relevant for nanotechnology strategy research.

Robert Wiblin interviewed [🔉] Richard Ngo on large language models for the 80,000 Hours Podcast.

Spencer Greenberg released an excellent episode on the FTX catastrophe for the Clearer Thinking podcast.

In a press release, FTX announced a "process for voluntary return of avoidable payments". See this Effective Altruism Forum post by Molly Kovite from Open Philanthropy for context and clarifications.

Giving What We Can announced the results of the Longtermist Fund's first-ever grantmaking round.

80,000 Hours published several exploratory profiles in the ‘Sometimes recommended’ category: S-risks [🔉], Whole brain emulation [🔉], Risks from malevolent actors [🔉], and Risks of stable totalitarianism [🔉].

The Survival and Flourishing Fund has opened its next application round. SFF estimates that they will distribute around $10 million this round. Applications are due on January 30. Apply now.

The Global Priorities Institute welcomes applications for Predoctoral Research Fellows in Economics. Apply now.

The Rational Animations Youtube channel has released videos on how to take over the universe in three easy steps and on whether a single alien message could destroy humanity.

The Centre for Long-Term Resilience published a response to the UK government’s new National Resilience Framework [🔉].

The Space Futures Initiative launched in September. They are seeking expressions of interest from potential organizations and individuals interested in collaborating.

Conversation with Katja Grace

To read our conversation with Katja Grace on counterarguments to the basic AI x-risk case, please go to the version of this issue crossposted on the Effective Altruism Forum.

We thank Leonardo Picón and Lyl Macalalad for editorial assistance.

See also Kelsey Piper's Future Perfect coverage of Ord's report [🔉].

The argument should be familiar to readers exposed to the standard arguments for AI risk. But even these readers may learn from this article, which makes its assumptions and conclusions unusually explicit.

By risk-neutral totalism, the authors mean an axiology defined by the conjunction of additivity—the value of an outcome is a weighted sum of its value locations—, impartiality—all locations have the same weight in the sum—, and risk-neutrality—the value of a risky option is equal to the expected value of its outcome. This axiology supplies, in the authors' opinion, the most straightforward argument for longtermism.

This summary closely follows p. 23 of the paper.

By ‘Manhattan Project’, we mean the period of 6.5 years ranging from the discovery of fission to the delivery of a working bomb, rather than the last three years of this period during which the US government became actively involved (see p. 13 of Ord’s report).