#3: digital sentience, AGI ruin, and forecasting track records

Jul 04, 2022

We could thus imagine, as an extreme case, a technologically highly advanced society, containing many complex structures, some of them far more intricate and intelligent than anything that exists on the planet today — a society which nevertheless lacks any type of being that is conscious or whose welfare has moral significance. In a sense, this would be an uninhabited society. It would be a society of economic miracles and technological awesomeness, with nobody there to benefit. A Disneyland without children.
— Nick Bostrom

Future Matters is a newsletter about longtermism brought to you by Matthew van der Merwe and Pablo Stafforini. Each month we collect and summarize longtermism-relevant research and share news from the longtermism community. The version crossposted to the Effective Altruism Forum includes a bonus conversation with a prominent longtermist. You can also listen on your favorite podcast platform and follow on Twitter.

Research

Google engineer Blake Lemoine believes that one of the company’s powerful language models, LaMDA, should be considered a person. He formed this impression from extensive dialogue with the model (see transcript). Lemoine has gone public, having been placed on leave after raising concerns internally at the company (see his interviews in Washington Post and WIRED). Robert Long’s Lots of links on LaMDA provides an excellent summary of the saga and the ensuing discussion. We concur with Nick Bostrom’s assessment: “With recent advances in AI (and much more to come before too long, presumably) it is astonishing how neglected this issue still is.” (To read our conversation with Robert Long about digital minds, see the version of this issue crossposted on the Effective Altruism Forum)

Garrison Lovely’s Do we need a better understanding of 'progress'? examines progress studies, a nascent intellectual movement focused on understanding the roots of technological progress in order to speed it up. The piece includes some interesting discussion of the points of disagreement between progress studies and longtermism, which mostly center around attitudes to risk.1

Ollie Base notes that Things usually end slowly when it comes to mass extinction events (~millions of years) and the collapse of empires (decades to centuries). On this basis, he updates slightly towards existential risks happening over long timescales. As Base and several commenters point out, this isn’t a great reference class for risks from new technologies (AI, engineered pandemics, nuclear war), which constitute most of the total existential risk. Nevertheless, this sort of reference class forecasting is an important input for reasoning about unprecedented events like existential catastrophes.

Eliezer Yudkowsky’s AGI ruin: a list of lethalities has caused quite a stir. He recently announced that MIRI had pretty much given up on solving AI alignment (Edit: Rob Bensinger clarifies in the comments that "MIRI has [not] decided to give up on reducing existential risk from AI."). In this (very long) post, Yudkowsky states his reasons for thinking that humanity is therefore doomed. His “list of lethalities” is structured into three sections: a first section on general worries about AGI (such as that humans must solve alignment on the first try, or that they must solve this problem within a time limit); a second section on technical difficulties related to the current deep learning paradigm; and a third section on the state of the field of AI safety. Yudkowsky’s pessimistic conclusion, very succinctly, is that everyone else fundamentally misunderstands the challenge of AI alignment and that none of the existing AI safety approaches have any hope of working.

Paul Christiano responds to Yudkowsky’s piece in Where I agree and disagree with Eliezer. There is agreement over much of the high-level picture of things: catastrophically risky AI systems could exist soon and without any warning; many current safety approaches are not aimed at the important problems; no current approaches would work without significant iteration and improvement; and humanity has routinely failed to solve easier coordination problems than those we might have to solve to avoid AI catastrophe. However, Christiano disagrees with Yudkowsky’s bleak assessment of AI safety, and sees him as misunderstanding how research progress is made. In his words, Yudkowsky “generalizes a lot from pessimism about solving problems easily to pessimism about solving problems at all.” Broadly speaking, Christiano believes Yudkowsky is overly confident in many of his claims and fails to engage productively with opposing views.

Ben Garfinkel's On deference and Yudkowsky's AI risk estimates argues that, in forming their views about AI risk, people defer to Yudkowsky to a degree not warranted by his informal track record in technological forecasting. Garfinkel focuses on a set of dramatic forecasts by Yudkowsky that either turned out wrong or appear overconfident in hindsight. Although we broadly agree with Garfinkel's conclusions, and suspect a more systematic examination of Yudkowsky’s pronouncements would vindicate his overall assessment, we thought that some of the objections raised in response to it were plausible, especially concerning the post’s methodology. See also Garfinkel's post-discussion reflections.

Holden Karnofsky's The track record of futurists seems … fine discusses a report by Gavin Leech and Misha Yagudin examining the forecasting track record of the “Big Three” of science fiction—Isaac Asimov, Arthur C. Clarke, and Richard Heinlein. A common response to arguments highlighting the importance of the long-term future—including Karnofsky's own arguments in the “most important century” blog post series—is that they tacitly assume that we can make reliable long-range forecasts. However, this objection would be lessened if it turned out that previous futurists had in fact performed reasonably well at making resolvable predictions. And this is broadly what Karnofsky concludes from the report: although Heinlein “looks pretty unserious and inaccurate”, Asimov “looks quite impressive”, and Clarke “seems pretty solid overall”. Check out the original report by Leech and Yagudin here; they offer $5 per cell they update in response to feedback.

Scott Aaronson, a renowned theoretical computer scientist, explains why he’s moving into AI safety. Aaronson is taking a one-year sabbatical to join the safety team at OpenAI, where he will work on applying complexity theory to understand the foundations of AI alignment better. He compares the “dramatic reversal” in his views, to those of Eliezer Yudkowsky: Aaronson had previously been skeptical that there was valuable work to be done on AI safety, but has now become more optimistic about the prospects. As discussed above, Yudkowsky has moved in the opposite direction; having been amongst the first and loudest advocates for AI safety, he has recently become despondent and fatalistic about humanity’s prospects.

Derek Shiller's The importance of getting digital consciousness right discusses a type of existential catastrophe characterized by the replacement of conscious biological beings with unconscious digital minds. A likely early force driving the emergence of artificial sentience is human demand for companionship, in the form of digital pets, friends, or romantic partners. But humans attribute conscious states based on folk intuition, not sophisticated theories of consciousness. Plausibly, such folk intuitions track phenomenal consciousness imperfectly, and could be manipulated by a sufficiently advanced intelligence. If building digital consciousness turns out to be a major technical challenge, creating systems that appear conscious to humans may be easier than creating systems that are in fact conscious. And as these systems become increasingly integrated into human social networks, the view that they lack consciousness will become increasingly harder to defend—locking humans into a trajectory in which the future intelligent population consists mostly of minds devoid of phenomenal consciousness.

Konstantin Pilz's Germans' opinions on translations of “longtermism” reports the results of a small MTurk survey conducted to inform how 'longtermism' should be translated into German. Pilz found that, although respondents preferred using the original English word to coming up with a German equivalent, they rated Zukunftsschutz (“future protection”) slightly above longtermism (which itself was rated above all the other proposed German translations). Pilz’s survey may serve as a model for effective altruists interested in translating longtermist content into other languages, while there is still time to influence what term is established as the canonical translation.

Holden Karnofsky's AI could defeat all of us combined argues that, if advanced AI systems decided to destroy or permanently disempower humanity, they might succeed.2 Karnofsky recapitulates the standard argument that a process of recursive self-improvement could bring about AI systems with cognitive capabilities vastly exceeding that of any human, and then notes that AIs could defeat humans even in the absence of such superintelligence. AI systems with roughly human-level ability could already pose an existential threat because they would vastly outnumber humans. Since training an AI is so much more costly than running it, by the time the first human-level AI system is trained private firms will likely have the resources to run hundreds of millions of copies each for a year. And since these systems can do human-level work, they could generate resources to multiply their number even further. As Karnofsky summarizes, "if there's something with human-like skills, seeking to disempower humanity, with a population in the same ballpark as (or larger than) that of all humans, we've got a civilization-level problem."

Nick Beckstead's Future Fund June 2022 update, the first public update on that organization's grantmaking, describes the Future Fund’s activities and the lessons learned from the funding models tested so far. Since launching, the Future Fund has made 262 grants and investments amounting to $132 million, which already exceeds the $100 million lower target announced four months ago. Around half of this funding ($73 million) was allocated via staff-led grantmaking, while the remaining half came, in roughly equal parts, from regranting ($31 million) and open calls ($26 million). Beckstead and the rest of the Future Fund team have so far been most excited about the regranting program, which they believe has resulted in funding for many people and projects which would otherwise have remained unfunded. By contrast, they report being less excited about the open call, primarily because of the high time costs associated with the program. Back in April, we created a Metaculus question on whether the Future Fund will outspend Open Philanthropy in 2022, and these recent developments suggest a positive resolution: the Future Fund’s grantmaking volume over the past four months is over 2.5 times Open Philanthropy’s longtermist grantmaking volume (~$51.5 million) since the year began.3

Oxford's Radcliffe Camera as re-imagined by DALL·E 2, by Owain Evans

News

Rob Wiblin and Luisa Rodríguez interviewed Lewis Dartnell on ways humanity can bounce back faster in a post-apocalyptic world for the 80,000 Hours Podcast. Rob also interviewed Nova DasSarma on why information security may be critical for AI safety.

The Global Priorities Institute published a summary of Andreas Mogensen’s Staking our future: deontic long-termism and the non-identity problem.

NYU announced a new research program on the moral, legal, and political status of nonhumans, with a special focus on digital minds. The Mind, Ethics, and Policy Program launches in Fall 2022, and will be directed by Jeff Sebo (see also: Sebo’s Twitter thread).

This summer, in collaboration with DC-based policy professionals, the Stanford Existential Risks Initiative (SERI) is organizing a second virtual speaker series on US policy careers. Sign up to receive further information and event access here.

The Institute for Progress, Guarding Against Pandemics, and Metaculus jointly launched the Biosecurity Forecasting Tournament, a multi-year competition designed to deliver trustworthy and actionable forecasts on biological risks to public health policymakers.

Fin Moorhouse reads his space governance profile (summarized in FM#0) for the 80k After Hours podcast.

Thomas Woodside and Dan Hendrycks published the fifth post in a series describing their models for Pragmatic AI Safety.

Jaime Sevilla, Tamay Besiroglu and the rest of the team announced the launch of Epoch, a research initiative working on investigating trends in machine learning and forecasting the development of transformative artificial intelligence.

Fin Moorhouse and Luca Righetti interviewed Ajay Karpur on metagenomic sequencing for Hear This Idea.

Kurzgesagt, a German animation and design studio, published an impressive—and impressively fact-checked—video introducing the core longtermist ideas to a popular audience. As of this writing, the video has received over 4 million views.

Jason Gaverick Matheny, previously Founding Director of the Center for Security and Emerging Technology and Director of the Intelligence Advanced Research Projects Activity, was named president and CEO of RAND Corporation.

Vael Gates published a comprehensive list of AI safety resources for AI researchers, as well as a talk discussing risks from advanced AI.

The Legal Priorities Project is running a writing competition to provide practical guidance to the US federal government on how to incorporate existential and catastrophic risks into agency cost-benefit analysis. They plan to distribute $72,500 in prize money for up to 10 prizes. Submissions are due July 31st. Apply now.

Open Philanthropy opened applications for the second iteration of the Open Philanthropy Undergraduate Scholarship, a program that aims to provide support for promising and altruistically-minded students hoping to start an undergraduate degree at top US or UK universities. Applications are due August 15th. Apply now.

Fønix Logistics is recruiting a team with backgrounds in disaster response, physical security, and physical design to join a project to build biological weapons shelters.

Nick Bostrom’s appearance on the Swedish radio program Sommar in P1 is now available with English subtitles, thanks to Julia Karbing.

80,000 Hours is conducting a census of people interested in doing longtermist work.

Michael Aird published a collection of resources for people interested in EA and longtermist research careers.

Conversation with Robert Long

See the version of this issue crossposted on the Effective Altruism Forum.

We thank Leonardo Picón for editorial assistance.

See also Max Daniel’s Progress studies vs. longtermist EA: some differences.

Karnofsky doesn't argue for the truth of the antecedent in this post.

Because of data lags, however, we may be significantly underestimating Open Philanthropy’s spending so far.