Personal and Psychological Dimensions of AI Researchers Confronting AI Catastrophic Risks

On May 31st, 2023, a BBC web page headlined: “AI ‘godfather’ Yoshua Bengio feels ‘lost’ over life’s work.” However, this is a statement I never made, not to the BBC nor to any other media. I did attempt, in these interviews, to articulate a psychological challenge I was contending with. I aim to offer here a more in-depth explanation of my own inner searching and tie it to a possible understanding of the pronounced disagreements among top AI researchers about major AI risks, particularly the existential ones. We disagree strongly despite being generally rational colleagues that share humanist values: how is that possible? I will argue that we need more humility, acceptance that we might be wrong, that we are all human and hold cognitive biases, and that we have to nonetheless take important decisions in the context of such high uncertainty and lack of consensus.

What I actually said in the BBC interview was: “It is challenging, emotionally speaking, for people who are inside, especially if like myself, you’ve built your career, your identity around the idea of bringing something useful and good to society and to humanity and to science”. What I meant was that I was in the process of changing my mind about something very personal: whether my work in its current trajectory – racing to bridge the gap between state-of-the-art in AI and human-level intelligence – was aligned with my values. Is it actually beneficial or dangerous to humanity, given the current structure of society and the fact that powerful tools like AI are dual-use, they can be used for good or for bad? I am now concerned that we may not be on a good course, and that in order to reap the benefits of AI and avoid catastrophic outcomes, major changes are needed to better understand and mitigate the risks.

For most of my research career, which started in 1986, my focus has been purely scientific: understanding the principles of intelligence, how it works in biological entities, and how we might construct artificial intelligences. I worked on the hypothesis that a few simple principles, by analogy to physics, may explain intelligence. The last few decades have provided evidence in favor of that hypothesis, rooted in the ability to learn from data and experience. Learning principles are much simpler than the massive complexity in intelligent entities – for example the billions or trillions of parameters in very large neural networks. For most of these years, I did not think about the dual-use nature of science because our research results seemed so far from human capabilities and the work was only academic. It was a pure pursuit of knowledge, beautiful, but mostly detached from society until about a decade ago. I now believe that I was wrong and short-sighted to ignore that dual-use nature. I also think I was not paying enough attention to the possibility of losing control to superhuman AIs.

In the last decade, AI has moved from a mostly academic endeavor to one with a large and now dominant industrial component. Deep learning technology is being deployed more and more, with the perspective that AI may become the heart of future economic growth in the coming decades (see the approximately 20 trillions of dollars yearly impact estimated by McKinsey or Stuart Russell’s estimate of 14 quadrillion dollars of net present value in his 2019 book). This transition has enticed me to think much more about the social impact of AI. I focused on the good it could – and the good it already does – bring to the world. I continued working on fundamental questions, like attention (which gave rise to Transformers, which power LLMs) and worked on applications of AI in biomedical sciences, drug discovery, in fighting climate change, and addressing biases that could yield discrimination. I started a discussion with my colleagues from the social sciences and humanities on the necessity of ethical and legislative guardrails. This led to the Montreal Declaration for a Responsible Development of AI in 2017-2018, my leadership of the Global Partnership on AI working group on Responsible AI (2020-2022), and our recent work with UNESCO on the Missing Links on AI Governance (2023).

I read and evaluated a preprint version of Stuart Russell’s 2019 book (Human Compatible), which increased my awareness of a possible existential risk for humanity if we do not maintain control over superhuman AI systems. I understood his arguments about the potential danger of misalignment between human intentions and AI behavior intellectually, but had not fully digested what it meant emotionally for me and my own career. Reading the book had not changed my fundamental belief that, in balance, we were on a good trajectory with AI research: good for science and good for society with expected positive impacts in many domains. I thought the outcome would be positive with some adjustments, like regulations to avoid discrimination and banning lethal autonomous weapons, and while existential concerns were worthy of attention, they would become potentially relevant only in a distant future. I still continued to feel good about the focus of my work, somehow looking the other way when it came to the possibility of misuse and catastrophic outcomes.

At that time, I believed the arguments, that currently remain commonly held in the AI community, to discard the importance of such risks. Human-level AI seemed plausible but decades to centuries away and the systems we trained in our labs were so incompetent relative to humans that it was difficult to feel a threat, whether from misuse or loss of control. It seemed obvious that well before we would get to that point, we would reap loads of social benefits from deploying current and improved AI systems. Since human-level AI seemed so far away, we imagined it as probably quite different from current methods, suggesting that it was difficult to design safety mechanisms for yet unknown AI systems. It never even crossed my mind to question whether some knowledge could be dangerous or whether some code should not be put in everyones’ hands. I thought it was a good thing that some people studied AI safety, but was quite content to continue on my course of trying to figure out how to bring system 2 – i.e., deliberate reasoning – abilities into deep learning, a research program that I started almost a decade ago with our work on attention.

When ChatGPT came out, my immediate reaction was to look for its failures. Like many others, I found corner cases where it produced incoherent output, suggesting that it was still way off regarding system 2 abilities. However, within a month or two of the release, I grew more and more impressed by how well it performed. I started to realize that top AI systems had for the most part achieved mastery of language, i.e., were essentially passing the Turing test, at some statistical threshold.That was completely unexpected, for me and many others. I also saw that although ChatGPT would sometimes confabulate and be incoherent with facts and its own outputs, it was able in most cases to produce the appearance of reasoning. When GPT-4 came out, progress on system 2 abilities was apparent. Yet, it is likely that nothing fundamentally changed in the underlying design principles besides increasing compute power and training for longer or on better/more data. On the other hand, several arguments have been made about missing ingredients for system 2 abilities, including in my own papers.

Since I had been working for over two years on a new approach to train large neural networks that could potentially bridge the system 2 gap, it started to dawn on me that my previous estimates of when human-level AI would be reached needed to be radically changed. Instead of decades to centuries, I now see it as 5 to 20 years with 90% confidence.

And what if it was, indeed, just a few years?

The other factor besides when is what capabilities to expect, nicely explained by Geoff Hinton in his Cambridge talk in May: even if our AI systems only benefit from the principles of human-level intelligence, from there we will automatically get superhuman AI systems. This is because of the advantages of digital hardware over analog wetware: exact calculations and much greater bandwidth between computers enable knowledge transfer between models many orders of magnitude faster than possible by humans. For example, spoken language between humans can transmit 39 bits/second. The much faster communication between computers enables a form of parallelism making it possible for AI systems to learn much faster and from more data.

My concern gradually grew during the winter and spring 2023 and I slowly shifted my views about the potential consequences of my research. I decided to sign the letter asking for more caution about systems more powerful than GPT-4. I realized such LLMs had digested a lot of knowledge about society and humans that could one day be exploited by bad actors in ways that could be catastrophic, namely to democracy, public safety and national security. A crucial technical element here is that prompting and fine-tuning could turn an apparently innocuous system into one targeted at malicious intent, as well as lower the required level of technical skills to do so. Moreover, such a transformation could be done at almost no cost, and with a minimal amount of data. One could also cheaply transform a dialogue system into a goal-directed agent that could act on the Internet, as was illustrated by AutoGPT. This brought to the forefront the possibility that, within a few years, we could have catastrophic outcomes enabled by more powerful AIs, either because of carelessness, malicious human intent, or by losing control of highly autonomous systems.

I started reading more about AI safety and came to a critically important conclusion: we do not yet know how to make an AI agent controllable and thus guarantee the safety of humanity! And yet we are – myself included until now – racing ahead towards building such systems.

It is difficult to digest such reflections and carry out the mindset shift that it entails. It is difficult because accepting the logical conclusions that follow means questioning our own role, the value of our work, our own sense of value. Most of us want to see ourselves as fundamentally good, with a virtuous purpose, contributing to something beneficial for society and our community. Having a positive self-image makes us feel good about our work and gives us the motivation and energy to move forward. It is painful to face the idea that we may have been contributing to something that could be greatly destructive. Human nature will lead us towards brushing aside these thoughts or finding comfort in reassuring arguments rather than face the full horror of such possibilities. Bringing the benefits of AI to the table is not sufficient to compensate if the possible negative outcomes include catastrophic misuses of AI on par with nuclear war and pandemics, or even existential risk.

I have a 20-month old grandchild whom I love very much, and he is very present in my thoughts and emotions. While the future is filled with uncertainty, and I don’t assume to know how any of this will play out. I cannot rationally reject the catastrophic possibilities nor ignore the deep sense of empathy I feel for him and for the multitudes whose lives may be deeply affected or destroyed if we continue denying the risks of powerful technologies. It is truly horrible to even entertain these thoughts and some days, I wish I could just brush them away. Or, be like before 2023, when these thoughts did not have a stronghold on my conscious mind.

My own journey these past months has made me curious about the psychological factors at play as we all wrestle with this new reality and debate scenarios or probabilities. I recognize cognitive biases are most likely still involved in my own thinking and decision making, as is most often the case for humans in general, including AI researchers, despite our best intentions. And, I have a sincere desire to understand why there is so much disagreement amongst AI researchers – almost all of whom are incredibly smart and devoted – about the magnitude of risk and best course of action. How can this be? And how do we find the common ground from which to move forward together to ensure AI serves the future of humankind?

As we argue – in public and in person (not to mention the terribly polarizing social media) – about these difficult questions, I believe that we should all keep in mind the possibility of psychological factors such as confirmation or self-serving biases, and be careful to avoid making overconfident statements. People on both sides of this debate who have taken strong positions (including myself of course!) are encouraged to explore underlying mindsets and emotions behind their certainty in the face of such troubling questions. Curiosity, openness and humility will enhance our ability to explore different viewpoints and hold a more compassionate view, rather than polarizing the discussion and fueling frustration or anger towards the people we disagree with.

Being able to change one’s views when faced with new evidence or new arguments is essential for the advancement of science, as well as to steer society towards a beneficial future. The more curious and interested we are in our mistakes, the more we learn, grow, evolve and broaden our capacity to impact others and the world in a positive manner. As AI researchers, we must honor this commitment to ongoing exploration and avoid painting ourselves as a staunch advocate of a single view. The tendency to overcommit to specific viewpoints despite a high degree of true uncertainty is reminiscent of how, in machine learning, different maximum likelihood world models fitted on the same data may strongly disagree in places where epistemic uncertainty is large. Being able to accept that we have been wrong, for ourselves and in the eyes of others, is difficult but necessary to make scientific progress and converge towards a morally just path. Interestingly, having the humility to accept that we may still be wrong corresponds to adopting the Bayesian approach of aggregating all the views, including those we disagree with, so long as they are consistent with facts and logic. Expressing certainty of upcoming doom or ridiculing others’ views as science-fiction is, on the contrary, not compatible with this Bayesian open-mindedness.

Before nuclear power and spaceflight were realized, they too were science fiction. As noted by Allan Dafoe, being cautious means very different things for scientific inquiry and decision-making. This difference was reflected in the thinking of physicists Leo Szilard and Enrico Fermi. Szilard wrote: “From the very beginning [1939] the line was drawn […] Fermi thought that the conservative thing was to play

down [his 10%] possibility that [a nuclear chain reaction] may happen, [Szilard] thought the conservative thing was to assume that it would happen and take all the necessary precautions.”

Unfortunately, to settle the AI debate, we can’t rely on mathematical models of how research, technology and politics are likely to evolve in the next decade under different interventions. We don’t have past experience interacting with machines more intellectually capable than us and thus obtain statistics of what is safe and what isn’t. Yet we cannot wait until irreversible damage is done in order to change course. AI researchers are used to easily performing many experiments, including controlled experiments, and statistical assessments before drawing conclusions. Here we instead have to resort to a form of reasoning and out-of-distribution projection that is closer to how many of our colleagues in the social sciences work. It makes it harder and more uncertain to evaluate possible futures. However, reason and compassion can still be used to guide our conversations and actions. As scientists, we should avoid making claims we can’t support; but as decision-makers we also ought to act under uncertainty to take precautions. In spite of our differences in points of view, it’s time for our field of AI to seriously discuss the questions: what if we succeed? What if potentially dangerous superhuman AI capabilities are developed sooner than expected? Let’s embrace these challenges and our differences, while being mindful of each other’s humanity and our unique emotional and psychological journeys in this new era of AI.

Acknowledgements. Yoshua Bengio thanks Valerie Pisano, Soren Mindermann, Jean-Pierre Falet, Niki Howe, Nasim Rahaman, Joseph Viviano, Mathieu Bourgey, David Krueger, Eric Elmoznino, Claire Boine, Victor Schmidt, Xu Ji, Anja Surina, Almer van der Sloot, and Dianbo Liu.