Research

AI Safety Research

At the beginning of 2023, I began a pivot into research on AI safety from my earlier focus on machine learning aimed at increasing AI capabilities and applying AI for the benefit of society. I asked myself: what can possibly go wrong as we approach or surpass human-level intelligence with AI, and how do we design AI so that it will be honest and not harm humans in the first place?

See this paper for an outline of my long-term research vision to construct safe-by-design AI, which I call the Scientist AI. Recent observations show growing tendencies to deception, cheating, hacking, lying and even more concerning, self-preservation in frontier AIs. More broadly, we do not know how to ensure that AIs will not violate our instructions, which also means that humans with bad intentions are able to use them. All this suggests potentially catastrophic risks from misaligned and very capable and agentic AIs in the future, i.e., more and more autonomous AIs. The main training signals in current frontier AIs all give rise to uncontrolled and misaligned agency, from trying to imitate people (current LLM pre-training) or pleasing people (current RLHF).

Instead, the Scientist AI is trained to understand, explain and predict, like a selfless idealized and platonic scientist. See LawZero’s Research page for more details.

I am looking for research scientists and research engineers who would like to join me in this quest. Please write to me if you are interested in and motivated by finding technical solutions to AI risks.

Note that I am not taking new students, so that I can reduce the size of my group (which is very large) and dedicate myself to this project.

Past research

In the past I worked on learning of deep representations (either supervised or unsupervised), capturing sequential dependencies with recurrent networks and other autoregressive models (including the first neural net language models), understanding credit assignment (including the quest for biologically plausible analogues of backprop, as well as end-to-end learning of complex modular information processing assemblies), meta-learning (or learning to learn), attention mechanisms (which are the key ingredients which led to the success of Transformers), deep generative models of many kinds, curriculum learning, variations of stochastic gradient descent and why SGD works for neural nets, convolutional architectures, natural language processing (especially with word embeddings, language models and machine translation), understanding why deep learning works so well and what its current limitations are. I worked on many applications of deep learning, including – but not limited to – healthcare (such as medical image analysis and drug discovery), standard AI tasks of computer vision, modeling speech and language and, more recently, robotics.

Key publications

International AI Safety Report 2026

The International AI Safety Report 2026 synthesises the current scientific evidence on the capabilities, emerging risks, and safety of general-purpose AI systems. The report series was mandated by the nations attending the AI Safety Summit in Bletchley, UK. 29 nations, the UN, the OECD, and the EU each nominated a representative to the report's Expert Advisory Panel. Over 100 AI experts contributed, representing diverse perspectives and disciplines. Led by the Report's Chair, these independent experts collectively had full discretion over the report's content.

Advanced AI as a Global Public Good and a Global Risk

In this essay, I argue that transformative AI creates three new categories of catastrophic risk—destructive chaos from weak actors, concentration of power among strong actors, and the loss of control to rogue AIs. Only if we recognize the global nature of these risks, he explains, and manage transformative AI as a global public good, will our societies be able to flourish alongside this technology in years to come.