Research
AI Safety Research
At the beginning of 2023, I began a pivot into research on AI safety from my earlier focus on machine learning aimed at increasing AI capabilities and applying AI for the benefit of society. I asked myself: what can possibly go wrong as we approach or surpass human-level intelligence with AI, and how do we design AI so that it will be honest and not harm humans in the first place?
See this paper for an outline of my long-term research vision to construct safe-by-design AI, which I call the Scientist AI. Recent observations show growing tendencies to deception, cheating, hacking, lying and even more concerning, self-preservation in frontier AIs. More broadly, we do not know how to ensure that AIs will not violate our instructions, which also means that humans with bad intentions are able to use them. All this suggests potentially catastrophic risks from misaligned and very capable and agentic AIs in the future, i.e., more and more autonomous AIs. The main training signals in current frontier AIs all give rise to uncontrolled and misaligned agency, from trying to imitate people (current LLM pre-training) or pleasing people (current RLHF).
Instead, the Scientist AI is trained to understand, explain and predict, like a selfless idealized and platonic scientist. See LawZero’s Research page for more details.
I am looking for research scientists and research engineers who would like to join me in this quest. Please write to me if you are interested in and motivated by finding technical solutions to AI risks.
Note that I am not taking new students, so that I can reduce the size of my group (which is very large) and dedicate myself to this project.
Past research
In the past I worked on learning of deep representations (either supervised or unsupervised), capturing sequential dependencies with recurrent networks and other autoregressive models (including the first neural net language models), understanding credit assignment (including the quest for biologically plausible analogues of backprop, as well as end-to-end learning of complex modular information processing assemblies), meta-learning (or learning to learn), attention mechanisms (which are the key ingredients which led to the success of Transformers), deep generative models of many kinds, curriculum learning, variations of stochastic gradient descent and why SGD works for neural nets, convolutional architectures, natural language processing (especially with word embeddings, language models and machine translation), understanding why deep learning works so well and what its current limitations are. I worked on many applications of deep learning, including – but not limited to – healthcare (such as medical image analysis and drug discovery), standard AI tasks of computer vision, modeling speech and language and, more recently, robotics.