AI safety needs social scientists

OpenAI News
AI safety needs social scientists

We’ve written a paper arguing that long-term AI safety research needs social scientists to ensure AI alignment algorithms succeed when actual humans are involved. Properly aligning advanced AI systems with human values requires resolving many uncertainties related to the psychology of human rationality, emotion, and biases. The aim of this paper is to spark further collaboration between machine learning and social science researchers, and we plan tohire⁠(opens in a new window)social scientists to work on this full time at OpenAI.

The goal of long-term artificial intelligence (AI) safety is to ensure that advanced AI systems are aligned with human values—that they reliably do things that people want them to do. At OpenAI we hope to achieve this by asking people questions about what they want, training machine learning (ML) models on this data, and optimizing AI systems to do well according to these learned models. Examples of this research includeLearning from human preferences⁠(opens in a new window),AI safety via debate⁠(opens in a new window), andLearning complex goals with iterated amplification⁠(opens in a new window).

Unfortunately, human answers to questions about their values may be unreliable. Humans have limited knowledge and reasoning ability, and exhibit a variety of cognitive biases and ethical beliefs that turn out to be inconsistent on reflection. We anticipate that different ways of asking questions will interact with human biases in different ways, producing higher or lower quality answers. For example, judgments about how wrong an action is can vary depending on whether the word “morally” appears in thequestion⁠(opens in a new window), and people can make inconsistent choices between gambles if the task they are presented with iscomplex⁠(opens in a new window).

We have several methods that try to target the reasoning behind human values, includingamplification⁠(opens in a new window)anddebate⁠(opens in a new window), but do not know how they behave with real people in realistic situations. If a problem with an alignment algorithm appears only in natural language discussion of a complex value-laden question, current ML may be too weak to uncover the issue.

To avoid the limitations of ML, we propose experiments that consist entirely of people, replacing ML agents with people playing the role of those agents. For example, thedebate⁠(opens in a new window)approach to AI alignment involves a game with two AI debaters and a human judge; we can instead use two human debaters and a human judge. Humans can debate whatever questions we like, and lessons learned in the human case can be transferred to ML.

These human-only experiments will be motivated by machine learning algorithms but will not involve any ML systems or require an ML background. They will require careful experimental design to build constructively on existing knowledge about how humans think. Most AI safety researchers are focused on machine learning, which we do not believe is sufficient background to carry out these experiments.

To fill the gap, we need social scientists with experience in human cognition, behavior, and ethics, and in the careful design of rigorous experiments. Since the questions we need to answer are interdisciplinary and somewhat unusual relative to existing research, we believe many fields of social science are applicable, including experimental psychology, cognitive science, economics, political science, and social psychology, as well as adjacent fields like neuroscience and law.

We believe close collaborations between social scientists and machine learning researchers will be necessary to improve our understanding of the human side of AI alignment. As a first step, several OpenAI researchers helped organize a workshop at Stanford University’sCenter for Advanced Study in the Behavioral Sciences⁠(opens in a new window)(CASBS) led byMariano-Florentino Cuéllar⁠(opens in a new window),Margaret Levi⁠(opens in a new window), andFederica Carugati⁠(opens in a new window), and we continue to meet regularly to discuss issues around social science and AI alignment. We thank them for their valuable insights and participation in these conversations.

_Our paper_⁠(opens in a new window)_is a call for social scientists in AI safety. We are in the process of starting this research at OpenAI, and are__hiring_⁠(opens in a new window)_full time social science researchers to push these experiments forward. If you are interested in working in this area, please__apply_⁠(opens in a new window)_!_

Geoffrey Irving, Amanda Askell

Disrupting malicious uses of AI by state-affiliated threat actors Security Feb 14, 2024

Building an early warning system for LLM-aided biological threat creation Publication Jan 31, 2024

Democratic inputs to AI grant program: lessons learned and implementation plans Safety Jan 16, 2024

Our Research * Research Index * Research Overview * Research Residency * OpenAI for Science * Economic Research

Latest Advancements * GPT-5.3 Instant * GPT-5.3-Codex * GPT-5 * Codex

Safety * Safety Approach * Security & Privacy * Trust & Transparency

ChatGPT * Explore ChatGPT(opens in a new window) * Business * Enterprise * Education * Pricing(opens in a new window) * Download(opens in a new window)

Sora * Sora Overview * Features * Pricing * Sora log in(opens in a new window)

API Platform * Platform Overview * Pricing * API log in(opens in a new window) * Documentation(opens in a new window) * Developer Forum(opens in a new window)

For Business * Business Overview * Solutions * Contact Sales

Company * About Us * Our Charter * Foundation * Careers * Brand

Support * Help Center(opens in a new window)

More * News * Stories * Livestreams * Podcast * RSS

Terms & Policies * Terms of Use * Privacy Policy * Other Policies

(opens in a new window)(opens in a new window)(opens in a new window)(opens in a new window)(opens in a new window)(opens in a new window)(opens in a new window)

OpenAI © 2015–2026 Manage Cookies

English United States

Originally published on OpenAI News.