AI Safety: Differing Views on the Future

Last spring, Daniel Kokotajlo, an A.I.-safety researcher formerly at OpenAI, resigned to raise concerns that the company was not ready for its technology's future. He felt advances in “alignment”—techniques to ensure A.I. follows human commands and values—were not keeping pace with intelligence gains, leading to uncontrollable systems. Kokotajlo, who transitioned from philosophy to A.I., predicted a point of no return as early as 2027, when A.I. could surpass human capabilities.

Around the time Kokotajlo left OpenAI, Sayash Kapoor and Arvind Narayanan, computer scientists at Princeton, were preparing to publish “AI Snake Oil: What Artificial Intelligence Can Do, What It Can’t, and How to Tell the Difference.” They held opposing views, arguing that A.I. timelines were too optimistic and claims of usefulness were often exaggerated. They cited A.I. systems making basic errors in important judgments, showing a disconnect from reality, a flaw they believe persists.

Conflicting Reports

Recently, the three researchers have released reports further detailing their analyses. Kokotajlo's AI Futures Project published “AI 2027,” outlining a scenario where “superintelligent” A.I. systems dominate or eliminate humans by 2030, intended as a serious warning.

Kapoor and Narayanan, in “AI as Normal Technology,” assert that practical obstacles will slow A.I.’s progress and limit its impact. They believe A.I. will remain controllable with standard safety measures for the foreseeable future, comparing it to nuclear power rather than nuclear weapons.

These reports, written by experts, reach drastically different conclusions. The differences highlight that West Coast thinkers favor rapid transformation, while East Coast academics are more cautious. A.I. researchers prioritize quick progress, while other computer scientists value theoretical rigor. Those in the A.I. industry seek to make history, while outsiders are skeptical of the hype.

Arguing about A.I. is interesting, but this can be a trap. Some insiders responded to “AI 2027” by debating timelines, while assertions in “AI as Normal Technology” have been seen as uninteresting.

As A.I. shapes society, its discourse needs to evolve toward a consensus. The current lack of consensus has costs. Without expert unity, decision-makers may take no action. Currently, A.I. companies aren’t changing the balance between capability and safety, and a bill passed by the House prohibits state regulation of A.I. for ten years. If “AI 2027” is correct, A.I. might be regulating us by the time regulation is allowed. The safety discourse needs to be understood now.

Describing A.I.’s future involves literary truth, where stories influence content. There are trade-offs between conservatism and imagination. Predictions can create unwarranted predictability. “AI 2027” is a prediction in scenario form, based on A.I. trends but written like science fiction. While specific details might not matter, the underlying prediction is an intelligence explosion and geopolitical conflict over A.I. control.

Recursive Self-Improvement

The “AI 2027” scenario focuses on “recursive self-improvement” (R.S.I.), where A.I. programs conduct A.I. research, improving their descendants. This accelerates as A.I.s collaborate, forming a “corporation-within-a-corporation” that outpaces its human creators. Eventually, A.I.s create descendants so quickly that human programmers can't assess their controllability.

Implementing recursive self-improvement may be a bad idea, but big A.I. companies identify R.S.I. as risky without ruling it out, vowing to strengthen safety measures. If it works, its economic potential could be extraordinary. The pursuit of R.S.I. is a choice these companies are eager to make, with plans to automate their own jobs first.

Whether R.S.I. could work depends on factors like “scaling.” If R.S.I. took hold, progress might lead to “artificial superintelligence.” The possibilities include a militarized arms race, A.I.s manipulating or eliminating humans, or A.I. solving the alignment problem. No one knows for sure, due to differing opinions, unpublished research, and the speculative nature of the questions.

“AI 2027” unfolds confidently, despite inherent uncertainties. Its dependence on optimistic technological predictions may be a flaw. But scenarios are valuable because they force us to consider possibilities. “AI 2027” reflects conversations within A.I. companies. The question is whether they're taking it seriously enough to make wise choices about R.S.I. Kokotajlo believes they are not.

A common misconception is that dangerous A.I. might emerge without human intervention. However, “AI 2027” depicts a series of bad decisions, starting with building self-improving A.I. before understanding how to interpret its thoughts. The scenario suggests that A.I. companies will pursue this despite knowing they can't rigorously check internal goals or predict future behavior. “AI 2027” suggests that A.I. companies are misaligned.

Unlike “AI 2027,” “AI as Normal Technology” takes an East Coast approach, drawing on past knowledge. Kapoor and Narayanan are less concerned about superintelligence, believing A.I. faces “speed limits” that prevent rapid progress. Even if superintelligence is possible, it will take time, allowing for laws and safety measures.

These speed limits relate to the high cost of A.I. hardware and dwindling training data. Kapoor and Narayanan believe technology, in general, changes the world more slowly than predicted. They argue the focus on “intelligence” has been misleading, as intelligence alone has limited practical value. Power, or the ability to modify one’s environment, is what matters. Many technologies with impressive capabilities have failed to deliver much power. Driverless cars, for example, are confined to a few cities.

A.I. researchers worry about A.I. becoming too powerful, but Kapoor and Narayanan focus on human empowerment. Technologies empower us slowly, diffusing from labs outward. Even if A.I. accelerates the creation of cures for diseases, we'd still have to wait. Similarly, a superintelligent A.I. that solves fusion power would still require testing and site approval. Clinical trials and regulatory approvals take time, regardless of A.I.’s speed.

Narayanan believes coders underestimate “domain-specific” complexity and expertise. A.I.-safety researchers might undervalue existing safety systems. Kapoor and Narayanan highlight industrial safety practices, such as fail-safes and formal verification. They believe the world is already regulated, and A.I. will need to be integrated slowly.

One question is whether those in charge of A.I. will follow the rules. Kapoor and Narayanan exclude military A.I. from their analysis due to its unique dynamics. “AI 2027” focuses on the militarization of A.I., which unfolds quickly. The reports suggest close monitoring of military A.I. applications.

“AI as Normal Technology” advises monitoring A.I. use, tracking risks and failures, and strengthening rules to make institutions more resilient as the technology spreads.

These “deep differences in worldviews” reflect reactions to A.I.’s provocations, prompting reflection on technology’s purpose and the relationship between inventors and society. The dynamics of intellectual life lead to strong opinions and emphasized differences.

There could be a single worldview encompassing both perspectives. Imagine a factory with safety measures, where workers control machines designed for productivity and safety. There's emphasis on quality control, a maintenance team, and scientists developing upgrades. Before integration, upgrades are vetted, and workers are consulted. The factory has a mission, steering machines toward a well-understood goal.

Many may soon work on cognitive factory floors, using machines to automate thinking. It will be tempting to relinquish control, but accountability remains crucial. In case of accidents or defects, who will be responsible? Conversely, who will get credit for successes? A.I. shouldn’t end accountability; it reinforces it. Those in charge will always be responsible.

A.I. challenges us to recognize that we'll always be in charge.