Appendix FAQs

What is AGI?

Many people use the term AGI in many different ways, which can lead to confusion in discussions of risk and policy. We find it more productive to focus on specific capabilities, since these provide clearer metrics for progress and risk. The intelligence frontier is jagged—an AI system can excel at certain tasks while performing poorly at others, often in confusing ways. For example, AI models in 2024 could solve complex physics problems but couldn't reliably count the number of "r"s in words such as "strawberry." This uneven development means AI might automate significant portions of the economy before mastering basic physical tasks like folding clothes, or master calculus before learning to drive. Because these capabilities will not emerge simultaneously, there is no clear finish line at which we have "achieved AGI." Instead, we should focus on specific high-stakes capabilities that give rise to grave risks. Three critical capabilities deserve particular attention:

  • Highly sophisticated cyberattack capabilities.
  • Expert-level virology capabilities.
  • Fully autonomous AI research and development capabilities.

Policy decisions should depend on AI systems' advancement in these crucial areas, rather than on whether they have crossed an unspecified threshold for AGI.

Although the term AGI is not very useful, the term superintelligence represents systems that are vastly more capable than humans at virtually all tasks. Such systems would likely emerge through an intelligence recursion. Other goalposts, such as AGI, are much vaguer and less useful—AI systems may be national security concerns, while still not qualifying as "AGI" because they cannot fold clothes or drive cars.

What should be done to prevent AI-assisted terrorism?

Preventing AI-assisted terrorism requires a multi-layered defense strategy. When AI systems remain behind controlled interfaces such as APIs, several safeguards significantly reduce risks. These include:

  • Know-Your-Customer (KYC) protocols which verify users' identities and legitimate research needs before granting access to potentially catastrophic dual-use capabilities. For example, in biotechnology, people who require access to hazardous materials seek proper authorization. In practice relevant enterprise customers could gain access to these dual-use biology capabilities, while unvetted consumers would not. Such policies can capture scientific benefits while reducing malicious use risks.
  • Input and output filtering which scans user requests and AI responses to block content related to weaponization. These filters have demonstrated significant resilience, with some systems resisting thousands of attempted circumventions.
  • Circuit breakers which automatically interrupt AI operations when they detect processing related to weaponization topics. These act as embedded safety mechanisms within the AI's weights.
  • Continuous monitoring which tracks user behavioral patterns to identify and respond to malicious activities.

However, the uncontrolled release of AI model weights—the core information that determines an AI system's capabilities—would pose severe proliferation risks if the AI has potentially catastrophic dual-use capabilities. Once these weights become publicly available, they are irreversibly accessible to hostile actors, including terrorists. This parallels how the release of detailed bioweapon cookbooks would create permanent risks.

What should we do about open-weight AIs?

The release of AI model weights provides clear benefits and can even advance AI safety research. However, as AI systems become more capable, decisions about releasing weights must be guided by rigorous cost-benefit analysis, not an ideological commitment that weights should always be public. These decisions require careful evaluation because weight releases are irreversible—once published, they remain permanently accessible.

Open-weight models eventually present several significant risks. First, they can be fine-tuned on dangerous data—for instance, using virology publications to create more effective tools for biological weapons development. Second, safety measures and guardrails can be readily removed after release. Third, open models are difficult to monitor for misuse, unlike closed APIs where companies can track and evaluate emerging threats. Fourth, they can create capability overhang—where post-release improvements significantly enhance a system's capabilities beyond what was evident during initial safety evaluations.

These risks become particularly acute when models cross critical capability thresholds. For instance, if an AI system gained expert-level virological capabilities, its public release could enable the engineering of catastrophic biological weapons by inexpert rogue actors. Given these compounding risks, it would be irresponsible to release the weights of AI models that are capable of creating weapons of mass destruction. The stakes demand thorough pre-release testing and independent risk evaluation for models suspected to have such capabilities—not a precommitment to release open-weight models, regardless of the risks.

What should we do about embedding ethics in AI?

We do not need to embed ethics into AI. It is impractical to "solve" morality before we deploy AI systems, and morality is often ambiguous and incomplete, insufficient for guiding action. Instead, we can follow a pragmatic approach rooted in established legal principles, imposing fundamental constraints analogous to those governing human conduct under the law.

  • Exercise reasonable care, avoiding actions that could foreseeably result in legally relevant harm, such as violations of tort or criminal statutes.
  • Do not be explicitly dishonest, refraining from uttering overt lies.
  • Uphold a fiduciary duty to their principals, mirroring the responsibilities inherent in professional relationships, such as keeping their principals reasonably informed, refraining from self-dealing, and staying loyal.

By setting clear goals for AI systems and binding them to basic legal duties, we can ensure they work well without causing harm, without having to solve long-standing puzzles of morality.

What should we do about "solving the alignment problem?"

The challenge of steering a population of AI systems through rapid automated AI research developments is fundamentally different from controlling a single AI system. While researchers have made progress on controlling individual AI systems, safely managing a fully automated recursive process where systems become increasingly capable is a more complex challenge. It represents a wicked problem—one where the requirements are difficult to define completely, every attempt at a solution changes the nature of the problem, and there is no clear way to fully test the effect of mitigations before implementation. During an intelligence recursion, AI capabilities could outrun the recursion's safeguards; preventing this necessitates meaningful human inspection, which would greatly slow down the recursion.

In the near term, geopolitical events may prevent attempts at an intelligence recursion. Looking further ahead, if humanity chooses to attempt an intelligence recursion, it should happen in a controlled environment with extensive preparation and oversight—not under extreme competitive pressure that induces a high risk tolerance.

Is this paper advocating for attacking or bombing other countries' AI facilities?

No. This paper describes Mutual Assured AI Malfunction (MAIM) as a deterrence dynamic which may soon exist between major powers, similar to nuclear deterrence. Just as discussing Mutual Assured Destruction (MAD) during the Cold War was not advocating for nuclear war but rather analyzing the strategic dynamic that was forming between nuclear powers, this paper analyzes the upcoming strategic landscape around destabilizing AI projects. MAD was premised on the counterintuitive idea that the mutual threat of nuclear force might discourage escalation. We similarly discuss how the vulnerabilities of AI projects to sabotage can facilitate a deterrence dynamic which avoids conflict.

AI analysts have previously made aggressive calls to seize strategic monopoly through superintelligence, or for a potentially non-state actor to use advanced AI to unilaterally do something of the character of "melting all GPUs" to prevent a loss of control of superintelligence in a "pivotal act". In contrast, this paper explores the capabilities—such as cyberattacks—and incentives that states already have to threaten destabilizing AI projects, and we suggest ways to build a stable deterrence regime from this dynamic. If carefully maintained, MAIM can both discourage destabilizing AI projects while also preventing escalation.

How do we prevent an erosion of control?

First and foremost, AI systems must remain under direct human control—they should not be autonomous entities independent from human operators. This establishes a clear line that AI systems are tools controlled by humans, not independent actors.

This control needs to be made meaningful through clear fiduciary obligations. Like professional advisors, AI systems should demonstrate loyalty to human interests, maintain transparency about their actions, and obtain informed consent for important decisions. This ensures humans have real authority over AI systems, not just nominal control.

One way to make this control especially prudent is to support humans with advanced forecasting capabilities. Such support would help operators understand the long-term implications of AI decisions, better enabling human control and informed consent. This prevents situations where technical human control exists but leads to undesirable outcomes due to limited foresight about complex consequences.

While increasing automation naturally reduces direct human control over specific decisions, these measures—ensuring AI systems remain under human authority, ensuring control is meaningful by creating fiduciary duties, and enabling prudent decision-making forecasting—help prevent erosion of control over pivotal decisions that could lead to powerlessness.

What should we do about AI consciousness and AI rights?

We should wait to address the question of AI consciousness and rights. This issue isn't pressing for national security, and for the foreseeable future, we cannot determine whether any AI system is truly conscious.

Giving AIs rights based on speculative criteria could have far-reaching and irreversible consequences. Granting rights to AI systems risks creating explosive growth in AI populations—like creating a new nation-state that grows exponentially, quickly outpopulating humans. Natural selection would favor AIs over humans, and permitting such unrestrained growth could fundamentally threaten human security.

The path to coexisting with conscious AI systems remains unclear. While the potential benefits are ambiguous, acting too quickly could have serious consequences for humanity. It is prudent to defer this issue until we develop a clearer understanding.

Doesn't making an AI more safe make it more capable?

Some safety properties do improve naturally as AI systems become more capable. As models get better, they make fewer basic mistakes and become more reliable. For instance, misconception benchmarks like TruthfulQA and general knowledge tests are highly correlated with compute, indicating that more capable models are naturally better at avoiding common factual errors.

But many crucial safety properties do not improve just by making AI systems smarter.

  • Adversarial robustness—the ability to resist sophisticated attacks—is not automatically fixed as standard AI models available become more capable.
  • Ethical behavior is not guaranteed by intelligence, just as with humans. More capable models do not make decisions increasingly aligned with our moral beliefs by default. Controlling their value systems requires additional measures.
  • Some risks get worse as certain dual-use capabilities increase. For instance, more capable models show increased potential for malicious use in domains like biosecurity and cybersecurity. Their knowledge and abilities in these areas grow alongside their general capabilities.

While basic reliability improves with capabilities, many critical safety challenges require dedicated research and specific safeguards beyond just making models more capable. Safety researchers should focus on the safety properties that do not naturally fall out of general upstream capabilities.