Avoiding an AI Arms Race with Assurance Technologies

Download Audio

As AI’s transformative potential and national security significance grow, so has the incentive for countries to develop AI capabilities that outcompete their adversaries. Leaders in both the US and Chinese governments have indicated that they see their countries in an arms race to harness the economic and strategic advantages of powerful AI.

Yet as the benefits of AI come thick and fast, so might its risks. In a 2024 Science article, a broad coalition of experts from academia and industry raised the alarm about the serious threats that advanced AI may soon pose — such as AI misuse or loss of control events leading to large-scale cyber, nuclear, or biological calamities.

Because these risks wouldn’t be constrained by geography, it is in everyone’s interests to mitigate them, hence calls by scientists from multiple countries for international efforts to regulate AI. However, an international AI development deal will only succeed if all parties — including rival nations — can ascertain that their counterparts are upholding their commitments.

In a recent interview, Vice President JD Vance was asked if the US would consider pausing frontier AI development should its risks become intolerable: “The honest answer to that is that I don’t know, because part of this arms race component is if we take a pause, does the People’s Republic of China not take a pause?”

Vance’s comments reflect a perennial problem in the history of international diplomacy: the assurance dilemma. Even if we reach a mutually beneficial agreement with an adversary, how do we know they’ll keep their word? Despite eagerness to harness its benefits, both the US and Chinese governments have acknowledged the risks AI could pose. If these risks become more salient as capabilities grow, an assurance dilemma could become the primary factor impeding efforts to de-escalate an AI race.

This problem is hard, but not unsolvable. We believe innovative technical approaches can help to solve AI’s assurance dilemma. These tools will help us ensure the safety of AI development by verifying that our adversaries hold up their end of future deals.

Want to contribute to the conversation? Pitch your piece.

Assurance mechanisms for AI

The assurance dilemma has a solution in the form of assurance mechanisms: processes that allow parties to verify compliance with agreements.

Assurance mechanisms usually revolve around one or more essential ingredients that are crucial to the development of a dangerous technology. Nuclear non-proliferation treaties, for example, rely on monitoring uranium supplies and enrichment facilities through a global network of radiation sensors and regular audits of inventories at nuclear facilities.

In the development of AI, compute is the component most analogous to uranium. Many experts have suggested that several unique properties of compute — most notably that it is traceable, quantifiable, and produced via a highly concentrated supply chain — make it AI’s most governable element.

So, can we design an assurance regime for AI that monitors the global supply of compute, just as nuclear organizations monitor uranium? Not quite. There are important disanalogies between AI and nuclear technology. For example, compliance with the Treaty on the Non-Proliferation of Nuclear Weapons (NPT) is in part verified through on-site inspections that monitor the levels of enrichment of the uranium stored at nuclear facilities. These inspections can detect whether the nuclear fuel is being used for civilian purposes (energy production) or prohibited military purposes (weapons development) — the level of uranium enrichment required for the latter is much higher.

In the AI case, however, it would be much harder for inspectors visiting an AI data center to distinguish between chips that have been used for permitted vs nefarious purposes — computations do not leave physical traces.

Another problem for tried-and-tested verification methods, such as regularly scheduled inspections or audits, is the potential for AI technology to improve at an unpredictable pace.

In the nuclear case, inspection frequency can be set by calculating a state’s ‘breakout time’: how long it would take them to produce enough weapons-grade uranium for a single nuclear weapon. While breakout time can be estimated using information about a state’s existing uranium stockpiles and enrichment facilities, we do not have anything like a mature science of predicting improvements in AI. New AI capabilities can arise relatively unexpectedly, and the relationship between training compute and performance is not well understood. This would make it difficult to schedule audits or inspections at an appropriate cadence. Even absent discontinuities, AI is improving at a blistering pace that will demand unusually agile regulation.

These challenges should not deter policymakers. AI chips are general-purpose computing devices, which makes it possible to implement sophisticated verification mechanisms directly on the AI hardware itself.

Hardware-enabled mechanisms

The idea of using hardware-based solutions to ensure security and compliance is not new. For example, Apple includes a subsystem called a Secure Enclave in every iPhone. Secure Enclaves have a number of functions, such as securely processing and storing biometric data or cryptographic keys, which in turn help prevent users from running unauthorized or corrupted apps.

The term “hardware-enabled mechanism” (HEM) encapsulates a very broad range of possible technologies. Generally speaking, they are security systems integrated into the AI hardware stack, which can perform a variety of functions. Some varieties are already being proposed as policy solutions. The recently proposed Chip Security Act includes a provision requiring US-made advanced AI chips to be outfitted with location verification mechanisms in order to prevent chip smuggling. A 2024 RAND Corporation report describes two further types of HEMs — offline licensing, which would require chips to possess a specialized license in order to operate, and fixed set, which would impose network restrictions on GPUs to prevent them from being aggregated into large clusters.

However, these approaches only scratch the surface of what HEMs technology could enable.

Sophisticated HEMs could potentially verify not just where chips are, but how they’re being used. They may not only be able to detect but also prevent violations of multilateral agreements.

In April of this year, Nora Ammann — one of the authors of this article — and other collaborators published a three–part report series proposing a variant of HEM we call Flexible Hardware-Enabled Guarantees (flexHEGs). FlexHEGs would be designed to ensure compliance with flexible rules resulting from a multilateral decision-making process. A secure processor – while maintaining full confidentiality – monitors the information coming to and from the chip, and can block non-compliant computations.

Instead of being limited to a narrow set of applications, these mechanisms would be reprogrammable to support a wide range of verification regimes. This adaptability is crucial, ensuring the hardware can seamlessly adjust to evolving policy needs, scientific advancements, and unforeseen future AI use cases, even after it's deployed.

If we can build flexHEGs at scale, they could be used to ensure compliance with a wide range of verification regimes, from requiring that certain safety tests be run before developing models above a certain compute threshold, to limiting the size of training runs.

This may sound ambitious — but we believe it is technically feasible.

Designing an Effective HEMs-Enabled Assurance Regime

If designed and implemented correctly, a HEMs-enabled assurance regime could deliver enormous benefits for the safety and security of AI. But this regime would also need to avoid some serious pitfalls, including regulatory overreach and the possibility of unilateral control. FlexHEGs have the potential to help circumvent these challenges.

We believe that an effective HEMs-enabled AI assurance regime would have the following features:

Pre-emptive

Since AI has the potential to cause catastrophic harm that could render any after-the-fact penalties for non-compliance obsolete, an effective assurance regime would need to prevent, and not just deter, rule violations.

The flexHEG design includes a secure processor that would monitor instructions going to and from the chip. It would also block the chip from computing disallowed instructions, such as operating as part of a training run that exceeds a specific compute threshold, or continuing training before required evaluations have been successfully passed.

An active security system could provide confidence that physical tampering would not enable stakeholders to circumvent the agreement. This would build on secure enclosures like ones that have been used to defend cryptographic coprocessors and other chips from physical tampering for over 20 years, as well as a self-disablement mechanism that would render the chip inoperable if a tampering attempt was detected.

Flexible

As AI develops, so will our understanding of its risks. HEMs should be programmable, allowing us to update the rules that are encoded into the chips as we gather more information about which might be most effective.

This is a key aspect of the flexHEG design. Authorized parties (or an appropriate quorum of such parties) would be able to update encoded rules through regular firmware updates.

FlexHEG’s updatability would allow us to respond to new dangers, integrate newly developed safeguards, or enable previously disallowed operations that we have since determined to be safe. It is also key to ensuring that risk mitigation doesn’t hamper AI progress.

Privacy-preserving

An assurance regime that requires chips to ‘call home’ with verification information about a chip’s activity could prove unacceptable to stakeholders who do not want detailed information about their chip’s operations shared with third-party regulators or adversaries.

We could avoid this with HEMs that check or enforce rules locally, on the AI hardware itself. FlexHEGs allow for this on-site assurance via the processor described above that prevents the chip from being used for non-compliant tasks.

Some properties could even be verified in zero-knowledge, that is: using cryptographic tools, a party could prove that a certain statement about an AI system is true — that it is compliant with agreed-upon safety standards, for example — without revealing anything else about that system. This could help resolve concerns that verification techniques will compromise details about strategically significant capabilities. For now, zero-knowledge proofs remain highly computationally expensive in the context of AI, so absent important breakthroughs, their usage will for now be limited to relatively narrow use cases.

Multilateral

A HEM-enabled, multilateral AI development deal might include rules requiring sign-off from a quorum of authorised parties (representative appointed to an IAEA-style body, for example) before any updates can be made. As we discuss in the flexHEG reports, the flexHEG design should be fully open-sourced so that all parties can independently verify that they function as claimed, thus building justified trust into these mechanisms.

Finally, such HEMs could also be used to implement robust and credible benefit-sharing agreements, where states can reap the rewards of state-of-the-art AI through, for example, subsidized access to deployed models, in exchange for not developing frontier models themselves. This would jointly achieve safety, non-proliferation, and shared prosperity.

Unlocking New Policy Options

International politics is all about making deals. The ability to reliably verify such deals would open the door to a suite of ambitious policy options that could help promote international security. These could include limiting the amount of computing power or data used in a training run, requiring evals to be run for computations above a certain threshold, or requiring the encryption of exported model weights to prevent unauthorized usage.

There is still much debate about which policies would most substantially reduce AI-related risks. There is a separate debate about which policies would be most likely to garner support from the world’s AI superpowers.

But the success of any global agreement will hinge on technologies that assure the cosignatories that they can trust and verify that their rivals are following the same rules they are. Thus, the key feature of any 'assurance technology' lies in its ability to create — through robust processes or verifiable artifacts — evidence of compliance that is impossible to counterfeit. This evidence must be independently auditable and verifiable by multiple parties, eliminating the need for any single stakeholder to blindly trust another's claims.

While we believe that flexHEGs are a sound approach to alleviating the technological challenges of the assurance dilemma, our intention is not to prescribe particular interventions, but rather to advocate for investment in assurance technologies that could grant governments a broader set of options in the future.

Creating an assurance regime for AI will not be easy. It will require a departure from tried-and-tested verification methods that we have relied on in the past. But it is our strong belief that AI presents more opportunities for robust verification than it does challenges. We can take advantage of AI’s technical complexity to create preemptive, agile, and independently auditable verification systems that continuously monitor AI behavior in real-time.

Discussions of global AI regulation can often present a false choice between a ‘naive’ vision of the future in which adversaries put aside their differences to steward the world away from danger, and a fervent commitment to realpolitik, which will produce an inevitable, unconstrained race towards ever-more powerful systems. But coordinating around the safe development of AI need not depend on blind trust — if we can successfully implement a secure assurance regime for AI, we can avoid an escalatory dynamic even in periods of high geopolitical tension.

‍

See things differently? AI Frontiers welcomes expert insights, thoughtful critiques, and fresh perspectives. Send us your pitch.

Footnotes

Written by

Nora Ammann

Nora Ammann is a Technical Specialist at the UK's Advanced Research and Invention Agency (ARIA). The work shared here was done in an independent capacity.

Sarah Hastings-Woodhouse

Sarah Hastings-Woodhouse is an independent writer and researcher with an interest in educating the public about risks from powerful AI and other emerging technologies.‍

Image: Getty Images

A Patchwork of State AI Regulation Is Bad. A Moratorium Is Worse.

Congress is weighing a measure that would nullify thousands of state AI rules and bar new ones — upending federalism and halting the experiments that drive smarter policy.

Kristin O’Donoghue

Jun 25, 2025

Today's AIs Aren't Paperclip Maximizers. That Doesn't Mean They're Not Risky

Classic arguments about AI risk imagined AIs pursuing arbitrary and hard-to-comprehend goals. Large Language Models aren't like that, but they pose risks of their own.

Peter N. Salib

Simon Goldstein

May 21, 2025