Over the course of 10 hours this April, a massive power outage swept across Spain and Portugal, causing extensive disruption. The most severe blackout in both countries’ history, it paralyzed entire transport networks and interrupted essential services throughout the Iberian Peninsula, causing estimated economic damages in the billions of euros — and at least eight fatalities.
Weeks earlier, a fire at an electrical substation had a similarly debilitating effect on Heathrow, Europe’s busiest airport, shuttering it for an entire day. It led to over 1,300 flight cancellations and caused tens of millions in economic damage.
These are precisely the kind of incidents that AI could help mitigate. AI systems deployed in critical electrical infrastructure could analyze complex patterns and predict potential failures before they occur. They could monitor and respond to grid anomalies in milliseconds, catching signs of an overloaded system or impending blackout far more quickly than the current system of safeguards.
But there's a catch: We cannot simply plug AI into these high-stakes systems and hope for the best. What happens when an AI system hallucinates, fails to check a critical condition, or optimizes for the wrong objective when thrust into an unprecedented scenario? For safety-critical domains like energy grids, "probably safe" isn't good enough. To realize the potential of AI in these areas, we need to develop more robust, mathematical guarantees of safety.
Today's power grids operate on a foundation of conservative safety margins designed to prevent incidents like blackouts. Grid operators maintain "spinning reserves" — backup power plants that run continuously at partial capacity, ready to instantly compensate for sudden changes in supply or demand. Operators also follow strict protocols that prioritize stability over efficiency, often keeping coal or gas plants running even when renewable energy is abundant, because these conventional sources are more predictable, thereby helping buffer temporary demand/supply mismatches and ensure the grid is operating within its safe frequency range.
Similarly, current weather forecasting for renewable power generation relies on models that update every few hours, not the minute-by-minute precision needed for optimal grid management. Meanwhile, forecasting the overall demand for electricity still depends heavily on historical patterns and human judgment about events like major sporting broadcasts or unusual weather.
Want to contribute to the conversation? Pitch your piece.
These approaches have served us well for decades. But they're increasingly strained by the complexity of modern grids that must maintain a precise balance between supply and demand while integrating thousands of distributed renewable sources, batteries, and smart devices.
While the Spanish authorities are still investigating the exact causes of their crippling blackout, experts at the country’s national grid think it likely relates to fluctuations in the availability of renewable energy sources and subsequent problems in grid synchronization. AI systems could significantly improve this situation in two ways: making better decisions about when to charge and discharge energy storage systems, and ensuring renewable energy sources deliver electricity in a way that doesn’t destabilize the consistent frequency the grid requires.
Incidents like the Heathrow fire also show the boon that AI could be to our strained and increasingly complex infrastructure. A specialized AI system could monitor for the common precursors to transformer failures: oil degradation from chemical aging, insulation wear, and overheating from overloading — all factors that experts believe may have contributed to the Heathrow incident.
To better understand how AI could prevent such incidents, consider how electricity grids currently operate: human operators in control rooms monitor thousands of data points and make split-second decisions about power routing and generation. While they are supported by automated systems, these systems follow pre-programmed rules like triggering alarms when temperatures exceed set thresholds or activating backup power when voltage drops. A purpose-built AI control system would go far beyond these simple rule-based responses, continuously analyzing all data streams in real-time — from weather patterns to demand fluctuations — to detect subtle warning signs of future trouble that both human operators and current systems might miss.
But it’s not just about avoiding catastrophe. Incorporating AI systems that are capable of monitoring the weather, forecasting demand, and making better use of battery resources could make our energy systems more efficient.
Right now, grid operators (rightly) take a highly conservative approach to balancing supply and demand to ensure the grid keeps running (the UK's National Grid, for instance, currently spends over £3 billion annually on balancing costs). But that leads to a significant amount of renewable energy being wasted. The people overseeing these complex systems don't have a detailed sense of how close they might be to blackouts, so they naturally err on the side of caution in order to keep the lights on. This means keeping extra fossil fuel plants ready to generate electricity even when renewable energy is available.
The challenge we face in deploying AI into these systems isn't primarily about the capabilities of AI models — which are rapidly improving — but about safety. Before we entrust our most critical systems to AI, we need stronger mathematical guarantees that these systems will behave as intended under all conditions, including edge cases we haven't explicitly anticipated.
Today's approaches to AI safety — across all domains — rely primarily on empirical methods like red-teaming (which involves actively trying to make the system produce harmful or unwanted outputs), evaluations (testing the AI against a specific set of questions), and monitoring (continuously watching the AI system's outputs during deployment to detect problems as they occur). These techniques are suitable for lower-stakes applications, but they're fundamentally inadequate for critical infrastructure for three key reasons.
First, current AI safety approaches can only test a tiny fraction of possible scenarios. No matter how extensive, when it comes to complex systems being controlled by neural networks, any testing regime coverage will inevitably fall short of real-world variability. For instance, an AI system controlling power grid operations might be thoroughly tested for handling equipment failures during a variety of weather conditions, but fail catastrophically when a transformer overheats during an unexpected heatwave while renewable energy output simultaneously drops due to cloud cover — a specific combination of conditions that wasn't included in the testing scenarios.
Second, even when conventional safety tests identify problems, the fixes are often ad hoc patches rather than fundamental resolutions. Each patch might fix a specific failure case while potentially introducing new ones. For example, if testing reveals that an AI system incorrectly shuts down a power plant during high-demand periods, engineers might add a rule preventing shutdowns when demand exceeds a certain threshold. This patch, however, could then prevent necessary emergency shutdowns during actual equipment failures, thus creating a new safety risk.
Third, these approaches cannot provide quantitative guarantees about system behavior. In other words, they can tell you how a system has behaved in specific test conditions, but not prove how it will behave under all conditions. In a power grid context, extensive testing might show that an AI system successfully balanced supply and demand in all tested scenarios. However, this cannot guarantee that the system won't make a catastrophic decision during real-world situations that weren't anticipated by the tests’ designers — such as when multiple substations fail simultaneously during a cyberattack and cause cascading blackouts across an entire region.
For critical infrastructure, where the stakes are high, promising test results aren’t enough. We need systems that we can mathematically verify will remain within specified safety parameters.
To overcome the shortcomings of traditional safety testing, advanced AI systems for critical environments (like electrical grids) should be fine-tuned to produce narrow AI applications with mathematical guarantees of safety in their contexts of use, such as a specific power grid.
My colleagues and I at the UK’s Advanced Research and Invention Agency (ARIA) call this approach “Safeguarded AI.” The overall aim of our approach, which is backed by almost £60 million in funding, is to create a secure production facility where we harness frontier AI capabilities to produce narrower, safe-by-design AI applications that are tailor-made for specific settings, like electrical grids.
The Safeguarded AI approach draws inspiration from other high-assurance fields like aviation and nuclear power, where formal methods have been used to provide safety guarantees for critical systems. Nuclear power plants safely harness the raw energy of uranium through rigorous containment and control systems, for example via standards that require quantitative risk assessments and exhaustive processes of verification and validation of all safety-critical systems and operational procedures, demonstrating that societal risks are within acceptable thresholds, often requiring systems to have less than 1 in 10,000 chance of dangerous failure per year. Similarly, using this approach, we aim to harness the power of frontier AI while containing its risks, building and designing AI systems that are ready to be deployed in safety-critical environments.
At the heart of Safeguarded AI is the goal of mathematically representing what “safe” looks like in each specific setting, and then enforcing that requirement upon AI to safeguard it for that setting.
To develop these mathematical safety proofs, we work with domain experts supported by AI to translate real-world safety requirements — like 'never let voltage drop below X level' or 'always maintain Y minutes of backup power' — into precise mathematical equations. These equations capture all the physical constraints and relationships in a given system, similar to how engineers use mathematical models to prove a bridge can safely carry a certain weight. The AI system is then instructed to demonstrate, through rigorous mathematical proof, that its proposed actions will keep all these safety variables within their specified bounds under all possible operating conditions — providing the same level of certainty we expect from structural engineering calculations.
A generative AI system thus trained would output a verifiable argument demonstrating that each fine-tuned AI application would stay within those safe bounds of quantifiable certainty — so long as it is deployed in a context that is consistent with the mathematical model. The Safeguarded AI application would only be deployed after a separate proof-checker verified a certificate of the model’s mathematical proof. After being deployed, its operators would use continuous runtime monitoring to ensure that the real-world operating environment remains consistent with the model's assumptions.
Electrical grids aren’t the only use case for Safeguarded AI. For instance, as part of our programme, we are funding academics and researchers at AstraZeneca to develop a mathematical framework to simulate biopharmaceutical manufacturing processes, and at Lindus Health in order to optimize the design of clinical trials.
While our progress on Safeguarded AI has been promising, significant work remains before it can be deployed in critical infrastructure. For one, our research teams are still developing the mathematical languages needed, both to appropriately describe complex systems and to train AI to generate reliable safety proofs. We are also funding applicants to figure out what the secure facility (and the institute that will run it) will look like.
We believe Safeguarded AI could be a boon to societal resilience across the board — not just the specific cases we’re funding research into. For example, during supply chain disruptions, such as those experienced during COVID-19 where multiple factors created global shortages, Safeguarded AI could optimize logistics and inventory management with guaranteed near-optimal performance even under extreme stress. And telecommunications networks could also benefit, with Safeguarded AI dynamically managing network resources with provable guarantees against complete system failure. This would be especially helpful during emergencies, when communication networks are often loaded beyond capacity.
The Heathrow transformer fire and the Spanish blackout remind us of our vulnerability against catastrophic edge cases. Frontier AI systems present utility operators with an incredible opportunity to enhance the reliability and efficiency of the grid. But they must get the safety processes right. We will take a big step towards that in the fall of 2025, when we award funding to the researchers who will scope out what Safeguarded AI production facility will look like, as well as the institute that will run it. (We are still accepting applications for phase one of this area, and will be opening applications for the main research area later in the summer.)
This is the central output of all of our research, and one which we expect to outlast the programme, enabling powerful AI to be deployed into critical infrastructure in the years to come.
See things differently? AI Frontiers welcomes expert insights, thoughtful critiques, and fresh perspectives. Send us your pitch.
An unchecked autonomous arms race is eroding rules that distinguish civilians from combatants.
In the absence of federal legislation, the burden of managing AI risks has fallen to judges and state legislators — actors lacking the tools needed to ensure consistency, enforceability, or fairness.