What Happens When Your EHR Goes Down? Lessons From Healthcare Business Continuity Planning
What Happens When Your EHR Goes Down? Lessons From Healthcare Business Continuity Planning
In Brief
- EHR downtime is widely treated as an IT problem solved by backups and uptime guarantees, when it is an operational-continuity problem that requires rehearsed clinical and administrative protocols.
- The most disruptive recent outages originated outside the clinic's own network — in vendors, clearinghouses, and cloud platforms — which means a strong internal IT posture is no longer sufficient protection.
- The decisive question is not how to prevent downtime, which cannot be fully eliminated, but how to keep treating patients safely when it happens, and that capability is built before the outage, not during it.
The Short Answer
What happens when your EHR goes down, and what should a clinic do about it? Scheduling, clinical access, orders, billing, and communication can stall at once, and patient-safety risk rises measurably as treatment is delayed and medication-error risk climbs. The clinics that keep operating safely are the ones that prepared a degraded-mode protocol in advance, with defined fallbacks for each critical function, including the ability to run on paper. They also recognized that their largest exposure may be a vendor outage they cannot prevent and can only plan around.
Executive Summary
Most clinics think about EHR downtime, if they think about it at all, as a technology question with a technology answer: keep backups, buy an uptime guarantee, and trust the vendor. The events of recent years have made that framing dangerous. When a major claims clearinghouse was taken down by ransomware in 2024, thousands of practices that had done nothing wrong internally suddenly could not verify eligibility, submit claims, or get paid, some for weeks, because a vendor they often had not chosen directly had become a single point of failure for a third of the country. Hospitals have separately measured the cost of downtime at roughly $7,500 per minute, and the clinical risk is concrete: treatment delays and a meaningful rise in medication errors while systems are dark.
The lesson is that downtime is an operational and clinical problem with an IT trigger, and the response lives in protocols, not just in servers. A clinic with flawless backups can still be unable to safely see the patient in the room. The organizations that come through outages intact are the ones that decided in advance how each critical function would keep running in a degraded state, rehearsed it until it was muscle memory, and understood that resilience now has to extend beyond their own walls to the vendors they depend on. Downtime is inevitable. Being unprepared for it is a choice.
Why This Matters Now
The threat profile has shifted in two ways at once. Ransomware has made healthcare a primary target, and the consolidation of healthcare technology has concentrated risk into a handful of platforms whose failure cascades across the whole system. A clinic can harden its own network perfectly and still be grounded by an outage three vendors upstream. For leadership, the exposure is no longer a remote contingency for the IT team to manage quietly. It is an operational risk that touches revenue, patient safety, and the clinic's ability to function, and it now originates as often outside the building as inside it.
Defining the Terms
Business continuity is the plan and the capability to keep operating through a disruption. Downtime procedures are the documented, rehearsed steps a clinic follows when systems are unavailable. Degraded mode means operating safely at reduced capability rather than stopping entirely. RTO and RPO are the recovery time objective, how fast you must be back, and the recovery point objective, how much data you can afford to lose, the two targets that should drive any continuity design. Dependency risk is exposure to the failure of a third party you rely on but do not control.
The Problem Most Organizations Overlook
The overlooked problem is the gap between protecting data and protecting operations. Here is the contrarian position: backups and uptime SLAs create false confidence. They are worth having, but they answer the wrong question during an actual outage. Backups guarantee that your data still exists; they do nothing to help a clinician safely treat the patient in front of them while the system is unreachable. And the deeper blind spot is direction of risk. Practices invest in securing their own network and assume that makes them resilient, when the most damaging recent outages came from outside it entirely. "Our IT is solid" is a statement about one link in a chain that now runs through clearinghouses, cloud providers, and software vendors a clinic never directly selected.
Common Misconceptions
- "Our cloud EHR cannot go down." Cloud platforms and the vendors around them fail, and the 2024 outages proved that a hosted system is not an immune one.
- "Backups mean we are covered." Backups protect data, not operations. A clinic can have perfect backups and still be unable to safely see patients while the system is down.
- "Downtime is an IT problem." It is a clinical and operational problem with an IT trigger. The response lives in rehearsed protocols, not only in servers.
- "It will not happen to us." Outages from cyberattacks, hardware failures, software bugs, and upstream vendors are common and rising. The honest question is when, not whether.
Operational Impacts
Three realities define an outage as it unfolds. First, the failure can come from outside your walls: a clearinghouse or cloud outage can halt your billing or your access even when your own systems are perfectly healthy, which means internal hardening is necessary but not sufficient. Second, manual fallback is a skill rather than a binder: clinics that had never practiced paper workflows froze when the screens went dark, while the ones who had rehearsed kept moving because the protocol was muscle memory. Third, the clinical risk is immediate: during downtime, treatment delays and medication-error risk rise, which makes the continuity plan a patient-safety control and not merely a financial one.
Leadership Considerations
Three considerations belong to whoever owns the clinic's operations. First, map your dependencies, because you cannot plan around a vendor failure you never identified, and the dependency map is the true starting point of resilience. Second, rehearse rather than merely document, since a downtime plan no one has practiced is a plan that fails under stress, and drills are what convert a binder into a capability. Third, weigh the honest tradeoff: building and rehearsing continuity costs time and pulls staff off the schedule for drills, set against the cost of being unable to operate safely during an outage that will eventually arrive. Preparation reads as overhead right up until the day it is the only thing keeping the doors open.
What High-Performing Organizations Do Differently
The clinics that absorb outages without chaos share a discipline. They map both internal and external dependencies so they know where their single points of failure actually live. They define a degraded-mode fallback for each critical function, keep paper procedures current and practiced, and run downtime drills that make the protocol reflexive. And they treat single-vendor and clearinghouse concentration as a risk to plan around, not a fact to accept. The practical core of all of this is knowing which functions absolutely must keep running, and how, when the EHR is unavailable.
The Five Functions That Must Survive
Function | What halts when the EHR is down | The degraded-mode fallback that must be ready |
|---|---|---|
Patient identification & scheduling | Check-in, the day's schedule, who is coming and why | A printed daily schedule and a paper registration process |
Clinical access to critical info | Allergies, medications, problem lists, recent results | Printed or cached summaries for scheduled patients; a known way to reach critical history |
Orders & results | Lab and imaging orders, e-prescribing | Paper orders, phoned-in prescriptions, manual result tracking |
Billing & claims | Eligibility checks, claim submission, payment posting | A documented hold-and-batch process and an alternate clearinghouse path |
Communication | Internal messaging, patient and pharmacy contact | A non-EHR fallback channel and a contact tree for staff and key vendors |
Used as a checklist, the table is unsparing: any function without a ready, rehearsed fallback is a function that stops the day the EHR does.
Metro Relay Observations
- The clinics that came through recent outages best were not the ones with the most expensive systems. They were the ones that still knew how to run on paper.
- The most damaging downtime we see now rarely starts inside the clinic. It starts at a vendor the clinic never directly chose and cannot directly fix.
- A downtime plan that lives in a binder no one has opened is, during an actual outage, indistinguishable from having no plan at all.
- Backups answer the wrong question in a crisis. The urgent question is not whether the data is safe; it is whether the clinic can safely see the patient in front of it right now.
Metro Relay Perspective
Technology systems are infrastructure, and resilience is a business and clinical issue rather than an IT one. The outcome worth optimizing is the ability to keep operating safely, not a number on an uptime report. Because so much risk now lives in third parties, resilience has to extend beyond a clinic's own walls to the vendors and platforms it depends on. And the capability that matters is built and rehearsed before an outage, because no plan written during a crisis performs as well as one that was already muscle memory when the crisis began.
Strategic Recommendations
Map both internal and external dependencies, including the clearinghouses and cloud platforms upstream of your own systems. Define and document degraded-mode fallbacks for the five critical functions, and keep paper downtime procedures current. Run regular downtime drills so the protocol is reflexive rather than theoretical. Assess single-vendor and clearinghouse concentration and plan around it. And set explicit RTO and RPO targets, then confirm that your actual recovery process can meet them rather than assuming it will.
Future Outlook
Third-party and supply-chain risk will keep rising as healthcare technology consolidates and as ransomware continues to target the sector. Payers, regulators, and cyber insurers are increasingly treating operational resilience and vendor diversity as expectations rather than nice-to-haves. The standard of care is shifting from "we back up our data" toward "we can keep treating patients through an outage," and more clinics are elevating continuity from an IT task to a board-level requirement. The outages will continue; what is changing is whether being unprepared for them remains defensible.
Conclusion
Your EHR will go down at some point, whether by attack, by error, or by a vendor failure you cannot control, and the only variable you actually govern is whether your clinic can keep treating patients when it does. That capability is not something you can buy after the lights go out. It is a rehearsed protocol, built in advance, that defines how each essential function keeps running in a degraded state and how your team operates around a dependency you do not own. The clinics that prepare keep their doors open and their patients safe. The ones that assumed the system would always be there discover, at the worst possible moment, that the plan was the thing they were missing.
Key Takeaways
- EHR downtime is an operational and clinical problem with an IT trigger; the response lives in rehearsed protocols, not just in backups.
- Backups and uptime SLAs protect data, not operations — a clinic with perfect backups can still be unable to safely see patients.
- The most damaging outages now originate outside your network, in vendors and clearinghouses you don't control; map those dependencies.
- Define and rehearse a degraded-mode fallback for the Five Functions That Must Survive, including the ability to run on paper.
- Set RTO and RPO targets, run downtime drills, and treat continuity as a patient-safety control and a board-level requirement.