Cloud Fax Downtime & Resilience Planning: An Enterprise Continuity Guide

Paperless Productivity

Posted on June 9, 2026

Key Takeaways:

Architecture determines outcomes: A single-region deployment with no failover means fax stops during a cloud provider outage. Resilience is a design decision, not a default.
Know your failure modes: Cloud provider, application, telecom, EHR, and network failures each affect fax differently and require different responses. Identify which has occurred before acting.
Monitor proactively: Queue depth, outbound success rates, and server health should be alerting before an outage occurs. Silent outbound failure—a fax that leaves the EHR but never reaches RightFax—requires logging on both sides to detect.
Telecom is a single point of failure: A backup SIP trunk or alternate carrier path directly addresses carrier-level outages. Confirm T.38 support with your provider before a failure, not during one.
Downtime procedures need to be operational: If the referral coordinator doesn’t know the fax downtime procedure, the procedure isn’t operational. Document and test fax downtime separately from EHR downtime.

What happens to your faxes during an outage?

The answer depends almost entirely on your architecture. Inadequate preparation means operations grind to a halt while critical documents are lost. But with proper planning and testing, systems can fail over gracefully while the primary environment recovers.

The infamous AWS outage in October 2025 AWS drove the point home for organizations that were caught off guard. A DNS resolution failure in US-EAST-1 disrupted DynamoDB and created cascading downstream failures across dozens of services, creating a real-world test for enterprises’ fax DR strategies (or lack thereof). The root cause was Amazon’s error, but the degree of disruption was largely determined by fax architecture decisions made years previously.

This guide covers how to plan for fax continuity before an outage occurs: architecture options, monitoring requirements, downtime procedures, telecom resilience, and what to do after service is restored.

This is a general overview of common practices, which are a solid starting point but not necessarily right for your org. Your fax solution partner is the best resource for specific or situational guidance. (And if you’re a RightFax customer without a solid solution partner, consider reaching out to see if our team fits.)

Understanding Fax Outage Types

Different failure modes affect fax workflows differently and require different responses. Knowing which type of failure has occurred is the first step in an effective response.

Cloud provider or IaaS failure

A regional outage affecting compute, storage, or networking—as in the October 2025 AWS event—takes down single-region fax deployments entirely. If the RightFax VM is unreachable, neither inbound nor outbound fax functions. Multi-region configurations with defined failover can route around a regional failure; single-region deployments cannot.

RightFax application failure

The cloud infrastructure is up, but the fax application itself has failed. The culprit could range from a server crash to a failed update to a simple configuration error. Inbound faxes fail at the point of delivery; outbound faxes initiated from EHR workflows fail at the point of transmission. Resolution options include restart, rollback to a prior configuration, or failover to a warm standby instance.

Telecom or SIP trunk failure

The carrier or SIP trunk is unavailable. Inbound faxes fail at the carrier level (no delivery attempt reaches the fax server) and senders receive a failure signal. Outbound transmission fails at the point of call setup. A backup SIP trunk or alternate carrier routing path addresses this failure mode.

EHR-side failure

The fax infrastructure is intact, but the EHR is unavailable. The fax software receives and queues inbound faxes but cannot route them into the EHR. Outbound fax from EHR workflows fails at initiation. This requires a documented downtime procedure: an alternate fax client for outbound, a defined queue review process for inbound, and a reconciliation workflow for filing documents into the EHR on recovery.

Network or connectivity failure

The fax server is functioning but users and applications cannot reach it. The fax server itself may recover faxes once connectivity is restored, but documents initiated during the outage are lost unless a retry mechanism is in place. VPN failover and redundant connectivity paths are key mitigation steps.

Architecture for Fax Resilience

Fax resilience is a design decision, not a feature. The architecture chosen at deployment determines what an outage looks like. For a deeper treatment of HA and DR strategy, see Ensuring Fax Continuity with High Availability and Disaster Recovery.

The choice of cloud region(s) is one of the biggest determinants of fax resilience:

Single-region IaaS

One VM, one cloud region, no failover. Any regional cloud provider outage or VM failure takes fax offline. This architecture is appropriate only for low-volume, non-critical fax environments where downtime is an acceptable risk.

Multi-region with failover

Primary and secondary RightFax instances in separate cloud regions. On primary failure, inbound routing shifts to the secondary via SIP trunk reconfiguration and the secondary instance handles outbound. RTO depends on whether failover is automated or manual: automated failover can restore service in seconds, whereas manual failover is slower and subject to staff availability.

Hybrid architecture

An on-premises RightFax instance serves as a DR target for a cloud primary, or vice versa. This provides independence from cloud provider failures and can be particularly useful for organizations with existing on-premises infrastructure. Telecom routing shifts between cloud and on-premises during an outage.

Private Fax Cloud® multi-region configuration

Private Fax Cloud® can be configured for multi-region deployment with monitoring, alerting, and failover orchestration managed by our team. The architecture is designed from deployment to align fax workflows with DR processes rather than grafting resilience on afterward. SQL Server Always On availability groups can be used for the RightFax database, enabling synchronized replication between nodes and automated failover without manual intervention.

The key design principle is to keep the telecom and fax application layers independent resilient. If a SIP trunk remains active when the fax application fails, it’s possible to notify senders (e.g., with busy signal or failure notification) rather than silently dropping inbound attempts.

Queue Monitoring & Alerting

It’s categorically better to catch and mitigate an outage yourself than to learn about an outage from clinical or operational staff.

RightFax’s built-in reporting covers queue depth, transmission success and failure rates, and channel utilization. Cloud-native monitoring tools—CloudWatch for AWS, Azure Monitor, Google Cloud Monitoring—add visibility to CPUs, memory, disk I/O, and network health. Both layers should be active and should send alerts before an outage occurs, not configured in response to one.

Some key metrics to monitor include:

Queue depth trending upward without a corresponding increase in transmission completions
Outbound success rate dropping below baseline
Inbound delivery rate degradation
Channel utilization approaching capacity at peak
Server health indicators

Silent outbound failures are especially pernicious. We might look to close the gap through a combination of EHR-side transmission logging and RightFax-side delivery confirmation.

Telecom Resilience

Telecom is a frequently overlooked single point of failure in fax DR planning. A cloud fax deployment with multi-region compute and a single SIP trunk is only as resilient as that carrier.

T.38 is the standard protocol for fax over IP. Confirm T.38 fallbacks with your SIP provider before a failure occurs. Alternatives like G.711 are usually viable, but is significantly more sensitive to packet loss and jitter. If your vendor’s fallback raises concerns, then it’s worth considering redundant and geographically diverse carriers.

Downtime Procedures

Most healthcare downtime procedures are written for EHR outages. Fax-specific downtime is frequently under-documented, leaving staff at a loss when referrals suddenly stop arriving.

Downtime procedures for fax should cover:

Who is notified and in what sequence when fax failure is detected
How outbound faxes are handled during downtime: alternate fax client access, physical fax fallback where available, or documented deferral with follow-up
How inbound faxes are monitored during the outage (particularly whether documents are queueing or failing entirely)
How documents received during downtime are reconciled back into the EHR on recovery
Which fax numbers and workflows are highest priority for restoration

These procedures should be known by clinical operations leaders, not just IT. Otherwise, it simply isn’t operational.

Department-level readiness in healthcare

Discharge: alternate distribution method for discharge summaries when outbound fax is unavailable
Referrals: documented alternate intake process; staff know which numbers are affected and have an escalation path
Lab results: providers aware that results may be delayed; critical value notification has a fax-independent backup channel
Prior authorizations: payer portals identified as a temporary alternative; document which payers accept portal submissions
Pharmacy: verbal or portal-based orders during downtime; reconciliation required on recovery

Post-Incident Review

Recovery is not the end of the process. A structured post-incident review produces the information needed to prevent recurrence and close gaps that the outage exposed.

The review should document:

The timeline of service failure, detection, and restoration
What was lost, such as inbound faxes that failed without queuing or outbound faxes that failed silently
What needs to be reconciled, namely documents that need to be resent or re-requested

Root cause analysis should trace the failure to its origin: cloud provider, application, telecom layer, or configuration. The output of the review should update DR documentation, including any architectural changes needed to close gaps. Needless to say, DR configuration and failover should be tested on a schedule in anticipation of possible incidents, not only in response.

Resilience Checklist

Architecture

☐ Is fax running in a single-region deployment with no defined failover?
☐ Is there a documented RTO and RPO for fax services?
☐ Are those targets stated in a contract or SLA rather than a sales conversation?
☐ Are the telecom and application layers independently resilient?
☐ Is there a backup SIP trunk or alternate carrier path configured?
☐ Has failover been tested, not just designed?

Monitoring & alerting

☐ Are queue depth, success rates, and server health actively monitored?
☐ Are alerting thresholds defined for critical fax metrics?
☐ Is silent outbound failure detectable — can you identify when a fax leaves the EHR but doesn’t reach RightFax?
☐ Is infrastructure-level monitoring (CPU, memory, disk, network) configured alongside application monitoring?

Downtime procedures

☐ Is there a documented fax downtime procedure separate from the EHR downtime procedure?
☐ Do clinical operations leaders know the fax downtime procedure — not just IT?
☐ Is there a defined reconciliation process for documents received or sent during downtime?
☐ Have fax downtime procedures been tested in the past 12 months?

Recovery

☐ Is there a post-incident review process that specifically includes fax?
☐ Is DR configuration tested independently of EHR DR testing?
☐ Are RTO and RPO targets validated by test results rather than assumptions?

Building Resilience Before You Need It

Your organization’s fax SLA doesn’t just come from your cloud provider. Conscious architectural decisions made long before an outage will dictate how much resilience you can expect when the worst happens.

Sound, resilient architecture is build into Private Fax Cloud®: multi-region capability, managed monitoring and alerting, and a team of highly specialized engineers. Whether you want a sounding board for your own downtime and mitigation concerns, or you’re ready to explore more resilient alternatives, then contact us to speak with a solutions engineer.

« The Healthcare Fax-to-EHR Integration Guide

HIPAA Fax Automation for Faster Healthcare Decisions »