What happens to your faxes during an outage?
The answer depends almost entirely on your architecture. Inadequate preparation means operations grind to a halt while critical documents are lost. But with proper planning and testing, systems can fail over gracefully while the primary environment recovers.
The infamous AWS outage in October 2025 AWS drove the point home for organizations that were caught off guard. A DNS resolution failure in US-EAST-1 disrupted DynamoDB and created cascading downstream failures across dozens of services, creating a real-world test for enterprises’ fax DR strategies (or lack thereof). The root cause was Amazon’s error, but the degree of disruption was largely determined by fax architecture decisions made years previously.
This guide covers how to plan for fax continuity before an outage occurs: architecture options, monitoring requirements, downtime procedures, telecom resilience, and what to do after service is restored.
This is a general overview of common practices, which are a solid starting point but not necessarily right for your org. Your fax solution partner is the best resource for specific or situational guidance. (And if you’re a RightFax customer without a solid solution partner, consider reaching out to see if our team fits.)
Different failure modes affect fax workflows differently and require different responses. Knowing which type of failure has occurred is the first step in an effective response.
A regional outage affecting compute, storage, or networking—as in the October 2025 AWS event—takes down single-region fax deployments entirely. If the RightFax VM is unreachable, neither inbound nor outbound fax functions. Multi-region configurations with defined failover can route around a regional failure; single-region deployments cannot.
The cloud infrastructure is up, but the fax application itself has failed. The culprit could range from a server crash to a failed update to a simple configuration error. Inbound faxes fail at the point of delivery; outbound faxes initiated from EHR workflows fail at the point of transmission. Resolution options include restart, rollback to a prior configuration, or failover to a warm standby instance.
The carrier or SIP trunk is unavailable. Inbound faxes fail at the carrier level (no delivery attempt reaches the fax server) and senders receive a failure signal. Outbound transmission fails at the point of call setup. A backup SIP trunk or alternate carrier routing path addresses this failure mode.
The fax infrastructure is intact, but the EHR is unavailable. The fax software receives and queues inbound faxes but cannot route them into the EHR. Outbound fax from EHR workflows fails at initiation. This requires a documented downtime procedure: an alternate fax client for outbound, a defined queue review process for inbound, and a reconciliation workflow for filing documents into the EHR on recovery.
The fax server is functioning but users and applications cannot reach it. The fax server itself may recover faxes once connectivity is restored, but documents initiated during the outage are lost unless a retry mechanism is in place. VPN failover and redundant connectivity paths are key mitigation steps.
Fax resilience is a design decision, not a feature. The architecture chosen at deployment determines what an outage looks like. For a deeper treatment of HA and DR strategy, see Ensuring Fax Continuity with High Availability and Disaster Recovery.
The choice of cloud region(s) is one of the biggest determinants of fax resilience:
One VM, one cloud region, no failover. Any regional cloud provider outage or VM failure takes fax offline. This architecture is appropriate only for low-volume, non-critical fax environments where downtime is an acceptable risk.
Primary and secondary RightFax instances in separate cloud regions. On primary failure, inbound routing shifts to the secondary via SIP trunk reconfiguration and the secondary instance handles outbound. RTO depends on whether failover is automated or manual: automated failover can restore service in seconds, whereas manual failover is slower and subject to staff availability.
An on-premises RightFax instance serves as a DR target for a cloud primary, or vice versa. This provides independence from cloud provider failures and can be particularly useful for organizations with existing on-premises infrastructure. Telecom routing shifts between cloud and on-premises during an outage.
Private Fax Cloud® can be configured for multi-region deployment with monitoring, alerting, and failover orchestration managed by our team. The architecture is designed from deployment to align fax workflows with DR processes rather than grafting resilience on afterward. SQL Server Always On availability groups can be used for the RightFax database, enabling synchronized replication between nodes and automated failover without manual intervention.
The key design principle is to keep the telecom and fax application layers independent resilient. If a SIP trunk remains active when the fax application fails, it’s possible to notify senders (e.g., with busy signal or failure notification) rather than silently dropping inbound attempts.
It’s categorically better to catch and mitigate an outage yourself than to learn about an outage from clinical or operational staff.
RightFax’s built-in reporting covers queue depth, transmission success and failure rates, and channel utilization. Cloud-native monitoring tools—CloudWatch for AWS, Azure Monitor, Google Cloud Monitoring—add visibility to CPUs, memory, disk I/O, and network health. Both layers should be active and should send alerts before an outage occurs, not configured in response to one.
Some key metrics to monitor include:
Silent outbound failures are especially pernicious. We might look to close the gap through a combination of EHR-side transmission logging and RightFax-side delivery confirmation.
Telecom is a frequently overlooked single point of failure in fax DR planning. A cloud fax deployment with multi-region compute and a single SIP trunk is only as resilient as that carrier.
T.38 is the standard protocol for fax over IP. Confirm T.38 fallbacks with your SIP provider before a failure occurs. Alternatives like G.711 are usually viable, but is significantly more sensitive to packet loss and jitter. If your vendor’s fallback raises concerns, then it’s worth considering redundant and geographically diverse carriers.
Most healthcare downtime procedures are written for EHR outages. Fax-specific downtime is frequently under-documented, leaving staff at a loss when referrals suddenly stop arriving.
Downtime procedures for fax should cover:
These procedures should be known by clinical operations leaders, not just IT. Otherwise, it simply isn’t operational.
Recovery is not the end of the process. A structured post-incident review produces the information needed to prevent recurrence and close gaps that the outage exposed.
The review should document:
Root cause analysis should trace the failure to its origin: cloud provider, application, telecom layer, or configuration. The output of the review should update DR documentation, including any architectural changes needed to close gaps. Needless to say, DR configuration and failover should be tested on a schedule in anticipation of possible incidents, not only in response.
Your organization’s fax SLA doesn’t just come from your cloud provider. Conscious architectural decisions made long before an outage will dictate how much resilience you can expect when the worst happens.
Sound, resilient architecture is build into Private Fax Cloud®: multi-region capability, managed monitoring and alerting, and a team of highly specialized engineers. Whether you want a sounding board for your own downtime and mitigation concerns, or you’re ready to explore more resilient alternatives, then contact us to speak with a solutions engineer.