Post

The Replica Backup Trap: Why Your PostgreSQL Backups Are Mysteriously Failing

The Replica Backup Trap: Why Your PostgreSQL Backups Are Mysteriously Failing

It’s 3:00 AM. Your automated cron job kicks off a pg_dump on your database read replica. You chose to run backups on the replica for a perfectly logical reason: backups are intensive, resource-heavy operations, and you want to save your primary database’s CPU and disk I/O for your actual living, breathing users.

You go to sleep thinking your data is safe. Instead, you wake up to a failed job alert and a message that looks all too familiar:

1
2
3
ERROR: canceling statement due to conflict with recovery
DETAIL: User query might have needed to see row versions that must be removed.

Wait, what? A backup isn’t a user query—it’s just a backup! Why is the replica killing its own safety net?

Let’s dive into why backing up from a replica is a notorious trap, look at how the conflict plays out visually, and map out the cleanest production solutions to fix it.


The Core Problem: The Backup is Just a Giant Query

To understand why this happens, we have to look at how backup utilities like pg_dump function under the hood.

A backup isn’t a magical snapshot; it is essentially one massive, long-running read transaction. When pg_dump starts, it opens a transaction with REPEATABLE READ isolation. It needs a completely frozen, consistent view of the database from that exact microsecond so that table A matches table B, even if the backup takes three hours to finish.

While your backup is happily scanning tables on the replica, your primary database is still blazing away handling live production traffic.

Here is exactly how that creates a collision course:

  1. The Ghost Row: A user updates or deletes a row on the Primary database. The Primary runs a VACUUM to clean up the old version of that row.
  2. The WAL Stream: The Primary writes this cleanup event into the Write-Ahead Log (WAL) and streams it to the replica.
  3. The Standby Delay: The replica receives the log and says, “I need to delete this old row version right now to stay in sync with the primary.” But it stops. It notices that your pg_dump process is still running and still needs to read that exact old row version.
  4. The Ultimatum: The replica waits for a grace period defined by your configuration (usually 30 seconds). If your backup doesn’t finish in 30 seconds (and it won’t), the replica prioritizes replication over your backup. It summarily executes the backup process to apply the update.

Why didn’t hot_standby_feedback save us?

If you read our previous article, you might think: “Can’t I just turn on hot_standby_feedback = on so the replica tells the primary not to vacuum those rows?”

Yes, in theory. But in production, backups often break right through the feedback shield for two reasons:

  • The Replication Gap: If your primary is processing heavy bulk writes while the backup is running, the replica can easily fall slightly behind in replaying logs. If the replica lags behind by even a few seconds, its “feedback” message arrives at the primary too late—after the primary has already vacuumed the rows. When those WAL files finally land on the replica, a conflict is unavoidable, and the backup dies.
  • Log Shipping vs. Streaming: If your replica restores via WAL files (restore_command) rather than a live network stream (primary_conninfo), hot_standby_feedback literally cannot communicate upstream. The primary is entirely blind to the backup happening on the replica.

Production Solutions: How to Back Up Safely

You don’t have to move your backups back to the primary and risk slowing down your users. Here are the three industry-standard ways to fix this.

Solution 1: Crank Up the Standby Delays (The Quick Fix)

If you want a fast configuration fix, you need to tell the replica to give your backup hours of breathing room instead of seconds.

Modify these settings in the postgresql.conf file on your Read Replica:

1
2
3
4
5
6
# If your replica streams live from the primary:
max_standby_streaming_delay = '4h'  # Gives the backup 4 hours to finish

# If your replica restores from archived WAL files:
max_standby_archive_delay = '4h'

What happens now: When a conflict occurs, the replica will willingly pause replication for up to 4 hours to let your backup finish.

  • The Trade-off: Your replica data will become stale (lag behind the primary) while the backup is running. As soon as the backup finishes, the replica will read the accumulated logs and catch up rapidly.

Solution 2: Use Strict Physical Replication Slots

If you cannot afford replication lag during your backup window, you must pair hot_standby_feedback = on with a Physical Replication Slot on the primary.

A replication slot forces the primary database to track the replica’s exact position. When the replica says, “Hey, I’m doing a backup, don’t delete anything yet,” the primary is forced to comply. It will store old WAL data on its own disk rather than throwing it away.

  • The Trade-off: If your backup takes a very long time, your primary database’s disk space will temporarily grow as it holds onto data for the replica. Ensure your primary has plenty of disk headroom.

Solution 3: Ditch pg_dump for Physical Backups (The Enterprise Way)

If your database is hundreds of gigabytes or terabytes in size, logical backups (pg_dump) on a replica are fundamentally the wrong tool. Large scale databases should use physical backup tools like pgBackRest, Barman, or native cloud snapshots (like AWS Aurora/RDS snapshots).

These tools don’t open an MVCC database transaction. Instead, they copy the raw data blocks directly from the disk along with the raw WAL streams. Because they don’t care about row versions or read consistency at the SQL level, they are immune to recovery conflicts and will never get canceled.


Summary Strategy

  • For Small/Medium DBs (< 50GB): Keep your backup on the replica, turn hot_standby_feedback = on, and increase max_standby_streaming_delay to a window long enough for your backup to safely complete (e.g., 2h or 4h).
  • For Large Enterprise DBs (> 100GB): Stop using pg_dump. Move your backup strategy to a dedicated physical tool like pgBackRest, which bypasses the database engine completely and completely avoids the conflict trap.
This post is licensed under CC BY 4.0 by the author.