Lesson 2 · Resilience
You have a backup. You do not yet have a recovery. Today we find out which.
vzdump job and proved it produces a real archive. That moved your worst
RPO from ∞ to 24 hours. But your
RTO is still a question mark — and an RTO
you've never measured is the same as no RTO at all.
A backup is a claim: "this file can become a running guest again." Until you've actually cashed that claim, it's untested — and untested backups fail at the worst possible moment (a half-written archive, a storage that won't accept the restore, a step you didn't know you'd need at 2am). This lesson cashes the claim once, on purpose, while nothing is on fire — and hands you a real number.
The professional rule is blunt: you don't have backups, you have restores. The backup job is the easy half; the recovery is the half that actually saves you, and it's the half nobody practises. A fire drill (or restore test) is a deliberate, scheduled rehearsal of recovery — done calm, timed, and torn down — so that the real thing is muscle memory, not improvisation.
Which one of these actually establishes your RTO?
A restore test is only safe if it can't touch production. Three rules make it harmless and reversible:
1. Restore to a new, unused ID — never over the original. Restoring onto
the live guest's ID would overwrite a working service with an older copy. Use a throwaway VMID
(your next-free is 114) and destroy it after.
2. Keep it off the network until you've fixed its identity — the restored
copy carries the original's static IP. Boot it as-is and two guests fight over
192.168.5.126. Change the IP (or leave it stopped) before you start it.
3. Restore to spare space — send it to nvme-storage, not the
SSD local-lvm thin pool, so the drill never pressures the pool your real guests run on.
pct restore onto an existing ID replaces it. The
whole point of a fire drill is that it's a drill — pick an ID that doesn't exist yet, and the
worst case is you delete a throwaway.
Your restored copy of Prowlarr is about to boot. What's the danger?
You already have one archive on the NVMe — the Prowlarr test backup from Lesson 1. Let's
restore it to a throwaway container, confirm it really comes back, time it, and tear it down.
Nothing here touches the live Prowlarr (103); the drill lives and dies as ID 114.
# on the Proxmox host ssh root@192.168.5.121 # grab the newest prowlarr archive ARCHIVE=$(ls -t /mnt/nvme/dump/vzdump-lxc-103-*.tar.zst | head -1) echo "$ARCHIVE" # THE TIMED PART — restore to a throwaway ID on spare NVMe space time pct restore 114 "$ARCHIVE" --storage nvme-storage --unprivileged 1 # give the clone a non-conflicting identity, then bring it up pct set 114 -net0 name=eth0,bridge=vmbr0,ip=192.168.5.151/22,gw=192.168.4.1 pct start 114 # PROVE it's a real, running guest (not just files) pct exec 114 -- systemctl is-system-running pct exec 114 -- ls /var/lib/prowlarr # its config came back # tear the drill down — fully frees the space, no stale thin blocks pct stop 114 && pct destroy 114
Write down the real time that time pct restore printed. That
— plus the minute or two to fix the IP and boot — is your measured RTO for a single small LXC.
For the first time, you can say it out loud.
Type the restore time you measured (in seconds) to bank the number.
secondsPrimary source to read next: the
Proxmox VE — Backup and Restore
wiki (restore section), and the
vzdump admin-guide chapter for
pct restore options.