charlie.tools
3 min read

Proxmox ZFS replication to a cold offsite node over Tailscale

Turning a mini-PC at my parents house into a cold standby for the homelab. ZFS send/receive over Tailscale, orchestrated with a Proxmox replication job and a boring shell script.

My homelab has been a single-node Proxmox box for too long. A power surge last summer took it offline for three days, and while nothing was lost, I realized the only copy of every family photo I have digitized is on that ZFS pool. Time to fix that.

The plan: a fanless mini-PC at my parents’ house, on Tailscale, pulling ZFS snapshots nightly. It does not run any VMs — it is just a cold target. If the primary dies, I restore to new hardware from the offsite pool.

The trick with Proxmox’s built-in replication is that it expects both nodes to be in the same cluster. I do not want a two-node cluster across a consumer internet link, so I drive replication with a shell script and zfs send | ssh | zfs receive.

/usr/local/bin/zfs-replicate-offsite.sh
#!/usr/bin/env bash
set -euo pipefail
POOL="rpool"
DATASETS=("rpool/data/vm-100-disk-0" "rpool/data/vm-101-disk-0" "rpool/photos")
REMOTE_HOST="offsite.tailnet"
REMOTE_POOL="backup"
SNAPSHOT_TAG="offsite-$(date +%Y%m%d-%H%M)"
for ds in "${DATASETS[@]}"; do
echo "==> Snapshotting $ds@$SNAPSHOT_TAG"
zfs snapshot "$ds@$SNAPSHOT_TAG"
# Find the most recent common snapshot for incremental send
LATEST_REMOTE=$(ssh "$REMOTE_HOST" \
"zfs list -H -t snapshot -o name -s creation ${REMOTE_POOL}/${ds#*/} 2>/dev/null | tail -n1" \
| sed "s|^${REMOTE_POOL}/||" || true)
if [[ -n "$LATEST_REMOTE" ]]; then
echo "==> Incremental from $LATEST_REMOTE"
zfs send -i "$LATEST_REMOTE" "$ds@$SNAPSHOT_TAG" \
| ssh "$REMOTE_HOST" "zfs receive -F ${REMOTE_POOL}/${ds#*/}"
else
echo "==> Full send (first run)"
zfs send "$ds@$SNAPSHOT_TAG" \
| ssh "$REMOTE_HOST" "zfs receive -F ${REMOTE_POOL}/${ds#*/}"
fi
done
# Prune local snapshots older than 14 days (offsite keeps its own retention)
zfs list -H -t snapshot -o name | grep '@offsite-' | while read snap; do
SNAP_DATE=$(echo "$snap" | sed -n 's/.*@offsite-\([0-9]\{8\}\).*/\1/p')
if [[ $(date -d "$SNAP_DATE" +%s) -lt $(date -d '14 days ago' +%s) ]]; then
zfs destroy "$snap"
fi
done

Lessons from the first month of running this:

  • Tailscale MSS clamping matters. My consumer cable upload started fragmenting on long ZFS streams until I set tailscale up --accept-dns=false --advertise-exit-node=false with MSS clamping in the tailnet ACL. Throughput tripled.
  • zfs receive -F is fine here but be aware it destroys any receiver-side snapshot newer than the source. On a cold node that only ever receives, this is what you want. On a hot node you are failing over to, it is a foot-gun.
  • The first full send of my photos dataset was 1.8 TB and ran for four days. Budget for that. Subsequent nightly incrementals run in under ten minutes.

Next step: a monthly scrub on the offsite node with paging to my phone if it reports errors. That is this weekend’s project.