Mein Haus, meine Straße, mein Blob

Sep 13, 2012

Fixing a 'connection reset by peer' error in bacula

This is probably a very setup-specific bug, but since it took me quite a while to figure out, I though I'd blog about it anyway. The problem was one specific bacula client — connected to both director and storage daemon via VPN — not completing any backup larger than a few megabyte. The logs didn't show anything but the not very helpful 'Connection reset by peer' message. Strangely enough, the files were copied just fine, but the director considered the backup failed afterwards, anyway.

What (probably) happened

The VPN tunnel, while not unstable as such, seems to drop idle connections after a while, causing the files to be copied without problems — after all, that connection is active all the time — but the control connection to be killed during that time, leading to the problem described above when bacula tries to update the database after finishing the file copy process.

How to fix it

The fix, it turns out, is trivial once you know why the problem occurs. Bacula has a Heartbeat Interval directive for director, file daemon and storage daemon. Activating a 30 second heartbeat for both the affected file daemon and the storage daemon did the trick.