Recovering from a nasty BTRFS meltdown on an AFS fileserver

A few years ago I was the victim of a bug in Btrfs which resulted in total inability to mount two Btrfs volumes containing AFS ‘vice’ partitions (partitions containing AFS file data, named and laid out as AFS ‘vnode’ structures).  It took a while, but I was able to recover the vast majority of the data with only minor corruption.  Here are my notes that might help someone in a similar situation.

How do I fix a broken btrfs?

If your filesystem is unmountable and btrfs restore cannot work, the steps in the following post encompass all techniques that are known to me to help make the filesystem metadata sane again:

http://www.spinics.net/lists/linux-btrfs/msg14888.html

Note: Always use the latest version of the btrfs-progs suite from upstream.

If your filesystem is very damaged like mine, and the above steps do not result in btrfs restore being able to run, then you will need to brute-force to find a tree root that is usable. In this command, findroot.clones is a list of possible roots output by btrfs-find-root, /dev/mapper/... is the corrupt filesystem, and /vicepc/tmp/ is a temporary output location for recovered files.

sed 's/^.*block \([^(]\+\).*$/\1/g' findroot.clones | grep '^[0-9]\+$' | while read ROOT; do
    echo "Trying root ${ROOT}" >> clones.log; ./btrfs restore -v -i -t ${ROOT} /dev/mapper/tr5ut-vicep--clones_corrupt /vicepc/tmp/ >> clones.log 2>&1;
done

How do I fix a broken AFS vice partition?

After doing this, I had a partial AFSIDat directory tree, albeit with no volume headers.  So the volume could not be attached normally.

Some ideas and tools for dealing with similar situations can be found in this openafs-devel post:
https://lists.openafs.org/pipermail/openafs-devel/2013-August/019509.html

Some useful tools were voldump, volinfo, and cmu-dumpscan (be sure to use the ‘mtpt’ branch if you have any large files in your dump, since the ‘master’ branch can’t handle them and crashes.).  You might have to mock up a volume header temporarily by creating an empty volume with the same volume id.

One problem I ran into was that the root directory vnode was missing. This was causing the salvager to ignore all the files in the tree. I attempted to rebuild the root directory vnode manually, but failed. In the end, I was able to force the salvager to attach the orphaned files with -orphans attach.

When using the demand-attach OpenAFS fileserver, the offline salvager must be used; the online salvager will not attach orphans. Invoke like this: /usr/lib/openafs/dasalvager /vicepx 536871356 -orphans attach

Leave a Reply