[ubuntu] 10.04 corrupted my xfs partition? [Archive]

hrodenburg

May 4th, 2010, 01:27 PM

Hi There,

Being an Ubuntu enthousiast (both professionally as well as at home), I upgraded my home server from 9.10 to 10.04 yesterday. It runs kvm with only 1 vm. Host and vm we're both ubuntu 9.10 64bit. I upgraded both to 10.04. So far so good. Also did reboots to load the new kernel image.

For data storage I have a large (6,35TB) XFS partition. This partition is exported to the vm as a block device (vdb). This morning I noticed the vm has crashed. The XFS partition (mounted as /home) was shut down.
Is this just bad luck? Or is it 10.04?

May 4 08:00:18 fileplanet kernel: [68558.681584] ffff8800241c8000: 58 41 47 46 00 00 00 01 00 00 00 02 0f ff ff ff XAGF............
May 4 08:00:18 fileplanet kernel: [68558.754806] Filesystem "vdb": XFS internal error xfs_btree_check_sblock at line 124 of file /build/buildd/linux-2.6.32/fs/xfs/xfs_btree.c. Caller 0xffffffffa009dcc4
May 4 08:00:18 fileplanet kernel: [68558.754826]
May 4 08:00:18 fileplanet kernel: [68558.777709] Pid: 2499, comm: rm Not tainted 2.6.32-21-server #32-Ubuntu
May 4 08:00:18 fileplanet kernel: [68558.777719] Call Trace:
May 4 08:00:18 fileplanet kernel: [68558.777869] [<ffffffffa00b0313>] xfs_error_report+0x43/0x50 [xfs]
May 4 08:00:18 fileplanet kernel: [68558.777898] [<ffffffffa009dcc4>] ? xfs_btree_check_block+0x14/0x30 [xfs]
May 4 08:00:18 fileplanet kernel: [68558.777911] [<ffffffffa00b037a>] xfs_corruption_error+0x5a/0x70 [xfs]
May 4 08:00:18 fileplanet kernel: [68558.777922] [<ffffffffa009dc21>] xfs_btree_check_sblock+0x71/0x100 [xfs]
May 4 08:00:18 fileplanet kernel: [68558.777933] [<ffffffffa009dcc4>] ? xfs_btree_check_block+0x14/0x30 [xfs]
May 4 08:00:18 fileplanet kernel: [68558.777944] [<ffffffffa009dcc4>] xfs_btree_check_block+0x14/0x30 [xfs]
May 4 08:00:18 fileplanet kernel: [68558.777955] [<ffffffffa009de5d>] xfs_btree_read_buf_block+0x9d/0xc0 [xfs]
May 4 08:00:18 fileplanet kernel: [68558.777971] [<ffffffffa009e514>] xfs_btree_lookup_get_block+0x84/0xf0 [xfs]
May 4 08:00:18 fileplanet kernel: [68558.777983] [<ffffffffa009c5c2>] ? xfs_btree_rec_addr+0x12/0x20 [xfs]
May 4 08:00:18 fileplanet kernel: [68558.777994] [<ffffffffa009ebb7>] xfs_btree_lookup+0xd7/0x4a0 [xfs]
May 4 08:00:18 fileplanet kernel: [68558.778015] [<ffffffffa00d4d3a>] ? kmem_zone_zalloc+0x3a/0x50 [xfs]
May 4 08:00:18 fileplanet kernel: [68558.778025] [<ffffffffa008ad3c>] ? xfs_allocbt_init_cursor+0x4c/0xc0 [xfs]
May 4 08:00:18 fileplanet kernel: [68558.778035] [<ffffffffa0087df9>] xfs_alloc_lookup_eq+0x19/0x20 [xfs]
May 4 08:00:18 fileplanet kernel: [68558.778044] [<ffffffffa0088bca>] xfs_free_ag_extent+0x42a/0x670 [xfs]
May 4 08:00:18 fileplanet kernel: [68558.778054] [<ffffffffa008a759>] xfs_free_extent+0xb9/0xe0 [xfs]
May 4 08:00:18 fileplanet kernel: [68558.778065] [<ffffffffa00955ad>] xfs_bmap_finish+0x15d/0x1a0 [xfs]
May 4 08:00:18 fileplanet kernel: [68558.778077] [<ffffffffa00b7d41>] xfs_itruncate_finish+0x171/0x350 [xfs]
May 4 08:00:18 fileplanet kernel: [68558.778090] [<ffffffffa00d24e6>] xfs_inactive+0x346/0x4a0 [xfs]
May 4 08:00:18 fileplanet kernel: [68558.778136] [<ffffffff8155743e>] ? _spin_lock+0xe/0x20
May 4 08:00:18 fileplanet kernel: [68558.778161] [<ffffffff8117959c>] ? fsnotify_clear_marks_by_inode+0x9c/0x100
May 4 08:00:18 fileplanet kernel: [68558.778174] [<ffffffffa00df4b2>] xfs_fs_clear_inode+0x72/0x80 [xfs]
May 4 08:00:18 fileplanet kernel: [68558.778187] [<ffffffff8115b00e>] clear_inode+0x7e/0x100
May 4 08:00:18 fileplanet kernel: [68558.778194] [<ffffffff8115b7c6>] generic_delete_inode+0x196/0x1c0
May 4 08:00:18 fileplanet kernel: [68558.778200] [<ffffffff8115b855>] generic_drop_inode+0x65/0x80
May 4 08:00:18 fileplanet kernel: [68558.778206] [<ffffffff8115a1f2>] iput+0x62/0x70
May 4 08:00:18 fileplanet kernel: [68558.778213] [<ffffffff81150ca2>] do_unlinkat+0x112/0x1d0
May 4 08:00:18 fileplanet kernel: [68558.778219] [<ffffffff81154305>] ? sys_getdents+0xb5/0xf0
May 4 08:00:18 fileplanet kernel: [68558.778225] [<ffffffff81150ed2>] sys_unlinkat+0x22/0x40
May 4 08:00:18 fileplanet kernel: [68558.778252] [<ffffffff810131b2>] system_call_fastpath+0x16/0x1b
May 4 08:00:18 fileplanet kernel: [68558.778355] xfs_force_shutdown(vdb,0x8) called from line 4341 of file /build/buildd/linux-2.6.32/fs/xfs/xfs_bmap.c. Return address = 0xffffffffa00955e6
May 4 08:00:18 fileplanet kernel: [68558.780886] Filesystem "vdb": Corruption of in-memory data detected. Shutting down filesystem: vdb

After rebooting I still cannot mount this partition. Not in the VM and not in the host system. When I try, it says:

mount: Structure needs cleaning

I tried booting with an older (2.6.31) kernel, but didn't help. Can I safely run xfs_repair without causing any more damage?

All disks in this system are attached to an Areca hardware raid controller configured in Raid5 and has been operating for months now without any problem.

Anyone else experience these problems perhaps?

Thanks in advance.

Regards,
Hugo

hrodenburg

May 5th, 2010, 08:22 AM

To follow up my own post. I ran xfs_repair on the filesystem which got it back up again. This morning however, it broke down again. (yaikes!)
It seems dat running rsnapshot breaks it. I use rsnapshot to backup a remote server.
Rsnapshot does a lot of hard linking files etc. Could this be a lead/issue? I found a slight similar issue here (http://oss.sgi.com/archives/xfs/2008-08/msg00062.html). But that one is old, and probably fixed ages ago right now.
I'm still running the 2.6.31-16 kernel by the way. But seems to make no difference.
I disabled rsnapshot for now, and perhaps will give it it's own partition before it screwes up my entire data partition.
Any thoughts? Is this really a bug? Should I file one?

Thanks for any replies.

Oh, and running xfs_repair fixed it again :)

heli@work

July 9th, 2010, 10:52 AM

Hi Hugo,
I'm struggling with the same problem since some weeks, but it looks like it is not related to Ubuntu. Here's my setup:

kvm host machine: Debian GNU/Linux 5.0 (lenny)
kvm virtuell machine: also Debian Lenny
kernel on both: 2.6.32-bpo.5-amd64
FC-HBA: Qlogic QLE2560 ISP2532-based 8Gb Fibre Channel to PCI Express

The host boots from a SAN (Hardware raid6, FC 8Gbps, through san switch). The VM is also located on the SAN.

Additionally, the host provides a 4.5 TB raid partition to the VM (no PCI passthrough). The VM (backupserver) uses this for backup to disk (bacula).
The host maintains the 4.5 TB device via lvm2 and one big logical volume with xfs filesystem. It is passed to the VM as /dev/vdc.

From time to time (3 times within the last month) the backup2disk partition gets corrupted and even worse, this makes the whole SAN crash (where about 15 other machine depend on).

I don't use any snapshots like you.

Before I provide tons of useless log messages, I would really know if this depends on xfs, FC/HBA-problem, kernel 2.6.32, kvm, or anything else.

xfs_repair fixes the filesystem with the cost of losing some files which are 100GB of size each (tape equivalent). This makes the backup rather useless. :^(

Helmut