Hello fellow Ubuntu users
First and foremost, let me assure you that I've searched high and low for a solution to my problem. I don't simply jump inhere to ask for help if I can avoid it
The Machine
Core 2 Duo E4600
2GB DDR2 RAM (1 stick)
Intel ICH10R based motherboard (tried an ICH9R aswell)
4-port SATA controller (PCI Sil 3114)
O/S: Ubuntu Desktop x64 10.04 LTS (using 'desktop' because I like having a remote desktop)
The Storage Setup
Disks: Assorted selection of 9 disk. 750GB, 1000GB and 1500GB Seagate and Western Digital disks.
The disks are joined through a standard LVM2 configuration. I don't know the LVM term, but normally you'd call it a JBOD setup.
On that LVM device, I've put a cryptsetup device, made with the LUKS tools (aes-xts-plain 256)
On the cryptsetup device, I've created and mounted an EXT4 partition.
All in all, a completely standard LVM2 and LUKS setup, running EXT4
The Problem
After a reboot, I proceed to unlock my cryptsetup encryption device, and then mount the EXT4 partition. All is well, the mount is accessible and everything looks fine.
I then try to send a file to the mount, via Samba. After a few hundred MB written, the I/O wait goes berserk. It stays at 50% (dual core setup remember).
The system becomes unresponsive to network commands (can't browse samba) for about 5-10 minutes. When it finally responds, the I/O wait is gone and everything is now fine. I can write and read hundreds of GB's of data without any issues at all. I can benchmark and stress all disks perfectly fine and no logs are showing disk errors.
I tried monitoring my disks with 'iostat -d 2' while the I/O wait was happening, and there is some slight Blk_read/s activity on 1 disk at a time. First for example /dev/sda is showing a little Blk_read/s acitivty, then it jumps to the next disk, and when every disk has show that slight Blk_read/s activity (500-800 or so) the problem is gone and the I/O wait is no more.
The solution?
I've tried changing motherboards, switching disks around on the controllers, checking individual disks, replacing disks and I've tried different versions of Ubuntu. The problem however persists.
I could see it being a network issue, possibly a driver issue. But since the NIC is a standard RTL8111 on-board it seems unlike that the problem wouldn't be more widespread since this NIC is litterally being used everywhere. I did change my motherboard, so a faulty NIC seems unlikely twice in a row.
I could also imagine "self check" feature being done by either cryptsetup/LUKS or LVM, but I can't for the life of me find ANY information about such a feature anywhere.
I really hope someone inhere has experienced a similar problem and has been able to solve it. This problem is the sole reason I'm seriously considering Windows for my fileserver (yes, I'm going insane).
Thank you in advance! Any help is appreciated!
Sincerely
Martin Moerch Aka. Atroxes



Adv Reply

Bookmarks