22 May 2009

E2fsck hanging at 70% on Large Partition

We had a disk of 1 TB sized partition, which started giving problems.

We ran fsck, but it got stuck at 70% after about 4-5 hours.

The version on our system was
e2fsprogs-1.35-7.1.
The issue we faced on Friday, 13-Feb-2009 [Friday the 13th :)]

On researching further it seemed to be due to a bug in the e2fsprogs.
Finally we concluded the bug to be a floating point precision error which could cause e2fsck to loop forever on really big filesystems with a large inode count.

On searching through our friend Google, we found a
Bug Report.
Read more about it at:http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=411838http://www.linuxquestions.org/questions/linux-software-2/e2fsck-is-running-for-3-days-checking-800-gb-ext3-lvm-volume-411715/

We prepared a Patch to address the issue. Here is the patch:

--- e2fsprogs-1.35/lib/ext2fs/icount.c 2003-12-07 22:41:38.000000000 +0530
+++ e2fsprogs-1.35/lib/ext2fs/icount.c.new 2009-03-16 17:39:11.000000000 +0530
@@ -251,6 +251,10 @@
range = ((float) (ino - lowval)) /
(highval - lowval);
mid = low + ((int) (range * (high-low)));
+ if (mid > high)
+ mid = high;
+ if (mid < mid =" low;" ino ="=">list[mid].ino) {
We applied the patch against the source RPM and rebuilt the RPM to get a patched e2fsck.
Then we ran the fsck again on the partition and waited for about 4-6 hours. Wow, it completed successfully. Here is the output:

rpm -ivh e2fsprogs-1.35-7.1.src.rpm


copied our patch "e2fsprogs-1.35-icount-floating-point-precision.patch" to /usr/src/redhat/SOURCES


vi /usr/src/redhat/SPECS/e2fsprogs.spec


Line15: Patch8: e2fsprogs-1.35-icount-floating-point-precision.patch
Line54: %patch8 -p1 -b .icount-float


cd /usr/src/redhat/SPECS/
rpmbuild -ba e2fsprogs.spec


...
...
...
Wrote: /usr/src/redhat/SRPMS/e2fsprogs-1.35-7.1.src.rpm
Wrote: /usr/src/redhat/RPMS/x86_64/e2fsprogs-1.35-7.1.x86_64.rpm
Wrote: /usr/src/redhat/RPMS/x86_64/e2fsprogs-devel-1.35-7.1.x86_64.rpm
Wrote: /usr/src/redhat/RPMS/x86_64/e2fsprogs-debuginfo-1.35-7.1.x86_64.rpm


Now, e2fsprogs-1.35-7.1.x86_64.rpm was installed with rpm -Uvh


[root@server root]# e2fsck -C 0 /dev/cciss/c1d0p1
e2fsck 1.35 (28-Feb-2004)
/dev/cciss/c1d0p1 was not cleanly unmounted, check forced.
Pass 1: Checking inodes, blocks, and sizes
Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Unattached inode 101949715
Connect to /lost+found? yes

Inode 101949715 ref count is 65535, should be 1. Fix? yes

Unattached zero-length inode 101949716. Clear? yes

Unattached inode 101949717
Connect to /lost+found? yes

Inode 101949717 ref count is 65535, should be 1. Fix? yes

Unattached inode 101949718
Connect to /lost+found? yes

Inode 101949718 ref count is 65535, should be 1. Fix? yes

Unattached inode 101949719
Connect to /lost+found? yes

Inode 101949719 ref count is 65535, should be 1. Fix? yes

Unattached inode 101949720
Connect to /lost+found? yes

Inode 101949720 ref count is 65535, should be 1. Fix? yes

Unattached inode 101949721
Connect to /lost+found? yes

Inode 101949721 ref count is 65535, should be 1. Fix? yes

Pass 5: Checking group summary information
Block bitmap differences: -(204107626--204107632) -(204107680--204107685) -204107757
Fix? yes

Free blocks count wrong for group #6228 (5, counted=19).
Fix? yes

Free blocks count wrong (54192711, counted=54192725).
Fix? yes

Inode bitmap differences: -(101949713--101949714)
Fix? yes

Free inodes count wrong for group #6222 (7057, counted=7059).
Fix? yes

Free inodes count wrong (78687902, counted=78687904).
Fix? yes


/dev/cciss/c1d0p1: ***** FILE SYSTEM WAS MODIFIED *****
/dev/cciss/c1d0p1: 46764384/125452288 files (0.-1% non-contiguous), 196710437/250903162 blocks
[root@server root]#
Maybe it helps someone else too.