On Thu, 8 Nov 2007, BERTRAND Jo=EBl wrote:
quoted text > BERTRAND Jo=EBl wrote:
>> Chuck Ebbert wrote:
>>> On 11/05/2007 03:36 AM, BERTRAND Jo=EBl wrote:
>>>> Neil Brown wrote:
>>>>> On Sunday November 4,
jpiszcz@lucidpixels.com wrote:
>>>>>> # ps auxww | grep D
>>>>>> USER PID %CPU %MEM VSZ RSS TTY STAT START TIME=20
>>>>>> COMMAND
>>>>>> root 273 0.0 0.0 0 0 ? D Oct21 14:40
>>>>>> [pdflush]
>>>>>> root 274 0.0 0.0 0 0 ? D Oct21 13:00
>>>>>> [pdflush]
>>>>>>=20
>>>>>> After several days/weeks, this is the second time this has happened,
>>>>>> while doing regular file I/O (decompressing a file), everything on
>>>>>> the device went into D-state.
>>>>> At a guess (I haven't looked closely) I'd say it is the bug that was
>>>>> meant to be fixed by
>>>>>=20
>>>>> commit 4ae3f847e49e3787eca91bced31f8fd328d50496
>>>>>=20
>>>>> except that patch applied badly and needed to be fixed with
>>>>> the following patch (not in git yet).
>>>>> These have been sent to stable@ and should be in the queue for 2.6.23=
=2E2
quoted text >>>> My linux-2.6.23/drivers/md/raid5.c contains your patch for a long
>>>> time :
>>>>=20
>>>> ...
>>>> spin_lock(&sh->lock);
>>>> clear_bit(STRIPE_HANDLE, &sh->state);
>>>> clear_bit(STRIPE_DELAYED, &sh->state);
>>>>
>>>> s.syncing =3D test_bit(STRIPE_SYNCING, &sh->state);
>>>> s.expanding =3D test_bit(STRIPE_EXPAND_SOURCE, &sh->state);
>>>> s.expanded =3D test_bit(STRIPE_EXPAND_READY, &sh->state);
>>>> /* Now to look around and see what can be done */
>>>>
>>>> /* clean-up completed biofill operations */
>>>> if (test_bit(STRIPE_OP_BIOFILL, &sh->ops.complete)) {
>>>> clear_bit(STRIPE_OP_BIOFILL, &sh->ops.pending);
>>>> clear_bit(STRIPE_OP_BIOFILL, &sh->ops.ack);
>>>> clear_bit(STRIPE_OP_BIOFILL, &sh->ops.complete);
>>>> }
>>>>
>>>> rcu_read_lock();
>>>> for (i=3Ddisks; i--; ) {
>>>> mdk_rdev_t *rdev;
>>>> struct r5dev *dev =3D &sh->dev[i];
>>>> ...
>>>>=20
>>>> but it doesn't fix this bug.
>>>>=20
>>>=20
>>> Did that chunk starting with "clean-up completed biofill operations" en=
d
quoted text >>> up where it belongs? The patch with the big context moves it to a=20
>>> different
>>> place from where the original one puts it when applied to 2.6.23...
>>>=20
>>> Lately I've seen several problems where the context isn't enough to mak=
e
quoted text >>> a patch apply properly when some offsets have changed. In some cases a
>>> patch won't apply at all because two nearly-identical areas are being
>>> changed and the first chunk gets applied where the second one should,
>>> leaving nowhere for the second chunk to apply.
>>
>> I always apply this kind of patches by hands, and no by patch comman=
d.=20
quoted text >> Last patch sent here seems to fix this bug :
>>=20
>> gershwin:[/usr/scripts] > cat /proc/mdstat
>> Personalities : [raid1] [raid6] [raid5] [raid4]
>> md7 : active raid1 sdi1[2] md_d0p1[0]
>> 1464725632 blocks [2/1] [U_]
>> [=3D=3D=3D=3D=3D>...............] recovery =3D 27.1% (396992504/1=
464725632)=20
quoted text >> finish=3D1040.3min speed=3D17104K/sec
>
> =09Resync done. Patch fix this bug.
>
> =09Regards,
>
> =09JKB
>
Excellent!
I cannot easily re-produce the bug on my system so I will wait for the=20
next stable patch set to include it and let everyone know if it happens=20
again, thanks.