login
Header Space

 
 

Re: [RFC][PATCH] md: avoid fullsync if a faulty member missed a dirty transition

Previous thread: AIM7 40% regression with 2.6.26-rc1 by Zhang, Yanmin on Tuesday, May 6, 2008 - 1:48 am. (140 messages)

Next thread: make vmstat cpu-unplug safe by KOSAKI Motohiro on Tuesday, May 6, 2008 - 3:02 am. (2 messages)
To: Mike Snitzer <snitzer@...>
Cc: <linux-raid@...>, <linux-kernel@...>, <paul.clements@...>
Date: Tuesday, May 6, 2008 - 2:53 am

I can't help thinking that you are misinterpreting something.  I don't
think there is a clean-&gt;dirty transition happening here.
You could confirm this by using --examine on both devices after the
messy shutdown and before re-assembling the array.

Even allowing for that possible confusion, I cannot quite see what is
going on.
It is fairly clear from the event counts that the NBD device is marked
clean, but if this is happening at array-shutdown time, I cannot see
why md would try to write to the NBD device and thereby detect an
error...

Do you have an internal bitmap or a bitmap in an external file?

In general, I would not like to make decisions based on the
oddness/evenness of the event counter.  I consider that to be an
internal implementation detail.  I am happy to make decisions based on
a difference-of-1.  I need to understand the big picture first though.

NeilBrown
--
To: Neil Brown <neilb@...>
Cc: <linux-raid@...>, <linux-kernel@...>, <paul.clements@...>
Date: Tuesday, May 6, 2008 - 7:58 am

Hi Neil,

I definitely could be misinterpreting something.  However, I did
determine that if the write-mostly NBD member of the raid1 becomes
degraded while writing to the raid1 it frequently has an 'events' that
is one less than the 'events_cleared' (of the local raid1 member that
the array gets reassembled with first).  The events indicate the NBD
member is clean and the local member is dirty.

I'm using internal bitmaps.  I've focused on the even-&gt;odd
(clean-&gt;dirty) transition to rationalize the safety of allowing the
NBD member to be off by one _and_ clean.  That could easily be
superficial but it seems significant.

It looks like bitmap_update_sb()'s incrementing of events_cleared (on
behalf of the local member) could be racing with the fact that the NBD
member becomes faulty (whereby making the array degraded).  This
allows the events_cleared to reflect a clean-&gt;dirty transition last
occurred before the array became degraded.  My reasoning is: If it was
a clean-&gt;dirty transition the bitmap still has the associated dirty
bit set in the local member's bitmap, so using the bitmap to resync is
valid.

thanks,
Mike
--
Previous thread: AIM7 40% regression with 2.6.26-rc1 by Zhang, Yanmin on Tuesday, May 6, 2008 - 1:48 am. (140 messages)

Next thread: make vmstat cpu-unplug safe by KOSAKI Motohiro on Tuesday, May 6, 2008 - 3:02 am. (2 messages)
speck-geostationary