> On Mon, Apr 14, 2008 at 07:37:20PM +0200, Jens Axboe wrote:
> > On Mon, Apr 14 2008,
scameron@beardog.cca.cpqcorp.net wrote:
> > >
> > >
> > > > On Mon, Apr 14 2008,
scameron@beardog.cca.cpqcorp.net wrote:
> > > > >
> > > > >
> > > > > Fix race condition between cciss_init_one(), cciss_update_drive_info(),
> > > > > and cciss_check_queues(). cciss_softirq_done would try to start
> > > > > queues which were not quite ready to be started, as its checks for
> > > > > readiness were not sufficiently synchronized with the queue initializing
> > > > > code in cciss_init_one and cciss_update_drive_info. Slow cpu and
> > > > > large numbers of logical drives seem to make the race more likely
> > > > > to cause a problem.
> > > >
> > > > Hmm, this seems backwards to me. cciss_softirq_done() isn't going to
> > > > start the queues, until an irq has triggered for instance. Why isn't the
> > > > init properly ordered instead of band-aiding around this with a
> > > > 'queue_ready' variable?
> > > >
> > >
> > > Each call to add_disk() will trigger some interrupts,
> > > and earlier added disks may cause the queues of later,
> > > not-yet-completely added disks to be started.
> > >
> > > I suppose the init routine might be reorganized to initialize all
> > > the queues, then have second loop call add_disk() for all
> > > of them. Is that what you had in mind by "properly ordered?"
> >
> > Yep precisely, don't call add_disk() until everything is set up.
> >
> > > Disks may be added at run time though, and I think this tears
> > > down all but the first disk, and re-adds them all, if I remember
> > > right, so there is some complication there to think about.
> >
> > Well, other drivers manage quite fine without resorting to work-arounds
> > :-)
>
> Ok. Thanks for the constructive criticism. I'll rethink it.
>
> Fortunately, (or unfortunately) the race is apparently pretty hard
> to trigger, it's been in there for ages, and we've only just seen it
> manifest as a problem recently and only in one particular configuration.