> On Tue, Dec 11, 2007 at 04:16:32PM -0800, Ben Woodard wrote:
> > We may need to go back and do some additional work on this. It doesn't
> > seem to be quite as cut and dried as we initially thought.
> >
> > This quirk doesn't appear to work on virtually the same motherboard with
> > the barcelona processors in it. It also may be sensitive to the firmware
> > version. More extensive testing on a larger number of pre-production is
> > not showing it to be as effective as it appeared to be initially on the
> > testbed.
> >
> > I'm doing some retesting to figure out what exact situations and
> > collection of patches were able to make it work before.
> >
> Ben, please lets be clear about this. You say this patch doesn't help on a new
> system. Even thought its almost the exact same system, its not the same system.
> Does this patch work consistently on the system you initially reported the
> problem on? I've done enough work on this at this point that I'm invested in
> not abandoning this fix. If this solves the problem on dual core system, but
> not quad core, I'd much rather move forward with this fix and address your quad
> core problem as a separate issue.
>
> Neil
>
>
> > -ben
> >
> >
> >
> > Neil Horman wrote:
> > > Recently a kdump bug was discovered in which a system would hang inside
> > > calibrate_delay during the booting of the kdump kernel. This was caused by the
> > > fact that the jiffies counter was not being incremented during timer
> > > calibration. The root cause of this problem was found to be a bios
> > > misconfiguration of the hypertransport bus. On system affected by this hang,
> > > the bios had assigned APIC ids which used extended apic bits (more than the
> > > nominal 4 bit ids's), but failed to configure bit 17 of the hypertransport
> > > transaction config register, which indicated that the mask for the destination
> > > field of interrupt packets accross the ht bus (see section 3.3.9 of
> > >
http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/26094.PDF).
> > > If a crash occurs on a cpu with an APIC id that extends beyond 4 bits, it will
> > > not recieve interrupts during the kdump kernel boot, and this hang will be the
> > > result. The fix is to add this patch, whcih add an early pci quirk check, to
> > > forcibly enable this bit in the httcfg register. This enables all cpus on a
> > > system to receive interrupts, and allows kdump kernel bootup to procede
> > > normally.
> > >
> > > Regards
> > > Neil
> > >
> > >
> > > Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
> > >