Re: [PATCH 1/3] acpi: Rip out EC_FLAGS_QUERY_PENDING (prevent race condition)

!MAILaRCHIVE_VOTE_RePLACE
Previous message: [thread] [date] [author]
Next message: [thread] [date] [author]
To: Alexey Starikovskiy <astarikovskiy@...>
Cc: Henrique de Moraes Holschuh <hmh@...>, <linux-acpi@...>, <linux-kernel@...>
Date: Saturday, July 19, 2008 - 4:41 pm

Alexey Starikovskiy wrote:

Thanks for your robust response.  Sorry for not discussing this more in
the first place.  I wanted to write the code that fixes my system first,
and post it as soon as possible.  I think I missed a flag to say "this
post is not an immediate request for submission".


I confess I gave up too quickly on the synchronisation issues and
resorted to brute force.  Patch #1 is preventing a narrow race:

- acpi_ec_write_cmd() (QUERY)
- EC writes 0 (no event) to input buffer
- new event arrives, triggering a GPE interrupt which is ignored because
QUERY_PENDING is set.
- QUERY_PENDING is cleared (but too late).

Now we ignored an event, which will be delayed until the next GPE interrupt.

The original reason for patch #1 was to stop patch #2 driving me
insane.  If you apply the second patch on it's own, you break
EC_FLAGS_QUERY_PENDING.  It gets cleared as soon as the _first_ query
command is submitted.  It works, but it means the flag no longer has a
clearly defined meaning.  I tried to fix that by only clearing it once
the loop has finished - and then realised I was widening a pre-existing
race condition.

As you say, I took the easy way out and pushed it all into the
workqueue.  So the workqueue ends up as a buffer of GPE occurrences.  I
did look at reducing the memory usage while avoiding race conditions,
but I couldn't find a reasonable solution.  But I looked at it again now
and I have a better solution:

...
    /* moved here from acpi_ec_transaction_unlocked() */
    clear_bit(EC_FLAGS_QUERY_PENDING, &ec->flags);

    while (acpi_ec_query(ec, &value) == 0)
        acpi_ec_gpe_run_handler(ec, value);
}

The PENDING flag then reflects whether or not there's an item on the
workqueue.  Does that make sense?

It's not quite redundant with the first patch.  We still have GPE
polling mode - patch #1 doesn't affect that.  In polling mode, it's
essential to query all the pending events - otherwise, if they arrive
more frequently than the polling interval then you will inevitably drop
events.

Patch #2 is also required to fix my buggy hardware.  My laptop's EC
buffers multiple events, but clears SCI_EVT after every query.  This
caused problems in polling mode; with multiple events between polling
intervals only one gets queried - and after a while the buffer overflows
and it breaks completely.
I assume you're referring to the "would benefit from wider testing"
patch #3. Thanks for identifying the bugzilla entry - I had difficulty
separating the different entries on GPEs.  I'm optimistic that we can
fix all these crazy buffering EC's without having to disable GPE interrupts.

The reason I like my proposed fix is that it makes the code simple
enough that I can understand it, without making any assumptions about
how many interrupts arrive per GPE.  The EC can be broken in lots of
ways, so long as:

1. We receive interrupts when one or more GPE's are pending.
2. We don't get a constant interrupt storm.

I don't think I missed anything.  Is there anything else I should check
before I try to get testing?

Thanks
Alan
--
Previous message: [thread] [date] [author]
Next message: [thread] [date] [author]

Messages in current thread:
Re: [PATCH 1/3] acpi: Rip out EC_FLAGS_QUERY_PENDING (preven..., Alexey Starikovskiy, (Sat Jul 19, 12:59 pm)
Re: [PATCH 1/3] acpi: Rip out EC_FLAGS_QUERY_PENDING (preven..., Alan Jenkins, (Sat Jul 19, 4:41 pm)
Re: [PATCH 1/3] acpi: Rip out EC_FLAGS_QUERY_PENDING (preven..., Alexey Starikovskiy, (Sat Jul 19, 5:12 pm)