Alexey Starikovskiy wrote:Thanks for your robust response. Sorry for not discussing this more in the first place. I wanted to write the code that fixes my system first, and post it as soon as possible. I think I missed a flag to say "this post is not an immediate request for submission". I confess I gave up too quickly on the synchronisation issues and resorted to brute force. Patch #1 is preventing a narrow race: - acpi_ec_write_cmd() (QUERY) - EC writes 0 (no event) to input buffer - new event arrives, triggering a GPE interrupt which is ignored because QUERY_PENDING is set. - QUERY_PENDING is cleared (but too late). Now we ignored an event, which will be delayed until the next GPE interrupt. The original reason for patch #1 was to stop patch #2 driving me insane. If you apply the second patch on it's own, you break EC_FLAGS_QUERY_PENDING. It gets cleared as soon as the _first_ query command is submitted. It works, but it means the flag no longer has a clearly defined meaning. I tried to fix that by only clearing it once the loop has finished - and then realised I was widening a pre-existing race condition. As you say, I took the easy way out and pushed it all into the workqueue. So the workqueue ends up as a buffer of GPE occurrences. I did look at reducing the memory usage while avoiding race conditions, but I couldn't find a reasonable solution. But I looked at it again now and I have a better solution: ... /* moved here from acpi_ec_transaction_unlocked() */ clear_bit(EC_FLAGS_QUERY_PENDING, &ec->flags); while (acpi_ec_query(ec, &value) == 0) acpi_ec_gpe_run_handler(ec, value); } The PENDING flag then reflects whether or not there's an item on the workqueue. Does that make sense? It's not quite redundant with the first patch. We still have GPE polling mode - patch #1 doesn't affect that. In polling mode, it's essential to query all the pending events - otherwise, if they arrive more frequently than the polling interval then you will inevitably drop events. Patch #2 is also required to fix my buggy hardware. My laptop's EC buffers multiple events, but clears SCI_EVT after every query. This caused problems in polling mode; with multiple events between polling intervals only one gets queried - and after a while the buffer overflows and it breaks completely. I assume you're referring to the "would benefit from wider testing" patch #3. Thanks for identifying the bugzilla entry - I had difficulty separating the different entries on GPEs. I'm optimistic that we can fix all these crazy buffering EC's without having to disable GPE interrupts. The reason I like my proposed fix is that it makes the code simple enough that I can understand it, without making any assumptions about how many interrupts arrive per GPE. The EC can be broken in lots of ways, so long as: 1. We receive interrupts when one or more GPE's are pending. 2. We don't get a constant interrupt storm. I don't think I missed anything. Is there anything else I should check before I try to get testing? Thanks Alan --
| James Bottomley | Re: Integration of SCST in the mainstream Linux kernel |
| Greg Kroah-Hartman | [PATCH 005/196] Chinese: add translation of SubmittingDrivers |
| majkls | sys_chroot+sys_fchdir Fix |
| Paul Mackerras | Re: [linux-pm] [PATCH] Remove process freezer from suspend to RAM pathway |
git: | |
| Jarek Poplawski | [PATCH] pkt_sched: Destroy gen estimators under rtnl_lock(). |
| David Miller | [GIT]: Networking |
| KOSAKI Motohiro | [bug?] tg3: Failed to load firmware "tigon/tg3_tso.bin" |
