I (and many others) have a problem. I am using Ubuntu 8.04 "Hardy Heron" (alpha). xorg 1:7.3+10ubuntu6 Linux ubuntu 2.6.24-12-generic #1 SMP Mon Mar 10 15:32:00 UTC 2008 i686 GNU/Linux MSI P35 Neo motherboard with a HP USB keyboard. Sometimes the keys get "stuck" and it keeps on repeating the button all the time. It often happen when I play games, that it repeats one of the arrow keys. Now this is a problem that lots of people have experienced. I've experienced it with other kernels too. https://bugs.launchpad.net/ubuntu/+source/xorg/+bug/194214 It is a very annoying bug, that happens to many users. Restarting Xorg solves the problem. Someone said this was a problem with dynticks, but I don't know. Does anyone know what is causing this problem, and how to fix it? As I don't subscribe to this list, I would like to get a reply sent to my email. --
Hi Fred, I had the same problem back in the 2.6.16 to 2.6.20 days (no dynticks). The only solution I found was to get rid of gnome. Since then I've been Yep, try to get rid of gnome, if you don't see the problem anymore then complain to the gnome developers. Sebastien. --
Very probably this is due to broken way how X themselves implement auto-repeat, instead of using kernel-provided auto-repeat functionality. I guess you are not able to reproduce this problem in the console, but only AFAIK it's a problem with X being confused by scheduler behavior (which is not necessarily wrong). -- Jiri Kosina SUSE Labs --
It should be said that X implements auto-repeat out of necessity. While the kernel can report key down and up events, its further interpretation of those events is not appropriate. Many combinations of events are possible, such as keyboard plus mouse, and this precludes the kernel from providing a full interpretation. It would be wrong for it to even try. X is the proper place to implement auto-repeat for X. --
The problem is that under heavy load the auto-repeat problem is real; I've seen it as well, and it means that I've started to try to avoid "make -j4" since that's a great way to trigger it. - Ted --
On Wed, 12 Mar 2008 10:47:01 -0400 I reported a bug on this http://bugzilla.kernel.org/show_bug.cgi?id=10163 It seemed to be related to group scheduler config option. --
I'm sure it does suck under heavy load, although I suppose you could increase the start time. But I wonder if that is the problem this time? Modern machines are so damned fast they actually take real effort to load. Actually, "make -j4" doesn't sound particularly heavy. There's huge disk i/o in a make. Plenty of scheduling opportunities. Obviously I only know what everybody else here knows; but with so many recent posts suggesting a scheduling fault has been introduced, I'm expecting it to be that. --
The problem became much more apparent during early -rc phase of 2.6.25 for those people that have CONFIG_GROUP_SCHED turned on. This clearly shows that X are somehow unhappy with how kernel schedules them, but I don't have idea how autorepeat is implemented inside X and what could be the problem right now. -- Jiri Kosina SUSE Labs --
Just for the record, this problem started with openSUSE 10.2 for me, that's a 2.6.18 thingy. I'm a heavy xterm user, where the autorepeat gets a life of its own _occasionally_. I'm able to stop it by triggering a autorepeat manually (typical antidot reaction). Pete --
I've seen these key repeats for years. All I ever had to do was to make X run heavily enough in the presence of another (hefty) load that it hits the expired array and thereby takes a serious latency hit. I always considered key repeats under load to be X's quaint way of saying "HEEEEELP MEEEE" ;-) -Mike --
I experience random repeats during heavy loads such as yum upgrade, which triggers a huge swapout, in Fedora Core 7 with Fedora's 2.6.23.14-64 on amd64, Xorg 1.3... Using USB keyboard. I'm not sure if it's the same issue or not, they don't repeat "forever" for me, it just makes my speeellllliingggg llllooookkk teerrriibbblle. Like that. Before this happens, letters usually stop appearing on screen as I'm typing. I usually stop typing at that point, since I know it will just become a mess. I haven't encountered "forever"-stuck keys since about Fedora Core 5, I don't remember kernel and X versions there, but I was using a PS/2 keyboard, and almost always it was something like ctrl-right that stuck, which meant that the computer ended up switching virtual desktops as fast as it could. Recovery involved bashing on keyboard, trying ctrl-alt-backspace, ctrl-alt-f1 and so on until something worked. It took many minutes to clear up, with everything trying to redraw their windows a few hojillion times. I hope this doesn't come back, I can cope with the bad spelling ;-) --
Hm, dunno what all is in that kernel. Huge swapout with yum upgrade shouldn't be happening I don't think, upgrades here certainly don't (I'm using suses upgrade dohickey though...). I can only recommend trying Yes, that's the symptom I was refering to. If you see that under reasonable CPU load, and _without_ major swapping going on, then I'd be suspicious of scheduler trouble. Swap can definitely keep X off the cpu for extended periods, and that seems to be what triggers the repeated keys behavior. (I've never troubleshot it, so must say _seems_) -Mike --
I have a hard time calling this a kernel scheduler trouble. My understanding is that X are sometimes unhappy how kernel schedules them when under load, and that triggers bug in X autorepeat code. -- Jiri Kosina SUSE Labs --
Only if X is not getting CPU for long periods without swapping would I become suspicious of the scheduler. We recently had exactly this trouble with the fair groups load balancing code (now reverted) and CPU bound loads. The bug is a symptom of latency woes, and the scheduler is just one potential source worth watching. In the heavy swapping case, it's unlikely to be scheduler trouble, and yes, the bug lies elsewhere. -Mike --
So I would like to ask if swap letting X (and everything else in my experience) out of the cpu for extended periods is considered normal behaviour, in the sense that nobody is trying to "fix" it (due to it being considered impossible to fix)...? Sorry for being off-topic, but I run a minimal Window Maker desktop in a P4 3.0 GHz with 512 MB of RAM (around 140 MB being used as per 'free'), and trying to load a 380 MB text file in xjed editor makes my whole desktop quite unfair... it takes tens of seconds to switch desktop, type things in the terminal etc. When xjed finishes loading the text file, everything comes back to "fair" again. Is there some law in the nature of computers which says that when swapping everything else waits for swap to finish its business? I hope not :-) --
I propose you start a new thread about this with proper Subject, and CC the scheduler maintainers. Thanks, -- Jiri Kosina SUSE Labs --
Right. I am sorry! But the thing I've learned from: http://lkml.org/lkml/2008/2/28/249 makes me _not_ think the scheduler is guilty by itself. Although it may appear so when first-looked upon. Now I think that the problem I was facing that day was also caused by swapping, which makes everything else wait for it to finish. So I am sorry for going off-topic here, but I couldn't resist asking Galbraith about it. Or I should just start a new thread like this? :-) Petition for Ingo writing CFSS: Completely Fair Swap Scheduler --
No, it's not normal. I'd say the VM makes bad decisions if _moderate_ swapping behaves badly. Heavy swapping is another story. (I think Rik is addressing some VM issues as we speak, so hopefully it will improve Me too. In the past, I tested swap heavily (and beat it into submission when it misbehaved for me), but haven't tested swap performance since becoming fairly ram-wealthy. -Mike --
Yes, this is perfectly normal. A heavily swapping machine will swap out parts of X. Now, if X has a need for low-latency for keyboard handling, then the X developers can use mlock to lock the X keyboard service in memory, and make it a real-time (or at least high priority) process too. This should avoid the problem even with extreme swapping and/or Seems ou use too much memory then. If xjed wastes memory (by bringing the entire file into memory No such law, but there are badly implemented software around. If xjed is capable of delaying all X events while loading the file, for example . . . Helge Hafting --
Right, but making the mistake of not being very precise That would be a great thing for me. But why one wouldn't want this behaviour to be default for a desktop? I mean it should be like that already for a desktop experience. I don't care if xjed takes longer to load the 380MB file while swapping something as long as I don't feel my Well, Window Maker by itself uses around 5-10 MB of RAM. The 140 MB figure was with firefox and thunderbird openned, I tried with emacs and it simply said something like this: "Are you sure you want to open this big file?" I said 'yes' and emacs reffused with "Buffer memory excedded" or something like that. At least xjed openned the file :-) Well, I must say that 'vi' could open the file almost Yeah, xjed uses too much memory for this, but it would be "harmless" if there were some mechanism to prevent swap (not too heavy) from starving the whole system. Why can't there be a swap scheduler for this situation? (I am sorry for being ignorant about it, there probably exists one). If more than one process is using swap, they should use it fairly. Put xjed's swap to rest for a moment, load the swap due to X, and go forward. My machine has 2 GB of swap area, both X and xjed swaps could exist simultaneously without "having to wait to long" for the other process business with swap to finish. Please forgive me if I am being unfair about something, I don't understand the internals of all this stuff. That's why I first asked if having swap _not_ interfering too much in other processes was impossible by some computer principle (like disks are not fast enough). But it appears that it is something related to the scheduling of what to read/write from/to swap and when. Of course that's just what I think, and I would like to learn more from knowleadgeble lkml people. Maybe in trying to explain things to me, some hacker may find that something could be made better, or point me to some /sys tunnable which makes my ...
Normally, memory that is used all the time does not get swapped out. If you use X while the machine is swapping, you will normally see lots of little delays, not one longe freeze. So this may have been something else. This scenario seems to require only moderate swap. There may be another explanation, that require more X knowledge than I have: Some windowing systems give apps the opportunity of blocking the entire windowing system while doing stuff. This is usually only used for "system modal dialogs" and for very quick operations that need to stop all other user interaction. I don't know to what extent this is possible in X, but if it is, then the door is open for badly written apps to to stupid things like load a 380M file while the user interface is blocked. A well-designed app should do such lengthy jobs without blocking everything else, so the user can do other stuff while the machine works. Note that any io-operation _might_ be lengthy - even a 50-byte file could reside on a network file system on a server located in a different continent. You may want to write to xjed developers, perhaps they can do something about this. Like loading the file in the background, perhaps. The interesting question is whether this is a swapping problem that can be solved by kernel fixing, or if it merely is a problem with the design of the X server. In the latter case you need to contact x.org developers instead. xjed developers can probably work around the problem too, although that would to be unnecessary if both the kernel and the x server were ideal. A test you can do: * Start up X as usual and log in * Switch to the console (ctrl+alt+F2) * Log in on the console, with the same user as you have in the X session. * Give the command "export DISPLAY=:0" (without the quotes) * Start xjed with the big file, from the console. You won't see it there, it will show itself in the X session instead. Make sure you start it in the background, i.e.: "xjed bigfile & ...
As I noted in another thread (unfortunately ignored so far),
sometimes xjed opens the file and my desktop is ok, as
responsive as I expect it to be.
So this isn't a xjed bug, even tough it could be better
and load parts of the file on demand (like vi appears
So it is a kernel problem!
I did this test a few times today, using today's git kernel and
it takes 3 seconds for 'ls' to appear in the screen after typing,
and 10 seconds (I checked with the clock) to be executed.
Today I think I've found a reliable way to reproduce this
bad behaviour (I've got it 5 times from 5 attempts so far).
It is independent of elevator={anticipatory, cfq, deadline},
I could get the bad results with all of them.
I just boot the machine, mount the swap partition manually (it
looks like a Mandriva Cooker bug) and start 'xjed bigfile.txt &'
in the terminal (it also happens outside of X).
It takes more than 15 minutes to load the 380MB file when I hit
the problem. It takes like 10 minutes to finish Ingo's cfs-debug
script (I can see it stop _after_ the file is loaded).
I could verify that swap usage goes to 400 MB out of 2 GB, and when
I start the test this is what I have:
total used free shared buffers cached
Mem: 515496 148628 366868 0 9916 80612
-/+ buffers/cache: 58100 457396
Swap: 2144636 0 2144636
In one of the tests I started 'vmstat 1 >> vmstat_log3.txt' a few seconds
before starting xjed. And this is what I got (notice that it has 400 lines,
showing that it took 400 seconds to load the file, but in fact it was
even more. Time seems to be distorted during this test)
www.ift.unesp.br/users/crmafra/vmstat_log3.txt
I could also get some info from latencytop during the test, but
it was very difficult because the terminal screen took ages to
refresh. This is worst log I could get manually:
Cause Maximum ...When it happens to me, the pointer usually updates only about 1 - 2 times per second, which seems to be enough to cause keyboard repeat problems as well. If the swapfile usage is 0 when a big swapout takes place, I usually see about 6M/sec pushed to disc and rarely pointer or keyboard anomalies. After running a few more days though, swapping rates drop to 1.3M per sec or worse. The delays in X get bigger too, sometimes several seconds long. I guess if the kernel swaps out a piece of X that X needs, then it takes much much longer to get it back at that low swap rate, and it ends up manifesting itself as pointer jitter and keyboard repeat problems. The pointer stutter can be considered acceptable, I guess, atleast there the pointer (eventually) jumps to where I expect it to go based on how much I move the mouse... I have to wonder what X is doing to break the keyboard... --
If X needs low-latency for keyboard, X is misdesigned. Kernel already provides timestamped input events... Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html --
No. hw is proper place to implement autorepeat, and along with some buffering, it has chance to work. Kernel is not real-time, and X are definitely not real-time, while autorepeat is real-time operation. It actually mostly works in ps/2 case. Buffer in hardware means that pretty big interrupt delays can be tolerated without problems. -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html --
With so many people suffering from this bug (for a long time), and so many bright people here, why doesn't it get fixed? ...other operating systems don't suffer from this bug... --
X is broken here... feel free to fix it. You could also test 2.6.25-rc4, perhaps it is fixed there? Disable GROUP_SCHED. -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html pomozte zachranit klanovicky les: http://www.ujezdskystrom.info/ --
That's true. Unfortunately USB keyboards don't behave this way and there is nothing we can do about that. -- Jiri Kosina SUSE Labs --
Maybe. (Could we get host controller to effectively timestamp usb packets for us?) ..but that is not a problem here, because X are broken even on ps/2 keyboards. USB keyboards may be misdesigned, but they are not responsible for problems we see. Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html --
So does the keyboard events generate something like this then: KEY_x_DOWN KEY_x_REPEAT KEY_x_UP If so then X certainly could get all the keyboard information I imagine it needs from the kernel, but otherwise I am not sure how it could. A repeated series of key down events are not enough since some keys you don't want repeated you just want to know when the key is held down and when it isn't. I just hope someone figures it out since I would love to stop getting duplicate characters whenever the system is under a bit of load in X. -- Len Sorensen --
Yes, kernel<->user interface is something like that. Try evtest to see
PS/2 keyboard sends both ups and downs. "down down down up" means
autorepeat. It is not actually ambiguous.
BTW this is what I use to generate huge latencies and cause X
problems:
void
main(void)
{
int i;
iopl(3);
while (1) {
asm volatile("cli");
// for (i=0; i<20000000; i++)
for (i=0; i<1000000000; i++)
asm volatile("");
asm volatile("sti");
sleep(1);
}
}
...run it once per core.
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
--
My Dell X1 notebook had this problem HUGELY, back in about 2.6.18 or so. Under *no* load to speak of, X would lose track of a "key up" event and forever be stuck on "key down". I could still use the mouse to start another X session simultaneously, and in that alternate X things worked fine. So it was definitely an X server process issue, not a system wide kernel thing. And not a GNOME thing -- I use KDE exclusively. Problem seems to have gone away since I put 2.6.23 onto that machine. Newer kernels have broken suspend/resume there, so 2.6.23 is as high as that one gets for now. -ml --
