After the recent conversation with Roland and after more testing, I have another patch for review (although _not_ for submission, as again it's against 2.6.18.5). This patch breaks the shared utime/stime/sched_time fields out into their own structure which is allocated as needed via alloc_percpu(). This avoids cache thrashing when running lots of threads on lots of CPUs. Please take a look and let me know what you think. In the meantime I'll be working on a similar patch to 2.6-head that has optimizations for uniprocessor and two-CPU operation, to avoid the overhead of the percpu functions when they are unneeded. This patch: Replaces the utime, stime and sched_time fields in signal_struct with the shared_times structure, which is cacheline aligned and allocated when needed using the alloc_percpu() mechanism. There is one copy of this structure per running CPU when it is being used. Each place that loops through all threads in a thread group to sum task->utime and/or task->stime now use the shared_*_sum() inline functions defined in sched.h to sum the per-CPU structures. This includes compat_sys_times(), do_task_stat(), do_getitimer(), sys_times() and k_getrusage(). Certain routines that used task->signal->[us]time now use the shared_*_sum() functions instead, which may (but hopefully will not) change their semantics slightly. These include fill_prstatus() (in fs/binfmt_elf.c), do_task_stat() (in fs/proc/array.c), wait_task_zombie() and do_notify_parent(). At each tick, update_cpu_clock(), account_user_time() and account_system_time() update the relevant field of the shared_times structure using a pointer obtained using per_cpu_ptr, with the effect that these functions do not compete with one another for the cacheline. Each of these functions updates the task-private field followed by the shared_times version if one is present. Finally, kernel/posix-cpu-timers.c has changed quite dramatically. First, run_posix_cpu_timers() decides whether a timer has expired by consulting the it_*_expires fields in the task struct of the running thread and the shared_*_sum() functions that cover the entire process. The check_process_timers() routine bases its computations on the shared structure, removing two loops through the threads. "Rebalancing" is no longer required, the process_timer_rebalance() routine as disappeared entirely and the arm_timer() routine merely fills p->signal->it_*_expires from timer->it.cpu.expires.*. The cpu_clock_sample_group_locked() loses its summing loops, using the the shared structure instead. Finally, set_process_cpu_timer() sets tsk->signal->it_*_expires directly rather than calling the deleted rebalance routine. The only remaining open question is whether these changes break the semantics of the status-returning routines fill_prstatus(), do_task_stat(), wait_task_zombie() and do_notify_parent(). -- Frank Mayhar <fmayhar@google.com> Google, Inc.
| Vladislav Bolkhovitin | Re: Integration of SCST in the mainstream Linux kernel |
| Greg Kroah-Hartman | [PATCH 005/196] Chinese: add translation of SubmittingDrivers |
| Yinghai Lu | [PATCH 01/33] x86: add after_bootmem for 32bit |
| Joerg Roedel | [PATCH] AMD IOMMU: replace to_pages macro with iommu_num_pages |
git: | |
| Jan Wielemaker | Re: git filter-branch --subdirectory-filter, still a mistery |
| Nguyễn Thái Ngọc Duy | [PATCH 01/14] Extend index to save more flags |
| davidk | Removing files |
| Guido Ostkamp | [PATCH] Fix "identifier redeclared" compilation error with SUN cc |
| David Miller | [GIT]: Networking |
| Lachlan Andrew | Re: [PATCH] tcp-illinois: incorrect beta usage |
| Julius Volz | [PATCHv2 RFC 01/25] IPVS: Add CONFIG_IP_VS_IPV6 option for IPv6 support |
| Mark Lord | Re: 2.6.25-rc8: FTP transfer errors |
| Richard Stallman | Real men don't attack straw men |
| Greg KH | Re: Free Linux Driver Development! |
| Marcos Laufer | dmesg IBM x3650 OpenBSD 4.3 |
| Mark Kettenis | Re: Random crashes with Intel D945GCLF2 |
