On Wed, Aug 18, 2010 at 02:30:24PM -0400, Robin Holt wrote:
quoted text > When I investigated, I noticed that all the zone_data[] structures are
> allocated precisely at the beginning of the individual node's physical
> memory. By simply staggering based upon nodeid, I reduced the average
> down to 0.0006% of every second.
Interesting. Have you measured cache misses for both cases?
quoted text > With this patch, the max value did not change. I believe that is a
> combination of cacheline contention updating the zone's vmstat information
> combined with round_jiffies_common spattering unrelated cpus onto the same
> jiffie for their next update. I will investigate those issues seperately.
The max value probably the case where none of the data the code accesses
is actually in the cache. Cache aliasing does not help then so I would
not expect a change in the maximum amount here.
quoted text > I had no idea whether to ask
stable@kernel.org to pull this back to the
> stable releases. My reading of the stable_kernel_rules.txt criteria is
> only fuzzy as to whether this meets the "oh, that's not good" standard.
> I personally think this meets that criteria, but I am unwilling to defend
> that position too stridently. In the end, I punted and added them to
> the Cc list. We will be asking both SuSE and RedHat to add this to
> their upcoming update releases as we expect it to affect their customers.
I don't think this is stable material. It improves performance and does
not fix a bug. But I am not the one to decide this :-)
quoted text >
> arch/x86/mm/numa_64.c | 12 ++++++++++--
> 1 file changed, 10 insertions(+), 2 deletions(-)
>
> Index: round_jiffies/arch/x86/mm/numa_64.c
> ===================================================================
> --- round_jiffies.orig/arch/x86/mm/numa_64.c 2010-08-18 11:39:20.495141178 -0500
> +++ round_jiffies/arch/x86/mm/numa_64.c 2010-08-18 13:24:42.679080103 -0500
> @@ -198,6 +198,7 @@ setup_node_bootmem(int nodeid, unsigned
> unsigned long start_pfn, last_pfn, nodedata_phys;
> const int pgdat_size = roundup(sizeof(pg_data_t), PAGE_SIZE);
> int nid;
> + unsigned long cache_alias_offset;
> #ifndef CONFIG_NO_BOOTMEM
> unsigned long bootmap_start, bootmap_pages, bootmap_size;
> void *bootmap;
> @@ -221,9 +222,16 @@ setup_node_bootmem(int nodeid, unsigned
> start_pfn = start >> PAGE_SHIFT;
> last_pfn = end >> PAGE_SHIFT;
>
> - node_data[nodeid] = early_node_mem(nodeid, start, end, pgdat_size,
> + /*
> + * Allocate an extra cacheline per node to reduce cacheline
> + * aliasing when scanning all node's node_data.
> + */
> + cache_alias_offset = nodeid * SMP_CACHE_BYTES;
> + node_data[nodeid] = cache_alias_offset +
> + early_node_mem(nodeid, start, end,
> + pgdat_size + cache_alias_offset,
> SMP_CACHE_BYTES);
> - if (node_data[nodeid] == NULL)
> + if (node_data[nodeid] == (void *)cache_alias_offset)
It is cleaner to keep the NULL check here and add the cache_alias_offset
to the pointer after that check.
quoted text > return;
> nodedata_phys = __pa(node_data[nodeid]);
> reserve_early(nodedata_phys, nodedata_phys + pgdat_size, "NODE_DATA");
>
--
AMD Operating System Research Center
Advanced Micro Devices GmbH Einsteinring 24 85609 Dornach
General Managers: Alberto Bozzo, Andrew Bowd
Registration: Dornach, Landkr. Muenchen; Registerger. Muenchen, HRB Nr. 43632
--
unsubscribe notice To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to
majordomo@vger.kernel.org
More majordomo info at
http://vger.kernel.org/majordomo-info.html
Please read the FAQ at
http://www.tux.org/lkml/
Messages in current thread:
Re: [Patch] numa:x86_64: Cacheline aliasing makes for_each ... , Roedel, Joerg , (Thu Aug 19, 10:30 am)