On Thu, Apr 10, 2008 at 04:59:15PM -0700, Nish Aravamudan wrote:
quoted text > Hi Nick,
>
> On 4/10/08,
npiggin@suse.de <npiggin@suse.de> wrote:
> > Hi,
> >
> > I'm taking care of Andi's hugetlb patchset now. I've taken a while to appear
> > to do anything with it because I have had other things to do and also needed
> > some time to get up to speed on it.
> >
> > Anyway, from my reviewing of the patchset, I didn't find a great deal
> > wrong with it in the technical aspects. Taking hstate out of the hugetlbfs
> > inode and vma is really the main thing I did.
>
> Have you tested with the libhugetlbfs test suite? We're gearing up for
> libhugetlbfs 1.3, so most of the test are uptodate and expected to run
> cleanly, even with giant hugetlb page support (Jon has been working
> diligently to test with his 16G page support for power). I'm planning
> on pushing the last bits out today for Adam to pick up before we start
> stabilizing for 1.3, so I'm hoping if you grab tomorrow's development
> snapshot from libhugetlbfs.ozlabs.org, things should run ok. Probably
> only with just 1G hugepages, though, we haven't yet taught
> libhugetlbfs about multiple hugepage size availability at run-time,
> but that shouldn't be hard.
Yeah, it should be easy to disable the 2MB default and just make it
look exactly the same but with 1G pages.
Thanks a lot for your suggestion, I'll pull the snapshot over the
weekend and try to make it pass on x86 and work with Jon to ensure it
is working with powerpc...
quoted text > > However on the less technical side, I think a few things could be improved,
> > eg. to do with the configuring and reporting, as well as the "administrative"
> > type of code. I tried to make improvements to things in the last patch of
> > the series. I will end up folding this properly into the rest of the patchset
> > where possible.
>
> I've got a few ideas here. Are we sure that
> /proc/sys/vm/nr_{,overcommit}_hugepages is the pool allocation
> interface we want going forward? I'm fairly sure we don't. I think
> we're best off moving to a sysfs-based allocator scheme, while keeping
> /proc/sys/vm/nr_{,overcommit}_hugepages around for the default
> hugepage size (which may be the only for many folks for now).
>
> I'm thinking something like:
>
> /sys/devices/system/[DIRNAME]/nr_hugepages ->
> nr_hugepages_{default_hugepagesize}
> /sys/devices/system/[DIRNAME]/nr_hugepages_default_hugepagesize
> /sys/devices/system/[DIRNAME]/nr_hugepages_other_hugepagesize1
> /sys/devices/system/[DIRNAME]/nr_hugepages_other_hugepagesize2
> /sys/devices/system/[DIRNAME]/nr_overcommit_hugepages ->
> nr_overcommit_hugepages_{default_hugepagesize}
> /sys/devices/system/[DIRNAME]/nr_overcommit_hugepages_default_hugepagesize
> /sys/devices/system/[DIRNAME]/nr_overcommit_hugepages_other_hugepagesize1
> /sys/devices/system/[DIRNAME]/nr_overcommit_hugepages_other_hugepagesize2
>
> That is, nr_hugepages in the directory (should it be called vm?
> memory? hugepages specifically? I'm looking for ideas!) will just be a
> symlink to the underlying default hugepagesize allocator. The files
> themselves would probably be named along the lines of:
>
> nr_hugepages_2M
> nr_hugepages_1G
> nr_hugepages_64K
>
> etc?
Yes I don't like the proc interface, nor the way it has been extended
(although that's not Andi's fault it is just a limitation of the old
API).
I think actually we should have individual directories for each hstate
size, and we can put all other stuff (reservations and per-node stuff
etc) under those directories. Leave the proc stuff just for the default
page size.
I think it should go in /sys/kernel/, because I think /sys/devices is
more of the hardware side of the system (so it makes sense for
reporting eg the actual supported TLB sizes, but for configuring your
page reserves, I think it makes more sense under /sys/kernel/). But
we'll ask the sysfs folk for guidance there.
quoted text > We'd want to have a similar layout on a per-node basis, I think (see
> my patchsets to add a per-node interface).
>
> > The other thing I did was try to shuffle the patches around a bit. There
> > were one or two (pretty trivial) points where it wasn't bisectable, and also
> > merge a couple of patches.
> >
> > I will try to get this patchset merged in -mm soon if feedback is positive.
> > I would also like to take patches for other architectures or any other
> > patches or suggestions for improvements.
>
> There are definitely going to be conflicts between my per-node stack
> and your set, but if you agree the interface should be cleaned up for
> multiple hugepage size support, then I'd like to get my sysfs bits
> into -mm and work on putting the global allocator into sysfs properly
> for you to base off. I think there's enough room for discussion that
> -mm may be a bit premature, but that's just my opinion.
>
> Thanks for keeping the patchset uptodate, I hope to do a more careful
> review next week of the individual patches.
Sure, I haven't seen your work but it shouldn't be terribly hard to merge
either way. It should be easy if we work together ;)
Thanks,
Nick
--
unsubscribe notice To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to
majordomo@vger.kernel.org
More majordomo info at
http://vger.kernel.org/majordomo-info.html
Please read the FAQ at
http://www.tux.org/lkml/
Messages in current thread:
Re: [patch 00/17] multi size, and giant hugetlb page suppo ... , Nick Piggin , (Fri Apr 11, 1:28 am)