> On Wed, Apr 07, 2010 at 12:06:07PM +0800, Minchan Kim wrote:
>> On Wed, Apr 7, 2010 at 11:54 AM, Taras Glek <tglek@mozilla.com> wrote:
>> > On 04/06/2010 07:24 PM, Wu Fengguang wrote:
>> >>
>> >> Hi Taras,
>> >>
>> >> On Tue, Apr 06, 2010 at 05:51:35PM +0800, Johannes Weiner wrote:
>> >>
>> >>>
>> >>> On Mon, Apr 05, 2010 at 03:43:02PM -0700, Taras Glek wrote:
>> >>>
>> >>>>
>> >>>> Hello,
>> >>>> I am working on improving Mozilla startup times. It turns out that page
>> >>>> faults(caused by lack of cooperation between user/kernelspace) are the
>> >>>> main cause of slow startup. I need some insights from someone who
>> >>>> understands linux vm behavior.
>> >>>>
>> >>
>> >> How about improve Fedora (and other distros) to preload Mozilla (and
>> >> other apps the user run at the previous boot) with fadvise() at boot
>> >> time? This sounds like the most reasonable option.
>> >>
>> >
>> > That's a slightly different usecase. I'd rather have all large apps startup
>> > as efficiently as possible without any hacks. Though until we get there,
>> > we'll be using all of the hacks we can.
>> >>
>> >> As for the kernel readahead, I have a patchset to increase default
>> >> mmap read-around size from 128kb to 512kb (except for small memory
>> >> systems). This should help your case as well.
>> >>
>> >
>> > Yes. Is the current readahead really doing read-around(ie does it read pages
>> > before the one being faulted)? From what I've seen, having the dynamic
>> > linker read binary sections backwards causes faults.
>> >
http://sourceware.org/bugzilla/show_bug.cgi?id=11447
>> >>
>> >>
>> >>>>
>> >>>> Current Situation:
>> >>>> The dynamic linker mmap()s executable and data sections of our
>> >>>> executable but it doesn't call madvise().
>> >>>> By default page faults trigger 131072byte reads. To make matters worse,
>> >>>> the compile-time linker + gcc lay out code in a manner that does not
>> >>>> correspond to how the resulting executable will be executed(ie the
>> >>>> layout is basically random). This means that during startup 15-40mb
>> >>>> binaries are read in basically random fashion. Even if one orders the
>> >>>> binary optimally, throughput is still suboptimal due to the puny
>> >>>> readahead.
>> >>>>
>> >>>> IO Hints:
>> >>>> Fortunately when one specifies madvise(WILLNEED) pagefaults trigger 2mb
>> >>>> reads and a binary that tends to take 110 page faults(ie program stops
>> >>>> execution and waits for disk) can be reduced down to 6. This has the
>> >>>> potential to double application startup of large apps without any clear
>> >>>> downsides.
>> >>>>
>> >>>> Suse ships their glibc with a dynamic linker patch to fadvise()
>> >>>> dynamic libraries(not sure why they switched from doing madvise
>> >>>> before).
>> >>>>
>> >>
>> >> This is interesting. I wonder how SuSE implements the policy.
>> >> Do you have the patch or some strace output that demonstrates the
>> >> fadvise() call?
>> >>
>> >
>> > glibc-2.3.90-ld.so-madvise.diff in
>> >
http://www.rpmseek.com/rpm/glibc-2.4-31.12.3.src.html?hl=com&cba=0:G:0:3732595:0:15:0:
>> >
>> > As I recall they just fadvise the filedescriptor before accessing it.
>> >>
>> >>
>> >>>>
>> >>>> I filed a glibc bug about this at
>> >>>>
http://sourceware.org/bugzilla/show_bug.cgi?id=11431 . Uli commented
>> >>>> with his concern about wasting memory resources. What is the impact of
>> >>>> madvise(WILLNEED) or the fadvise equivalent on systems under memory
>> >>>> pressure? Does the kernel simply start ignoring these hints?
>> >>>>
>> >>>
>> >>> It will throttle based on memory pressure. In idle situations it will
>> >>> eat your file cache, however, to satisfy the request.
>> >>>
>> >>> Now, the file cache should be much bigger than the amount of unneeded
>> >>> pages you prefault with the hint over the whole library, so I guess the
>> >>> benefit of prefaulting the right pages outweighs the downside of evicting
>> >>> some cache for unused library pages.
>> >>>
>> >>> Still, it's a workaround for deficits in the demand-paging/readahead
>> >>> heuristics and thus a bit ugly, I feel. Maybe Wu can help.
>> >>>
>> >>
>> >> Program page faults are inherently random, so the straightforward
>> >> solution would be to increase the mmap read-around size (for desktops
>> >> with reasonable large memory), rather than to improve program layout
>> >> or readahead heuristics :)
>> >>
>> >
>> > Program page faults may exhibit random behavior once they've started.
>> >
>> > During startup page-in pattern of over-engineered OO applications is very
>> > predictable. Programs are laid out based on compilation units, which have no
>> > relation to how they are executed. Another problem is that any large old
>> > application will have lots of code that is either rarely executed or
>> > completely dead. Random sprinkling of live code among mostly unneeded code
>> > is a problem.
>> > I'm able to reduce startup pagefaults by 2.5x and mem usage by a few MB with
>> > proper binary layout. Even if one lays out a program wrongly, the worst-case
>> > pagein pattern will be pretty similar to what it is by default.
>> >
>> > But yes, I completely agree that it would be awesome to increase the
>> > readahead size proportionally to available memory. It's a little silly to be
>> > reading tens of megabytes in 128kb increments :) You rock for trying to
>> > modernize this.
>>
>> Hi, Wu and Taras.
>>
>> I have been watched at this thread.
>> That's because I had a experience on reducing startup latency of application
>> in embedded system.
>>
>> I think sometime increasing of readahead size wouldn't good in embedded.
>> Many of embedded system has nand as storage and compression file system.
>> About nand, as you know, random read effect isn't rather big than hdd.
>> About compression file system, as one has a big compression,
>> it would make startup late(big block read and decompression).
>> We had to disable readahead of code page with kernel hacking.
>> And it would make application slow as time goes by.
>> But at that time we thought latency is more important than performance
>> on our application.
>>
>> Of course, it is different whenever what is file system and
>> compression ratio we use .
>> So I think increasing of readahead size might always be not good.
>>
>> Please, consider embedded system when you have a plan to tweak
>> readahead, too. :)
>
> Minchan, glad to know that you have experiences on embedded Linux.
>
> While increasing the general readahead size from 128kb to 512kb, I
> also added a limit for mmap read-around: if system memory size is less
> than X MB, then limit read-around size to X KB. For example, do only
> 128KB read-around for a 128MB embedded box, and 32KB ra for 32MB box.
>
> Do you think it a reasonable safety guard? Patch attached.