Re: Increasing MAXPHYS

Previous thread: [head tinderbox] failure on i386/i386 by FreeBSD Tinderbox on Sunday, March 21, 2010 - 6:54 pm. (1 message)

Next thread: Re: build failures after stdlib update by Alexander Best on Monday, March 22, 2010 - 2:40 am. (1 message)
From: Poul-Henning Kamp
Date: Monday, March 22, 2010 - 1:23 am

The easiest way to obtain more parallelism, is to divide the mesh into
multiple independent meshes.

This will do you no good if you have five disks in a RAID-5 config, but
if you have two disks each mounted on its own filesystem, you can run
a g_up & g_down for each of them.

-- 
Poul-Henning Kamp       | UNIX since Zilog Zeus 3.20
phk@FreeBSD.ORG         | TCP/IP since RFC 956
FreeBSD committer       | BSD since 4.3-tahoe    
Never attribute to malice what can adequately be explained by incompetence.
_______________________________________________
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscribe@freebsd.org"
From: Pawel Jakub Dawidek
Date: Monday, March 22, 2010 - 4:36 pm

A class is suppose to interact with other classes only via GEOM, so I
think it should be safe to choose g_up/g_down threads for each class
individually, for example:

	/dev/ad0s1a (DEV)
	       |
	g_up_0 + g_down_0
	       |
	     ad0s1a (BSD)
	       |
	g_up_1 + g_down_1
	       |
	     ad0s1 (MBR)
	       |
	g_up_2 + g_down_2
	       |
	     ad0 (DISK)

We could easly calculate g_down thread based on bio_to->geom->class and
g_up thread based on bio_from->geom->class, so we know I/O requests for
our class are always coming from the same threads.

If we could make the same assumption for geoms it would allow for even
better distribution.

-- 
Pawel Jakub Dawidek                       http://www.wheelsystems.com
pjd@FreeBSD.org                           http://www.FreeBSD.org
FreeBSD committer                         Am I Evil? Yes, I Am!
From: Scott Long
Date: Monday, March 22, 2010 - 5:05 pm

The whole point of the discussion, sans PHK's interlude, is to reduce the context switches and indirection, not to increase it.  But if you can show decreased latency/higher-iops benefits of increasing it, more power to you.  I would think that the results of DFly's experiment with parallelism-via-more-queues would serve as a good warning, though.

Scott

_______________________________________________
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscribe@freebsd.org"
From: Matthew Dillon
Date: Tuesday, March 23, 2010 - 1:25 am

Well, I'm not sure what experiment you are refering to but I'll assume
    its the network threading, which works quite well actually.  The protocol
    threads can be matched against the toeplitz function and in that case
    the entire packet stream operates lockless.  Even without the matching
    we still get good benefits from batching (e.g. via ether_input_chain())
    which drops the IPI and per-packet switch overhead basically to zero.
    We have other issues but the protocol threads aren't one of them.

    In anycase, the lesson to learn with batching to a thread is that you
    don't want the thread to immediately preempt the sender (if it happens
    to be on the same cpu), or to generate an instant IPI (if going between
    cpus).  This creates a degenerate case where you wind up with a
    thread switch on each message or an excessive messaging interrupt
    rate... THAT is what seriously screws up performance.  The key is to
    be able to batch multiple messages per thread switch when under load
    and to be able to maintain a pipeline.

    A single user-process test case will always have a bit more latency
    and can wind up being inefficient for a variety of other reasons
    (e.g. whether the target thread is on the same cpu or not),
    but that becomes less relevant when the machine is under load so
    its a self-correcting problem for the most part.

    Once the machine is under load batching becomes highly efficient.
    That is, latency != cpu cycle cost under load.  When the threads
    have enough work to do they can pick up the next message without the
    cost of entering a sleep state or needing a wakeup (or needing to
    generate an actual IPI interrupt, etc).  Plus you can run lockless
    and you get excellent cache locality.  So as long as you ensure these
    optimal operations become the norm under load you win.

    Getting the threads to pipeline properly and avoid unnecessary
    tsleeps and wakeups is the hard part.

    ...
From: Julian Elischer
Date: Monday, March 22, 2010 - 5:33 pm

doesn't really help my problem however.. I just want to access the 

_______________________________________________
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscribe@freebsd.org"
Previous thread: [head tinderbox] failure on i386/i386 by FreeBSD Tinderbox on Sunday, March 21, 2010 - 6:54 pm. (1 message)

Next thread: Re: build failures after stdlib update by Alexander Best on Monday, March 22, 2010 - 2:40 am. (1 message)