Re: [PATCH] block: fix q->max_segment_size checking in blk_recalc_rq_segments about VMERGE

Previous thread: need help to enable ARM AMBA PL031 RTC by Hebbar on Tuesday, July 15, 2008 - 3:43 am. (2 messages)

Next thread: [PATCH] x86: traps and do_traps under one hood by Jaswinder Singh on Tuesday, July 15, 2008 - 3:52 am. (4 messages)
From: FUJITA Tomonori
Date: Tuesday, July 15, 2008 - 3:44 am

blk_recalc_rq_segments assumes that any segments can be merged in the
case of BIOVEC_VIRT_MERGEABLE && !BIOVEC_VIRT_OVERSIZE. However, an
IOMMU can't merge segments if the total length of the segments is
larger than max_segment_size (the LLD restriction).

Due to this bug, a LLD may get the larger number of segments than
nr_hw_segments because the block layer puts more segments in a request
than it should do.

This bug could happen on alpha, parisc, and sparc, which use VMERGE.

Like blk_hw_contig_segment() does, this patch uses hw_seg_size for
simplicity, which is a bit larger than an exact value (we don't need
BIOVEC_VIRT_START_SIZE here).

Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Cc: Jens Axboe <jens.axboe@oracle.com>
Cc: Mikulas Patocka <mpatocka@redhat.com>
Cc: David Miller <davem@davemloft.net>
---
 block/blk-merge.c |    3 ++-
 1 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/block/blk-merge.c b/block/blk-merge.c
index 5efc9e7..39a22f8 100644
--- a/block/blk-merge.c
+++ b/block/blk-merge.c
@@ -83,7 +83,8 @@ void blk_recalc_rq_segments(struct request *rq)
 			continue;
 		}
 new_segment:
-		if (BIOVEC_VIRT_MERGEABLE(bvprv, bv) &&
+		if (hw_seg_size + bv->bv_len <= q->max_segment_size &&
+		    BIOVEC_VIRT_MERGEABLE(bvprv, bv) &&
 		    !BIOVEC_VIRT_OVERSIZE(hw_seg_size + bv->bv_len))
 			hw_seg_size += bv->bv_len;
 		else {
-- 
1.5.5.GIT

--

From: Mikulas Patocka
Date: Tuesday, July 15, 2008 - 6:37 am

Parisc doesn't use virtual merge accounting (there is variable for it but 
it's always 0). On sparc64 it is broken anyway with or without your patch. 
And alpha alone doesn't justify substantial code bloat in generic block 
layer. So I propose this patch to drop it at all.

A further patch could be made to drop bi_hw_segments, nr_hw_segments and 
similar --- they have no use for anything except alpha and because there 
are few alpha computers and alpha is discontinued platform, the logic for 
attempting to predict number of hardware segments can't be well tested and 

Drop virtual merge accounting in bio layer (not the virtual merging 
itself). It is used only on Sparc64 (where it is broken and needs to be 
disabled) and on Alpha. This logic is very hard to maintain (the generic 
code in block/blk-merge.c tries to attempt to predict how 
architecture-specific IOMMU will merge the segments) and Alpha 
architecture alone doesn't justify the maintenance and code-bloat cost.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
---
  arch/parisc/kernel/setup.c |    5 ---
  arch/x86/kernel/pci-dma.c  |    6 ---
  block/blk-merge.c          |   72 +++------------------------------------------
  fs/bio.c                   |    6 +--
  include/asm-alpha/io.h     |    3 -
  include/asm-ia64/io.h      |   26 +---------------
  include/asm-parisc/io.h    |    6 ---
  include/asm-powerpc/io.h   |    7 ----
  include/asm-sparc64/io.h   |    1
  include/asm-x86/io_64.h    |    3 -
  include/linux/bio.h        |   15 ---------
  11 files changed, 10 insertions(+), 140 deletions(-)

Index: linux-2.6.26-fast/arch/parisc/kernel/setup.c
===================================================================
--- linux-2.6.26-fast.orig/arch/parisc/kernel/setup.c	2008-07-15 14:19:01.000000000 +0200
+++ linux-2.6.26-fast/arch/parisc/kernel/setup.c	2008-07-15 14:19:18.000000000 +0200
@@ -57,11 +57,6 @@ int parisc_bus_is_phys __read_mostly = 1
  EXPORT_SYMBOL(parisc_bus_is_phys);
  #endif

-/* ...
From: FUJITA Tomonori
Date: Tuesday, July 15, 2008 - 7:20 am

On Tue, 15 Jul 2008 09:37:05 -0400 (EDT)

Hmm, really? Looks like PARISC IOMMUs (ccio-dma.c and sba_iomm.c) set

Yeah, we need to modify SPARC64 IOMMU code (I'm not sure that it's

Jens, what do you think about removing VMERGE code?
--

From: Mikulas Patocka
Date: Tuesday, July 15, 2008 - 7:37 am

Even if we fix it now, the question is: how long it will stay fixed? Until 
someone makes another change to struct device that restricts boundaries on 
some wacky hardware.

--

From: FUJITA Tomonori
Date: Tuesday, July 15, 2008 - 8:30 am

On Tue, 15 Jul 2008 10:37:58 -0400 (EDT)

I'm not sure how the boundary restriction of a device can break
the VMERGE accounting.
--

From: Mikulas Patocka
Date: Tuesday, July 15, 2008 - 8:46 am

Because block layer code doesn't know anything about the device, pci 
access restrictions and so on.

Someone already broken DaveM's Sparc64 merging by adding boundaries (it 
was broken even before, but these boundary checks made it worse).

Mikulas
--

From: FUJITA Tomonori
Date: Tuesday, July 15, 2008 - 5:34 pm

On Tue, 15 Jul 2008 11:46:46 -0400 (EDT)

Not true, the block layer knows about the device restrictions like DMA
boundary.

But it's not the point here because the boundary restriction doesn't
matter for the VMERGE accounting. An IOMMU just returns an error if it
can't allocate an I/O space fit for the device restrictions.


Please give me an example how the boundary restriction of a device can
break the VMERGE accounting and an IOMMU if you aren't still sure.
--

From: Mikulas Patocka
Date: Wednesday, July 16, 2008 - 11:02 am

You have dma_get_seg_boundary and dma_get_max_seg_size. On sparc64, adding 
one of these broken VMERGE accounting (the VMERGE didn't happen past 64-kb 
boundary and bio layer thought that VMERGE would be possible).

And if you fix this case, someone will break it again, sooner or later, by 
adding new restriction.

Mikulas
--

From: FUJITA Tomonori
Date: Wednesday, July 16, 2008 - 9:14 pm

On Wed, 16 Jul 2008 14:02:27 -0400 (EDT)

If the device has 64KB boundary restriction, the device also has
max_seg_size restriction of 64KB or under. So the vmerge acounting

What is your new restriction?

All restrictions that IOMMUs need to know are dma_get_seg_boundary and
dma_get_max_seg_size.
--

From: Mikulas Patocka
Date: Thursday, July 17, 2008 - 4:50 am

We don't know what happens in the future. And that is the problem that we 
don't know --- but we have two pieces of code (blk-merge and iommu) that 
try to calculate the same number (number of hw segments) and if they get 
different result, it will crash. If the calculations were done at one 
place, there would be no problem with that.

Mikulas
--

From: FUJITA Tomonori
Date: Thursday, July 17, 2008 - 6:18 am

On Thu, 17 Jul 2008 07:50:24 -0400 (EDT)


I don't think that your argument, 'the problem that we don't know', is
true.

With the vmerge accounting, we calculate at two places. So if we add
a new restriction, we need to handle it at two places. It's a logical
result.

Of course, it's easier to calculate at one place rather than two
places. But 'we don't know what restriction we will need' isn't a
problem.


BTW, as I've already said, I'm not against removing the vmerge
accounting from the block layer.
--

From: Boaz Harrosh
Date: Thursday, July 17, 2008 - 6:27 am

I have a question. Does the block layer know of the IOMMU in use
for the device? can it call into the IOMMU to calculate the
restriction?

Thanks Boaz

--

From: James Bottomley
Date: Thursday, July 17, 2008 - 6:56 am

Yes and no.  The parameter PCI_DMA_BUS_IS_PHYS is set if the platform
doesn't have one.  Nowadays, that's not enough; with VT and bypass what
the system really needs to know is if the device will be using the
iommu.

The idea of calling into the platform iommu code was considered when all
this was done, but it was rejected.  Function pointer calls are
incredibly expensive on most platforms that at that time had iommus.
The best way was to construct a theoretical parametrisation of an iommu
and get the block layer to follow that model.

James


--

From: David Miller
Date: Saturday, July 19, 2008 - 12:28 am

From: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>

I am also, as stated, not against this.

Fujita-san, please proposage a patch so that we can put this
issue behind us :-)
--

From: Mikulas Patocka
Date: Saturday, July 19, 2008 - 6:45 pm

Few days ago I created this.

Another task would be to remove nr_hw_segments from request, bio and queue 
parameters (if this patch is accepted).

Mikulas

---
 arch/parisc/kernel/setup.c |    5 ---
 arch/x86/kernel/pci-dma.c  |    6 ---
 block/blk-merge.c          |   72 +++------------------------------------------
 drivers/parisc/ccio-dma.c  |    2 -
 drivers/parisc/sba_iommu.c |    2 -
 fs/bio.c                   |    6 +--
 include/asm-alpha/io.h     |    3 -
 include/asm-ia64/io.h      |   26 +---------------
 include/asm-parisc/io.h    |    6 ---
 include/asm-powerpc/io.h   |    7 ----
 include/asm-sparc64/io.h   |    1 
 include/asm-x86/io_64.h    |    3 -
 include/linux/bio.h        |   15 ---------
 13 files changed, 10 insertions(+), 144 deletions(-)

Index: linux-2.6.26-fast/arch/parisc/kernel/setup.c
===================================================================
--- linux-2.6.26-fast.orig/arch/parisc/kernel/setup.c	2008-07-15 16:19:53.000000000 +0200
+++ linux-2.6.26-fast/arch/parisc/kernel/setup.c	2008-07-15 16:20:15.000000000 +0200
@@ -57,11 +57,6 @@ int parisc_bus_is_phys __read_mostly = 1
 EXPORT_SYMBOL(parisc_bus_is_phys);
 #endif
 
-/* This sets the vmerge boundary and size, it's here because it has to
- * be available on all platforms (zero means no-virtual merging) */
-unsigned long parisc_vmerge_boundary = 0;
-unsigned long parisc_vmerge_max_size = 0;
-
 void __init setup_cmdline(char **cmdline_p)
 {
 	extern unsigned int boot_args[];
Index: linux-2.6.26-fast/arch/x86/kernel/pci-dma.c
===================================================================
--- linux-2.6.26-fast.orig/arch/x86/kernel/pci-dma.c	2008-07-15 16:19:53.000000000 +0200
+++ linux-2.6.26-fast/arch/x86/kernel/pci-dma.c	2008-07-15 16:20:15.000000000 +0200
@@ -30,11 +30,6 @@ int no_iommu __read_mostly;
 /* Set this to 1 if there is a HW IOMMU in the system */
 int iommu_detected __read_mostly = 0;
 
-/* This tells the BIO block layer to assume merging. Default to off
- ...
From: James Bottomley
Date: Saturday, July 19, 2008 - 7:17 pm

I think we've already established that the code in question is correct
and functional, 

As far as I can tell, virtual merging has always been broken on ppc, so
it shouldn't enable it.  It looks like at some point in history sparc
went from a working vmerge to a non working one (by copying the broken
ppc code), so the correct fix for both of these arches is simply to turn
off virtual merging.

ppc claims to turn off virtual merging, but in fact the define is
broken.  Sparc should now follow ppc.

Try this patch

James

---

diff --git a/include/asm-powerpc/io.h b/include/asm-powerpc/io.h
index 8b62782..0f3212c 100644
--- a/include/asm-powerpc/io.h
+++ b/include/asm-powerpc/io.h
@@ -715,7 +715,7 @@ static inline void * phys_to_virt(unsigned long address)
  * to coalesce sglists that happen to have been mapped in a contiguous
  * way by the iommu
  */
-#define BIO_VMERGE_BOUNDARY	0
+#undef BIO_VMERGE_BOUNDARY
 
 /*
  * 32 bits still uses virt_to_bus() for it's implementation of DMA
diff --git a/include/asm-sparc64/io.h b/include/asm-sparc64/io.h
index 3158960..1da5642 100644
--- a/include/asm-sparc64/io.h
+++ b/include/asm-sparc64/io.h
@@ -16,7 +16,9 @@
 /* BIO layer definitions. */
 extern unsigned long kern_base, kern_size;
 #define page_to_phys(page)	(page_to_pfn(page) << PAGE_SHIFT)
-#define BIO_VMERGE_BOUNDARY	8192
+
+/* virtual merging doesn't work on sparc now */
+#undef BIO_VMERGE_BOUNDARY
 
 static inline u8 _inb(unsigned long addr)
 {


--

From: David Miller
Date: Saturday, July 19, 2008 - 9:07 pm

From: James Bottomley <James.Bottomley@HansenPartnership.com>

I'd rather remove the vmerge code, it doesn't buy us
anything, and for something so complex and so hard to
keep working correctly it's existence is far from
justified these days.
--

From: James Bottomley
Date: Sunday, July 20, 2008 - 7:52 am

You can ... as soon as BIO_VMERGE_BOUNDARY is undefined or set to zero,
it gets compiled out of the block code. 

Since we're using it successfully in parisc, I don't want the block code
removed, but I don't see a reason to force other architectures to use
it.

However, it has two use cases.  One is the legacy one of making rather
dumb I/O cards perform better (which is the primary on on parisc), but
there is a current one making huge transfers go through SCSI using using
the sg_table code.  That latter is pretty vital to me since I have to
keep the code working, but I don't really have any SCSI cards that can
take advantage of it without virtual merging.  As a slight irony, IBM is
trying to persuade me that a ppc would be better than a parisc for big
endian I/O testing ... so I might just be seeing if I can make virtual
merging work on power too.

James


--

From: David Miller
Date: Sunday, July 20, 2008 - 10:23 am

From: James Bottomley <James.Bottomley@HansenPartnership.com>

All of this is gibberish, we've been over this a few times already
in this thread.

For a dumb I/O card, you advertise SG_ALL capabilities, the IOMMU
is going to merge things as it would have anyways, and you have
code in the driver to advance SG entries after each "dumb I/O".

There is zero value to the vmerge code, the real gains are being
realized already.
--

From: James Bottomley
Date: Sunday, July 20, 2008 - 10:33 am

Really? I must have missed the proposed replacement for the

Not that dumb ... they just have a limited number of SG slots.  We
wouldn't want to run them as spoon fed PIO because that really would

There is value to me in my testbed, which I can't achieve any other way
(except by buying different SCSI cards).

As I said, you can compile it out on sparc just fine.  I wish to keep it
running for parisc, so I'll maintain it.  If it ever bit rots out of
parisc like it has done for the other architectures, then feel free to
remove it.

James


--

From: Mikulas Patocka
Date: Thursday, July 24, 2008 - 8:07 am

So try to #define BIO_VMERGE_BOUNDARY 0 for Pa-Risc and tell us what 
performance degradation do you see (and what driver do you use and what is 
the I/O pattern).

If you show something specific, we can consider that --- but you haven't 
yet told us anything, except generic talk.

Mikulas
--

From: James Bottomley
Date: Thursday, July 24, 2008 - 8:28 am

You keep ignoring inconvenient facts.  For about the third time:

I run a test bed for sg_tables (large chaining of requests).  This runs
on parisc using virtual merging (has to because the final physical table
size can't go over the sg list of the SCSI card).  If I turn off virtual
merging I can no longer test sg_tables in vanilla kernels.

James


--

From: Mikulas Patocka
Date: Thursday, July 24, 2008 - 9:34 am

What sg_tables test do you mean? What does the test do? Why couldn't you 
run the test if BIO_VMERGE_BOUNDARY is 0? Normal I/O obviously can work 
with BIO_VMERGE_BOUNDARY 0, the kernel will just send more smaller 
requests.

Mikulas
--

From: James Bottomley
Date: Thursday, July 24, 2008 - 9:52 am

Look, if you don't really understand what I'm doing, it's not really my
job to educate you.  The sg_table discussions are on marc.info, mainly
on the SCSI lists; just look for 'sg chaining' in the header (need to
use google site ... marc's search is bad).

You can complain if the code is impacting you ... but I believe I've
optimised it so it isn't.  Your basic problem amounts to you not liking
me doing something that has no impact on you ... I'm afraid that's what
freedom leads to (shocking, I know).

James


--

From: Mikulas Patocka
Date: Thursday, July 24, 2008 - 2:49 pm

Chaining of sg_tables is used for drivers with big sg tables --- and 
vmerge counting is used for drivers with small sg tables. So what do they 
have in common?

Summary, what I mean:

* in blk-merge.c, you have 85 lines, that is 16% of the size of the file, 
devoted to counting of hw_segments

* it is only used on two architectures, one already outdated (alpha), the 
other being discontinued (pa-risc). On all the other architectures, 
hw_segments == phys_segments

* it is prone to bugs and hard to maintain, because the same value must be 
calculated in blk-merge.c and in architectural iommu functions --- if the 
value differs, you create too long request, corrupt kernel memory and 
crash (happened on sparc64). Anyone changing blk-merge in the future will 
risk breaking something on the architectures that use BIO_VMERGE_BOUNDARY 
--- and because these architectures are so rare, the bug will go unnoticed 
for long time --- like in the case of sparc64.

* you are just talking how this code is important for performance without 
showing any single proof that it really is (temporarily disable 
hw_segments accounting by defining BIO_VMERGE_BOUNDARY 0 and get the 
numbers).

Mikulas
--

From: David Miller
Date: Thursday, July 24, 2008 - 2:53 pm

From: Mikulas Patocka <mpatocka@redhat.com>

I completely agree with this point.

This VMERGE stuff is now a non-trivial maintainence burdon because
anyone who wants to hack on the block layer has to be mindful of
VMERGE but is very unlikely to have access to a system that it can
even be tested on.

And the answer isn't "James Bottomly will test your changes for you",
because that simply doesn't scale.

I still say we should definitely remove the VMERGE code.  It's not
worth the maintainence hassle just for some SG chaining test rig
on some obscure platform.

I really only hear one person who really wants this code around any
more.  Is that the Linux way? :-) Can't he patch it into his tree when
he needs it or write an alternative way to stress the SG chaining
code?  He has the source, right? :-)))
--

From: James Bottomley
Date: Thursday, July 24, 2008 - 8:47 pm

So you think the parametrisation in the block layer is the wrong way to
approach the problem?  On this argument your next patch should be
removing physical merging as well because it also relies on a

I'm sorry sparc broke, really I am ... but you changed your iommu code
from one with a working vmerge to one without and failed to turn off
vmerging.  Partly it wasn't noticed because at 2*PAGE_SIZE you have a
strange vmerge boundary, so it's harder to notice.   However, I can't
extrapolate that just because this happened on sparc it will inevitably

Actually, parisc will test your code we have a PAGE_SIZE vmerge

OK ... well you've had your say and so have I.  The code in question is

You're advocating an out of tree patch as a solution?  I didn't know
you'd been appointed RHEL maintainer ;-)

James


--

From: David Miller
Date: Thursday, July 24, 2008 - 10:21 pm

From: James Bottomley <James.Bottomley@HansenPartnership.com>

The vmerge boundary on sparc64 was 8K which is equal to sparc64's base
PAGE_SIZE.  I don't know where you get that 2*PAGE_SIZE from.

--

From: FUJITA Tomonori
Date: Thursday, July 24, 2008 - 7:26 pm

On Thu, 24 Jul 2008 17:49:14 -0400 (EDT)

VMERGE enables you to handle a large request even with drivers with

BTW, alpha IOMMU can't handle VMERGE. But the IOMMU has the code to
handle VMERGE so one-line patch can fix the IOMMU.


As I said before, can we leave this to Jens, keeping or removing
VMERGE? Seems that I see the same arguments again and again.
--

From: John David Anglin
Date: Thursday, July 24, 2008 - 7:40 pm

This "business based" argument should be ignored.  Followed to the limit,
there would only be one architecture left...

Dave
-- 
J. David Anglin                                  dave.anglin@nrc-cnrc.gc.ca
National Research Council of Canada              (613) 990-0752 (FAX: 952-6602)
--

From: David Miller
Date: Saturday, July 19, 2008 - 10:54 pm

From: Mikulas Patocka <mpatocka@redhat.com>

I'm fine with this:

Acked-by: David S. Miller <davem@davemloft.net>
--

From: James Bottomley
Date: Tuesday, July 15, 2008 - 7:50 am

That's correct.  The size and boundary depend on the type of IOMMU (ccio
or sba) so the vmerge boundary parameters are set up in the iommu driver

Actually, it's code I did.

There are plusses and minusses to all of this.  The original vmerge code
was done for sparc ... mainly because the benefits of virtual merging
can offset the cost of having to use the iommu.  However, most
architectures didn't use it.  When I fixed it up to work for parisc (and
introduced the parameters) we were trying to demonstrate that using it
was feasible.

The idea behind vmerging is that assembling and programming sg lists is
expensive, so you want to do it once.  Either in the iommu or in the
driver sg list, but not in both.  There is evidence that it saves around
7% or so on drivers.  However, for architectures that can do it, better
savings are made simply by lifting the iommu out of the I/O path (so
called bypass mode).

I suspect with IOMMUs coming back (and being unable to be bypassed) with
virtualisation, virtual merging might once more become a significant
value.

James


--

From: Mikulas Patocka
Date: Tuesday, July 15, 2008 - 8:24 am

The problem is with vmerge accounting in block layer (that is what I'm 
proposing to remove), not with vmerge itself.

Vmerge accounting has advantages only if you have device with small amount 
of sg slots --- it allows the block layer to create request that has 
higher number of segments then the device.

If you have device with for example 1024 slots, the virtual merge 
accounting has no effect, because the any request will fit into that size. 
Even without virtual merge accounting, the virtual merging will happen, so 
there will be no performance penalty for the controller --- the controller 
will be programmed with exactly the same number of segments as if virtual 
merge accounting was present. (there could be even slight positive 
performance effect if you remove accounting, because you burn less CPU 
cycles per request)

If you have device will small number of sg slots (16 or so), vmerge 
accounting can improve performance by creating requests with more than 16 
segments --- the question is: is there any such device? And is the device 
performance-sensitive? (i.e. isn't it such an old hardware where no one 

I suppose that no one would manufacture new SCSI card with 16 or 32 sg 
slots these days, so the accounting of hardware segments has no effect on 
modern hardware.

--

From: James Bottomley
Date: Tuesday, July 15, 2008 - 8:41 am

I don't think that's true ... otherwise parisc would be falling over

This isn't really true either.  A lot of devices with a high sg slot
count are still less efficient than an iommu for programming.

Even if they're not, on parisc we have to program the iommu, we can't
bypass, so it still makes sense to only have one large sg list (in the
iommu) and one small one (in the device).  Having two large ones reduces


Yes there is.  Both the iommu and the device have to traverse large SG
lists.  This is where the inefficiency lies.  On PA, we use exactly the
same number of iotlb slots whether virtual merging is in effect or not,
but the device has an internal loop to go over the list.  It's that loop
that virtual merging reduces.

Since the virtual merge computation is in line when the request is built
(by design) it doesn't really detract from the throughput and the cost

It's not about accounting, it's about performance.  There's a cost in
every device to traversing large count sg lists.  If you have to bear it
in the iommu (which is usually more efficient because the iotlb tends to
follow mmtlb optimisations) you can reduce the cost by eliminating it
from the device.

James


--

From: Mikulas Patocka
Date: Tuesday, July 15, 2008 - 8:58 am

You are mixing two ideas here:

(1) virtual merging --- IOMMU maps discontinuous segments into continuous 
area that it presents to the device.

(2) virtual merge accounting --- block layer tries to guess how many 
segments will be created by (1) and merges small requests into big ones. 
The resulting requests are as big that they can't be processed by the 

--- for these devices virtual merging (1) improves performance, but 

Virtual merge accounting (2) is about fitting a request. It is block layer 

The purpose of (1) virtual merging is to save device's sg slots. The 
purpose of (2) virtual merge accounting is to allow block layer to build 
larger requests. If you remove virtual merge accounting, it will cause no 

That's why I'm proposing to remove virtual merge accounting (2), but leave 
virtual merging (1) itself. The accounting doesn't reduce number of sg 
slots.

--

From: James Bottomley
Date: Tuesday, July 15, 2008 - 9:07 am

No ... I'm not ... the virtual merge implementation requires the block
layer to get this accounting right, otherwise the iommu code can end up
doing the wrong thing.

You're proposing to eliminate the difference between max_phys_segments

Yes, but it's gains very little ... architectures that don't want it can
already turn it off, and it's useful for those, like parisc, who do.

James


--

From: Mikulas Patocka
Date: Tuesday, July 15, 2008 - 9:20 am

The virtual merge (1) can work even without accounting (2). IOMMU can 

Yes. Only for alpha and pa-risc, there is difference between these values. 

It increases maintainability of the code, reduces bloat and bugs.

Mikulas
--

From: James Bottomley
Date: Tuesday, July 15, 2008 - 9:36 am

It can, but it's not optimal ... and depends on max_phys_segments ==

That's not really a good reason.  You can eliminate code because it's
unused and unikely to be used or you redo it to better or fix it to be
less buggy.  You don't simply eliminate useful functionality that
currently has in-tree users, however marginal you might opine those
users to be.

James


--

From: Mikulas Patocka
Date: Tuesday, July 15, 2008 - 2:50 pm

So show a specific device where the virtual merge accounting is useful.

(1) The device that is often used in alpha or pa-risc environments --- 
because the accounting is not used on other archs.

(2) The device that is performance-sensitive --- not something outdated or 
unusual.

(3) And the device that has limited sg-list size, so that generic I/O 
requests made by the kernel hit this limit. (if the sg-list is so big that 
nr_phys_segments of most requests fits into it, you don't need to count 
nr_hw_segments --- because nr_hw_segments < nr_phys_segments and 
nr_phys_segments already fits).

[ the device that traverses its sg-list slowly doesn't fall into category 
(3), beacuse virtual merging would happen with or without nr_hw_segments 
accounting ]

Mikulas
--

Previous thread: need help to enable ARM AMBA PL031 RTC by Hebbar on Tuesday, July 15, 2008 - 3:43 am. (2 messages)

Next thread: [PATCH] x86: traps and do_traps under one hood by Jaswinder Singh on Tuesday, July 15, 2008 - 3:52 am. (4 messages)