"I think the SG stuff looks ok now, but I think we have a lot of 'fix up the rough edges' to go!" Linus Torvalds noted regarding some of the fallout from the recent merge of Jens Axboe's SG chaining patchset. During one of the many discussions, Jens explained:
"It's all about the end goal - having maintainable and resilient code. And I think the sg code will be better once we get past the next day or so, and it'll be more robust. That is what matters to me, not the simplicity of the patch itself."
Boaz Harrosh commented, "thanks Jens for doing all this, The performance gain is substantial and we will all enjoy it." Jens replied, "my pleasure, I just wish it could have been a little less painful. But in a day or two, it should all be behind us and we can move forward with making good use of it."
From: Linus Torvalds <torvalds@...> Subject: Re: [PATCH 09/10] Change table chaining layout Date: Oct 22, 3:49 pm 2007 On Mon, 22 Oct 2007, Geert Uytterhoeven wrote: > > Better safe than sorry... > > Is it possible that a chain entry pointer has bit 1 set on architectures > (e.g. m68k) where the natural alignment of 32-bit quantities is _2_ bytes, > not 4? Better make sure that such alignment never happens... But no, I don't think it will, since these things would generally always have to be allocated with an allocator, and the *allocator* won't return 2-byte aligned data structures. Linus -
From: Alan Cox <alan@...> Subject: Re: [PATCH 09/10] Change table chaining layout Date: Oct 22, 4:16 pm 2007 On Mon, 22 Oct 2007 12:49:40 -0700 (PDT) Linus Torvalds <torvalds@linux-foundation.org> wrote: > > > On Mon, 22 Oct 2007, Geert Uytterhoeven wrote: > > > > Better safe than sorry... > > > > Is it possible that a chain entry pointer has bit 1 set on architectures > > (e.g. m68k) where the natural alignment of 32-bit quantities is _2_ bytes, > > not 4? > > Better make sure that such alignment never happens... But no, I don't > think it will, since these things would generally always have to be > allocated with an allocator, and the *allocator* won't return 2-byte > aligned data structures. No - but a structure which has other objects in it before the object being written out may well be 2 byte aligned on M68K and some of the other externally 16bit platforms - ditto local dynamic objects. Why can't we just make the list one item longer than the entry count and stick a NULL on the end of it like normal people ? Then you need one bit which ought to be safe for everyone (and if the bit is a macro any CPU warped enough to have byte alignment is surely going to have top bits spare...) Alan -
From: Linus Torvalds <torvalds@...> Subject: Re: [PATCH 09/10] Change table chaining layout Date: Oct 22, 4:44 pm 2007 On Mon, 22 Oct 2007, Alan Cox wrote: > > Why can't we just make the list one item longer than the entry count and > stick a NULL on the end of it like normal people ? Then you need one bit > which ought to be safe for everyone (and if the bit is a macro any CPU > warped enough to have byte alignment is surely going to have top bits > spare...) Well, quite frankly, equally easy is to just add a __attribute__((aligned(4))) or whatever the gcc syntax for that is today.. That guarantees that gcc lays things out properly. Linus -
From: Alan Cox <alan@...> Subject: Re: [PATCH 09/10] Change table chaining layout Date: Oct 22, 5:43 pm 2007 On Mon, 22 Oct 2007 13:44:43 -0700 (PDT) Linus Torvalds <torvalds@linux-foundation.org> wrote: > > > On Mon, 22 Oct 2007, Alan Cox wrote: > > > > Why can't we just make the list one item longer than the entry count and > > stick a NULL on the end of it like normal people ? Then you need one bit > > which ought to be safe for everyone (and if the bit is a macro any CPU > > warped enough to have byte alignment is surely going to have top bits > > spare...) > > Well, quite frankly, equally easy is to just add a > > __attribute__((aligned(4))) > > or whatever the gcc syntax for that is today.. That guarantees that gcc > lays things out properly. For structures, not array elements or stack objects. Does gcc now get aligned correct as an attribute on a stack object ? Still doesn't answer the rather more important question - why not just stick a NULL on the end instead of all the nutty hacks ? -
From: Linus Torvalds <torvalds@...> Subject: Re: [PATCH 09/10] Change table chaining layout Date: Oct 22, 5:47 pm 2007 On Mon, 22 Oct 2007, Alan Cox wrote: > For structures, not array elements or stack objects. Does gcc now get > aligned correct as an attribute on a stack object ? I think m68k stack layout still guarantees 4-byte-alignment, no? > Still doesn't answer the rather more important question - why not just > stick a NULL on the end instead of all the nutty hacks ? You still do need one bit for the discontiguous case, so it's not like you can avoid the hacks anyway (unless you just blow up the structure entirely) and make it a separate member). So once you have that bit+pointer, using a separate NULL entry isn't exactly prettier. Especially as we actally want to see the difference between "end-of-allocation" and "not yet filled in", so you shouldn't use NULL anyway, you should probably use something like "all-ones". Linus -
From: Boaz Harrosh <bharrosh@...> Subject: Re: [PATCH 09/10] Change table chaining layout Date: Oct 23, 5:29 am 2007 On Mon, Oct 22 2007 at 23:47 +0200, Linus Torvalds <torvalds@linux-foundation.org> wrote: > > On Mon, 22 Oct 2007, Alan Cox wrote: > >> For structures, not array elements or stack objects. Does gcc now get >> aligned correct as an attribute on a stack object ? > > I think m68k stack layout still guarantees 4-byte-alignment, no? > >> Still doesn't answer the rather more important question - why not just >> stick a NULL on the end instead of all the nutty hacks ? > > You still do need one bit for the discontiguous case, so it's not like you > can avoid the hacks anyway (unless you just blow up the structure > entirely) and make it a separate member). So once you have that > bit+pointer, using a separate NULL entry isn't exactly prettier. > > Especially as we actally want to see the difference between > "end-of-allocation" and "not yet filled in", so you shouldn't use NULL > anyway, you should probably use something like "all-ones". > > Linus > - Every one is so hysterical about this sg-chaining problem. And massive patches produced, that when a simple none intrusive solution is proposed it is totally ignored because every one thinks, "I can not be that stupid". Well Einstein said: "Simplicity is the ultimate sophistication". So no one need to feel bad. I'm talking about Benney's Proposition of a while back. (I'm including it below, cause I can't bother with the stupid Archives broken search) What Benny was proposing is that the scatterlist pointer might not have the complete information about sizes and allocations, but the surrounding code always has, So why not just pass this information to the decision maker - sg_next() - so it can make the right choice, before a suicide. So sg_next() becomes: static inline struct scatterlist *sg_next(struct scatterlist *sg, int next, int nr) { return next < nr ? sg_next_unsafe(sg) : NULL; } Where sg_next_unsafe(sg) is what the original sg_next() used to be. and a user code like for_each_sg() becomes: /* * Loop over each sg element, following the pointer to a new list if necessary */ #define for_each_sg(sglist, sg, nr, __i) \ for (__i = 0, sg = (sglist); sg; sg = sg_next(sg, ++__i, nr)) In his patch he shows examples of other uses of sg_next they all fit. The sg_next usage is new and in few places. Not like the sg->page all over the kernel. I know he has a patch for the complete kernel, (I know I helped a bit) and it is a fraction of the size of all the patches that where submitted after that. And it does not have all the problems that we still have now with slob allocators, stack, and such. OK Just my [mid=350733,350751,350770,350810,350813,351243,351249,351254,351260,351291,351488,351297].2 Boaz Harrosh ----- diff --git a/include/linux/scatterlist.h b/include/linux/scatterlist.h index 2dc7464..3a27e03 100644 --- a/include/linux/scatterlist.h +++ b/include/linux/scatterlist.h @@ -30,7 +30,7 @@ static inline void sg_init_one(struct scatterlist *sg, const void *buf, ((struct scatterlist *) ((unsigned long) (sg)->page & ~0x01)) /** - * sg_next - return the next scatterlist entry in a list + * sg_next_unsafe - return the next scatterlist entry in a list * @sg: The current sg entry * * Usually the next entry will be @sg@ + 1, but if this sg element is part @@ -41,7 +41,7 @@ static inline void sg_init_one(struct scatterlist *sg, const void *buf, * the current entry, this function will NOT return NULL for an end-of-list. * */ -static inline struct scatterlist *sg_next(struct scatterlist *sg) +static inline struct scatterlist *sg_next_unsafe(struct scatterlist *sg) { sg++; @@ -51,11 +51,27 @@ static inline struct scatterlist *sg_next(struct scatterlist *sg) return sg; } +/** + * sg_next - return the next scatterlist entry in a list + * @sg: The current sg entry + * @next: Index of next sg entry + * @nr: Number of sg entries in the list + * + * Note that the caller must ensure that there are further entries after + * the current entry, this function will NOT return NULL for an end-of-list. + * + */ +static inline struct scatterlist *sg_next(struct scatterlist *sg, + int next, int nr) +{ + return next < nr ? sg_next_unsafe(sg) : NULL; +} + /* * Loop over each sg element, following the pointer to a new list if necessary */ #define for_each_sg(sglist, sg, nr, __i) \ - for (__i = 0, sg = (sglist); __i < (nr); __i++, sg = sg_next(sg)) + for (__i = 0, sg = (sglist); sg; sg = sg_next(sg, ++__i, nr)) /** * sg_last - return the last scatterlist entry in a list diff --git a/drivers/scsi/sg.c b/drivers/scsi/sg.c index 7238b2d..57cc1dd 100644 --- a/drivers/scsi/sg.c +++ b/drivers/scsi/sg.c @@ -1165,7 +1165,7 @@ sg_vma_nopage(struct vm_area_struct *vma, unsigned long addr, int *type) sg = rsv_schp->buffer; sa = vma->vm_start; for (k = 0; (k < rsv_schp->k_use_sg) && (sa < vma->vm_end); - ++k, sg = sg_next(sg)) { + sg = sg_next(sg, ++k, rsv_schp->k_use_sg)) { len = vma->vm_end - sa; len = (len < sg->length) ? len : sg->length; if (offset < len) { @@ -1209,7 +1209,7 @@ sg_mmap(struct file *filp, struct vm_area_struct *vma) sa = vma->vm_start; sg = rsv_schp->buffer; for (k = 0; (k < rsv_schp->k_use_sg) && (sa < vma->vm_end); - ++k, sg = sg_next(sg)) { + sg = sg_next(sg, ++k, rsv_schp->k_use_sg)) { len = vma->vm_end - sa; len = (len < sg->length) ? len : sg->length; sa += len; @@ -1840,7 +1840,7 @@ sg_build_indirect(Sg_scatter_hold * schp, Sg_fd * sfp, int buff_size) } for (k = 0, sg = schp->buffer, rem_sz = blk_size; (rem_sz > 0) && (k < mx_sc_elems); - ++k, rem_sz -= ret_sz, sg = sg_next(sg)) { + rem_sz -= ret_sz, sg = sg_next(sg, ++k, mx_sc_elems)) { num = (rem_sz > scatter_elem_sz_prev) ? scatter_elem_sz_prev : rem_sz; @@ -1913,7 +1913,7 @@ sg_write_xfer(Sg_request * srp) if (res) return res; - for (; p; sg = sg_next(sg), ksglen = sg->length, + for (; p; sg = sg_next_unsafe(sg), ksglen = sg->length, p = page_address(sg->page)) { if (usglen <= 0) break; @@ -1991,8 +1991,8 @@ sg_remove_scat(Sg_scatter_hold * schp) } else { int k; - for (k = 0; (k < schp->k_use_sg) && sg->page; - ++k, sg = sg_next(sg)) { + for (k = 0; sg && sg->page; + sg = sg_next(sg, ++k, schp->k_use_sg)) { SCSI_LOG_TIMEOUT(5, printk( "sg_remove_scat: k=%d, pg=0x%p, len=%d\n", k, sg->page, sg->length)); @@ -2045,7 +2045,7 @@ sg_read_xfer(Sg_request * srp) if (res) return res; - for (; p; sg = sg_next(sg), ksglen = sg->length, + for (; p; sg = sg_next_unsafe(sg), ksglen = sg->length, p = page_address(sg->page)) { if (usglen <= 0) break; @@ -2092,7 +2092,7 @@ sg_read_oxfer(Sg_request * srp, char __user *outp, int num_read_xfer) if ((!outp) || (num_read_xfer <= 0)) return 0; - for (k = 0; (k < schp->k_use_sg) && sg->page; ++k, sg = sg_next(sg)) { + for (k = 0; sg && sg->page; sg = sg_next(sg, ++k, schp->k_use_sg)) { num = sg->length; if (num > num_read_xfer) { if (__copy_to_user(outp, page_address(sg->page), @@ -2142,7 +2142,7 @@ sg_link_reserve(Sg_fd * sfp, Sg_request * srp, int size) SCSI_LOG_TIMEOUT(4, printk("sg_link_reserve: size=%d\n", size)); rem = size; - for (k = 0; k < rsv_schp->k_use_sg; ++k, sg = sg_next(sg)) { + for (k = 0; sg; sg = sg_next(sg, ++k, rsv_schp->k_use_sg)) { num = sg->length; if (rem <= num) { sfp->save_scat_len = num; - -
From: Jens Axboe <jens.axboe@...> Subject: Re: [PATCH 09/10] Change table chaining layout Date: Oct 23, 5:41 am 2007 On Tue, Oct 23 2007, Boaz Harrosh wrote: > On Mon, Oct 22 2007 at 23:47 +0200, Linus Torvalds <torvalds@linux-foundation.org> wrote: > > > > On Mon, 22 Oct 2007, Alan Cox wrote: > > > >> For structures, not array elements or stack objects. Does gcc now get > >> aligned correct as an attribute on a stack object ? > > > > I think m68k stack layout still guarantees 4-byte-alignment, no? > > > >> Still doesn't answer the rather more important question - why not just > >> stick a NULL on the end instead of all the nutty hacks ? > > > > You still do need one bit for the discontiguous case, so it's not like you > > can avoid the hacks anyway (unless you just blow up the structure > > entirely) and make it a separate member). So once you have that > > bit+pointer, using a separate NULL entry isn't exactly prettier. > > > > Especially as we actally want to see the difference between > > "end-of-allocation" and "not yet filled in", so you shouldn't use NULL > > anyway, you should probably use something like "all-ones". > > > > Linus > > - > > Every one is so hysterical about this sg-chaining problem. And massive > patches produced, that when a simple none intrusive solution is proposed > it is totally ignored because every one thinks, "I can not be that stupid". > Well Einstein said: "Simplicity is the ultimate sophistication". So no one > need to feel bad. It's all about the end goal - having maintainable and resilient code. And I think the sg code will be better once we get past the next day or so, and it'll be more robust. That is what matters to me, not the simplicity of the patch itself. -- Jens Axboe -
From: Boaz Harrosh <bharrosh@...> Subject: Re: [PATCH 09/10] Change table chaining layout Date: Oct 23, 5:50 am 2007 On Tue, Oct 23 2007 at 11:41 +0200, Jens Axboe <jens.axboe@oracle.com> wrote: > On Tue, Oct 23 2007, Boaz Harrosh wrote: >> On Mon, Oct 22 2007 at 23:47 +0200, Linus Torvalds <torvalds@linux-foundation.org> wrote: >>> On Mon, 22 Oct 2007, Alan Cox wrote: >>> >>>> For structures, not array elements or stack objects. Does gcc now get >>>> aligned correct as an attribute on a stack object ? >>> I think m68k stack layout still guarantees 4-byte-alignment, no? >>> >>>> Still doesn't answer the rather more important question - why not just >>>> stick a NULL on the end instead of all the nutty hacks ? >>> You still do need one bit for the discontiguous case, so it's not like you >>> can avoid the hacks anyway (unless you just blow up the structure >>> entirely) and make it a separate member). So once you have that >>> bit+pointer, using a separate NULL entry isn't exactly prettier. >>> >>> Especially as we actally want to see the difference between >>> "end-of-allocation" and "not yet filled in", so you shouldn't use NULL >>> anyway, you should probably use something like "all-ones". >>> >>> Linus >>> - >> Every one is so hysterical about this sg-chaining problem. And massive >> patches produced, that when a simple none intrusive solution is proposed >> it is totally ignored because every one thinks, "I can not be that stupid". >> Well Einstein said: "Simplicity is the ultimate sophistication". So no one >> need to feel bad. > > It's all about the end goal - having maintainable and resilient code. > And I think the sg code will be better once we get past the next day or > so, and it'll be more robust. That is what matters to me, not the > simplicity of the patch itself. > But that is exactly what his patch is. Much more robust. Because you do not relay on sglist content but on outside information, that you already have. Have you had an hard look at his solution? It just simply falls into place. Please try it out for yourself. I did, and it works. Boaz -
From: Jens Axboe <jens.axboe@...> Subject: Re: [PATCH 09/10] Change table chaining layout Date: Oct 23, 5:55 am 2007 On Tue, Oct 23 2007, Boaz Harrosh wrote: > On Tue, Oct 23 2007 at 11:41 +0200, Jens Axboe <jens.axboe@oracle.com> wrote: > > On Tue, Oct 23 2007, Boaz Harrosh wrote: > >> On Mon, Oct 22 2007 at 23:47 +0200, Linus Torvalds <torvalds@linux-foundation.org> wrote: > >>> On Mon, 22 Oct 2007, Alan Cox wrote: > >>> > >>>> For structures, not array elements or stack objects. Does gcc now get > >>>> aligned correct as an attribute on a stack object ? > >>> I think m68k stack layout still guarantees 4-byte-alignment, no? > >>> > >>>> Still doesn't answer the rather more important question - why not just > >>>> stick a NULL on the end instead of all the nutty hacks ? > >>> You still do need one bit for the discontiguous case, so it's not like you > >>> can avoid the hacks anyway (unless you just blow up the structure > >>> entirely) and make it a separate member). So once you have that > >>> bit+pointer, using a separate NULL entry isn't exactly prettier. > >>> > >>> Especially as we actally want to see the difference between > >>> "end-of-allocation" and "not yet filled in", so you shouldn't use NULL > >>> anyway, you should probably use something like "all-ones". > >>> > >>> Linus > >>> - > >> Every one is so hysterical about this sg-chaining problem. And massive > >> patches produced, that when a simple none intrusive solution is proposed > >> it is totally ignored because every one thinks, "I can not be that stupid". > >> Well Einstein said: "Simplicity is the ultimate sophistication". So no one > >> need to feel bad. > > > > It's all about the end goal - having maintainable and resilient code. > > And I think the sg code will be better once we get past the next day or > > so, and it'll be more robust. That is what matters to me, not the > > simplicity of the patch itself. > > > > But that is exactly what his patch is. Much more robust. Because you do not > relay on sglist content but on outside information, that you already have. > Have you had an hard look at his solution? It just simply falls into place. > Please try it out for yourself. I did, and it works. Sure, I looked at it, it's not exactly rocket science, I do understand what it achieves. I don't think the patch is bad as such, I'm merely trying to state that I think the end code AND interface will be much nicer with the current direction that the sg helpers are moving. It does rely on outside context, because you need to pass in the sglist number. In my opinion, this patch would be a bandaid for the original chain code until we got around to fixing the PAGEALLOC crash. Which we did, it's now merged. The patch doesn't make the code cleaner, it makes it uglier. It'll work, but that still doesn't mean I have to agree it's a nice design. -- Jens Axboe -
From: Boaz Harrosh <bharrosh@...> Subject: Re: [PATCH 09/10] Change table chaining layout Date: Oct 23, 6:23 am 2007 On Tue, Oct 23 2007 at 11:55 +0200, Jens Axboe <jens.axboe@oracle.com> wrote: > On Tue, Oct 23 2007, Boaz Harrosh wrote: >> On Tue, Oct 23 2007 at 11:41 +0200, Jens Axboe <jens.axboe@oracle.com> wrote: >>> On Tue, Oct 23 2007, Boaz Harrosh wrote: >>>> On Mon, Oct 22 2007 at 23:47 +0200, Linus Torvalds <torvalds@linux-foundation.org> wrote: >>>>> On Mon, 22 Oct 2007, Alan Cox wrote: >>>>> >>>>>> For structures, not array elements or stack objects. Does gcc now get >>>>>> aligned correct as an attribute on a stack object ? >>>>> I think m68k stack layout still guarantees 4-byte-alignment, no? >>>>> >>>>>> Still doesn't answer the rather more important question - why not just >>>>>> stick a NULL on the end instead of all the nutty hacks ? >>>>> You still do need one bit for the discontiguous case, so it's not like you >>>>> can avoid the hacks anyway (unless you just blow up the structure >>>>> entirely) and make it a separate member). So once you have that >>>>> bit+pointer, using a separate NULL entry isn't exactly prettier. >>>>> >>>>> Especially as we actally want to see the difference between >>>>> "end-of-allocation" and "not yet filled in", so you shouldn't use NULL >>>>> anyway, you should probably use something like "all-ones". >>>>> >>>>> Linus >>>>> - >>>> Every one is so hysterical about this sg-chaining problem. And massive >>>> patches produced, that when a simple none intrusive solution is proposed >>>> it is totally ignored because every one thinks, "I can not be that stupid". >>>> Well Einstein said: "Simplicity is the ultimate sophistication". So no one >>>> need to feel bad. >>> It's all about the end goal - having maintainable and resilient code. >>> And I think the sg code will be better once we get past the next day or >>> so, and it'll be more robust. That is what matters to me, not the >>> simplicity of the patch itself. >>> >> But that is exactly what his patch is. Much more robust. Because you do not >> relay on sglist content but on outside information, that you already have. >> Have you had an hard look at his solution? It just simply falls into place. >> Please try it out for yourself. I did, and it works. > > Sure, I looked at it, it's not exactly rocket science, I do understand > what it achieves. I don't think the patch is bad as such, I'm merely > trying to state that I think the end code AND interface will be much > nicer with the current direction that the sg helpers are moving. > > It does rely on outside context, because you need to pass in the sglist > number. In my opinion, this patch would be a bandaid for the original > chain code until we got around to fixing the PAGEALLOC crash. Which we > did, it's now merged. The patch doesn't make the code cleaner, it makes > it uglier. It'll work, but that still doesn't mean I have to agree it's > a nice design. > A nice design is to have an struct like BIO. That holds a pointer to the array of scatterlists, size, ..., and a next and prev pointers to the next chunks. Than have all kernel code that now accepts scatterlist* and size accept a pointer to such structure. And all is clear and defined. But since we do not do that, and every single API in the kernel that receives a scatterlist pointer also receives an sg_count parameter, than I do not see what is so hacky about giving that sg_count parameter to the one that needs it the most. sg_next(); OK I guess this is all a matter of taste so there is no point arguing about it any more. I can see your view, and the work has been done so I guess there is no point going back. If it all works than it's for the best. Thanks Jens for doing all this, The performance gain is substantial and we will all enjoy it. Boaz -
From: Linus Torvalds <torvalds@...> Subject: Re: [PATCH 09/10] Change table chaining layout Date: Oct 23, 11:22 am 2007 On Tue, 23 Oct 2007, Boaz Harrosh wrote: > > A nice design is to have an struct like BIO. That holds a pointer to the > array of scatterlists, size, ..., and a next and prev pointers to the next > chunks. Than have all kernel code that now accepts scatterlist* and size > accept a pointer to such structure. And all is clear and defined. Yes, that would be one clean situation. > But since we do not do that, and every single API in the kernel that > receives a scatterlist pointer also receives an sg_count parameter, > than I do not see what is so hacky about giving that sg_count parameter > to the one that needs it the most. sg_next(); Well, I'd personally actually prefer to *not* have the count be passed down explicitly, because it's just too error prone. So I'd much rather see the count implicit in the list: whether it's in an explicit header structure (that is the *only* thing passed down) or whether it's embedded in the list itself is not important. Since the list itself has to have the "next pointer" for chaining, and thus already has "embedded information" in it, it actually does make sense in my opinion to just embed the end-of-list information too. And the end result right now is pretty simple, with "sg_next()" being really simple to use, and there being no way to screw things up by getting the count and the sg pointer out of sync. My biggest complaint right now is that a lot of users of the sg *filling* functions were mindlessly converted, so we have code like cryptoloop.c: sg_set_page(&sg_in, in_page); cryptoloop.c: sg_in.offset = in_offs; cryptoloop.c: sg_in.length = sz; which is just really stupid, and we should have a function for that. But worse is code like this: ub.c: sg_set_page(sg, virt_to_page(sc->top_sense)); ub.c: sg->offset = (unsigned long)sc->top_sense & (PAGE_SIZE-1); ub.c: sg->length = UB_SENSE_SIZE; which again was converted "line by line" and we actually *do* have a function to do the above three lines as sg_set_buf(sg, sc->top_sense, UB_SENSE_SIZE); where that *single* line is just tons shorter but more importantly, more readable, than the mess that is a brute-force conversion. So I think the SG stuff looks ok now, but I think we have a lot of "fix up the rough edges" to go! (The above is not the only case. Just grep for "sg_set_page", and you'll see several examples of this kind of hard-to-read code. Basically, I don't think it's ever a good idea to initialize the SG entries one by one, and even when we have a hard page/offset/size thing, we should not set them one by one, and we should probably extend sg_set_page() to always take offset and length too, since setting one without the other two is never really sensible!) Linus -
From: Jens Axboe <jens.axboe@...> Subject: Re: [PATCH 09/10] Change table chaining layout Date: Oct 23, 6:29 am 2007 On Tue, Oct 23 2007, Boaz Harrosh wrote: > On Tue, Oct 23 2007 at 11:55 +0200, Jens Axboe <jens.axboe@oracle.com> wrote: > > On Tue, Oct 23 2007, Boaz Harrosh wrote: > >> On Tue, Oct 23 2007 at 11:41 +0200, Jens Axboe <jens.axboe@oracle.com> wrote: > >>> On Tue, Oct 23 2007, Boaz Harrosh wrote: > >>>> On Mon, Oct 22 2007 at 23:47 +0200, Linus Torvalds <torvalds@linux-foundation.org> wrote: > >>>>> On Mon, 22 Oct 2007, Alan Cox wrote: > >>>>> > >>>>>> For structures, not array elements or stack objects. Does gcc now get > >>>>>> aligned correct as an attribute on a stack object ? > >>>>> I think m68k stack layout still guarantees 4-byte-alignment, no? > >>>>> > >>>>>> Still doesn't answer the rather more important question - why not just > >>>>>> stick a NULL on the end instead of all the nutty hacks ? > >>>>> You still do need one bit for the discontiguous case, so it's not like you > >>>>> can avoid the hacks anyway (unless you just blow up the structure > >>>>> entirely) and make it a separate member). So once you have that > >>>>> bit+pointer, using a separate NULL entry isn't exactly prettier. > >>>>> > >>>>> Especially as we actally want to see the difference between > >>>>> "end-of-allocation" and "not yet filled in", so you shouldn't use NULL > >>>>> anyway, you should probably use something like "all-ones". > >>>>> > >>>>> Linus > >>>>> - > >>>> Every one is so hysterical about this sg-chaining problem. And massive > >>>> patches produced, that when a simple none intrusive solution is proposed > >>>> it is totally ignored because every one thinks, "I can not be that stupid". > >>>> Well Einstein said: "Simplicity is the ultimate sophistication". So no one > >>>> need to feel bad. > >>> It's all about the end goal - having maintainable and resilient code. > >>> And I think the sg code will be better once we get past the next day or > >>> so, and it'll be more robust. That is what matters to me, not the > >>> simplicity of the patch itself. > >>> > >> But that is exactly what his patch is. Much more robust. Because you do not > >> relay on sglist content but on outside information, that you already have. > >> Have you had an hard look at his solution? It just simply falls into place. > >> Please try it out for yourself. I did, and it works. > > > > Sure, I looked at it, it's not exactly rocket science, I do understand > > what it achieves. I don't think the patch is bad as such, I'm merely > > trying to state that I think the end code AND interface will be much > > nicer with the current direction that the sg helpers are moving. > > > > It does rely on outside context, because you need to pass in the sglist > > number. In my opinion, this patch would be a bandaid for the original > > chain code until we got around to fixing the PAGEALLOC crash. Which we > > did, it's now merged. The patch doesn't make the code cleaner, it makes > > it uglier. It'll work, but that still doesn't mean I have to agree it's > > a nice design. > > > A nice design is to have an struct like BIO. That holds a pointer to the > array of scatterlists, size, ..., and a next and prev pointers to the next > chunks. Than have all kernel code that now accepts scatterlist* and size > accept a pointer to such structure. And all is clear and defined. > > But since we do not do that, and every single API in the kernel that > receives a scatterlist pointer also receives an sg_count parameter, > than I do not see what is so hacky about giving that sg_count parameter > to the one that needs it the most. sg_next(); Not all paths need to know the exact number though, and with the changes you could legitimately pass in just the header and iteration would work fine. > OK I guess this is all a matter of taste so there is no point arguing > about it any more. I can see your view, and the work has been done so > I guess there is no point going back. If it all works than it's for the > best. Yes agreed, debating taste is usually not very interesting as we wont get far ;-) > Thanks Jens for doing all this, The performance gain is substantial > and we will all enjoy it. My pleasure, I just wish it could have been a little less painful. But in a day or two, it should all be behind us and we can move forward with making good use of it. -- Jens Axboe -

uh
not a single link in the summary that explains what is SG chaining...? halp!
http://kerneltrap.org/node/81
http://kerneltrap.org/node/8176
http://lwn.net/Articles/234617/
Stargate
Stargate?
Smug Alert
Reading this, for a moment, I thought about that South Park episode titled "Smug Alert",with all the kernel developers saying "Good for you!" and answering "Thaanks!" to themselves.