> On Wed, 10 May 2006, Linus Torvalds wrote:
>
> >
> >
> > Btw, Nico, that rabin hash code is _extremely_ confusing.
>
> Possible.
>
> > The hash entry pointers point to "data + RABIN_WINDOW", and then to make
> > things even _more_ confusing, the hash calculation code is actually offset
> > by one, so it will have computed the hash with
> >
> > val = ((val << 8) | data[i]) ^ T[val >> RABIN_SHIFT];
> >
> > where "i" goes from _1_ to RABIN_WINDOW instead of 0..WINDOW-1.
> >
> > So, if I read that correctly, the "entry->ptr" actually points not to the
> > beginning of the data that was hashed, or even the end, but literally to
> > the last byte of the data that was hashed in that window.
> >
> > Isn't that just _really_ confusing?
> >
> > Or is there some sense to this?
>
> Yes, ptr points to the last byte of the window for given hash value.
>
> This is so because in the matching loop the window is scrolled byte by
> byte and the new hash value is always made of ROBIN_WINDOW-1 bytes which
> already have been processed (most probably added as literal bytes in the
> delta buffer) plus one new byte. So it makes sense to start forward
> byte matching only from that last byte to find the longest source area
> to match, especially since all the other bytes in the window are likely
> to be identical already and comparing them repeatedly for entries with
> the same hash would be wasteful in most cases.
>
> Further down, once the best offset with the longest match in the source
> buffer has been found, backward matching is performed to remove as much
> literal bytes that were added to the delta output as possible, which is
> most likely to catch the whole Robin window that hasn't been compared
> previously, but in that case the window content is compared only once.
>
> And the reason why the reference hash is computed with an offset of 1 to
> RABIN_WINDOW inclusive in create_delta_index() is to allow for quick
> initialization of the Rabin window _outside_ of the main loop in
> create_delta(). There is a comment to that effect near the top of
> create_delta_index but probably a small reminder should be added in the
> index loop as well.
>
>
> Nicolas
> -
> To unsubscribe from this list: send the line "unsubscribe git" in
> the body of a message to
majordomo@vger.kernel.org
> More majordomo info at
http://vger.kernel.org/majordomo-info.html
>