Well, even the output may actually be affected, in the case of _real_ hash
collisions (as opposed to just the hash _list_ collision that XDL_HASHLONG
caused).
So I actually think it would be better to have "uint32_t" as the hash
value - because that would mean that all diffs (or, in the case of the
block-algorithm, the deltas) are guaranteed to give the same results
regardless of architecture.
Right now, we actually generate a 64-bit hash value (BUT: for short lines,
it's likely only _interesting_ in the low bits, so the high bits tend to
have a very high likelihood of being zero). So hash collisions are
different: on a 32-bit architecture, two lines may have the same hash,
while on a 64-bit one, they are different.
And together with some of the limiters we have (eg XDL_MAX_EQLIMIT) hash
collisions can sometimes affect the output.
Admittedly, in _practice_ this is really unlikely to affect anything
(you'd get a valid diff in either case, they'd just possibly be subtly
different, and the input data must be _really_ strange to even see that
case), but I do think that the hash algorithm can matter.
NOTE! I'm not talking about XDL_HASHLONG(), I'm talking about the
xdl_hash_record() hash, which returns differently-sized hash results on
32-bit and 64-bit. And there are cases where we _only_ compare the hashes,
and don't actually double-check the contents.
So I think that in _practice_ you can't see differences between a 32-bit
version and a 64-bit one, but the possibility is there. Using "uint32_t"
instead of "unsigned long" to keep track of hashes would avoid that
theoretical problem (and might actually make for better performance on
64-bit archtiectures, if only because of denser data structures and thus
better cache behaviour).
Linus
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html