Ok, here is some suggestion:
Here is the inner loop for this stuff:
for (i = 0; i < ssz; i++) {
struct spanhash *s = &(src_count->data[i]);
struct spanhash *d;
unsigned dst_cnt, src_cnt;
if (!s->cnt)
continue;
src_cnt = s->cnt;
d = spanhash_find(dst_count, s->hashval);
dst_cnt = d ? d->cnt : 0;
if (src_cnt < dst_cnt) {
la += dst_cnt - src_cnt;
sc += src_cnt;
}
else
sc += dst_cnt;
}
Now here is how one could optimize the data structures: The hash
structures are with linear probing, and we try to find any hash
matches from source to destination. If we sort all hashes indexed to
a given first hash bucket by their full hash value, then one could
basically use passes similar to list merges for figuring the 1:1
relations. That cuts down the O(l n) cost (where n is the number of
elements and l their average run length) to O(n).
Of course, making l close to 1 by keeping the hash utilization
reasonably low is much simpler.
--
David Kastrup, Kriemhildstr. 15, 44793 Bochum
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html