Heh. We can probably tweak the heuristics (one of the _great_ things about
content detection is that you can fix it after the fact, unlike the
alternative).
That said, I've personally actually found the content-based similarity
analysis to often be quite informative, even when (and perhaps
_especially_ when) it ended up showing something that the actual author of
the thing didn't intend.
So yeah, I've seen a few strange cases myself, but they've actually been
interesting. Like seeing how much of a file was just a copyright license,
and then a file being considered a "copy" just because it didn't actually
introduce any real new code.
Linus
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html