login
Header Space

 
 

Re: On Tabs and Spaces

Score:
Previous message: [thread] [date] [author]
Next message: [thread] [date] [author]
To: Christer Weinigel <christer@...>
Cc: Johannes Schindelin <Johannes.Schindelin@...>, Tom Tobin <korpios@...>, <git@...>
Date: Wednesday, October 17, 2007 - 7:53 pm

On Thu, 18 Oct 2007, Christer Weinigel wrote:

I suspect it works quite well in practice.

But we've had to tweak the xdiff code before, and the hash calculations 
for bucket size limits. If somebody actually points out a problem case, we 
can probably tweak it again.


In general, *any* situation where you have tons of character sequences 
that are the same (and here it's not the characters *themselves* that have 
to be the same - it's the *sequence* that has to be the same, so it's not 
about repeating the same character over and over per se: it's about 
repeating a certain block of characters many many times in the source 
code) will be problematic for pretty much any similarity analysis.

Why? Because you just have a lot of the same sequence, and to get a good 
delta you want to find common "sequences of these sequences" (call them 
supersequences) in order to find the biggest common chunk.

So the badly performing cases for any delta algorithm (and I do want to 
point out that this has nothing what-so-ever to do with the particular one 
that git uses) tends to be exactly the ones where you have lots and lots 
of smaller chunks that match in two files, and that then makes it costlier 
to find the *bigger* chunks that are build up of those smaller chunks.

And generally you tend to have two situations: you either (a) take *much* 
longer to find the common areas (they are often quadratic or worse 
algorithms) or (b) you decide to ignore chunks that are so common that 
they don't really add any real information when it comes to finding truly 
common chunks. Where that second choice generally means that you can miss 
some cases where you *could* have found a good match for deltification.

In fact, usually you have a combination of the above two effects: certain 
deltas may be more expensive to find but there is also a limit that kicks 
in and means that you never spend *too* much time on finding them if the 
pattern space is not amenable to it.

Would lots of spaces be such a pattern? I personally doubt it would really 
matter. In general, source code is easy to delta: the bulk of any common 
sequences in most files will be found by the trivial "look for common 
sequences in the beginning and the end". The really *bad* cases tend to be 
rather odd, and often generated files.

So no, I don't think deltification is a huge deal for spaces. But it does 
boil down to the same kind of issues: if you blow up the source base by 
20%, you often slow down things by 20% or more, simply because there is 
more data to process at all stages. It simply just slows down everything - 
totally unnecessarily.

			Linus
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Previous message: [thread] [date] [author]
Next message: [thread] [date] [author]

Messages in current thread:
Re: On Tabs and Spaces, Jari Aalto, (Tue Oct 16, 1:06 pm)
Re: On Tabs and Spaces, Linus Torvalds, (Tue Oct 16, 3:20 pm)
Re: On Tabs and Spaces, Dmitry Potapov, (Thu Oct 18, 12:36 am)
Re: On Tabs and Spaces, Sam Ravnborg, (Tue Oct 16, 4:56 pm)
Re: On Tabs and Spaces, Mike Hommey, (Tue Oct 16, 3:36 pm)
Re: On Tabs and Spaces, Linus Torvalds, (Tue Oct 16, 3:47 pm)
Re: On Tabs and Spaces, Matthieu Moy, (Tue Oct 16, 4:32 pm)
Re: On Tabs and Spaces, Linus Torvalds, (Tue Oct 16, 3:51 pm)
Re: On Tabs and Spaces, Tom Tobin, (Tue Oct 16, 4:18 pm)
Re: On Tabs and Spaces, Linus Torvalds, (Tue Oct 16, 7:05 pm)
Re: On Tabs and Spaces, Christer Weinigel, (Tue Oct 16, 7:51 pm)
Re: On Tabs and Spaces, Linus Torvalds, (Tue Oct 16, 8:45 pm)
Re: On Tabs and Spaces, Michael Witten, (Tue Oct 16, 11:08 pm)
Re: On Tabs and Spaces, Linus Torvalds, (Tue Oct 16, 11:29 pm)
Re: On Tabs and Spaces, Luke Lu, (Wed Oct 17, 3:17 am)
Re: On Tabs and Spaces, Michael Witten, (Wed Oct 17, 5:09 am)
Re: On Tabs and Spaces, Nikolai Weibull, (Wed Oct 17, 6:21 am)
Re: On Tabs and Spaces, Michael Witten, (Wed Oct 17, 7:23 am)
Re: On Tabs and Spaces, Luke Lu, (Wed Oct 17, 6:03 am)
Re: On Tabs and Spaces, Linus Torvalds, (Wed Oct 17, 11:53 am)
Re: On Tabs and Spaces, Johannes Schindelin, (Wed Oct 17, 2:05 pm)
Re: On Tabs and Spaces, Jeff King, (Wed Oct 17, 8:32 pm)
Re: On Tabs and Spaces, Linus Torvalds, (Wed Oct 17, 8:59 pm)
Re: On Tabs and Spaces, Jeff King, (Wed Oct 17, 10:45 pm)
Re: On Tabs and Spaces, , (Wed Oct 17, 11:03 pm)
Re: On Tabs and Spaces, Jeff King, (Wed Oct 17, 11:00 pm)
Re: On Tabs and Spaces, Linus Torvalds, (Wed Oct 17, 11:32 pm)
Re: On Tabs and Spaces, Linus Torvalds, (Thu Oct 18, 12:17 am)
Re: On Tabs and Spaces, Nicolas Pitre, (Thu Oct 18, 12:52 am)
Re: On Tabs and Spaces, Jeff King, (Thu Oct 18, 12:54 am)
Re: On Tabs and Spaces, Jeff King, (Thu Oct 18, 12:55 am)
Re: On Tabs and Spaces, Linus Torvalds, (Wed Oct 17, 11:13 pm)
Re: On Tabs and Spaces, Jeff King, (Wed Oct 17, 11:23 pm)
Re: [PATCH] git-gc: improve wording of --auto notification, Shawn O. Pearce, (Thu Oct 18, 9:24 pm)
Re: On Tabs and Spaces, Tom Tobin, (Wed Oct 17, 2:25 pm)
Re: On Tabs and Spaces, Linus Torvalds, (Wed Oct 17, 2:54 pm)
Re: On Tabs and Spaces, Jan Wielemaker, (Wed Oct 17, 3:47 pm)
Re: On Tabs and Spaces, Tom Tobin, (Wed Oct 17, 3:33 pm)
Re: On Tabs and Spaces, Johannes Schindelin, (Wed Oct 17, 5:08 pm)
Re: On Tabs and Spaces, Nicolas Pitre, (Wed Oct 17, 3:48 pm)
Re: On Tabs and Spaces, Linus Torvalds, (Wed Oct 17, 3:53 pm)
Re: On Tabs and Spaces, Christer Weinigel, (Wed Oct 17, 5:21 pm)
Re: On Tabs and Spaces, Johannes Schindelin, (Wed Oct 17, 6:11 pm)
Re: On Tabs and Spaces, Christer Weinigel, (Wed Oct 17, 7:17 pm)
Re: On Tabs and Spaces, Johannes Schindelin, (Wed Oct 17, 7:44 pm)
Re: On Tabs and Spaces, Christer Weinigel, (Wed Oct 17, 8:31 pm)
Re: On Tabs and Spaces, Andreas Ericsson, (Thu Oct 18, 2:02 am)
Re: On Tabs and Spaces, Linus Torvalds, (Wed Oct 17, 7:53 pm)
Re: On Tabs and Spaces, Linus Torvalds, (Wed Oct 17, 6:03 pm)
Re: On Tabs and Spaces, David Kastrup, (Thu Oct 18, 2:25 am)
Re: On Tabs and Spaces, Linus Torvalds, (Wed Oct 17, 3:44 pm)
Re: On Tabs and Spaces, David Kastrup, (Wed Oct 17, 4:31 pm)
Re: On Tabs and Spaces, Josh England, (Wed Oct 17, 3:52 pm)
speck-geostationary