Re: what's a useful definition of full text index on a repository?

!MAILaRCHIVE_VOTE_RePLACE
Previous message: [thread] [date] [author]
Next message: [thread] [date] [author]
To: David Tweed <david.tweed@...>
Cc: Git Mailing List <git@...>
Date: Tuesday, October 2, 2007 - 11:48 am

On 10/2/07, David Tweed <david.tweed@gmail.com> wrote:

Dictionary compression can be used without full-text indexes. It is
just really easy to build the full-text index if the data is already
dictionary compressed. Dictionary compression works for everything
except binary or random data.

Git is already using a small scale dictionary compressor via zip. I
suspect doing a full scale dictionary for a pack file and then using
arithmetic encoding of the tokens would provide substantially more
compression. The big win is having a single dictionary instead of a
new dictionary each time zip is used.

When we were working on Mozilla, Mozilla changed licenses three times.
The license text ended up taking about 30MB in the current scheme.
With full dictionary compression this would reduce down to a few kb.

More compression is good for git. It means we can keep more data in
RAM and reduce download times. With current hardware it is almost
always better to trade off CPU to reduce IO.

-- 
Jon Smirl
jonsmirl@gmail.com
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Previous message: [thread] [date] [author]
Next message: [thread] [date] [author]

Messages in current thread:
Re: what's a useful definition of full text index on a repos..., Jon Smirl, (Tue Oct 2, 11:48 am)