Re: what's a useful definition of full text index on a repository?

!MAILaRCHIVE_VOTE_RePLACE
Previous message: [thread] [date] [author]
Next message: [thread] [date] [author]
To: Jon Smirl <jonsmirl@...>
Cc: Git Mailing List <git@...>
Date: Tuesday, October 2, 2007 - 5:34 am

On 10/1/07, Jon Smirl <jonsmirl@gmail.com> wrote:

I'd thought that keeping a full-text index of all my program files was
my dirty little secret that shows I'm not a "pro" programmer ;-)

[details snipped]

This sounds interesting in principle but is beyond what I'm thinking
in practice (particularly since I'm not in the "C is the only language
worth ever using" camp).


Well, as I say I'm not convinced it makes sense to integrate this with
existing pack stuff precisely because I don't think it's universally
useful. So you seem to end up with all the usual tricks, eg, Golomb
coding inverted indexes, etc, _if_ you treat each blob as completely
independent. I was wondering if there was anything else you can do
given the special structure that might be both more useful and more
compact?


Well, the kind of question I was thinking was "clearly you can use the
existing sort of full text indexing (eg, the stuff covered in Cleary,
Witten & Bell's covered Managing Gigabytes), but is that the most
useful way of doing things in the context of an evolving database?" If
you treat every blob as essentially a different document there are
indexing tools out there already you can use. What I was wondering was
if it's really that useful to a human user to report every revision of
a document containing those keywords even if the differences are in
other parts of the text far removed from the text containing the
keywords. I don't know the answer.


The other point is that direct searching is easier because you know
exactly what the query is at the point you have access to the full
text, whereas building an index you want to extract no more and no
less information to be able to answer all allowed queries. But I still
like the idea of getting a UMPC type thing if they become affordable.

-- 
cheers, dave tweed__________________________
david.tweed@gmail.com
Rm 124, School of Systems Engineering, University of Reading.
"we had no idea that when we added templates we were adding a Turing-
complete compile-time language." -- C++ standardisation committee
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Previous message: [thread] [date] [author]
Next message: [thread] [date] [author]

Messages in current thread:
Re: what's a useful definition of full text index on a repos..., David Tweed, (Tue Oct 2, 5:34 am)