On 10/1/07, Jon Smirl <jonsmirl@gmail.com> wrote:I'd thought that keeping a full-text index of all my program files was my dirty little secret that shows I'm not a "pro" programmer ;-) [details snipped] This sounds interesting in principle but is beyond what I'm thinking in practice (particularly since I'm not in the "C is the only language worth ever using" camp). Well, as I say I'm not convinced it makes sense to integrate this with existing pack stuff precisely because I don't think it's universally useful. So you seem to end up with all the usual tricks, eg, Golomb coding inverted indexes, etc, _if_ you treat each blob as completely independent. I was wondering if there was anything else you can do given the special structure that might be both more useful and more compact? Well, the kind of question I was thinking was "clearly you can use the existing sort of full text indexing (eg, the stuff covered in Cleary, Witten & Bell's covered Managing Gigabytes), but is that the most useful way of doing things in the context of an evolving database?" If you treat every blob as essentially a different document there are indexing tools out there already you can use. What I was wondering was if it's really that useful to a human user to report every revision of a document containing those keywords even if the differences are in other parts of the text far removed from the text containing the keywords. I don't know the answer. The other point is that direct searching is easier because you know exactly what the query is at the point you have access to the full text, whereas building an index you want to extract no more and no less information to be able to answer all allowed queries. But I still like the idea of getting a UMPC type thing if they become affordable. -- cheers, dave tweed__________________________ david.tweed@gmail.com Rm 124, School of Systems Engineering, University of Reading. "we had no idea that when we added templates we were adding a Turing- complete compile-time language." -- C++ standardisation committee - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
| hooanon05 | [PATCH 67/67] merge aufs |
| Greg Kroah-Hartman | [PATCH 008/196] Chinese: add translation of volatile-considered-harmful.txt |
| monstr | [PATCH 33/52] [microblaze] bug headers files |
| Oliver Pinter | Re: x86: 4kstacks default |
git: | |
| Jarek Poplawski | [PATCH] pkt_sched: Destroy gen estimators under rtnl_lock(). |
| Gerrit Renker | [PATCH 15/37] dccp: Set per-connection CCIDs via socket options |
| David Miller | [GIT]: Networking |
| Natalie Protasevich | [BUG] New Kernel Bugs |
