Re: Git import of the recent full enwiki dump

Previous message: [thread] [date] [author]
Next message: [thread] [date] [author]
From: Shawn O. Pearce
Date: Friday, April 16, 2010 - 5:53 pm

Sebastian Bober <sbober@servercare.de> wrote:

Well, to be fair to fast-import, its tree handling code is linear
scan based, because that's how any other part of Git handles trees.

If you just toss all 19M wiki pages into a single top level tree,
that's going to take a very long time to locate the wiki page
talking about Zoos.


Really, fast-import should be able to handle this well, assuming you
aren't just tossing all 19M files into a single massive directory
and hoping for the best.  Because *any* program working on that
sort of layout will need to spit out the 19M entry tree object on
each and every commit, just so it can compute the SHA-1 checksum
to get the tree name for the commit.

-- 
Shawn.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Previous message: [thread] [date] [author]
Next message: [thread] [date] [author]

Messages in current thread:
Git import of the recent full enwiki dump, Richard Hartmann, (Fri Apr 16, 4:47 pm)
Re: Git import of the recent full enwiki dump, Sverre Rabbelier, (Fri Apr 16, 5:19 pm)
Re: Git import of the recent full enwiki dump, Sebastian Bober, (Fri Apr 16, 5:48 pm)
Re: Git import of the recent full enwiki dump, Shawn O. Pearce, (Fri Apr 16, 5:53 pm)
Re: Git import of the recent full enwiki dump, Sebastian Bober, (Fri Apr 16, 6:01 pm)
Re: Git import of the recent full enwiki dump, Richard Hartmann, (Fri Apr 16, 6:10 pm)
Re: Git import of the recent full enwiki dump, Shawn O. Pearce, (Fri Apr 16, 6:18 pm)
Re: Git import of the recent full enwiki dump, Sebastian Bober, (Fri Apr 16, 6:25 pm)