login
Header Space

 
 

GIT max file size.

Previous thread: automatically removing missing files beneath a directory by Geoffrey Irving on Thursday, May 8, 2008 - 12:39 pm. (5 messages)

Next thread: [PATCH] doc: clarify definition of "update" for git-add -u by Jeff King on Thursday, May 8, 2008 - 1:25 pm. (1 message)
To: <git@...>
Date: Thursday, May 8, 2008 - 12:33 pm

Hello.

I received "fatal: Out of memory, malloc failed" error when I tried to
check in file of ~2.5G

It can be argued that binary file of that size (or binary file
altogether) has no place in version control anyways, but still I pursued
it a bit more.

In #git@freenode channel I received some hints and in the end I started
running
"git-hash-object -w images/filesystem_ext2.img.bz2"

It would seem that sha1_file.c:write_sha1_file() has defined size
variable as integer.

This wraps around in "size = 8 + deflateBound(&amp;stream, len+hdrlen);" and
gives big number for mmap()
mmap(NULL, 18446744071976239104, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = -1 ENOMEM (Cannot allocate memory)

size is used in "stream.avail_out = size;" and zstream.avail_out seems
to be unsigned int
"uInt     avail_out; /* remaining free space at next_out */"

If size would be changed to unsigned would the max filesize be increased
to ~4G or would it blow up elsewhere?

Is git going to support &gt;2G files or is having "everything that is
needed to complete build process from beginning to bitter end" in
version control something that git is not meant for?

If latter is true then this would be pretty much pointless change. If
former then changing 'size' to integer won't be enough anyways...


Best Regards
Janne Pänkälä

-- 
Janne Pänkälä

--
To: epankala@cc.hut.fi <epankala@...>
Cc: <git@...>
Date: Thursday, May 8, 2008 - 4:46 pm

There's two issues there.

One is that a lot of what git does is simply "whole file at a time". The 
diff machinery, and a lot of the core stuff, simply just expects to be 
able to mmap() or read the whole file in one go. So on a 32-bit 
architecture, you'll generally be limited to the size of the address 
space, not to anything else.

The other thing is then that (partly because of the above) there probably 
are places where we haven't been as careful about size-type things as we 
could have been. A lot of the code uses size_t, but I bet not everything 
does.

		Linus
--
To: epankala@cc.hut.fi <epankala@...>
Cc: <git@...>
Date: Thursday, May 8, 2008 - 2:46 pm

That was the plan :)
--
Previous thread: automatically removing missing files beneath a directory by Geoffrey Irving on Thursday, May 8, 2008 - 12:39 pm. (5 messages)

Next thread: [PATCH] doc: clarify definition of "update" for git-add -u by Jeff King on Thursday, May 8, 2008 - 1:25 pm. (1 message)
speck-geostationary