Re: [PATCH v2 3/3] count-objects: add human-readable size option

Previous thread: none

Next thread: [BUG?] git gui handles type change from file to symlink poorly by Gustaf Hendeby on Thursday, August 14, 2008 - 6:23 pm. (1 message)
To: Git Mailing List <git@...>
Cc: Marcus Griep <marcus@...>, Junio C Hamano <gitster@...>
Date: Thursday, August 14, 2008 - 6:18 pm

This patch series does three things. First, it allows count-objects
to report pack size along side loose object size in verbose output.

Second, it adds a function to strbuf to assist in formatting values
using metric orders of magnitude in both SI and binary units, thus
providing a more human-readable display format.

Third, it allows count-objects to utilize that faculty through an
additional command line argument which shows sizes in verbose
output as human readable values.

Marcus Griep (3):
count-objects: Add total pack size to verbose output
strbuf: Add method to convert byte-size to human readable form
count-objects: add human-readable size option

.gitignore | 1 +
Documentation/git-count-objects.txt | 13 ++++--
Makefile | 2 +-
builtin-count-objects.c | 31 +++++++++++-
strbuf.c | 88 +++++++++++++++++++++++++++++++++++
strbuf.h | 9 ++++
t/t0031-human-readable.sh | 9 ++++
test-human-read.c | 64 +++++++++++++++++++++++++
8 files changed, 209 insertions(+), 8 deletions(-)
create mode 100755 t/t0031-human-readable.sh
create mode 100644 test-human-read.c

--

To: Git Mailing List <git@...>
Cc: Marcus Griep <marcus@...>, Junio C Hamano <gitster@...>
Date: Friday, August 15, 2008 - 12:20 am

Adds the total pack size (including indexes) the verbose count-objects
output, floored to the nearest kilobyte.

Updates documentation to match this addition.

Signed-off-by: Marcus Griep <marcus@griep.us>
---
Documentation/git-count-objects.txt | 5 +++--
builtin-count-objects.c | 3 +++
2 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/Documentation/git-count-objects.txt b/Documentation/git-count-objects.txt
index 75a8da1..6bc1c21 100644
--- a/Documentation/git-count-objects.txt
+++ b/Documentation/git-count-objects.txt
@@ -21,8 +21,9 @@ OPTIONS
--verbose::
In addition to the number of loose objects and disk
space consumed, it reports the number of in-pack
- objects, number of packs, and number of objects that can be
- removed by running `git prune-packed`.
+ objects, number of packs, disk space consumed by those packs,
+ and number of objects that can be removed by running
+ `git prune-packed`.

Author
diff --git a/builtin-count-objects.c b/builtin-count-objects.c
index 91b5487..249040b 100644
--- a/builtin-count-objects.c
+++ b/builtin-count-objects.c
@@ -104,6 +104,7 @@ int cmd_count_objects(int argc, const char **argv, const char *prefix)
if (verbose) {
struct packed_git *p;
unsigned long num_pack = 0;
+ unsigned long size_pack = 0;
if (!packed_git)
prepare_packed_git();
for (p = packed_git; p; p = p->next) {
@@ -112,12 +113,14 @@ int cmd_count_objects(int argc, const char **argv, const char *prefix)
if (open_pack_index(p))
continue;
packed += p->num_objects;
+ size_pack += p->pack_size + p->index_size;
num_pack++;
}
printf("count: %lu\n", loose);
printf("size: %lu\n", loose_size / 2);
printf("in-pack: %lu\n", packed);
printf("packs: %lu\n", num_pack);
+ printf("size-pack: %lu\n", size_pack / 1024);
printf("prune-packable: %lu\n", packed_loose);
printf("garbage: %lu\n", garbage);
}
--
1.6.0.rc2.6.g8eda3

--

To: Git Mailing List <git@...>
Cc: Marcus Griep <marcus@...>, Junio C Hamano <gitster@...>
Date: Friday, August 15, 2008 - 11:47 am

Adds the total pack size (including indexes) the verbose count-objects
output, floored to the nearest kilobyte.

As well, t5500 is sensitive to changes in the output of git-count-objects
and is updated for the recent addition of size-pack to that command.

Updates documentation to match this addition.

Signed-off-by: Marcus Griep <marcus@griep.us>
---

I realized that there was a breaking test in there that depended upon the
output of git-count-objects. I updated that test case to understand the
additional output.

Documentation/git-count-objects.txt | 5 +++--
builtin-count-objects.c | 3 +++
t/t5500-fetch-pack.sh | 3 ++-
3 files changed, 8 insertions(+), 3 deletions(-)

diff --git a/Documentation/git-count-objects.txt b/Documentation/git-count-objects.txt
index 75a8da1..6bc1c21 100644
--- a/Documentation/git-count-objects.txt
+++ b/Documentation/git-count-objects.txt
@@ -21,8 +21,9 @@ OPTIONS
--verbose::
In addition to the number of loose objects and disk
space consumed, it reports the number of in-pack
- objects, number of packs, and number of objects that can be
- removed by running `git prune-packed`.
+ objects, number of packs, disk space consumed by those packs,
+ and number of objects that can be removed by running
+ `git prune-packed`.

Author
diff --git a/builtin-count-objects.c b/builtin-count-objects.c
index 91b5487..249040b 100644
--- a/builtin-count-objects.c
+++ b/builtin-count-objects.c
@@ -104,6 +104,7 @@ int cmd_count_objects(int argc, const char **argv, const char *prefix)
if (verbose) {
struct packed_git *p;
unsigned long num_pack = 0;
+ unsigned long size_pack = 0;
if (!packed_git)
prepare_packed_git();
for (p = packed_git; p; p = p->next) {
@@ -112,12 +113,14 @@ int cmd_count_objects(int argc, const char **argv, const char *prefix)
if (open_pack_index(p))
continue;
packed += p->num_objects;
+ size_pack += p->pack_size + p->index_size;
...

To: Git Mailing List <git@...>
Cc: Marcus Griep <marcus@...>, Junio C Hamano <gitster@...>
Date: Friday, August 15, 2008 - 12:20 am

Takes a strbuf as its first argument and appends the human-readable
form of 'value', the second argument, to that buffer.

e.g. strbuf_append_human_readable(sb, 1012, 0, 0, HR_SPACE)
produces "0.9 Ki".

Documented in strbuf.h; units can be directly appended to the strbuf
to produce "0.9 KiB/s", "0.9 KiB", or other unit.

Supports SI magnitude prefixes as well:
e.g. strbuf_append_human_readable(sb, 2024, 0, 0, HR_USE_SI)
produces "2.0k", to which a unit can be appended, such as " objects"
to produce "2.0k objects".

Also, add in test cases to ensure it produces the expected output
and to demonstrate what different arguments do.

Signed-off-by: Marcus Griep <marcus@griep.us>
---
.gitignore | 1 +
Makefile | 2 +-
strbuf.c | 92 +++++++++++++++++++++++++++++++++++++++++++++
strbuf.h | 30 +++++++++++++++
t/t0031-human-readable.sh | 9 ++++
test-human-read.c | 68 +++++++++++++++++++++++++++++++++
6 files changed, 201 insertions(+), 1 deletions(-)
create mode 100755 t/t0031-human-readable.sh
create mode 100644 test-human-read.c

diff --git a/.gitignore b/.gitignore
index a213e8e..d65fa75 100644
--- a/.gitignore
+++ b/.gitignore
@@ -146,6 +146,7 @@ test-date
test-delta
test-dump-cache-tree
test-genrandom
+test-human-read
test-match-trees
test-parse-options
test-path-utils
diff --git a/Makefile b/Makefile
index 90c5a13..f17ab76 100644
--- a/Makefile
+++ b/Makefile
@@ -1297,7 +1297,7 @@ endif

### Testing rules

-TEST_PROGRAMS = test-chmtime$X test-genrandom$X test-date$X test-delta$X test-sha1$X test-match-trees$X test-parse-options$X test-path-utils$X
+TEST_PROGRAMS = test-chmtime$X test-genrandom$X test-date$X test-delta$X test-sha1$X test-match-trees$X test-parse-options$X test-path-utils$X test-human-read$X

all:: $(TEST_PROGRAMS)

diff --git a/strbuf.c b/strbuf.c
index 720737d..d9888fb 100644
--- a/strbuf.c
+++ b/strbuf.c
@@ -308,3 +308,95 @@ int st...

To: Git Mailing List <git@...>
Cc: Marcus Griep <marcus@...>, Junio C Hamano <gitster@...>
Date: Friday, August 15, 2008 - 12:20 am

Adds a human readable size option to the verbose output
of count-objects for loose and pack object size totals.

Updates documentation to match.

Signed-off-by: Marcus Griep <marcus@griep.us>
---
Documentation/git-count-objects.txt | 6 +++++-
builtin-count-objects.c | 29 +++++++++++++++++++++++++----
2 files changed, 30 insertions(+), 5 deletions(-)

diff --git a/Documentation/git-count-objects.txt b/Documentation/git-count-objects.txt
index 6bc1c21..dc7b5b1 100644
--- a/Documentation/git-count-objects.txt
+++ b/Documentation/git-count-objects.txt
@@ -7,7 +7,7 @@ git-count-objects - Count unpacked number of objects and their disk consumption

SYNOPSIS
--------
-'git count-objects' [-v]
+'git count-objects' [-v [-H]]

DESCRIPTION
-----------
@@ -25,6 +25,10 @@ OPTIONS
and number of objects that can be removed by running
`git prune-packed`.

+-H::
+--human-sizes::
+ Displays sizes reported by `--verbose` in a more
+ human-readable format. (e.g. 22M or 1.5G)

Author
------
diff --git a/builtin-count-objects.c b/builtin-count-objects.c
index 249040b..6b1cd37 100644
--- a/builtin-count-objects.c
+++ b/builtin-count-objects.c
@@ -7,6 +7,7 @@
#include "cache.h"
#include "builtin.h"
#include "parse-options.h"
+#include "strbuf.h"

static void count_objects(DIR *d, char *path, int len, int verbose,
unsigned long *loose,
@@ -67,13 +68,13 @@ static void count_objects(DIR *d, char *path, int len, int verbose,
}

static char const * const count_objects_usage[] = {
- "git count-objects [-v]",
+ "git count-objects [-v [-H]]",
NULL
};

int cmd_count_objects(int argc, const char **argv, const char *prefix)
{
- int i, verbose = 0;
+ int i, verbose = 0, human_readable = 0;
const char *objdir = get_object_directory();
int len = strlen(objdir);
char *path = xmalloc(len + 50);
@@ -81,6 +82,8 @@ int cmd_count_objects(int argc, const char **argv, const char *prefix)
unsigned long loose_size = 0;
struct o...

To: Git Mailing List <git@...>
Cc: Marcus Griep <marcus@...>, Junio C Hamano <gitster@...>
Date: Thursday, August 14, 2008 - 6:18 pm

Adds the total pack size (including indexes) the verbose count-objects
output, floored to the nearest kilobyte.

Signed-off-by: Marcus Griep <marcus@griep.us>
---
builtin-count-objects.c | 3 +++
1 files changed, 3 insertions(+), 0 deletions(-)

diff --git a/builtin-count-objects.c b/builtin-count-objects.c
index 91b5487..249040b 100644
--- a/builtin-count-objects.c
+++ b/builtin-count-objects.c
@@ -104,6 +104,7 @@ int cmd_count_objects(int argc, const char **argv, const char *prefix)
if (verbose) {
struct packed_git *p;
unsigned long num_pack = 0;
+ unsigned long size_pack = 0;
if (!packed_git)
prepare_packed_git();
for (p = packed_git; p; p = p->next) {
@@ -112,12 +113,14 @@ int cmd_count_objects(int argc, const char **argv, const char *prefix)
if (open_pack_index(p))
continue;
packed += p->num_objects;
+ size_pack += p->pack_size + p->index_size;
num_pack++;
}
printf("count: %lu\n", loose);
printf("size: %lu\n", loose_size / 2);
printf("in-pack: %lu\n", packed);
printf("packs: %lu\n", num_pack);
+ printf("size-pack: %lu\n", size_pack / 1024);
printf("prune-packable: %lu\n", packed_loose);
printf("garbage: %lu\n", garbage);
}
--
1.6.0.rc2.6.g8eda3

--

To: Git Mailing List <git@...>
Cc: Marcus Griep <marcus@...>, Junio C Hamano <gitster@...>
Date: Thursday, August 14, 2008 - 6:18 pm

Takes a strbuf as its first argument and appends the human-readable
form of 'value', the second argument, to that buffer.

Argument 3, 'maxlen', specifies the longest string that should
be returned. That will make it easier for any pretty-ish formatting
like `ls` and `du` use. A value of 0 is unlimited length.

'scale' specifies a boundary, above which 'value' should be
reduced, and below which it should be reported. Commonly this is
1000. If 0, then it will find a scale that best fits into 'maxlen'.
If both 'maxlen' and 'scale' are 0, then scale will default to 1000.

'suffix' is appended onto every formatted string. This is often
"", "B", "bps", "objects" etc.

'flags' provides the ability to switch between a binary (1024)
and an si (1000) period (HR_USE_SI). Also, adding a space between
number and unit (HR_SPACE).

On success, returns 0. If maxlen is specified and there is not
enough space given the scale or an inordinately large value, returns
-n, where n is the amount of additional length necessary.

e.g. strbuf_append_human_readable(sb, 1012, 0, 0, "bps", HR_SPACE)
produces "0.9 Kbps".

Also, add in test cases to ensure it produces the expected output
and to demonstrate what different arguments do.

Signed-off-by: Marcus Griep <marcus@griep.us>
---
.gitignore | 1 +
Makefile | 2 +-
strbuf.c | 88 +++++++++++++++++++++++++++++++++++++++++++++
strbuf.h | 9 +++++
t/t0031-human-readable.sh | 9 +++++
test-human-read.c | 64 ++++++++++++++++++++++++++++++++
6 files changed, 172 insertions(+), 1 deletions(-)
create mode 100755 t/t0031-human-readable.sh
create mode 100644 test-human-read.c

diff --git a/.gitignore b/.gitignore
index a213e8e..d65fa75 100644
--- a/.gitignore
+++ b/.gitignore
@@ -146,6 +146,7 @@ test-date
test-delta
test-dump-cache-tree
test-genrandom
+test-human-read
test-match-trees
test-parse-options
test-path-utils
diff --git a/Makefil...

To: Marcus Griep <marcus@...>
Cc: Junio C Hamano <gitster@...>, Git Mailing List <git@...>
Date: Thursday, August 14, 2008 - 6:34 pm

Frankly, I doubt this has too much value, and it complicates the code _a
lot_. If you can't fit your stuff into pretty column, it's better to
just print whatever you have to and disrupt the columns instead of

My point still stands - in case of binary units, we should always
consistently use the i suffix. So having an example in the commit
message that advertises "bps" is simply wrong when it should read "iB/s"
(like it does with the current progress.c code).

I may sound boring, but it seems to me that you're still ignoring my
point quitly without proper counter-argumentation and I think it's an
important want, and since it's so hard to keep things consistent across

Hmmm. We could have

+ char *hr_prefixes[] = {
+ "", "Ki", "Mi", "Gi", "Ti", "Pi", "Ei", "Zi", "Yi", NULL
+ };
+ char *hr_si_prefixes[] = {
+ "", "k", "M", "G", "T", "P", "E", "Z", "Y", NULL
+ };

;-)

--
Petr "Pasky" Baudis
The next generation of interesting software will be done
on the Macintosh, not the IBM PC. -- Bill Gates
--

To: Petr Baudis <pasky@...>
Cc: Junio C Hamano <gitster@...>, Git Mailing List <git@...>
Date: Thursday, August 14, 2008 - 8:46 pm

Generally, the only reason for such a failure would be requesting
conflicting scale and maxlen values or requesting a maxlen which would
be too small to reasonably display any value, hence an empty string and
a number reporting how much more is necessary to get appropriate output.

In other words, a failure reports that you probably requested an
irrational number for maxlen. It would probably be easier to understand
if it were in terms of the numeric output rather than the entire string.
If I change it that way, then there shouldn't be an "irrational" positive

From a consistency standpoint, I can certainly agree. It's not hard to
implement. I wanted to avoid pigeon-holeing, but to keep our reporting

No damage. It was indicated to me that 80-ish was the preferred width, so
I was trying to follow that. If that's not true in the C sources, I'll

Per previous, sounds good to me.

Overall, I was looking to create a generic function that could be used
across Git without making assumptions of the consumer. Hence the maxlen,
scale, SI, and space configurability.

Thanks for the input, and I'll work up another draft.

--
Marcus Griep
GPG Key ID: 0x5E968152
——
http://www.boohaunt.net
את.ψο´
--

To: Marcus Griep <marcus@...>
Cc: Junio C Hamano <gitster@...>, Petr Baudis <pasky@...>, Git Mailing List <git@...>
Date: Thursday, August 14, 2008 - 8:52 pm

The sources are roughly:

- 80 columns wide, try not to make it wider;

- ident leading part of line with _TABs_ not spaces;

- tab width in your editor should be 8, so you see when you
are too wide;

- one tab per block in C;

- don't align variable names in declarations;

- use {} only around complex statements;

... there's more. But if you follow the above along with "dammit,
make it match the code above and below where you are inserting"
you get it pretty quick.

--
Shawn.
--

To: Petr Baudis <pasky@...>
Cc: Marcus Griep <marcus@...>, Git Mailing List <git@...>
Date: Thursday, August 14, 2008 - 7:04 pm

I pretty much agree with everything you said in this thread. In addition,
I wonder if we would want to be able to say:

960 bps
0.9 KiB/s
2.3 MiB/s

IOW, I do not think it is a good idea to have the list of "prefixes" in
this function and force callers to _append_ unit. You might be better off
by making the interface to the function to pass something like this:

struct human_unit {
char *unitname;
unsigned long valuescale;
} bps_to_human[] = {
{ "bps", 1 },
{ "KiB/s", 1024 },
{ "MiB/s", 1024 * 1024 },
{ NULL, 0 },
};

and perhaps give canned set of unit list for sizes and throughputs as
convenience.

By doing so, you could even do this:

struct human_unit bits_to_human[] = {
{ "bits", 1 },
{ "bytes", 8 },
{ "Kbytes", 8 * 1024 },
{ "Mbytes", 8 * 1024 * 1024 },
{ NULL, 0 },
};

I also am not particularly happy about using "double" in this API. Most
of the callers that gather stats in the rest of the codebase count in
(long) integers as far as I can tell, and it may be conceptually cleaner
to keep the use of double as an internal implementation issue of this
particular function.
--

To: Junio C Hamano <gitster@...>
Cc: Petr Baudis <pasky@...>, Git Mailing List <git@...>
Date: Thursday, August 14, 2008 - 8:53 pm

A pedantic note, but bps is bits per second, whereas Bps or B/s is bytes
per second. Requiring the same suffix for each would prevent some easily

The function requires the use of a double for fractional parts, so letting
the compiler perform an upcast were necessary seems innocuous to me.

Also, Yibi-/Zibi-, unlikely to be used as they are, won't fit in a long.

--
Marcus Griep
GPG Key ID: 0x5E968152
——
http://www.boohaunt.net
את.ψο´
--

To: Junio C Hamano <gitster@...>
Cc: Marcus Griep <marcus@...>, Git Mailing List <git@...>
Date: Thursday, August 14, 2008 - 7:24 pm

I dont hink it would be a big deal to say bytes/s or B/s (which is as

Frankly, my gut feeling here is that we are overengineering the whole
thing quite a bit, which is the same reason I dislike maxlen.

If it turns out we really do need to have custom prefixes somewhere, we
would have to go with something like this, but on the other hand this
goes against the consistency of output, and I have a bit of trouble
imagining a convincing use-case. So far we have two fairly different
users of such a code and just tailoring it to these two seems to make it
general enough for now.

--
Petr "Pasky" Baudis
The next generation of interesting software will be done
on the Macintosh, not the IBM PC. -- Bill Gates
--

To: Petr Baudis <pasky@...>
Cc: Marcus Griep <marcus@...>, Git Mailing List <git@...>
Date: Thursday, August 14, 2008 - 7:40 pm

Fair enough.
--

To: Git Mailing List <git@...>
Cc: Marcus Griep <marcus@...>, Junio C Hamano <gitster@...>
Date: Thursday, August 14, 2008 - 6:18 pm

Adds a human readable size option to the verbose output
of count-objects for loose and pack object size totals.

Updates documentation to match.

Signed-off-by: Marcus Griep <marcus@griep.us>
---
Documentation/git-count-objects.txt | 13 +++++++++----
builtin-count-objects.c | 30 ++++++++++++++++++++++++++----
2 files changed, 35 insertions(+), 8 deletions(-)

diff --git a/Documentation/git-count-objects.txt b/Documentation/git-count-objects.txt
index 75a8da1..291bc5e 100644
--- a/Documentation/git-count-objects.txt
+++ b/Documentation/git-count-objects.txt
@@ -7,7 +7,7 @@ git-count-objects - Count unpacked number of objects and their disk consumption

SYNOPSIS
--------
-'git count-objects' [-v]
+'git count-objects' [-v [-H]]

DESCRIPTION
-----------
@@ -21,9 +21,14 @@ OPTIONS
--verbose::
In addition to the number of loose objects and disk
space consumed, it reports the number of in-pack
- objects, number of packs, and number of objects that can be
- removed by running `git prune-packed`.
-
+ objects, number of packs, disk space consumed by those packs
+ and number of objects that can be removed by running
+ `git prune-packed`.
+
+-H::
+--human-sizes::
+ Displays sizes reported by `--verbose` in a more
+ human-readable format. (e.g. 22M or 1.5G)

Author
------
diff --git a/builtin-count-objects.c b/builtin-count-objects.c
index 249040b..4be9d7e 100644
--- a/builtin-count-objects.c
+++ b/builtin-count-objects.c
@@ -7,6 +7,7 @@
#include "cache.h"
#include "builtin.h"
#include "parse-options.h"
+#include "strbuf.h"

static void count_objects(DIR *d, char *path, int len, int verbose,
unsigned long *loose,
@@ -67,13 +68,13 @@ static void count_objects(DIR *d, char *path, int len, int verbose,
}

static char const * const count_objects_usage[] = {
- "git count-objects [-v]",
+ "git count-objects [-v [-H]]",
NULL
};

int cmd_count_objects(int argc, const char **argv, const char *prefix)
{
- int i, v...

To: Marcus Griep <marcus@...>
Cc: Junio C Hamano <gitster@...>, Git Mailing List <git@...>
Date: Thursday, August 14, 2008 - 6:37 pm

If it's non-human-readable anyway, why are you dividing this by 1024? At
any rate, it is not obvious at all that the size-pack is not actually
size-pack but size-pack/1024. You should either add the (fixed) unit
string behind or name it size-pack-kb - or just not divide it at all?

This also applies to PATCH1/3 in case it would get applied but the other
two wouldn't.

--
Petr "Pasky" Baudis
The next generation of interesting software will be done
on the Macintosh, not the IBM PC. -- Bill Gates
--

To: Petr Baudis <pasky@...>
Cc: Junio C Hamano <gitster@...>, Git Mailing List <git@...>
Date: Thursday, August 14, 2008 - 7:52 pm

I divide by 1024 here because the loose object size is reported in KiB.
The total that ends up in size_pack is in B, hence to be consistent, I
report the pack size in KiB as well.

--
Marcus Griep
GPG Key ID: 0x5E968152
——
http://www.boohaunt.net
את.ψο´
--

To: Marcus Griep <marcus@...>
Cc: Petr Baudis <pasky@...>, Git Mailing List <git@...>
Date: Thursday, August 14, 2008 - 8:10 pm

Wasn't the problem about documenting the addition of pack size, which is

I thought this part was ok without room for dispute.
--

Previous thread: none

Next thread: [BUG?] git gui handles type change from file to symlink poorly by Gustaf Hendeby on Thursday, August 14, 2008 - 6:23 pm. (1 message)