Skip to content

Commit 02a7f6f

Browse files
pks-tgitster
authored andcommitted
packfile: fix approximation of object counts
When approximating the number of objects in a repository we only take into account two data sources, the multi-pack index and the packfile indices, as both of these data structures allow us to easily figure out how many objects they contain. But the way we currently approximate the number of objects is broken in presence of a multi-pack index. This is due to two separate reasons: - We have recently introduced initial infrastructure for incremental multi-pack indices. Starting with that series, `num_objects` only counts the number of objects of a specific layer of the MIDX chain, so we do not take into account objects from parent layers. This issue is fixed by adding `num_objects_in_base`, which contains the sum of all objects in previous layers. - When using the multi-pack index we may count objects contained in packfiles twice: once via the multi-pack index, but then we again count them via the packfile itself. This issue is fixed by skipping any packfiles that have an MIDX. Overall, given that we _always_ count the packs, we can only end up overestimating the number of objects, and the overestimation is limited to a factor of two at most. The consequences of those issues are very limited though, as we only approximate object counts in a small number of cases: - When writing a commit-graph we use the approximate object count to display the upper limit of a progress display. - In `repo_find_unique_abbrev_r()` we use it to specify a lower limit of how many hex digits we want to abbreviate to. Given that we use power-of-two here to derive the lower limit we may end up with an abbreviated hash that is one digit longer than required. - In `estimate_repack_memory()` we may end up overestimating how much memory a repack needs to pack objects. Conseuqently, we may end up dropping some packfiles from a repack. None of these are really game-changing. But it's nice to fix those issues regardless. While at it, convert the code to use `repo_for_each_pack()`. Furthermore, use `odb_prepare_alternates()` instead of explicitly preparing the packfile store. We really only want to prepare the object database sources, and `get_multi_pack_index()` already knows to prepare the packfile store for us. Helped-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
1 parent 89219bc commit 02a7f6f

File tree

1 file changed

+4
-4
lines changed

1 file changed

+4
-4
lines changed

packfile.c

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1143,16 +1143,16 @@ unsigned long repo_approximate_object_count(struct repository *r)
11431143
unsigned long count = 0;
11441144
struct packed_git *p;
11451145

1146-
packfile_store_prepare(r->objects->packfiles);
1146+
odb_prepare_alternates(r->objects);
11471147

11481148
for (source = r->objects->sources; source; source = source->next) {
11491149
struct multi_pack_index *m = get_multi_pack_index(source);
11501150
if (m)
1151-
count += m->num_objects;
1151+
count += m->num_objects + m->num_objects_in_base;
11521152
}
11531153

1154-
for (p = r->objects->packfiles->packs; p; p = p->next) {
1155-
if (open_pack_index(p))
1154+
repo_for_each_pack(r, p) {
1155+
if (p->multi_pack_index || open_pack_index(p))
11561156
continue;
11571157
count += p->num_objects;
11581158
}

0 commit comments

Comments
 (0)