Skip to content

Commit d9bccf2

Browse files
pks-tgitster
authored andcommitted
builtin/maintenance: introduce "geometric" strategy
We have two different repacking strategies in Git: - The "gc" strategy uses git-gc(1). - The "incremental" strategy uses multi-pack indices and `git multi-pack-index repack` to merge together smaller packfiles as determined by a specific batch size. The former strategy is our old and trusted default, whereas the latter has historically been used for our scheduled maintenance. But both strategies have their shortcomings: - The "gc" strategy performs regular all-into-one repacks. Furthermore it is rather inflexible, as it is not easily possible for a user to enable or disable specific subtasks. - The "incremental" strategy is not a full replacement for the "gc" strategy as it doesn't know to prune stale data. So today, we don't have a strategy that is well-suited for large repos while being a full replacement for the "gc" strategy. Introduce a new "geometric" strategy that aims to fill this gap. This strategy invokes all the usual cleanup tasks that git-gc(1) does like pruning reflogs and rerere caches as well as stale worktrees. But where it differs from both the "gc" and "incremental" strategy is that it uses our geometric repacking infrastructure exposed by git-repack(1) to repack packfiles. The advantage of geometric repacking is that we only need to perform an all-into-one repack when the object count in a repo has grown significantly. One downside of this strategy is that pruning of unreferenced objects is not going to happen regularly anymore. Every geometric repack knows to soak up all loose objects regardless of their reachability, and merging two or more packs doesn't consider reachability, either. Consequently, the number of unreachable objects will grow over time. This is remedied by doing an all-into-one repack instead of a geometric repack whenever we determine that the geometric repack would end up merging all packfiles anyway. This all-into-one repack then performs our usual reachability checks and writes unreachable objects into a cruft pack. As cruft packs won't ever be merged during geometric repacks we can thus phase out these objects over time. Of course, this still means that we retain unreachable objects for far longer than with the "gc" strategy. But the maintenance strategy is intended especially for large repositories, where the basic assumption is that the set of unreachable objects will be significantly dwarfed by the number of reachable objects. If this assumption is ever proven to be too disadvantageous we could for example introduce a time-based strategy: if the largest packfile has not been touched for longer than $T, we perform an all-into-one repack. But for now, such a mechanism is deferred into the future as it is not clear yet whether it is needed in the first place. Signed-off-by: Patrick Steinhardt <ps@pks.im> Acked-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
1 parent 40a7415 commit d9bccf2

File tree

3 files changed

+59
-1
lines changed

3 files changed

+59
-1
lines changed

Documentation/config/maintenance.adoc

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -32,6 +32,15 @@ The possible strategies are:
3232
strategy for scheduled maintenance.
3333
* `gc`: This strategy runs the `gc` task. This is the default strategy for
3434
manual maintenance.
35+
* `geometric`: This strategy performs geometric repacking of packfiles and
36+
keeps auxiliary data structures up-to-date. The strategy expires data in the
37+
reflog and removes worktrees that cannot be located anymore. When the
38+
geometric repacking strategy would decide to do an all-into-one repack, then
39+
the strategy generates a cruft pack for all unreachable objects. Objects that
40+
are already part of a cruft pack will be expired.
41+
+
42+
This repacking strategy is a full replacement for the `gc` strategy and is
43+
recommended for large repositories.
3544
* `incremental`: This setting optimizes for performing small maintenance
3645
activities that do not delete any data. This does not schedule the `gc`
3746
task, but runs the `prefetch` and `commit-graph` tasks hourly, the

builtin/gc.c

Lines changed: 31 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1891,12 +1891,43 @@ static const struct maintenance_strategy incremental_strategy = {
18911891
},
18921892
};
18931893

1894+
static const struct maintenance_strategy geometric_strategy = {
1895+
.tasks = {
1896+
[TASK_COMMIT_GRAPH] = {
1897+
.type = MAINTENANCE_TYPE_SCHEDULED | MAINTENANCE_TYPE_MANUAL,
1898+
.schedule = SCHEDULE_HOURLY,
1899+
},
1900+
[TASK_GEOMETRIC_REPACK] = {
1901+
.type = MAINTENANCE_TYPE_SCHEDULED | MAINTENANCE_TYPE_MANUAL,
1902+
.schedule = SCHEDULE_DAILY,
1903+
},
1904+
[TASK_PACK_REFS] = {
1905+
.type = MAINTENANCE_TYPE_SCHEDULED | MAINTENANCE_TYPE_MANUAL,
1906+
.schedule = SCHEDULE_DAILY,
1907+
},
1908+
[TASK_RERERE_GC] = {
1909+
.type = MAINTENANCE_TYPE_SCHEDULED | MAINTENANCE_TYPE_MANUAL,
1910+
.schedule = SCHEDULE_WEEKLY,
1911+
},
1912+
[TASK_REFLOG_EXPIRE] = {
1913+
.type = MAINTENANCE_TYPE_SCHEDULED | MAINTENANCE_TYPE_MANUAL,
1914+
.schedule = SCHEDULE_WEEKLY,
1915+
},
1916+
[TASK_WORKTREE_PRUNE] = {
1917+
.type = MAINTENANCE_TYPE_SCHEDULED | MAINTENANCE_TYPE_MANUAL,
1918+
.schedule = SCHEDULE_WEEKLY,
1919+
},
1920+
},
1921+
};
1922+
18941923
static struct maintenance_strategy parse_maintenance_strategy(const char *name)
18951924
{
18961925
if (!strcasecmp(name, "incremental"))
18971926
return incremental_strategy;
18981927
if (!strcasecmp(name, "gc"))
18991928
return gc_strategy;
1929+
if (!strcasecmp(name, "geometric"))
1930+
return geometric_strategy;
19001931
die(_("unknown maintenance strategy: '%s'"), name);
19011932
}
19021933

t/t7900-maintenance.sh

Lines changed: 19 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -930,11 +930,29 @@ test_expect_success 'maintenance.strategy is respected' '
930930
git gc --quiet --no-detach --skip-foreground-tasks
931931
EOF
932932
933-
test_strategy gc --schedule=weekly <<-\EOF
933+
test_strategy gc --schedule=weekly <<-\EOF &&
934934
git pack-refs --all --prune
935935
git reflog expire --all
936936
git gc --quiet --no-detach --skip-foreground-tasks
937937
EOF
938+
939+
test_strategy geometric <<-\EOF &&
940+
git pack-refs --all --prune
941+
git reflog expire --all
942+
git repack -d -l --geometric=2 --quiet --write-midx
943+
git commit-graph write --split --reachable --no-progress
944+
git worktree prune --expire 3.months.ago
945+
git rerere gc
946+
EOF
947+
948+
test_strategy geometric --schedule=weekly <<-\EOF
949+
git pack-refs --all --prune
950+
git reflog expire --all
951+
git repack -d -l --geometric=2 --quiet --write-midx
952+
git commit-graph write --split --reachable --no-progress
953+
git worktree prune --expire 3.months.ago
954+
git rerere gc
955+
EOF
938956
)
939957
'
940958

0 commit comments

Comments
 (0)