Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WORK-IN-PROGRESS] Introduce the path walk API into Git for Windows #5146

Closed
wants to merge 45 commits into from
Closed
Show file tree
Hide file tree
Changes from 1 commit
Commits
Show all changes
45 commits
Select commit Hold shift + click to select a range
6354d7a
path-walk: introduce an object walk by path
derrickstolee Aug 29, 2024
c8e08c3
backfill: add builtin boilerplate
derrickstolee Jun 7, 2024
b05a276
backfill: basic functionality and tests
derrickstolee Sep 1, 2024
e02f7b3
backfill: add --batch-size=<n> option
derrickstolee Sep 1, 2024
4236e4f
backfill: add --sparse option
derrickstolee Sep 1, 2024
31c9b45
path-walk: allow consumer to specify object types
derrickstolee Sep 1, 2024
356abc9
backfill: assume --sparse when sparse-checkout is enabled
derrickstolee Sep 1, 2024
3a421ff
path-walk: allow visiting tags
derrickstolee Sep 9, 2024
b9471b6
survey: stub in new experimental `git-survey` command
jeffhostetler Apr 29, 2024
c4b3490
survey: add command line opts to select references
jeffhostetler Apr 29, 2024
ca37a49
survey: collect the set of requested refs
jeffhostetler Apr 29, 2024
0a20a17
survey: start pretty printing data in table form
derrickstolee Sep 1, 2024
91c4d57
survey: add object count summary
derrickstolee Sep 2, 2024
53632be
revision: create mark_trees_uninteresting_dense()
derrickstolee Sep 6, 2024
c63928e
survey: summarize total sizes by object type
derrickstolee Sep 2, 2024
3e9b671
path-walk: add prune_all_uninteresting option
derrickstolee Sep 4, 2024
af7d53f
survey: show progress during object walk
derrickstolee Sep 2, 2024
d192ae7
pack-objects: add --path-walk option
derrickstolee Sep 5, 2024
5f7e131
survey: add ability to track prioritized lists
derrickstolee Sep 2, 2024
ab0bc08
pack-objects: extract should_attempt_deltas()
derrickstolee Sep 6, 2024
bd8b5b5
survey: add report of "largest" paths
derrickstolee Sep 2, 2024
c6d4832
pack-objects: introduce GIT_TEST_PACK_PATH_WALK
derrickstolee Sep 6, 2024
c2092f0
p5313: add size comparison test
derrickstolee Aug 28, 2024
bbc57f7
repack: add --path-walk option
derrickstolee Sep 5, 2024
32fca07
pack-objects: enable --path-walk via config
derrickstolee Sep 5, 2024
c145b9e
pack-objects: add --full-name-hash option
derrickstolee Sep 7, 2024
72191a0
test-name-hash: add helper to compute name-hash functions
derrickstolee Sep 8, 2024
5039f03
p5314: add a size test for name-hash collisions
derrickstolee Sep 9, 2024
e43582c
scalar: enable path-walk during push via config
derrickstolee Sep 5, 2024
88fee5b
pack-objects: output debug info about deltas
derrickstolee Aug 28, 2024
d17e503
Merge branch 'backfill'
dscho Sep 15, 2024
d7e7283
Merge branch 'survey'
dscho Sep 15, 2024
98a5786
Merge branch 'pack-path-walk'
dscho Sep 15, 2024
9d0690a
Merge branch 'path-walk'
dscho Sep 15, 2024
556335a
fixup! survey: collect the set of requested refs
dscho Sep 15, 2024
69aa8d8
fixup! pack-objects: output debug info about deltas
dscho Sep 15, 2024
5001883
fixup! survey: summarize total sizes by object type
dscho Sep 15, 2024
3ab1bda
fixup! survey: add report of "largest" paths
dscho Sep 15, 2024
84c8a06
fixup! survey: summarize total sizes by object type
dscho Sep 15, 2024
16cd9a3
fixup! pack-objects: output debug info about deltas
dscho Sep 15, 2024
c8f1239
fixup! survey: start pretty printing data in table form
dscho Sep 15, 2024
b5c2265
fixup! survey: add object count summary
dscho Sep 15, 2024
fee8f88
fixup! survey: summarize total sizes by object type
dscho Sep 15, 2024
489ce0c
test-tool: add the `path-walk` subcommand
dscho Sep 17, 2024
9b78d40
fixup! test-tool: add the `path-walk` subcommand
dscho Sep 17, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 5 additions & 1 deletion Documentation/git-backfill.txt
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ git-backfill - Download missing objects in a partial clone
SYNOPSIS
--------
[verse]
'git backfill' [--batch-size=<n>]
'git backfill' [--batch-size=<n>] [--[no-]sparse]

DESCRIPTION
-----------
Expand Down Expand Up @@ -46,6 +46,10 @@ OPTIONS
from the server. This size may be exceeded by the last set of
blobs seen at a given path. Default batch size is 16,000.

--[no-]sparse::
Only download objects if they appear at a path that matches the
current sparse-checkout.

SEE ALSO
--------
linkgit:git-clone[1].
Expand Down
13 changes: 12 additions & 1 deletion builtin/backfill.c
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@
#include "parse-options.h"
#include "repository.h"
#include "commit.h"
#include "dir.h"
#include "hex.h"
#include "tree.h"
#include "tree-walk.h"
Expand All @@ -21,14 +22,15 @@
#include "path-walk.h"

static const char * const builtin_backfill_usage[] = {
N_("git backfill [--batch-size=<n>]"),
N_("git backfill [--batch-size=<n>] [--[no-]sparse]"),
NULL
};

struct backfill_context {
struct repository *repo;
struct oid_array current_batch;
size_t batch_size;
int sparse;
};

static void clear_backfill_context(struct backfill_context *ctx)
Expand Down Expand Up @@ -84,6 +86,12 @@ static int do_backfill(struct backfill_context *ctx)
struct path_walk_info info = PATH_WALK_INFO_INIT;
int ret;

if (ctx->sparse) {
CALLOC_ARRAY(info.pl, 1);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have to release the allocated memory somewhere?

if (get_sparse_checkout_patterns(info.pl))
return error(_("problem loading sparse-checkout"));
}

repo_init_revisions(ctx->repo, &revs, "");
handle_revision_arg("HEAD", &revs, 0, 0);

Expand All @@ -107,10 +115,13 @@ int cmd_backfill(int argc, const char **argv, const char *prefix)
.repo = the_repository,
.current_batch = OID_ARRAY_INIT,
.batch_size = 16000,
.sparse = 0,
};
struct option options[] = {
OPT_INTEGER(0, "batch-size", &ctx.batch_size,
N_("Minimun number of objects to request at a time")),
OPT_BOOL(0, "sparse", &ctx.sparse,
N_("Restrict the missing objects to the current sparse-checkout")),
OPT_END(),
};

Expand Down
18 changes: 18 additions & 0 deletions path-walk.c
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@
#include "hex.h"
#include "object.h"
#include "oid-array.h"
#include "repository.h"
#include "revision.h"
#include "string-list.h"
#include "strmap.h"
Expand Down Expand Up @@ -111,6 +112,23 @@ static int add_children(struct path_walk_context *ctx,
if (type == OBJ_TREE)
strbuf_addch(&path, '/');

if (ctx->info->pl) {
int dtype;
enum pattern_match_result match;
match = path_matches_pattern_list(path.buf, path.len,
dscho marked this conversation as resolved.
Show resolved Hide resolved
path.buf + base_len, &dtype,
ctx->info->pl,
ctx->repo->index);

if (ctx->info->pl->use_cone_patterns &&
match == NOT_MATCHED)
continue;
else if (!ctx->info->pl->use_cone_patterns &&
type == OBJ_BLOB &&
match != MATCHED)
continue;
}

if (!(list = strmap_get(&ctx->paths_to_lists, path.buf))) {
CALLOC_ARRAY(list, 1);
list->type = type;
Expand Down
11 changes: 11 additions & 0 deletions path-walk.h
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@

struct rev_info;
struct oid_array;
struct pattern_list;

/**
* The type of a function pointer for the method that is called on a list of
Expand All @@ -30,6 +31,16 @@ struct path_walk_info {
*/
path_fn path_fn;
void *path_fn_data;

/**
* Specify a sparse-checkout definition to match our paths to. Do not
* walk outside of this sparse definition. If the patterns are in
* cone mode, then the search may prune directories that are outside
* of the cone. If not in cone mode, then all tree paths will be
* explored but the path_fn will only be called when the path matches
* the sparse-checkout patterns.
*/
struct pattern_list *pl;
};

#define PATH_WALK_INFO_INIT { 0 }
Expand Down
55 changes: 55 additions & 0 deletions t/t5620-backfill.sh
Original file line number Diff line number Diff line change
Expand Up @@ -80,6 +80,61 @@ test_expect_success 'do partial clone 2, backfill batch size' '
test_line_count = 0 revs2
'

test_expect_success 'backfill --sparse' '
git clone --sparse --filter=blob:none \
--single-branch --branch=main \
"file://$(pwd)/srv.bare" backfill3 &&

# Initial checkout includes four files at root.
git -C backfill3 rev-list --quiet --objects --missing=print HEAD >missing &&
test_line_count = 44 missing &&

# Initial sparse-checkout is just the files at root, so we get the
# older versions of the four files at tip.
GIT_TRACE2_EVENT="$(pwd)/sparse-trace1" git \
-C backfill3 backfill --sparse &&
test_trace2_data promisor fetch_count 4 <sparse-trace1 &&
test_trace2_data path-walk paths 5 <sparse-trace1 &&
git -C backfill3 rev-list --quiet --objects --missing=print HEAD >missing &&
test_line_count = 40 missing &&

# Expand the sparse-checkout to include 'd' recursively. This
# engages the algorithm to skip the trees for 'a'. Note that
# the "sparse-checkout set" command downloads the objects at tip
# to satisfy the current checkout.
git -C backfill3 sparse-checkout set d &&
GIT_TRACE2_EVENT="$(pwd)/sparse-trace2" git \
-C backfill3 backfill --sparse &&
test_trace2_data promisor fetch_count 8 <sparse-trace2 &&
test_trace2_data path-walk paths 15 <sparse-trace2 &&
git -C backfill3 rev-list --quiet --objects --missing=print HEAD >missing &&
test_line_count = 24 missing
'

test_expect_success 'backfill --sparse without cone mode' '
git clone --no-checkout --filter=blob:none \
--single-branch --branch=main \
"file://$(pwd)/srv.bare" backfill4 &&

# No blobs yet
git -C backfill4 rev-list --quiet --objects --missing=print HEAD >missing &&
test_line_count = 48 missing &&

# Define sparse-checkout by filename regardless of parent directory.
# This downloads 6 blobs to satisfy the checkout.
git -C backfill4 sparse-checkout set --no-cone "**/file.1.txt" &&
git -C backfill4 checkout main &&

GIT_TRACE2_EVENT="$(pwd)/no-cone-trace1" git \
-C backfill4 backfill --sparse &&
test_trace2_data promisor fetch_count 6 <no-cone-trace1 &&

# This walk needed to visit all directories to search for these paths.
test_trace2_data path-walk paths 12 <no-cone-trace1 &&
git -C backfill4 rev-list --quiet --objects --missing=print HEAD >missing &&
test_line_count = 36 missing
'

. "$TEST_DIRECTORY"/lib-httpd.sh
start_httpd

Expand Down