From f3b43cc7866c2a811368f9a98ad3a4579263e511 Mon Sep 17 00:00:00 2001 From: Christian Schilling Date: Tue, 18 Nov 2025 18:25:27 +0100 Subject: [PATCH 1/2] Extend commit message filter This adds support to read tree entries to the commit message filter --- deck_prague.md | 509 +++++++++++++++++++++++++++ docs/src/reference/filters.md | 24 +- josh-core/src/filter/mod.rs | 29 +- josh-core/src/filter/text.rs | 56 ++- josh-core/src/history.rs | 2 +- tests/experimental/link-submodules.t | 24 +- tests/experimental/link.t | 2 +- tests/filter/message.t | 78 +++- tests/filter/message_regex.t | 44 ++- 9 files changed, 723 insertions(+), 45 deletions(-) create mode 100644 deck_prague.md diff --git a/deck_prague.md b/deck_prague.md new file mode 100644 index 000000000..3fa663c93 --- /dev/null +++ b/deck_prague.md @@ -0,0 +1,509 @@ +Beyond Subtrees: An Algebra for Git +=================================== + +### 2025/11/28 Prague Rust Meetup +*Christian Schilling* + + + + +# **The two camps** + + + + +*Polyrepo* + +**vs** + +*Monorepo* + + + +# **No good options** + +### **Polyrepos are bad:** +- **Fixed** partioning of codebase +- Repo sprawl +- Delayed updates +- Impossible to validate cross-repo changes +- Dependency management +- Hard to enforce "One version rule" ๐Ÿฆฉ +- Who uses this? +- Lifetime of references +- Rename a repo? +- leftpad + + +- Requires heavy tooling investment + + + +### **Monorepos are bad:** +- Huge clones +- Slow builds / slow CI +- Risk of tight coupling +- Team boundaries unclear +- No visibility control (ACL) +- Breaks distributed model โ€ผ๏ธ + + +- Requires heavy tooling investment + + + + +# **There is a *gap* Between Both Models** + +We want: +- Monorepo **consistency** & **sustainability** +- Polyrepo **isolation** & **distribution** + + +> We need a bridge! + + +Attempts: +- git submodule +- git subtree +- Copybara +- Sparse checkout +- Snapshot vendoring +- Package managers +- ... + + +> **Thesis: All of those solve only **parts** of the problem** + + + +# **Whishlist vs Reality** + +> ๐Ÿ“š **History preservation** +> ๐Ÿซต **git blame should work** +> ๐Ÿ“ค **Upstream workflow** +> ๐Ÿ”’ **SHA stability / round-trip** +> ๐ŸŒ **Partial sharing / distributed repos** +> โšก๏ธ **Performance** +> ๐Ÿงฉ **Transforms** (excludes, overlays, ...) +> ๐ŸŽฏ **Focused CI scoping** +> ๐Ÿงฌ **Evolution over time** + + +> โŒ -> no | ๐Ÿ˜ซ -> painful | โœ… -> yes + + + +| | ๐Ÿ“š | ๐Ÿซต | ๐Ÿ“ค | ๐Ÿ”’ | ๐ŸŒ | โšก๏ธ | ๐Ÿงฉ | ๐ŸŽฏ | ๐Ÿงฌ +|----------------------|----|----|----|----|----|----|----|----|---- +| Sparse checkout | โœ… | โœ… | โœ… | โœ… | โŒ | โœ… | โŒ | โŒ | โœ… +| git submodule | โœ… | ๐Ÿ˜ซ | ๐Ÿ˜ซ | โœ… | ๐Ÿ˜ซ | โœ… | โŒ | ๐Ÿ˜ซ | ๐Ÿ’ฉ +| git subtree | โœ… | ๐Ÿ’” | ๐Ÿ˜ซ | โœ… | ๐Ÿ˜ซ | ๐Ÿ’€ | โŒ | ๐Ÿ˜ซ | ๐Ÿ˜ซ +| Snapshot vendoring | โŒ | โŒ | โœ… | โŒ | ๐Ÿ˜ซ | โœ… | ๐Ÿ˜ซ | ๐Ÿ˜ซ | ๐Ÿ˜ซ +| Copybara | โŒ | โŒ | ๐Ÿ˜ซ | โŒ | ๐Ÿ˜ซ | ๐ŸŒ | โœ… | ๐Ÿ˜ซ | ๐Ÿ˜ซ +| Package managers | โŒ | โŒ | ๐Ÿ˜ซ | โŒ | ๐Ÿ˜ซ | โœ… | ๐Ÿ˜ซ | ๐Ÿ˜ซ | ๐Ÿ˜ซ + + + + + +Josh (Projections) | โœ… | โœ… | โœ… | โœ… | โœจ | โœ… | โœ… | โœ… | โœจ + + + + + + +**Why?** + + +**How?** + + + +# **Why These Tools Fail** + +They operate in terms of: +- Files +- Directories +- Scripts +- Imperative **procedures** + + +But Josh operates operates in terms of: +- Graphs +- Functional **relationships** + + + +# **Example usage** + +``` +git clone https://github.com/josh-project/josh.git josh +``` + + +Same via proxy: +``` +git clone https://josh-project.dev/josh.git josh +``` + + +Subtree via proxy: +``` +git clone https://josh-project.dev/josh.git:/docs.git josh-docs +``` + + +Subtree via cli, without proxy: +``` +josh clone https://github.com/josh-project/josh.git :/docs josh-docs +``` + + + + + +``` +:/docs +``` +What is that? ๐Ÿค” + + + +# **An algebra for Git** + +Trees and commits are immutable objects. + +Filters: +``` +Tree โ†’ Tree +Commit โ†’ Commit +``` + + + +Properties: +- deterministic +- composable +- partially invertible + + + +# **The Primitive Operations** + +1. Subtree +2. Chain +3. Inversion +4. Prefix +5. Compose +6. Exclude + +Symbols: + * p = Path + * t = Tree + * A,B,.. = Filter + + + +# **Subtree** + +``` +:/p +``` + +Keeps `p/**`, drops everything else, strips prefix. + + + +# **Chain** + +``` +:A:B:C +``` + +Applies filter to the result of previous filter + + + + + Associative: +``` +:[:A:B]:C == :A:[:B:C] +``` + + +``` +:/a/b/c == :/a:/b:/c +``` + + + +# **Inversion** + +``` +:F:invert[:F]:F == F +``` + + + +Inverse of chain +``` +:invert[:A:B] == :invert[:B]:invert[:A] +``` + + + + +Not **all** filters *have* inverses! +...that's ok. + + + +# **Prefix** + +``` +:prefix=p +``` + +Adds one level of hierarchy to the tree + + + +inverse of `:prefix=p` -> `:/p` (subdir) +``` +:prefix=p:/p == :/ +:prefix=p:/p:prefix=p == :prefix=p +:/:prefix=p == :prefix=p +``` + + + +inverse of `:/p` -> `:prefix=p` +``` +:/p:prefix=p:/p == :/p +:/p:/ == :/p +``` + + + +Shorthand notation: `::p/` == `:/p:prefix=p` (selection) + +inverse of `::p/` -> `::p/` + + + +# **Compose** + +``` +:[:A, :B, :C, ...] +``` + +Like "union" of trees + + + +Associative +``` +:[:A, :B, :C] == :[:[:A, :B], :C] == :[:A, :[:B, :C]] +``` + + + +Distributive +``` +:[:X:A, :X:B] == :X:[:A, :B] +:[:A:X, :B:X] == :[:A, :B]:X +``` + + + +Not commutative +``` +:[:A, :B] != :[:B, :A] +``` + + +(except when `:invert[:[:A, :B]] == :[:A, :B]`) + + + +Inverse +``` +:invert[:[:A, :B]] == :[:invert[:A], :invert[:B]] +``` + + + + +# **Exclude** + +``` +:exclude[:F] +``` + +Keep all files except those in :F + + + +Inverse: +``` +:invert[:exclude[:F]] == :exclude[:F] +``` + + + +# Push via filter + +Given HEAD in the full main with tree tโฐ, the projected HEAD has the tree: +``` +tโ‚€ = :F(tโฐ) +``` + + +With modifications we get (the updated projection HEAD) +``` +tโ‚€ -> tโ‚ +``` + + + +How do we get tยน? + +``` +:tยน == invert[:F](tโ‚) +``` + +... no, missing files + +``` +:tยน == :[ + :invert[:F](tโ‚) + :/(tโฐ) +] +``` + +... no, does not handle deletions + + +``` +:tยน == :[ + :invert[:F](tโ‚) + :exclude[:F:invert[:F]](tโฐ) +] +``` + + +(it does get a bit more complicated with merges, but this is the basic idea) + + + + + +Filters are configuration *about* the __relationships__ between repos... + + +Where to put them? + + + +# **Stored Filters - Versioned Architecture** + +Stored Filters: +- Live inside the repo as `*.josh` files +- Versioned +- Evolve with code +- Define per-commit projections +- Remove external configuration + + + + + +Common use case: +- Each target gets it's own repo, including it's dependencies +- Perfect sandbox: Only declared parts of the repo are available +- SHA changes only if real dependencies change +- Exclude docs from the SHA +- Include *only* docs in the SHA +- ... + +Save filter as `path/to/filter.josh` and then: + +``` +:+path/to/filter +``` + + +Yes, stored filters can reference other stored filters + + + +# **GraphQL API โ€” Query Projections Without Cloning** + +Query: +- Projected trees +- Commit graphs +- History under projections +- Hashes โœจ + + + +-> CI can make accurate decisions on what to build *before* cloning any repo โšก๏ธ + + + +# **Future Directions** + + + +- Projection aware merge bot +- Projection aware (stacked!) code review +- Scriptable filters +- Path-level ACLs +- Support (much!) larger repos +- Use git+josh as database for issues, wikis, ... + + + + + +# **Summary** +1. Monorepo & Polyrepo are both painful +2. The gap can be bridged +3. Relationships over procedures +4. Once projections are "given" a lot of possibilities open up + + + +# **Q&A** + + +Questions? + + + +```bash +exec +FILTER="::josh-cli/" +git archive $(josh-filter $FILTER) | tar -t | tree --fromfile +``` + + +```bash +exec +FILTER=":/josh-cli" +git archive $(josh-filter $FILTER) | tar -t | tree --fromfile +``` + + +```bash +exec +FILTER="::*.toml" +git archive $(josh-filter $FILTER) | tar -t | tree --fromfile +``` + + +```bash +exec +FILTER=":[::josh-core/,::josh-cli/]:exclude[::**/*.rs]" +git archive $(josh-filter $FILTER) | tar -t | tree --fromfile +``` + + diff --git a/docs/src/reference/filters.md b/docs/src/reference/filters.md index 3b0a3a786..c7ad6ff85 100644 --- a/docs/src/reference/filters.md +++ b/docs/src/reference/filters.md @@ -183,7 +183,8 @@ parents. ### Commit message rewriting **`:"template"`** or **`:"template";"regex"`** Rewrite commit messages using a template string. The template can use regex capture groups -to extract and reformat parts of the original commit message. +to extract and reformat parts of the original commit message, as well as special template variables +for commit metadata. **Simple message replacement:** ``` @@ -200,6 +201,27 @@ which are then used in the template. The regex `(?s)^(?Pfix|feat|docs): (? commit messages starting with "fix:", "feat:", or "docs:" followed by a message, and the template reformats them as `[type] message`. +**Using template variables:** +The template supports special variables that provide access to commit metadata: +- `{#}` - The tree object ID (SHA-1 hash) of the commit +- `{@}` - The commit object ID (SHA-1 hash) +- `{/path}` - The content of the file at the specified path in the commit tree +- `{#path}` - The object ID (SHA-1 hash) of the tree entry at the specified path + +Regex capture groups take priority over template variables. If a regex capture group has the same name as a template variable, the capture group value will be used. + +Example: +``` +:"Message: {#} {@}" +``` +This replaces commit messages with "Message: " followed by the tree ID and commit ID. + +**Combining regex capture groups and template variables:** +``` +:"[{type}] {message} (commit: {@})";"(?s)^(?POriginal) (?P.+)$" +``` +This combines regex capture groups (`{type}` and `{message}`) with template variables (`{@}` for the commit ID). + **Removing text from messages:** ``` :"";"TODO" diff --git a/josh-core/src/filter/mod.rs b/josh-core/src/filter/mod.rs index 4d0f98c19..1482d32fb 100644 --- a/josh-core/src/filter/mod.rs +++ b/josh-core/src/filter/mod.rs @@ -1054,7 +1054,7 @@ fn apply_to_commit2( for (root, _link_file) in v { let embeding = some_or!( apply_to_commit2( - &Op::Chain(message("{commit}"), file(root.join(".josh-link.toml"))), + &Op::Chain(message("{@}"), file(root.join(".josh-link.toml"))), &commit, transaction )?, @@ -1377,9 +1377,6 @@ fn apply2<'a>( let tree_id = x.tree().id().to_string(); let commit = x.commit; let commit_id = commit.to_string(); - let mut hm = std::collections::HashMap::::new(); - hm.insert("tree".to_string(), tree_id); - hm.insert("commit".to_string(), commit_id); let message = if let Some(ref m) = x.message { m.to_string() @@ -1391,7 +1388,29 @@ fn apply2<'a>( } }; - Ok(x.with_message(text::transform_with_template(&r, &m, &message, &hm)?)) + let tree = x.tree().clone(); + Ok(x.with_message(text::transform_with_template( + &r, + &m, + &message, + |key: &str| -> Option { + match key { + "#" => Some(tree_id.clone()), + "@" => Some(commit_id.clone()), + key if key.starts_with("/") => { + Some(tree::get_blob(repo, &tree, std::path::Path::new(&key[1..]))) + } + + key if key.starts_with("#") => Some( + tree.get_path(std::path::Path::new(&key[1..])) + .map(|e| e.id()) + .unwrap_or(git2::Oid::zero()) + .to_string(), + ), + _ => None, + } + }, + )?)) } Op::HistoryConcat(..) => Ok(x), Op::Linear => Ok(x), diff --git a/josh-core/src/filter/text.rs b/josh-core/src/filter/text.rs index 14d8b20e4..3703cca3f 100644 --- a/josh-core/src/filter/text.rs +++ b/josh-core/src/filter/text.rs @@ -2,43 +2,61 @@ use crate::JoshResult; use regex::Regex; use std::cell::RefCell; use std::collections::HashMap; +use std::fmt::Write; -pub fn transform_with_template( +pub fn transform_with_template( re: &Regex, template: &str, input: &str, - globals: &HashMap, -) -> JoshResult { + globals: F, +) -> JoshResult +where + F: Fn(&str) -> Option, +{ let first_error: RefCell> = RefCell::new(None); let result = re .replace_all(input, |caps: ®ex::Captures| { - // Build a HashMap with all named captures and globals - // We need to store the string values to keep them alive for the HashMap references - let mut string_storage: HashMap = HashMap::new(); - // Collect all named capture values + let mut string_storage: HashMap = HashMap::new(); for name in re.capture_names().flatten() { if let Some(m) = caps.name(name) { string_storage.insert(name.to_string(), m.as_str().to_string()); } } - // Build the HashMap for strfmt with references to the stored strings - let mut vars: HashMap = HashMap::new(); + // Use strfmt_map which calls our function for each key it needs + match strfmt::strfmt_map( + template, + |mut fmt: strfmt::Formatter| -> Result<(), strfmt::FmtError> { + let key = fmt.key; - // Add all globals first (lower priority) - for (key, value) in globals { - vars.insert(key.clone(), value as &dyn strfmt::DisplayStr); - } + // First check named captures (higher priority) + if let Some(value) = string_storage.get(key) { + write!(fmt, "{}", value).map_err(|_| { + strfmt::FmtError::Invalid(format!( + "failed to write value for key: {}", + key + )) + })?; + return Ok(()); + } - // Add all named captures (higher priority - will overwrite globals if there's a conflict) - for (key, value) in &string_storage { - vars.insert(key.clone(), value as &dyn strfmt::DisplayStr); - } + // Then call globals function (lower priority) + if let Some(global_value) = globals(key) { + write!(fmt, "{}", global_value).map_err(|_| { + strfmt::FmtError::Invalid(format!( + "failed to write global value for key: {}", + key + )) + })?; + return Ok(()); + } - // Format the template, propagating errors - match strfmt::strfmt(template, &vars) { + // Key not found - skip it (strfmt will leave the placeholder) + fmt.skip() + }, + ) { Ok(s) => s, Err(e) => { let mut error = first_error.borrow_mut(); diff --git a/josh-core/src/history.rs b/josh-core/src/history.rs index db40b2232..f5c6ca906 100644 --- a/josh-core/src/history.rs +++ b/josh-core/src/history.rs @@ -706,7 +706,7 @@ pub fn unapply_filter( ®ex::Regex::new("(?m)^Change: [^ ]+")?, &"", module_commit.message_raw().unwrap(), - &std::collections::HashMap::new(), + |_key: &str| -> Option { None }, )?; apply = apply.with_message(new_message); } diff --git a/tests/experimental/link-submodules.t b/tests/experimental/link-submodules.t index 266f16d51..ade2bcdc1 100644 --- a/tests/experimental/link-submodules.t +++ b/tests/experimental/link-submodules.t @@ -64,7 +64,7 @@ Test Adapt filter - should expand submodule into actual tree content [1] :embed=libs [2] ::libs/.josh-link.toml [2] :unapply(06d10a853b133ffc533e8ec3f2ed4ec43b64670c:/libs) - [3] :"{commit}" + [3] :"{@}" [3] :adapt=submodules [3] :link=embedded [10] sequence_number @@ -145,7 +145,7 @@ Test Adapt with multiple submodules [1] :embed=libs [2] ::libs/.josh-link.toml [2] :unapply(06d10a853b133ffc533e8ec3f2ed4ec43b64670c:/libs) - [3] :"{commit}" + [3] :"{@}" [3] :link=embedded [4] :adapt=submodules [11] sequence_number @@ -165,7 +165,7 @@ Test Adapt with multiple submodules [2] ::libs/.josh-link.toml [2] ::modules/another/.josh-link.toml [2] :unapply(06d10a853b133ffc533e8ec3f2ed4ec43b64670c:/libs) - [4] :"{commit}" + [4] :"{@}" [4] :adapt=submodules [4] :link=embedded [15] sequence_number @@ -247,7 +247,7 @@ Test Adapt with submodule changes - add commits to submodule and update [2] :unapply(06d10a853b133ffc533e8ec3f2ed4ec43b64670c:/libs) [3] ::libs/.josh-link.toml [4] :unapply(f4bfdb82ca5e0f06f941f68be2a0fd19573bc415:/libs) - [5] :"{commit}" + [5] :"{@}" [5] :adapt=submodules [5] :link=embedded [21] sequence_number @@ -309,7 +309,7 @@ Test Adapt with submodule changes - add commits to submodule and update [2] :unapply(06d10a853b133ffc533e8ec3f2ed4ec43b64670c:/libs) [3] ::libs/.josh-link.toml [4] :unapply(f4bfdb82ca5e0f06f941f68be2a0fd19573bc415:/libs) - [5] :"{commit}" + [5] :"{@}" [5] :adapt=submodules [5] :link=embedded [9] :/libs @@ -335,7 +335,7 @@ Test Adapt with submodule changes - add commits to submodule and update [2] :unapply(06d10a853b133ffc533e8ec3f2ed4ec43b64670c:/libs) [3] ::libs/.josh-link.toml [4] :unapply(f4bfdb82ca5e0f06f941f68be2a0fd19573bc415:/libs) - [5] :"{commit}" + [5] :"{@}" [5] :adapt=submodules [5] :link=embedded [7] :prune=trivial-merge @@ -360,7 +360,7 @@ Test Adapt with submodule changes - add commits to submodule and update [2] :unapply(06d10a853b133ffc533e8ec3f2ed4ec43b64670c:/libs) [3] ::libs/.josh-link.toml [4] :unapply(f4bfdb82ca5e0f06f941f68be2a0fd19573bc415:/libs) - [5] :"{commit}" + [5] :"{@}" [5] :adapt=submodules [5] :link=embedded [7] :export @@ -380,7 +380,7 @@ Test Adapt with submodule changes - add commits to submodule and update [2] :unapply(06d10a853b133ffc533e8ec3f2ed4ec43b64670c:/libs) [3] ::libs/.josh-link.toml [4] :unapply(f4bfdb82ca5e0f06f941f68be2a0fd19573bc415:/libs) - [5] :"{commit}" + [5] :"{@}" [5] :adapt=submodules [5] :link=embedded [7] :export @@ -401,7 +401,7 @@ Test Adapt with submodule changes - add commits to submodule and update [3] ::libs/.josh-link.toml [4] :/another [4] :unapply(f4bfdb82ca5e0f06f941f68be2a0fd19573bc415:/libs) - [5] :"{commit}" + [5] :"{@}" [5] :adapt=submodules [5] :link=embedded [7] :/modules @@ -425,7 +425,7 @@ Test Adapt with submodule changes - add commits to submodule and update [3] ::libs/.josh-link.toml [4] :/another [4] :unapply(f4bfdb82ca5e0f06f941f68be2a0fd19573bc415:/libs) - [5] :"{commit}" + [5] :"{@}" [5] :adapt=submodules [5] :link=embedded [7] :/modules @@ -465,7 +465,7 @@ Test Adapt with submodule changes - add commits to submodule and update [3] ::libs/.josh-link.toml [4] :/another [4] :unapply(f4bfdb82ca5e0f06f941f68be2a0fd19573bc415:/libs) - [5] :"{commit}" + [5] :"{@}" [5] :adapt=submodules [5] :link=embedded [7] :/modules @@ -503,7 +503,7 @@ Test Adapt with submodule changes - add commits to submodule and update [4] :/another [4] :prefix=libs [4] :unapply(f4bfdb82ca5e0f06f941f68be2a0fd19573bc415:/libs) - [5] :"{commit}" + [5] :"{@}" [5] :adapt=submodules [5] :link=embedded [7] :/modules diff --git a/tests/experimental/link.t b/tests/experimental/link.t index a991a0d63..76bb5c032 100644 --- a/tests/experimental/link.t +++ b/tests/experimental/link.t @@ -33,7 +33,7 @@ Test Link filter (identical to Adapt) $ josh-filter -s :adapt=submodules:link master --update refs/josh/filter/master [1] :embed=libs [1] :unapply(a1520c70819abcbe295fe431e4b88cf56f5a0c95:/libs) - [2] :"{commit}" + [2] :"{@}" [2] ::libs/.josh-link.toml [2] :adapt=submodules [2] :link=embedded diff --git a/tests/filter/message.t b/tests/filter/message.t index 8eb451fc5..f2e9e571d 100644 --- a/tests/filter/message.t +++ b/tests/filter/message.t @@ -34,14 +34,82 @@ Test that message rewriting works $ echo contents1 > file1 $ git add file1 - $ git commit -m "commit with {tree} and {commit}" 1> /dev/null + $ git commit -m "commit with {#} and {@}" 1> /dev/null Test that message rewriting with template variables works - $ josh-filter ':"Message: {tree} {commit}"' --update refs/josh/filter/master master - 025b01893026c240e56c95e6e8f1659aa417581e + $ josh-filter ':"Message: {#} {@}"' --update refs/josh/filter/master master + 1d858b36701f0d673e34f0f601a048b9c9c8d114 $ git log --pretty=%s josh/filter/master - Message: 3d77ff51363c9825cc2a221fc0ba5a883a1a2c72 8e125b48e2286c74bf9be1bbb8d3034a7370eebc + Message: 3d77ff51363c9825cc2a221fc0ba5a883a1a2c72 2c0be119f4925350c097c9e206dfa6353158bba3 $ git cat-file commit josh/filter/master | grep -A 1 "^$" - Message: 3d77ff51363c9825cc2a221fc0ba5a883a1a2c72 8e125b48e2286c74bf9be1bbb8d3034a7370eebc + Message: 3d77ff51363c9825cc2a221fc0ba5a883a1a2c72 2c0be119f4925350c097c9e206dfa6353158bba3 + + $ cd ${TESTTMP} + $ git init -q testrepo3 1> /dev/null + $ cd testrepo3 + + $ echo "file content" > file1 + $ mkdir -p subdir + $ echo "nested content" > subdir/file2 + $ git add file1 subdir/file2 + $ git commit -m "initial commit" 1> /dev/null + +Test that message rewriting with file content template variable works + $ josh-filter ':"File content: {/file1}"' --update refs/josh/filter/master master + cd7b44dc763fe78dc0b759398e689e54aa131eb5 + $ git log --pretty=%s josh/filter/master + File content: file content + $ git cat-file commit josh/filter/master | grep -A 1 "^$" + + File content: file content + +Test that message rewriting with nested file path works + $ josh-filter ':"Nested: {/subdir/file2}"' --update refs/josh/filter/master master + 23f3df907d06d6269adfc749e57b0c2974d66181 + $ git log --pretty=%s josh/filter/master + Nested: nested content + $ git cat-file commit josh/filter/master | grep -A 1 "^$" + + Nested: nested content + +Test that message rewriting with tree entry OID works + $ josh-filter ':"File OID: {#file1}"' --update refs/josh/filter/master master + f90332f7fe886418042703808cca42bf1e33af7c + $ git log --pretty=%s josh/filter/master | head -1 + File OID: * (glob) + $ git cat-file commit josh/filter/master | grep -A 1 "^$" | head -1 + + +Test that message rewriting with nested tree entry OID works + $ josh-filter ':"Nested OID: {#subdir/file2}"' --update refs/josh/filter/master master + 7c6a0f3f4866f824e3d88a7d3277f85d2c1c62f5 + $ git log --pretty=%s josh/filter/master | head -1 + Nested OID: * (glob) + $ git cat-file commit josh/filter/master | grep -A 1 "^$" | head -1 + + +Test that non-existent file path returns empty content + $ josh-filter ':"Missing: [{/nonexistent}]"' --update refs/josh/filter/master master + 8bf5b583555dd6c4765f3c34515de7e6c79813ac + $ git log --pretty=%s josh/filter/master | head -1 + Missing: [] + $ git cat-file commit josh/filter/master | grep -A 1 "^$" | head -1 + + +Test that non-existent tree entry returns zero OID + $ josh-filter ':"Missing OID: {#nonexistent}"' --update refs/josh/filter/master master + f63a6621696edc2b9ccec9a2ccd042af6276b081 + $ git log --pretty=%s josh/filter/master | head -1 + Missing OID: 0000000000000000000000000000000000000000 + $ git cat-file commit josh/filter/master | grep -A 1 "^$" | head -1 + + +Test combining multiple template variables + $ josh-filter ':"Tree: {#}, Commit: {@}, File: {/file1}, OID: {#file1}"' --update refs/josh/filter/master master + 5be71b6c02eb9a6aa6c1d4cd1fb2b682d732a940 + $ git log --pretty=%s josh/filter/master | head -1 + Tree: * (glob) + $ git cat-file commit josh/filter/master | grep -A 1 "^$" | head -1 + diff --git a/tests/filter/message_regex.t b/tests/filter/message_regex.t index 268362ce3..d8241585f 100644 --- a/tests/filter/message_regex.t +++ b/tests/filter/message_regex.t @@ -37,7 +37,7 @@ Test that message rewriting with regex works $ git commit -m "Original commit message" 1> /dev/null Test that message rewriting with regex and template variables works - $ josh-filter ':"[{type}] {message} (commit: {commit})";"(?s)^(?POriginal) (?P.+)$"' --update refs/josh/filter/master master + $ josh-filter ':"[{type}] {message} (commit: {@})";"(?s)^(?POriginal) (?P.+)$"' --update refs/josh/filter/master master 7f14701ff3a86f0e511cfd76d41715cac7dc7999 $ git log --pretty=%s josh/filter/master [Original] commit message (commit: 16421eebc58313502a347bc92349cc2f52d58fbd) @@ -73,3 +73,45 @@ Test that message rewriting can remove multiple occurrences from a message with Body line 3 with TODO + $ cd ${TESTTMP} + $ git init -q testrepo4 1> /dev/null + $ cd testrepo4 + + $ echo "test content" > file1 + $ git add file1 + $ git commit -m "test message with commit abc123" 1> /dev/null + +Test that regex capture groups work alongside template variables + $ josh-filter ':"Capture: {commit}, Template: {@}";"(?s)^test message with commit (?Pabc123)$"' --update refs/josh/filter/master master + a64fd2e1f9bcd62a3fbbe90769828fe9e10f32b7 + $ git log --pretty=%s josh/filter/master + test message with commit abc123 + $ git cat-file commit josh/filter/master | grep -A 1 "^$" + + test message with commit abc123 + + $ cd ${TESTTMP} + $ git init -q testrepo5 1> /dev/null + $ cd testrepo5 + + $ echo "file data" > data.txt + $ git add data.txt + $ git commit -m "Data: important" 1> /dev/null + +Test combining regex capture groups with file content template variable + $ josh-filter ':"Type: {type}, File: {/data.txt}";"(?s)^(?PData): (?P.+)$"' --update refs/josh/filter/master master + af9a05b27d9377ada889a8a51c39e80c272d217c + $ git log --pretty=%s josh/filter/master + Type: Data, File: file data + $ git cat-file commit josh/filter/master | grep -A 1 "^$" + + Type: Data, File: file data + +Test combining regex capture groups with tree entry OID template variable + $ josh-filter ':"Type: {type}, OID: {#data.txt}";"(?s)^(?PData): (?P.+)$"' --update refs/josh/filter/master master + a3c8a56fefd02521226116906719a44d826abea7 + $ git log --pretty=%s josh/filter/master | head -1 + Type: Data, OID: * (glob) + $ git cat-file commit josh/filter/master | grep -A 1 "^$" | head -1 + + From 591687ec6e6d40991f7a92cc4a71281c72d99ad7 Mon Sep 17 00:00:00 2001 From: Christian Schilling Date: Sun, 28 Sep 2025 15:57:52 +0200 Subject: [PATCH 2/2] Implement :lookup filter --- josh-core/src/filter/mod.rs | 75 ++++++++++++++++++++ josh-core/src/filter/op.rs | 4 ++ josh-core/src/filter/parse.rs | 2 + josh-core/src/filter/persist.rs | 36 ++++++++++ tests/experimental/lookup.t | 117 ++++++++++++++++++++++++++++++++ 5 files changed, 234 insertions(+) create mode 100644 tests/experimental/lookup.t diff --git a/josh-core/src/filter/mod.rs b/josh-core/src/filter/mod.rs index 1482d32fb..dc4e2467c 100644 --- a/josh-core/src/filter/mod.rs +++ b/josh-core/src/filter/mod.rs @@ -496,6 +496,14 @@ fn spec2(op: &Op) -> String { Op::Workspace(path) => { format!(":workspace={}", parse::quote_if(&path.to_string_lossy())) } + #[cfg(feature = "incubating")] + Op::Lookup(path) => { + format!(":lookup={}", parse::quote_if(&path.to_string_lossy())) + } + #[cfg(feature = "incubating")] + Op::Lookup2(oid) => { + format!(":lookup2={}", oid.to_string()) + } Op::Stored(path) => { format!(":+{}", parse::quote_if(&path.to_string_lossy())) } @@ -823,6 +831,71 @@ fn apply_to_commit2( apply(transaction, nf, Rewrite::from_commit(commit)?)? } + #[cfg(feature = "incubating")] + Op::Lookup(lookup_path) => { + let lookup_commit = if let Some(lookup_commit) = + apply_to_commit2(&Op::Subdir(lookup_path.clone()), &commit, transaction)? + { + lookup_commit + } else { + return Ok(None); + }; + + let op = Op::Lookup2(lookup_commit); + + if let Some(start) = transaction.get(to_filter(op), commit.id()) { + transaction.insert(filter, commit.id(), start, true); + return Ok(Some(start)); + } else { + return Ok(None); + } + } + + #[cfg(feature = "incubating")] + Op::Lookup2(lookup_commit_id) => { + let lookup_commit = repo.find_commit(*lookup_commit_id)?; + for parent in lookup_commit.parents() { + let lookup_tree = lookup_commit.tree_id(); + let cw = get_filter( + transaction, + &repo.find_tree(lookup_tree)?, + &std::path::PathBuf::new().join(commit.id().to_string()), + ); + if cw != filter::empty() { + if let Some(start) = + apply_to_commit2(&Op::Lookup2(parent.id()), &commit, transaction)? + { + transaction.insert(filter, commit.id(), start, true); + return Ok(Some(start)); + } else { + return Ok(None); + } + } + break; + } + let lookup_tree = lookup_commit.tree_id(); + let cw = get_filter( + transaction, + &repo.find_tree(lookup_tree)?, + &std::path::PathBuf::new().join(commit.id().to_string()), + ); + + if cw == filter::empty() { + // FIXME empty filter or no entry in table? + for parent in commit.parents() { + if let Some(start) = apply_to_commit2(&op, &parent, transaction)? { + transaction.insert(filter, commit.id(), start, true); + return Ok(Some(start)); + } else { + return Ok(None); + } + } + return Ok(None); + } + + Apply::from_commit(commit)? + .with_tree(apply(transaction, cw, Apply::from_commit(commit)?)?.into_tree()) + } Op::Squash(Some(ids)) => { if let Some(sq) = ids.get(&LazyRef::Resolved(commit.id())) { let oid = if let Some(oid) = @@ -1637,6 +1710,8 @@ fn apply2<'a>( } } Op::Pin(_) => Ok(x), + #[cfg(feature = "incubating")] + Op::Lookup(_) | Op::Lookup2(_) => Err(josh_error("not applicable to tree")), } } diff --git a/josh-core/src/filter/op.rs b/josh-core/src/filter/op.rs index e995dc967..7d0ebf389 100644 --- a/josh-core/src/filter/op.rs +++ b/josh-core/src/filter/op.rs @@ -69,6 +69,10 @@ pub enum Op { Prefix(std::path::PathBuf), Subdir(std::path::PathBuf), Workspace(std::path::PathBuf), + #[cfg(feature = "incubating")] + Lookup(std::path::PathBuf), + #[cfg(feature = "incubating")] + Lookup2(git2::Oid), Stored(std::path::PathBuf), Pattern(String), diff --git a/josh-core/src/filter/parse.rs b/josh-core/src/filter/parse.rs index 5908649c2..07c8959b4 100644 --- a/josh-core/src/filter/parse.rs +++ b/josh-core/src/filter/parse.rs @@ -10,6 +10,8 @@ fn make_op(args: &[&str]) -> JoshResult { ["author", author, email] => Ok(Op::Author(author.to_string(), email.to_string())), ["committer", author, email] => Ok(Op::Committer(author.to_string(), email.to_string())), ["workspace", arg] => Ok(Op::Workspace(Path::new(arg).to_owned())), + #[cfg(feature = "incubating")] + ["lookup", arg] => Ok(Op::Lookup(Path::new(arg).to_owned())), ["prefix"] => Err(josh_error(indoc!( r#" Filter ":prefix" requires an argument. diff --git a/josh-core/src/filter/persist.rs b/josh-core/src/filter/persist.rs index 749fee2c4..545d3fe45 100644 --- a/josh-core/src/filter/persist.rs +++ b/josh-core/src/filter/persist.rs @@ -336,6 +336,16 @@ impl InMemoryBuilder { let params_tree = self.build_str_params(&[hook.as_ref()]); push_tree_entries(&mut entries, [("hook", params_tree)]); } + #[cfg(feature = "incubating")] + Op::Lookup(path) => { + let params_tree = self.build_str_params(&[path.to_string_lossy().as_ref()]); + push_tree_entries(&mut entries, [("lookup", params_tree)]); + } + #[cfg(feature = "incubating")] + Op::Lookup2(oid) => { + let params_tree = self.build_str_params(&[oid.to_string().as_ref()]); + push_tree_entries(&mut entries, [("lookup2", params_tree)]); + } } let tree = gix_object::Tree { entries }; @@ -640,6 +650,32 @@ fn from_tree2(repo: &git2::Repository, tree_oid: git2::Oid) -> JoshResult { let path = std::str::from_utf8(path_blob.content())?; Ok(Op::Stored(std::path::PathBuf::from(path))) } + #[cfg(feature = "incubating")] + "lookup" => { + let inner = repo.find_tree(entry.id())?; + let path_blob = repo.find_blob( + inner + .get_name("0") + .ok_or_else(|| josh_error("lookup: missing path"))? + .id(), + )?; + let path = std::str::from_utf8(path_blob.content())?; + Ok(Op::Lookup(std::path::PathBuf::from(path))) + } + #[cfg(feature = "incubating")] + "lookup2" => { + let inner = repo.find_tree(entry.id())?; + let oid_blob = repo.find_blob( + inner + .get_name("0") + .ok_or_else(|| josh_error("lookup2: missing oid"))? + .id(), + )?; + let oid_str = std::str::from_utf8(oid_blob.content())?; + let oid = git2::Oid::from_str(oid_str) + .map_err(|e| josh_error(&format!("lookup2: invalid oid: {}", e)))?; + Ok(Op::Lookup2(oid)) + } "compose" => { let compose_tree = repo.find_tree(entry.id())?; let mut filters = Vec::new(); diff --git a/tests/experimental/lookup.t b/tests/experimental/lookup.t new file mode 100644 index 000000000..efb64f4b6 --- /dev/null +++ b/tests/experimental/lookup.t @@ -0,0 +1,117 @@ + $ export TERM=dumb + $ export RUST_LOG_STYLE=never + + $ git init -q real_repo 1> /dev/null + $ cd real_repo + + $ mkdir sub1 + $ echo contents1 > sub1/file1 + $ git add sub1 + $ git commit -m "add file1" 1> /dev/null + + $ mkdir sub1 + mkdir: cannot create directory 'sub1': File exists + [1] + $ echo contents2 > sub1/file2 + $ git add sub1 + $ git commit -m "add file2" 1> /dev/null + + $ git log --graph --pretty=%H + * 81b10fb4984d20142cd275b89c91c346e536876a + * bb282e9cdc1b972fffd08fd21eead43bc0c83cb8 + + $ mkdir table + $ echo ":prefix=x" > table/81b10fb4984d20142cd275b89c91c346e536876a + $ echo ":prefix=y" > table/bb282e9cdc1b972fffd08fd21eead43bc0c83cb8 + $ git add table + $ git commit -m "add lookup table" 1> /dev/null + + + $ echo contents3 > sub1/file3 + $ git add sub1 + $ git commit -m "add file3" 1> /dev/null + + $ git log --graph --pretty=%H + * 26e4c43675b985689e280bc42264a9226af76943 + * 14c74c5eca73952b36d736034b388832748c49d6 + * 81b10fb4984d20142cd275b89c91c346e536876a + * bb282e9cdc1b972fffd08fd21eead43bc0c83cb8 + + $ josh-filter -s ":lookup=table" --update refs/heads/filtered + [1] :lookup=table + [2] :/table + [4] :lookup2=4880528e9d57aa5efc925e120a8077bfa37d778d + + $ git log refs/heads/filtered --graph --pretty=%s + * add file2 + * add file1 + $ git diff ${EMPTY_TREE}..refs/heads/filtered + diff --git a/x/sub1/file1 b/x/sub1/file1 + new file mode 100644 + index 0000000..a024003 + --- /dev/null + +++ b/x/sub1/file1 + @@ -0,0 +1 @@ + +contents1 + diff --git a/x/sub1/file2 b/x/sub1/file2 + new file mode 100644 + index 0000000..6b46faa + --- /dev/null + +++ b/x/sub1/file2 + @@ -0,0 +1 @@ + +contents2 + $ git diff ${EMPTY_TREE}..refs/heads/filtered~1 + diff --git a/y/sub1/file1 b/y/sub1/file1 + new file mode 100644 + index 0000000..a024003 + --- /dev/null + +++ b/y/sub1/file1 + @@ -0,0 +1 @@ + +contents1 + + $ echo ":prefix=z" > table/14c74c5eca73952b36d736034b388832748c49d6 + $ echo ":prefix=z" > table/26e4c43675b985689e280bc42264a9226af76943 + $ git add table + $ git commit -m "mod lookup table" 1> /dev/null + $ tree table + table + |-- 14c74c5eca73952b36d736034b388832748c49d6 + |-- 26e4c43675b985689e280bc42264a9226af76943 + |-- 81b10fb4984d20142cd275b89c91c346e536876a + `-- bb282e9cdc1b972fffd08fd21eead43bc0c83cb8 + + 1 directory, 4 files + + $ josh-filter -s ":lookup=table" --update refs/heads/filtered + Warning: reference refs/heads/filtered wasn't updated + [2] :lookup=table + [3] :/table + [4] :lookup2=4880528e9d57aa5efc925e120a8077bfa37d778d + [5] :lookup2=ed934c124e28c83270d9cfbb011f3ceb46c0f69e + $ git log refs/heads/filtered --graph --pretty=%s + * add file2 + * add file1 + + $ git diff ${EMPTY_TREE}..refs/heads/filtered + diff --git a/x/sub1/file1 b/x/sub1/file1 + new file mode 100644 + index 0000000..a024003 + --- /dev/null + +++ b/x/sub1/file1 + @@ -0,0 +1 @@ + +contents1 + diff --git a/x/sub1/file2 b/x/sub1/file2 + new file mode 100644 + index 0000000..6b46faa + --- /dev/null + +++ b/x/sub1/file2 + @@ -0,0 +1 @@ + +contents2 + $ git diff ${EMPTY_TREE}..refs/heads/filtered~1 + diff --git a/y/sub1/file1 b/y/sub1/file1 + new file mode 100644 + index 0000000..a024003 + --- /dev/null + +++ b/y/sub1/file1 + @@ -0,0 +1 @@ + +contents1