-
Notifications
You must be signed in to change notification settings - Fork 2.1k
Refactor GCS force delete to separate object and anywhere cache deletions #16017
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Refactor GCS force delete to separate object and anywhere cache deletions #16017
Conversation
|
Hi there, I'm the Modular magician. I've detected the following information about your changes: Diff reportYour PR generated some diffs in downstreams - here they are.
|
Tests analyticsTotal tests: 128 Click here to see the affected service packages
🟢 All tests passed! View the build log |
c2thorn
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's messy to follow, but I think the logic looks good.
I think what you have now is a nice medium between failing early/late as possible, and don't think it's necessary to go more either way.
Maybe later we could extract out the anywhere cache deletions and object deletions from the delete function to have the logic be simplified a bit, but I think it's not needed here for this bug fix.
I know testing this scenario via acceptance or unit tests is probably overly painful, but were you able to manually verify any scenarios here?
|
Yep! Just walled on Anywhere Cache creation times a bit when I sent this, here's a gist: https://gist.github.com/rileykarson/de7d37fd6f3ba5e2db1c1e28b99a80fe In all cases, bucket has an object and my runner SA isn't permissioned to interact with anywhere caches:
|
|
Note, waiting on service review from @kautikdk to merge. It's a weird schedule due to the holidays so I'll wait a bit longer than the standard time and do on (or after) noon PST Dec 29 with no ack. |
|
Hi there, I'm the Modular magician. I've detected the following information about your changes: Diff reportYour PR generated some diffs in downstreams - here they are.
|
|
^ thought I'd amended that bit in, #16017 (comment) assumed it was present |
Tests analyticsTotal tests: 128 Click here to see the affected service packages
🟢 All tests passed! View the build log |
|
Hi @rileykarson, I am aligned with your proposed solution, which implements a best-effort cleanup—neither too harsh nor too lenient. I recall For now, I believe we can proceed with this solution as it restores the ability to delete objects even if Anywhere Cache permissions are missing, keeping the cleanups independent. FYI: @googlyrahman |
Fixes hashicorp/terraform-provider-google#25343
Sorry- this ended up a fairly substantial refactoring! The logic in this method has ended up pretty complex and this attempts to prune things a bit. Please read the code a few times over, it's chunky. I'm debating a bit if I should change more to make things clearer, but cut things off around the 60-70% mark to minimise (but not remove entirely) externalities like specific error or short circuit conditions.
The core issue was that Anywhere Caches require additional permissions not needed previously to interact with the GCS bucket or objects. When we first added the Anywhere Cache check we'd replicated a behaviour with objects where we'd listed the objects in the bucket all the time & failed if any were present but
force_destroywas false. This made all deletes requirestorage.anywhereCaches.list. After #13730 this was fixed so that non-force deletes would succeed, but a principal missing that permission would short-circuit out of the force-destroy code altogether.This left an issue when the principal did not have
storage.anywhereCaches.list, there were no caches on the bucket, but there were objects present, as we'd short-circuit before destroying any objects and fail the final delete call due to objects being present. After this change we'll process both kinds (objects, caches) separately and should succeed in that case.Old logic:
New logic:
This does mean we won't walk additional pages of anywhere caches- I'm not sure how needed that is, and it wouldn't happen unless there were multiple pages of objects before.
There are a few paths to take further refactoring- like failing as early as possible (check all objects, all retention policies, etc. and only proceed at all if we think we will succeed) or as late as possible (just iterate serially until we fail, no single page preflight checks for things). Declining to do them for now, but open to thoughts/suggestions.
Release Note Template for Downstream PRs (will be copied)
See Write release notes for guidance.