Skip to content

Conversation

@rhatdan
Copy link
Member

@rhatdan rhatdan commented Nov 21, 2025

This fixes a race condition where concurrent 'podman artifact add'
commands for different artifacts would result in only one artifact
being created, without any error messages.

The root cause was in the artifact store's Add() method. When the lock
was released during blob copying (for performance), concurrent additions
could overwrite each other's index entries:

  • Process A: Lock → Read index → Unlock → Copy blobs → Lock → Commit
  • Process B: Lock → Read index (missing A) → Unlock → Copy blobs → Lock → Commit
  • Result: Process B's commit overwrites Process A's entry

The fix adds conflict detection after reacquiring the lock. Before
committing, we reload the artifact index and verify the artifact name
hasn't been added by another process. If a conflict is detected, we
return an error rather than silently overwriting the concurrent change.

This preserves the performance benefits of concurrent blob copying
while preventing the race condition that caused artifacts to be lost.

Fixes: containers/podman#27569

Generated-with: Cursor AI

@github-actions github-actions bot added the common Related to "common" package label Nov 21, 2025
podmanbot pushed a commit to podmanbot/buildah that referenced this pull request Nov 21, 2025
@podmanbot
Copy link

✅ A new PR has been created in buildah to vendor these changes: containers/buildah#6526

@packit-as-a-service
Copy link

Packit jobs failed. @containers/packit-build please check.

podmanbot pushed a commit to podmanbot/buildah that referenced this pull request Nov 21, 2025
podmanbot pushed a commit to podmanbot/buildah that referenced this pull request Nov 21, 2025
podmanbot pushed a commit to podmanbot/buildah that referenced this pull request Nov 21, 2025
podmanbot pushed a commit to podmanbot/buildah that referenced this pull request Nov 21, 2025
podmanbot pushed a commit to podmanbot/buildah that referenced this pull request Nov 21, 2025
@github-actions github-actions bot added storage Related to "storage" package image Related to "image" package labels Nov 21, 2025
podmanbot pushed a commit to podmanbot/buildah that referenced this pull request Nov 21, 2025
Copy link
Contributor

@mtrmac mtrmac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

The diagnosis + fix is correct. Are we fine with paying the price? Earlier containers/podman#26524 , Cc: @mheon @baude @Luap99 .

Alternatively, it should be possible for oci/layout/newImageDestination to defer reading the index.json until the start of the “write manifests” phase; that would avoid the need to hold the lock for the duration of blob copies — with a fairly major downside that this would (again) make pkg/libartifact have a very remote dependency on internal implementation choices of oci/layout`.


as.lock.Lock()
locked = true
// Lock is still held from the beginning of the function
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment does not make much sense stand-alone for future readers of the code, without seeing it in this diff.

@Luap99
Copy link
Member

Luap99 commented Nov 22, 2025

yes, ref containers/podman#27574

It would be much better to fix this right so that we reload index.json after the relock. It means redoing the conflict check again.

@rhatdan
Copy link
Member Author

rhatdan commented Dec 1, 2025

I asked Cursor to re-do the PR to match the comments above. @Luap99 @mheon @mtrmac PTAL

podmanbot pushed a commit to podmanbot/buildah that referenced this pull request Dec 1, 2025
locked = false
// Release the lock during blob copying to allow concurrent operations.
// We'll reacquire it later and check for conflicts before committing.
as.lock.Unlock()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you're keeping this you absolutely need to keep the locked variable, this risks a double-unlock if the defer fires before the lock reacquires

This fixes a race condition where concurrent 'podman artifact add'
commands for different artifacts would result in only one artifact
being created, without any error messages.

The root cause was in the artifact store's Add() method. When the lock
was released during blob copying (for performance), concurrent additions
could overwrite each other's index entries:

- Process A: Lock → Read index → Unlock → Copy blobs → Lock → Commit
- Process B: Lock → Read index (missing A) → Unlock → Copy blobs → Lock → Commit
- Result: Process B's commit overwrites Process A's entry

The fix adds conflict detection after reacquiring the lock. Before
committing, we reload the artifact index and verify the artifact name
hasn't been added by another process. If a conflict is detected, we
return an error rather than silently overwriting the concurrent change.

A 'locked' boolean variable tracks lock state to prevent double-unlock
if an error occurs during blob copying (when the lock is released).
The deferred unlock only fires if 'locked' is true.

This preserves the performance benefits of concurrent blob copying
while preventing the race condition that caused artifacts to be lost.

Fixes: containers/podman#27569

Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
podmanbot pushed a commit to podmanbot/buildah that referenced this pull request Dec 1, 2025
return nil, fmt.Errorf("%s: %w", dest, libartTypes.ErrArtifactAlreadyExists)
}
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The original imageDest, as identified in the original PR description, continues to exist, will be used below, and contains the cached data. So this no longer fixes the race at all.

(*grumble* I think ferrying data between an AI and review comments with little human review is effectively equivalent to asking the reviewers to do the work themselves with the help of an AI. In that sense, it takes advantage of the PR as a signal “I care about this so much to do the work, so you should care about this more than about just a filed issue”, and risks breaking that signal. In the worst case, PRs will stay in the same queue as issues, with no reason to look at them with any extra priority.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

common Related to "common" package image Related to "image" package storage Related to "storage" package

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Running two podman artifact adds at the same time, fails to create both artifacts with no errors.

5 participants