Add SHM_LOCK_DIR environment variable for containerized deployments (memory leak fix) #7312
+28
−5
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Problem
OSRM crashes with
"lock file does not exist"when using shared memory (osrm-routed -s) in containerized environments (Kubernetes) after a container restart.The crash triggers a restart/crash loop, and each restart accumulates orphaned shm segments, leaking memory proportional to the graph size until the node is exhausted.
Related issues: #5134, #5703
Reproduction
See gist: https://gist.github.com/wes4m/719cb69b72e26c09c7ff57ed71cf33d9
Why This Happens in Containers
OSRM's shared memory implementation assumes
/tmpand System V shared memory have the same lifecycle. This is true on traditional systems where both are cleared on reboot, but not in containers:/tmpfilesystemWhen a container restarts:
/tmpis reset and lock files get deletedSharedRegionRegister(in shm) still references old segmentsSolution
This PR adds
OSRM_LOCK_DIRenvironment variable to specify a custom directory for lock files. This allows containerized deployments to place lock files in a volume that persists across container restarts (e.g. kubernetesemptyDir)Other Approaches
Mounting
/tmpto a persistent volume fixes the issue but will persist all temporary files from the container, not just lock files. SettingTMPDIRwill do the same, and cleaning orphaned shm segments on startup is a workaround that doesn't address the root cause.OSRM_LOCK_DIRonly affects lock file location and is backward compatible (tmp dir fallback), behavior is unchanged.