Skip to content

Conversation

@srikalyan
Copy link

@srikalyan srikalyan commented Dec 24, 2025

Summary

  • Adds FreePages *uint64 field to HugePagesInfo struct, populated from /sys/devices/system/node/node<N>/hugepages/hugepages-<size>kB/free_hugepages
  • Uses pointer type with omitempty to distinguish between "0 free pages" and "data unavailable"
  • Enables consumers like the Kubernetes Memory Manager to verify actual hugepage availability during pod admission

Motivation

The Kubernetes Static Memory Manager currently only tracks hugepage allocations for Guaranteed QoS pods. However, Burstable and BestEffort pods can consume hugepages (via hugetlbfs mounts or mmap with MAP_HUGETLB) without being tracked. This causes Guaranteed pods to be admitted based on stale allocation data, only to fail at runtime when hugepages are exhausted.

By exposing free_hugepages from sysfs, consumers can verify actual OS-reported availability before making admission decisions.

Design

The field uses *uint64 with omitempty (following v2 convention) to distinguish:

  • nil: free_hugepages data unavailable (file missing or unreadable)
  • 0: zero free hugepages available
  • N: N free hugepages available

This allows consumers to detect when the data isn't available and fall back appropriately.

Note: Since GetMachineInfo() is cached at startup, the FreePages value represents point-in-time data. Consumers requiring real-time availability may need to read sysfs directly or use a dedicated fresh-read method (pending KEP outcome).

Test Plan

  • Added unit tests for GetHugePagesFree() in sysfs
  • Updated TestGetHugePagesInfo to verify FreePages is correctly populated
  • Verified JSON serialization with omitempty behavior
  • All existing tests pass

Related

This change adds a FreePages field to HugePagesInfo, populated from
/sys/devices/system/node/node<N>/hugepages/hugepages-<size>kB/free_hugepages

This enables consumers like the Kubernetes Memory Manager to verify
actual hugepage availability during pod admission, rather than only
tracking allocations which can miss consumption by untracked workloads.

The field uses *uint64 with omitempty to distinguish between:
- nil: free_hugepages data unavailable (file missing or unreadable)
- 0: zero free hugepages available
- N: N free hugepages available

Related: kubernetes/kubernetes#134395
srikalyan added a commit to srikalyan/enhancements that referenced this pull request Dec 24, 2025
This KEP proposes enhancing the Memory Manager's Static policy to
verify OS-reported free hugepages availability during pod admission.

Problem:
The Memory Manager only tracks hugepage allocations for Guaranteed QoS
pods. Burstable/BestEffort pods can consume hugepages without being
tracked, causing subsequent Guaranteed pods to be admitted but fail
at runtime when hugepages are exhausted.

Solution:
- Add FreePages field to cadvisor's HugePagesInfo (PR google/cadvisor#3804)
- Verify OS-reported free hugepages during Allocate() in Static policy
- Reject pods when insufficient free hugepages are available

Related: kubernetes/kubernetes#134395
@srikalyan
Copy link
Author

Based on KEP review feedback, I'm considering changing FreePages from *uint64 to uint64.

Rationale: On Linux systems with hugepages configured, the sysfs interface (/sys/devices/system/node/node<N>/hugepages/hugepages-<size>kB/free_hugepages) is always available. We don't need to distinguish between "0 free hugepages" and "data unavailable" since sysfs won't be unavailable.

Current implementation: Uses *uint64 with omitempty to distinguish nil (unavailable) from 0 (zero free).

Proposed change: Use plain uint64. A value of 0 simply means zero free hugepages.

What are your thoughts on this? I'm happy to update the PR either way based on cadvisor's conventions and your preference.

cc @iwankgb

@srikalyan srikalyan marked this pull request as draft December 27, 2025 17:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant