-
Notifications
You must be signed in to change notification settings - Fork 296
Update feature/numa9 by merging from master #6814
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
To avoid CI format check error
Two commands are used to set max_cstate: xenpm to set at runtime and xen-cmdline to set it in grub conf file to take effect after reboot. Signed-off-by: Changlei Li <changlei.li@cloud.com>
String is used to represent the max_cstate and max_sub_cstate. "" -> unlimited "N" -> max cstate CN "N,M" -> max cstate CN with max sub state M Just follow the xen-cmdline cstate, see https://xenbits.xen.org/docs/unstable/misc/xen-command-line.html#max_cstate-x86 Signed-off-by: Changlei Li <changlei.li@cloud.com>
Signed-off-by: Changlei Li <changlei.li@cloud.com>
C-states are power management states for CPUs where higher numbered
states represent deeper sleep modes with lower power consumption but
higher wake-up latency. The max_cstate parameter controls the deepest
C-state that CPUs are allowed to enter.
Common C-state values:
- C0: CPU is active (not a sleep state)
- C1: CPU is halted but can wake up almost instantly
- C2: CPU caches are flushed, slightly longer wake-up time
- C3+: Deeper sleep states with progressively longer wake-up times
To set max_cstate on dom0 host, two commands are used: `xenpm` to set at
runtime and `xen-cmdline` to set it in grub conf file to take effect
after reboot.
xenpm examples:
```
# xenpm set-max-cstate 0 0
max C-state set to C0
max C-substate set to 0 succeeded
# xenpm set-max-cstate 0
max C-state set to C0
max C-substate set to unlimited succeeded
# xenpm set-max-cstate unlimited
max C-state set to unlimited
# xenpm set-max-cstate -1
Missing, excess, or invalid argument(s)
```
xen-command-line examples:
```
/opt/xensource/libexec/xen-cmdline --get-xen max_cstate
"" -> unlimited
"max_cstate=N" -> max cstate N
"max_cstate=N,M" -> max cstate N, max c-sub-state M *)
/opt/xensource/libexec/xen-cmdline --set-xen max_cstate=1
/opt/xensource/libexec/xen-cmdline --set-xen max_cstate=1,0
/opt/xensource/libexec/xen-cmdline --delete-xen max_cstate
```
[xen-command-line.max_cstate](https://xenbits.xen.org/docs/unstable/misc/xen-command-line.html#max_cstate-x86).
This PR adds a new field `host.max_cstate` to manage host's max_cstate.
`host.set_max_cstate` use the two commands mentioned above to configure.
While dbsync on xapi start, the filed will be synced by `xen-cmdline
--get-xen max_cstate`
Signed-off-by: Changlei Li <changlei.li@cloud.com>
- write ntp servers to chrony.conf - interaction with dhclient - handle /run/chrony-dhcp/$interface.sources - handle chrony.sh - restart/enable/disable chronyd Signed-off-by: Changlei Li <changlei.li@cloud.com>
Signed-off-by: Changlei Li <changlei.li@cloud.com>
Signed-off-by: Changlei Li <changlei.li@cloud.com>
At XAPI start, check the actual NTP config to determine the ntp mode, ntp enabled, ntp custom servers and store in xapi DB. Signed-off-by: Changlei Li <changlei.li@cloud.com>
Signed-off-by: Changlei Li <changlei.li@cloud.com>
Signed-off-by: Changlei Li <changlei.li@cloud.com>
New filed: `host.ntp_mode`, `host.ntp_custom_servers` New API: `host.set_ntp_mode`, `host.set_ntp_custom_servers`, `host.get_ntp_mode`, `host.get_ntp_custom_servers`, `host.get_ntp_servers_status`. **ntp_mode_dhcp**: In this mode, ntp uses the dhcp assigned ntp servers as sources. In Dom0, dhclient triggers `chrony.sh` to update the ntp servers when network event happens. It writes ntp servers to `/run/chrony-dhcp/$interface.sources` and the dir `/run/chrony-dhcp` is included in `chrony.conf`. The dhclient also stores dhcp lease in `/var/lib/xcp/dhclient-$interface.leases`, see https://github.com/xapi-project/xen-api/blob/v25.31.0/ocaml/networkd/lib/network_utils.ml#L925. When switch ntp mode to dhcp, XAPI checks the lease file and finds ntp server then fills chrony-dhcp file. The exec permission of `chrony.sh` is added. When swith ntp mode from dhcp to others, XAPI removes the chrony-dhcp files and the exec permission of `chrony.sh`. The operation is same with xsconsole https://github.com/xapi-project/xsconsole/blob/v11.1.1/XSConsoleData.py#L593. In this feature, xsconsole will change to use XenAPI to manage ntp later to avoid conflict. **ntp_mode_custom**: In this mode, ntp uses `host.ntp_custom_servers` as sources. This is implemented by changing `chrony.conf` and restart chronyd. `host.ntp_custom_servers` is set by the user. **ntp_mode_default**: In this mode, ntp uses default-ntp-servers in XAPI config file.
For example, the legacy default ntp servers are [0-3].centos.pool.ntp.org, and current default ntp servers are [0-3].xenserver.pool.ntp.org. After update or upgrade, the legacy default ntp servers are recognized and changed to current default ntp servers. The mode is ntp_mode_default as well. Signed-off-by: Changlei Li <changlei.li@cloud.com>
For example, the legacy default ntp servers are `[0-3].centos.pool.ntp.org`, and current default ntp servers are `[0-3].xenserver.pool.ntp.org`. After update or upgrade, the legacy default ntp servers are recognized and changed to current default ntp servers. The mode is `ntp_mode_default` as well. Add a new config option named legacy-default-ntp-servers. It will be defined in xapi.conf.d/xenserver.conf (the same with default-ntp-servers)
Signed-off-by: Changlei Li <changlei.li@cloud.com>
New filed: host.timezone APIs: host.set_timezone, host.get_timezone, host.list_timezones Signed-off-by: Changlei Li <changlei.li@cloud.com>
Signed-off-by: Changlei Li <changlei.li@cloud.com>
- New filed: host.timezone - APIs: host.set_timezone, host.get_timezone, host.list_timezones - host.timezone dbsync
Signed-off-by: Ming Lu <ming.lu@cloud.com>
Signed-off-by: Ming Lu <ming.lu@cloud.com>
) **host.get_ntp_synchronized**: Simply return true or false by parsing "System clock synchronized" or "NTP synchronized" from the output of "timedatectl status" like the following, no matter the NTP being enabled or not. ``` # timedatectl status Local time: Wed 2025-11-05 06:10:42 UTC Universal time: Wed 2025-11-05 06:10:42 UTC RTC time: Wed 2025-11-05 05:00:11 Time zone: UTC (UTC, +0000) System clock synchronized: yes NTP service: active RTC in local TZ: no ``` **host.set_servertime**: Use "timedatectl set-time" to set local time on the host in its local timezone only when NTP is disabled.
Signed-off-by: Changlei Li <changlei.li@cloud.com>
Command timedatectl is used to set server time. However, if disable chronyd before set_servertime by systemctl, timedatectl may not sync the status in time, then the set-time will fail as `Failed to set time: Automatic time synchronization is enabled`. This is because systemd checks synchronization status periodically and update its own NTP flag (which is timedatectl replied on) To let timedatectl update NTP flag instantly, use timedatectl set-ntp true/false to enable/disable the NTP. Signed-off-by: Changlei Li <changlei.li@cloud.com>
Every newly added field must have an entry in Host.create_params, otherwise these settings could be lost on pool join. Signed-off-by: Edwin Török <edwin.torok@citrix.com>
Only set affinity when VM successfully claim pages on a single node: ``` xensource.log:2025-12-15T17:04:34.833013+00:00 host-34 xenopsd-xc: [debug||38 |Async.VM.start R:b195b91f07ac|xenops] Domain.numa_placement.(fun).set_vcpu_affinity: setting vcpu affinity for domain 43: [40; 41; 42; 43; 44; 45; 46; 47; 48; 49; 50; 51; 52; 53; 54; 55; 56; \x0A 57; 58; 59; 60; 61; 62; 63; 64; 65; 66; 67; 68; 69; 70; 71; 72; 73; \x0A 74; 75; 76; 77; 78; 79] xensource.log:2025-12-15T17:04:34.907898+00:00 host-34 xenopsd-xc: [debug||74 |Async.VM.start R:d4600f3c365b|xenops] Domain.numa_placement.(fun).set_vcpu_affinity: setting vcpu affinity for domain 44: [0; 1; 2; 3; 4; 5; 6; 7; 8; 9; 10; 11; 12; 13; 14; 15; 16; 17; 18; 19; \x0A 20; 21; 22; 23; 24; 25; 26; 27; 28; 29; 30; 31; 32; 33; 34; 35; 36; \x0A 37; 38; 39] ``` the corresponding nodes the memory was allocated, only VMs that claimed successfully the node have the vcpu affinity assigned: ``` # xl debug-keys u && xl dmesg | tail -n 30 (XEN) [21080.047304] d43 (total: 5767675): (XEN) [21080.047305] Node 0: 1 (XEN) [21080.047307] Node 1: 5767678 (XEN) [21080.047309] d44 (total: 5767675): (XEN) [21080.047311] Node 0: 5767679 (XEN) [21080.047312] Node 1: 0 (XEN) [21080.047315] d45 (total: 5014779): (XEN) [21080.047317] Node 0: 2637953 (XEN) [21080.047318] Node 1: 2376830 ``` and the xl vcpu-list output: ``` # xl vcpu-list Name ID VCPU CPU State Time(s) Affinity (Hard / Soft) ws19-x64-clone1 43 0 42 -b- 14.8 all / 40-79 ws19-x64-clone1 43 1 53 -b- 14.3 all / 40-79 ws19-x64-clone1 43 2 57 -b- 19.1 all / 40-79 ws19-x64-clone1 43 3 59 -b- 11.5 all / 40-79 ws19-x64 44 0 6 -b- 18.0 all / 0-39 ws19-x64 44 1 12 -b- 17.5 all / 0-39 ws19-x64 44 2 14 -b- 11.9 all / 0-39 ws19-x64 44 3 3 -b- 11.4 all / 0-39 spread-ws19-x64 45 0 21 -b- 17.9 all / all spread-ws19-x64 45 1 1 -b- 12.9 all / all spread-ws19-x64 45 2 5 -b- 15.7 all / all spread-ws19-x64 45 3 18 -b- 15.3 all / all ```
The customer environment has following two issues
- The forest is very large
- They have very huge user sid
For large forest, the joined domain trust around 50 domains.
For each domain, with `winbind scan trusted domains = yes`
- xapi scan each trusted domain and enumerate all domain DCs routinely
to decide the closed DC for ldap query
- winbind create a subprocess for each trusted domain, and it also
enumerate all DCs to decide the best DC and sync domain information
This takes huge mount of resouce and keep winbind main process too
busy to handle user request.
However, customer usually only used 2-3 domains to manage XS, this
means it is not necessary to scan all the trusted domains.
- `winbind scan trusted domains = no` is set to forbid domain scan.
The side effect is xapi no longer know the trusted domain.
Thus, xapi perform ldap query to the DC of the trusted domain for the
necessary information.
- Closest KDC is maintained to perform ldap query, that is removed as
* `wbinfo --getdcname` is called to get a KD, winbind already perform
some basic check regarding the response time
* The closed KDC ldap is performed during add user, which is NOT a
frequent operation, so performance is not such critical
* The update subject backend task can refresh subject information
later
For the huge sid problem, winbind setup a 1-1 map between sid and uid
Huge sid number exceed the configured uid limitation. To fix it
- Exteend the configured limitation
- rid -> autorid as the map backend, autorid is better and deterministic
- Clean winbind cache during xapi start to support update from rid
and we do want to refresh with xapi restart
Signed-off-by: Lin Liu <lin.liu01@citrix.com>
For the new added unit test which check host create params, need to add the new fields to create params. This check is mainly for pool join case. Without the params, the values will be set to default values when the supporter host obj is created. Then the value may be lost after pool-join. However, for the fields in the feature, the values will be set by dbsync. So the values can be set correctly after restart during the pool-join. So there will not be defects. On the other hand, there is also no harm for adding them to create_params. It can pass the new added unit test and set the value correctly when host object is created during pool-join. So add them in this commit. Signed-off-by: Changlei Li <changlei.li@cloud.com>
For the new added unit test which check host create params, need to add the new fields to create params. This check is mainly for pool join case. Without the params, the values will be set to default values when the supporter host obj is created. Then the value may be lost after pool-join. However, for the fields in the feature, the values will be set by dbsync. So the values can be set correctly after restart during the pool-join. So there will not be defects. On the other hand, there is also no harm for adding them to create_params. It can pass the new added unit test and set the value correctly when host object is created during pool-join. So add them in this commit.
…i-project#6799) This only looks at newly added fields (those with an empty `lifecycle`), and requires them to be present in `Host.create_params`. This ensures that we get a compile error, and are forced to propagate it during pool join. Otherwise newly added fields seem to keep reintroducing this bug with every newly added feature (e.g. the pending NTP feature branch has this bug on most of its fields). So far I only found this bug on the update guidance fields. There are more bugs on other pre-existing fields in older releases, but those are skipped by the unit test (there are too many, `logging`, `iscsi_iqn`, etc.). We do want to eventually fix those, but it'll require a lot more testing, so will be done separately (also some of them are actually overwritten in dbsync_slave). There are a lot more properties we could check in the unit test (e.g. that all newly added parameters have defaults for backwards compatibility, that the doc and type matches the field, etc.). Although eventually I'd probably want to entirely auto-generate `create_params`, but we'll need to see how to do that to also take into account what dbsync_slave already does.
…6802) Solve the conflict for new added host fields. Update the version in create_params to 25.39.0-next
- Protect domain_netbios_name_map with Automic - Some other minor update Signed-off-by: Lin Liu <lin.liu01@citrix.com>
…-project#6800) The customer environment has following two issues - The forest is very large - They have very huge user sid For large forest, the joined domain trust around 50 domains. For each domain, with `winbind scan trusted domains = yes` - xapi scan each trusted domain and enumerate all domain DCs routinely to decide the closed DC for ldap query - winbind create a subprocess for each trusted domain, and it also enumerate all DCs to decide the best DC and sync domain information This takes huge mount of resouce and keep winbind main process too busy to handle user request. However, customer usually only used 2-3 domains to manage XS, this means it is not necessary to scan all the trusted domains. - `winbind scan trusted domains = no` is set to forbid domain scan. The side effect is xapi no longer know the trusted domain. Thus, xapi perform ldap query to the DC of the trusted domain for the necessary information. - Closest KDC is maintained to perform ldap query, that is removed as * `wbinfo --getdcname` is called to get a KD, winbind already perform some basic check regarding the response time * The closed KDC ldap is performed during add user, which is NOT a frequent operation, so performance is not such critical * The update subject backend task can refresh subject information later For the huge sid problem, winbind setup a 1-1 map between sid and uid Huge sid number exceed the configured uid limitation. To fix it - Exteend the configured limitation - rid -> autorid as the map backend, autorid is better and deterministic - Clean winbind cache during xapi start to support update from rid and we do want to refresh with xapi restart
The output are unstable with regards of the ocaml version, in ocaml 5.4 some of the parameters are moved before an end of line. Work around this by only printing the first line of the introduction for each command. Signed-off-by: Pau Ruiz Safont <pau.safont@vates.tech>
ocamlformat will start formatting comments, change them to be compatible with both the old and new versions Also ocamlformat has issues with comments right after a `then`, move them to before the if, there's no loss of comprehensibility when placed there. This is similar to the update I did on August, where ocamlformat 0.27.1 also has problems with comments after `then` is other locations. Signed-off-by: Pau Ruiz Safont <pau.safont@vates.tech>
Changes tests to depend less on the output format of cmdliner under ocaml 5.4, and introduces formatting changes compatible with 0.28.1, all about comment positioning.
This infrastructure has been unused for many years (I think it only ever worked for some months during 2019), and it's making ocaml 5.4 difficult to adopt, so drop it. I'm leaving the instrumentation in the dune metadata because it's disabled by default, and the document about coverage, because it's a historical document. Signed-off-by: Pau Ruiz Safont <pau.safont@vates.tech>
This infrastructure has been unused for many years (I think it only ever worked for some months during 2019), and it's making ocaml 5.4 difficult to adopt, so drop it. I'm leaving the instrumentation in the dune metadata because it's disabled by default, and the document about coverage, because it's a historical document.
xe diagnostic-timing-stats reports timings for functions in a key/value format. This patch sorts the output by key. Signed-off-by: Christian Lindig <christian.lindig@citrix.com> Co-authored-by: Christian Pardillo Laursen <christiansfd@protonmail.com>
xe diagnostic-timing-stats reports timings for functions in a key/value format. This patch sorts the output by key.
Fixes: cba2f1d During fix the localhost name issue, An problem was found Hosts in a pool can not resovle each other with static IP Thus, an enhancement is applied to push host name and IPs to DNS, This pushed all IPs of the host into DNS server, including the storage interface. This commit just revert the DNS change. Regarding the resovle issue with static IP, it better goes to somewhere else like network event hook, or system deamon, if we do care about it and want a fix. Signed-off-by: Lin Liu <lin.liu01@citrix.com>
…api-project#6811) Fixes: cba2f1d During fix the localhost name issue, An problem was found Hosts in a pool can not resovle each other with static IP Thus, an enhancement is applied to push host name and IPs to DNS, This pushed all IPs of the host into DNS server, including the storage interface. This commit just revert the DNS change. Regarding the resovle issue with static IP, it better goes to somewhere else like network event hook, or system deamon, if we do care about it and want a fix.
Signed-off-by: Ming Lu <ming.lu@cloud.com>
…api-project#6792) 1. Split peer and root CA for user installed trusted certificates. 2. Add purpose for user installed certificates.
There is race condition about vm cache between pool_migrate_complete and VM event. In the cross-pool migration case, it is designed to create vm with power_state Halted in XAPI db. In pool_migrate_complete, add_caches create an empty xenops_chae for the VM, then refresh_vm compares the cache powerstate None with its real state Running to update the right powerstate to XAPI db. In the fail case, it is found that: -> VM event 1 update_vm -> pool_migrate_complete add_caches (cache power_state None) -> pool_migrate_complete refresh_vm -> VM event 1 update cache (cache power_state Running) -> VM event 2 update_vm (Running <-> Running, XAPI DB not update) When pool_migrate_complete add_caches, the cache update of previous VM event 1 breaks the design intention. This commit add a wait in pool_migrate_complete to ensure all in-flight events complete before add_caches. Then there will be no race condition. Signed-off-by: Changlei Li <changlei.li@cloud.com>
There is race condition about vm cache between pool_migrate_complete and VM event. In the cross-pool migration case, it is designed to create vm with power_state Halted in XAPI db. In pool_migrate_complete, add_caches create an empty xenops_chae for the VM, then refresh_vm compares the cache powerstate None with its real state Running to update the right powerstate to XAPI db. In the fail case, it is found that: -> VM event 1 update_vm -> pool_migrate_complete add_caches (cache power_state None) -> pool_migrate_complete refresh_vm -> VM event 1 update cache (cache power_state Running) -> VM event 2 update_vm (Running <-> Running, XAPI DB not update) When pool_migrate_complete add_caches, the cache update of previous VM event 1 breaks the design intention. This commit add a wait in pool_migrate_complete to ensure all in-flight events complete before add_caches. Then there will be no race condition.
Signed-off-by: Christian Lindig <christian.lindig@citrix.com>
edwintorok
approved these changes
Jan 5, 2026
changlei-li
approved these changes
Jan 6, 2026
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
The API schema hash changed on master and so had to be updated.