Skip to content

Conversation

@lindig
Copy link
Contributor

@lindig lindig commented Jan 5, 2026

The API schema hash changed on master and so had to be updated.

BengangY and others added 30 commits September 23, 2025 16:08
Two commands are used to set max_cstate: xenpm to set at runtime
and xen-cmdline to set it in grub conf file to take effect after
reboot.

Signed-off-by: Changlei Li <changlei.li@cloud.com>
String is used to represent the max_cstate and max_sub_cstate.
"" -> unlimited
"N" -> max cstate CN
"N,M" -> max cstate CN with max sub state M
Just follow the xen-cmdline cstate, see
https://xenbits.xen.org/docs/unstable/misc/xen-command-line.html#max_cstate-x86

Signed-off-by: Changlei Li <changlei.li@cloud.com>
Signed-off-by: Changlei Li <changlei.li@cloud.com>
C-states are power management states for CPUs where higher numbered
states represent deeper sleep modes with lower power consumption but
higher wake-up latency. The max_cstate parameter controls the deepest
C-state that CPUs are allowed to enter.

Common C-state values:
- C0: CPU is active (not a sleep state)
- C1: CPU is halted but can wake up almost instantly
- C2: CPU caches are flushed, slightly longer wake-up time
- C3+: Deeper sleep states with progressively longer wake-up times

To set max_cstate on dom0 host, two commands are used: `xenpm` to set at
runtime and `xen-cmdline` to set it in grub conf file to take effect
after reboot.
xenpm examples:
```
   # xenpm set-max-cstate 0 0
   max C-state set to C0
   max C-substate set to 0 succeeded
   # xenpm set-max-cstate 0
   max C-state set to C0
   max C-substate set to unlimited succeeded
   # xenpm set-max-cstate unlimited
   max C-state set to unlimited
   # xenpm set-max-cstate -1
   Missing, excess, or invalid argument(s)
```
xen-command-line examples:
```
/opt/xensource/libexec/xen-cmdline --get-xen max_cstate
     "" -> unlimited
     "max_cstate=N" -> max cstate N
     "max_cstate=N,M" -> max cstate N, max c-sub-state M *)
/opt/xensource/libexec/xen-cmdline --set-xen max_cstate=1
/opt/xensource/libexec/xen-cmdline --set-xen max_cstate=1,0
/opt/xensource/libexec/xen-cmdline --delete-xen max_cstate
```

[xen-command-line.max_cstate](https://xenbits.xen.org/docs/unstable/misc/xen-command-line.html#max_cstate-x86).

This PR adds a new field `host.max_cstate` to manage host's max_cstate.
`host.set_max_cstate` use the two commands mentioned above to configure.
While dbsync on xapi start, the filed will be synced by `xen-cmdline
--get-xen max_cstate`
Signed-off-by: Changlei Li <changlei.li@cloud.com>
- write ntp servers to chrony.conf
- interaction with dhclient
  - handle /run/chrony-dhcp/$interface.sources
  - handle chrony.sh
- restart/enable/disable chronyd

Signed-off-by: Changlei Li <changlei.li@cloud.com>
Signed-off-by: Changlei Li <changlei.li@cloud.com>
Signed-off-by: Changlei Li <changlei.li@cloud.com>
At XAPI start, check the actual NTP config to determine the
ntp mode, ntp enabled, ntp custom servers and store in xapi
DB.

Signed-off-by: Changlei Li <changlei.li@cloud.com>
Signed-off-by: Changlei Li <changlei.li@cloud.com>
Signed-off-by: Changlei Li <changlei.li@cloud.com>
New filed: `host.ntp_mode`, `host.ntp_custom_servers`
New API: `host.set_ntp_mode`, `host.set_ntp_custom_servers`,
`host.get_ntp_mode`, `host.get_ntp_custom_servers`,
`host.get_ntp_servers_status`.

**ntp_mode_dhcp**: In this mode, ntp uses the dhcp assigned ntp servers
as sources. In Dom0, dhclient triggers `chrony.sh` to update the ntp
servers when network event happens. It writes ntp servers to
`/run/chrony-dhcp/$interface.sources` and the dir `/run/chrony-dhcp` is
included in `chrony.conf`. The dhclient also stores dhcp lease in
`/var/lib/xcp/dhclient-$interface.leases`, see
https://github.com/xapi-project/xen-api/blob/v25.31.0/ocaml/networkd/lib/network_utils.ml#L925.
When switch ntp mode to dhcp, XAPI checks the lease file and finds ntp
server then fills chrony-dhcp file. The exec permission of `chrony.sh`
is added. When swith ntp mode from dhcp to others, XAPI removes the
chrony-dhcp files and the exec permission of `chrony.sh`. The operation
is same with xsconsole
https://github.com/xapi-project/xsconsole/blob/v11.1.1/XSConsoleData.py#L593.
In this feature, xsconsole will change to use XenAPI to manage ntp later
to avoid conflict.

**ntp_mode_custom**: In this mode, ntp uses `host.ntp_custom_servers` as
sources. This is implemented by changing `chrony.conf` and restart
chronyd. `host.ntp_custom_servers` is set by the user.

**ntp_mode_default**: In this mode, ntp uses default-ntp-servers in XAPI
config file.
For example, the legacy default ntp servers are
[0-3].centos.pool.ntp.org, and current default
ntp servers are [0-3].xenserver.pool.ntp.org.
After update or upgrade, the legacy default ntp
servers are recognized and changed to current
default ntp servers. The mode is ntp_mode_default
as well.

Signed-off-by: Changlei Li <changlei.li@cloud.com>
For example, the legacy default ntp servers are
`[0-3].centos.pool.ntp.org`, and current default ntp servers are
`[0-3].xenserver.pool.ntp.org`. After update or upgrade, the legacy
default ntp servers are recognized and changed to current default ntp
servers. The mode is `ntp_mode_default` as well.
Add a new config option named legacy-default-ntp-servers. It will be
defined in xapi.conf.d/xenserver.conf (the same with
default-ntp-servers)
Signed-off-by: Changlei Li <changlei.li@cloud.com>
New filed: host.timezone
APIs: host.set_timezone, host.get_timezone, host.list_timezones

Signed-off-by: Changlei Li <changlei.li@cloud.com>
Signed-off-by: Changlei Li <changlei.li@cloud.com>
- New filed: host.timezone
- APIs: host.set_timezone, host.get_timezone, host.list_timezones
- host.timezone dbsync
Signed-off-by: Ming Lu <ming.lu@cloud.com>
Signed-off-by: Ming Lu <ming.lu@cloud.com>
)

**host.get_ntp_synchronized**:
Simply return true or false by parsing "System clock synchronized" or
"NTP synchronized" from the output of "timedatectl status" like the
following, no matter the NTP being enabled or not.
```
# timedatectl status
               Local time: Wed 2025-11-05 06:10:42 UTC
           Universal time: Wed 2025-11-05 06:10:42 UTC
                 RTC time: Wed 2025-11-05 05:00:11
                Time zone: UTC (UTC, +0000)
System clock synchronized: yes
              NTP service: active
          RTC in local TZ: no
```

**host.set_servertime**:
Use "timedatectl set-time" to set local time on the host in its local
timezone only when NTP is disabled.
Signed-off-by: Changlei Li <changlei.li@cloud.com>
Command timedatectl is used to set server time. However, if
disable chronyd before set_servertime by systemctl, timedatectl
may not sync the status in time, then the set-time will fail as
`Failed to set time: Automatic time synchronization is enabled`.
This is because systemd checks synchronization status periodically
and update its own NTP flag (which is timedatectl replied on)
To let timedatectl update NTP flag instantly, use timedatectl
set-ntp true/false to enable/disable the NTP.

Signed-off-by: Changlei Li <changlei.li@cloud.com>
edwintorok and others added 25 commits December 15, 2025 17:51
Every newly added field must have an entry in Host.create_params,
otherwise these settings could be lost on pool join.

Signed-off-by: Edwin Török <edwin.torok@citrix.com>
Only set affinity when VM successfully claim pages on a single node:
```
xensource.log:2025-12-15T17:04:34.833013+00:00 host-34 xenopsd-xc: [debug||38 |Async.VM.start R:b195b91f07ac|xenops]
Domain.numa_placement.(fun).set_vcpu_affinity: setting vcpu affinity for domain 43:
[40; 41; 42; 43; 44; 45; 46; 47; 48; 49; 50; 51; 52; 53; 54; 55; 56; \x0A 57; 58; 59; 60; 61; 62; 63; 64; 65; 66; 67; 68; 69; 70; 71; 72; 73; \x0A 74; 75; 76; 77; 78; 79]
xensource.log:2025-12-15T17:04:34.907898+00:00 host-34 xenopsd-xc: [debug||74 |Async.VM.start R:d4600f3c365b|xenops]
Domain.numa_placement.(fun).set_vcpu_affinity: setting vcpu affinity for domain 44:
[0; 1; 2; 3; 4; 5; 6; 7; 8; 9; 10; 11; 12; 13; 14; 15; 16; 17; 18; 19; \x0A 20; 21; 22; 23; 24; 25; 26; 27; 28; 29; 30; 31; 32; 33; 34; 35; 36; \x0A 37; 38; 39]
```
the corresponding nodes the memory was allocated, only VMs that claimed
successfully the node have the vcpu affinity assigned:
```
# xl debug-keys u && xl dmesg | tail -n 30
(XEN) [21080.047304] d43 (total: 5767675):
(XEN) [21080.047305]     Node 0: 1
(XEN) [21080.047307]     Node 1: 5767678
(XEN) [21080.047309] d44 (total: 5767675):
(XEN) [21080.047311]     Node 0: 5767679
(XEN) [21080.047312]     Node 1: 0
(XEN) [21080.047315] d45 (total: 5014779):
(XEN) [21080.047317]     Node 0: 2637953
(XEN) [21080.047318]     Node 1: 2376830
```
and the xl vcpu-list output:
```
# xl vcpu-list
Name                                ID  VCPU   CPU State   Time(s) Affinity (Hard / Soft)
ws19-x64-clone1                     43     0   42   -b-      14.8  all / 40-79
ws19-x64-clone1                     43     1   53   -b-      14.3  all / 40-79
ws19-x64-clone1                     43     2   57   -b-      19.1  all / 40-79
ws19-x64-clone1                     43     3   59   -b-      11.5  all / 40-79
ws19-x64                            44     0    6   -b-      18.0  all / 0-39
ws19-x64                            44     1   12   -b-      17.5  all / 0-39
ws19-x64                            44     2   14   -b-      11.9  all / 0-39
ws19-x64                            44     3    3   -b-      11.4  all / 0-39
spread-ws19-x64                     45     0   21   -b-      17.9  all / all
spread-ws19-x64                     45     1    1   -b-      12.9  all / all
spread-ws19-x64                     45     2    5   -b-      15.7  all / all
spread-ws19-x64                     45     3   18   -b-      15.3  all / all
```
The customer environment has following two issues
- The forest is very large
- They have very huge user sid

For large forest, the joined domain trust around 50 domains.
For each domain, with `winbind scan trusted domains = yes`
- xapi scan each trusted domain and enumerate all domain DCs routinely
to decide the closed DC for ldap query
- winbind create a subprocess for each trusted domain, and it also
enumerate all DCs to decide the best DC and sync domain information

This takes huge mount of resouce and keep winbind main process too
busy to handle user request.
However, customer usually only used 2-3 domains to manage XS, this
means it is not necessary to scan all the trusted domains.

- `winbind scan trusted domains = no` is set to forbid domain scan.
The side effect is xapi no longer know the trusted domain.
Thus, xapi perform ldap query to the DC of the trusted domain for the
necessary information.
- Closest KDC is maintained to perform ldap query, that is removed as
  * `wbinfo --getdcname` is called to get a KD, winbind already perform
     some basic check regarding the response time
  *  The closed KDC ldap is performed during add user, which is NOT a
     frequent operation, so performance is not such critical
  *  The update subject backend task can refresh subject information
     later

For the huge sid problem, winbind setup a 1-1 map between sid and uid
Huge sid number exceed the configured uid limitation. To fix it
- Exteend the configured limitation
- rid -> autorid as the map backend, autorid is better and deterministic
- Clean winbind cache during xapi start to support update from rid
  and we do want to refresh with xapi restart

Signed-off-by: Lin Liu <lin.liu01@citrix.com>
For the new added unit test which check host create
params, need to add the new fields to create params.
This check is mainly for pool join case. Without the
params, the values will be set to default values
when the supporter host obj is created. Then the value
may be lost after pool-join.
However, for the fields in the feature, the values will
be set by dbsync. So the values can be set correctly
after restart during the pool-join. So there will not be
defects.
On the other hand, there is also no harm for adding them
to create_params. It can pass the new added unit test
and set the value correctly when host object is created
during pool-join. So add them in this commit.

Signed-off-by: Changlei Li <changlei.li@cloud.com>
For the new added unit test which check host create params, need to add
the new fields to create params. This check is mainly for pool join
case. Without the params, the values will be set to default values
when the supporter host obj is created. Then the value may be lost after
pool-join.
However, for the fields in the feature, the values will be set by
dbsync. So the values can be set correctly after restart during the
pool-join. So there will not be defects.
On the other hand, there is also no harm for adding them to
create_params. It can pass the new added unit test and set the value
correctly when host object is created during pool-join. So add them in
this commit.
…i-project#6799)

This only looks at newly added fields (those with an empty `lifecycle`),
and requires them to be present in `Host.create_params`. This ensures
that we get a compile error, and are forced to propagate it during pool
join.

Otherwise newly added fields seem to keep reintroducing this bug with
every newly added feature (e.g. the pending NTP feature branch has this
bug on most of its fields).

So far I only found this bug on the update guidance fields. There are
more bugs on other pre-existing fields in older releases, but those are
skipped by the unit test (there are too many, `logging`, `iscsi_iqn`,
etc.).
We do want to eventually fix those, but it'll require a lot more
testing, so will be done separately (also some of them are actually
overwritten in dbsync_slave).

There are a lot more properties we could check in the unit test (e.g.
that all newly added parameters have defaults for backwards
compatibility, that the doc and type matches the field, etc.).
Although eventually I'd probably want to entirely auto-generate
`create_params`, but we'll need to see how to do that to also take into
account what dbsync_slave already does.
…6802)

Solve the conflict for new added host fields.
Update the version in create_params to 25.39.0-next
- Protect domain_netbios_name_map with Automic
- Some other minor update

Signed-off-by: Lin Liu <lin.liu01@citrix.com>
…-project#6800)

The customer environment has following two issues
- The forest is very large
- They have very huge user sid

For large forest, the joined domain trust around 50 domains. For each
domain, with `winbind scan trusted domains = yes`
- xapi scan each trusted domain and enumerate all domain DCs routinely
to decide the closed DC for ldap query
- winbind create a subprocess for each trusted domain, and it also
enumerate all DCs to decide the best DC and sync domain information

This takes huge mount of resouce and keep winbind main process too busy
to handle user request.
However, customer usually only used 2-3 domains to manage XS, this means
it is not necessary to scan all the trusted domains.

- `winbind scan trusted domains = no` is set to forbid domain scan. The
side effect is xapi no longer know the trusted domain. Thus, xapi
perform ldap query to the DC of the trusted domain for the necessary
information.
- Closest KDC is maintained to perform ldap query, that is removed as
* `wbinfo --getdcname` is called to get a KD, winbind already perform
some basic check regarding the response time
* The closed KDC ldap is performed during add user, which is NOT a
frequent operation, so performance is not such critical
* The update subject backend task can refresh subject information later

For the huge sid problem, winbind setup a 1-1 map between sid and uid
Huge sid number exceed the configured uid limitation. To fix it
- Exteend the configured limitation
- rid -> autorid as the map backend, autorid is better and deterministic
- Clean winbind cache during xapi start to support update from rid and
we do want to refresh with xapi restart
The output are unstable with regards of the ocaml version, in ocaml 5.4
some of the parameters are moved before an end of line.

Work around this by only printing the first line of the introduction for
each command.

Signed-off-by: Pau Ruiz Safont <pau.safont@vates.tech>
ocamlformat will start formatting comments, change them to be compatible
with both the old and new versions

Also ocamlformat has issues with comments right after a `then`, move
them to before the if, there's no loss of comprehensibility when placed
there. This is similar to the update I did on August, where ocamlformat
0.27.1 also has problems with comments after `then` is other locations.

Signed-off-by: Pau Ruiz Safont <pau.safont@vates.tech>
Changes tests to depend less on the output format of cmdliner under
ocaml 5.4, and introduces formatting changes compatible with 0.28.1, all
about comment positioning.
This infrastructure has been unused for many years (I think it only ever
worked for some months during 2019), and it's making ocaml 5.4 difficult
to adopt, so drop it.

I'm leaving the instrumentation in the dune metadata because it's
disabled by default, and the document about coverage, because it's a
historical document.

Signed-off-by: Pau Ruiz Safont <pau.safont@vates.tech>
This infrastructure has been unused for many years (I think it only ever
worked for some months during 2019), and it's making ocaml 5.4 difficult
to adopt, so drop it.

I'm leaving the instrumentation in the dune metadata because it's
disabled by default, and the document about coverage, because it's a
historical document.
xe diagnostic-timing-stats reports timings for functions in a key/value
format. This patch sorts the output by key.

Signed-off-by: Christian Lindig <christian.lindig@citrix.com>
Co-authored-by: Christian Pardillo Laursen <christiansfd@protonmail.com>
xe diagnostic-timing-stats reports timings for functions in a key/value
format. This patch sorts the output by key.
Fixes: cba2f1d
During fix the localhost name issue, An problem was found
Hosts in a pool can not resovle each other with static IP
Thus, an enhancement is applied to push host name and IPs to DNS,
This pushed all IPs of the host into DNS server, including the
storage interface.

This commit just revert the DNS change.
Regarding the resovle issue with static IP, it better goes to
somewhere else like network event hook, or system deamon, if we
do care about it and want a fix.

Signed-off-by: Lin Liu <lin.liu01@citrix.com>
…api-project#6811)

Fixes: cba2f1d
During fix the localhost name issue, An problem was found Hosts in a
pool can not resovle each other with static IP Thus, an enhancement is
applied to push host name and IPs to DNS, This pushed all IPs of the
host into DNS server, including the storage interface.

This commit just revert the DNS change.
Regarding the resovle issue with static IP, it better goes to somewhere
else like network event hook, or system deamon, if we do care about it
and want a fix.
Signed-off-by: Ming Lu <ming.lu@cloud.com>
…api-project#6792)

1. Split peer and root CA for user installed trusted certificates.
2. Add purpose for user installed certificates.
There is race condition about vm cache between pool_migrate_complete
and VM event.
In the cross-pool migration case, it is designed to create vm with
power_state Halted in XAPI db. In pool_migrate_complete, add_caches
create an empty xenops_chae for the VM, then refresh_vm compares the
cache powerstate None with its real state Running to update the
right powerstate to XAPI db.
In the fail case, it is found that:
-> VM event 1 update_vm
-> pool_migrate_complete add_caches (cache power_state None)
-> pool_migrate_complete refresh_vm
-> VM event 1 update cache (cache power_state Running)
-> VM event 2 update_vm (Running <-> Running, XAPI DB not update)
When pool_migrate_complete add_caches, the cache update of previous
VM event 1 breaks the design intention.

This commit add a wait in pool_migrate_complete to ensure all
in-flight events complete before add_caches. Then there will be no
race condition.

Signed-off-by: Changlei Li <changlei.li@cloud.com>
There is race condition about vm cache between pool_migrate_complete and
VM event.
In the cross-pool migration case, it is designed to create vm with
power_state Halted in XAPI db. In pool_migrate_complete, add_caches
create an empty xenops_chae for the VM, then refresh_vm compares the
cache powerstate None with its real state Running to update the right
powerstate to XAPI db.
In the fail case, it is found that:
-> VM event 1 update_vm
-> pool_migrate_complete add_caches (cache power_state None)
-> pool_migrate_complete refresh_vm
-> VM event 1 update cache (cache power_state Running)
-> VM event 2 update_vm (Running <-> Running, XAPI DB not update)
When pool_migrate_complete add_caches, the cache update of previous VM
event 1 breaks the design intention.

This commit add a wait in pool_migrate_complete to ensure all in-flight
events complete before add_caches. Then there will be no race condition.
Signed-off-by: Christian Lindig <christian.lindig@citrix.com>
@lindig lindig requested review from minglumlu and robhoes January 5, 2026 14:21
@lindig lindig merged commit 9a781ca into xapi-project:feature/numa9 Jan 6, 2026
16 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

9 participants