Skip to content

Conversation

@pooknull
Copy link
Contributor

@pooknull pooknull commented Dec 18, 2025

K8SPSMDB-1541 Powered by Pull Request Badge

https://perconadev.atlassian.net/browse/K8SPSMDB-1541

DESCRIPTION

Problem:
It's not possible to deploy a PSA (Primary-Secondary-Arbiter) cluster. Operator returns an error:

set votes: write mongo config: replSetReconfig: (NewReplicaSetConfigurationIncompatible) Rejecting reconfig where the new config has a PSA topology and the secondary is electable, but the old config contains only one writable node.

Cause:
When the cluster is deployed, operator adds each new member one by reconcile. When secondary is added, the operator enforces a rule to keep an odd number of voting members, so it sets the secondary to votes: 0. That state leaves the replica set with only one voting member. However, to deploy PSA it's necessary to have two voting members to add an arbiter.

Solution:
Ignore the rule of odd value of voting members when deploying a PSA cluster.

CHECKLIST

Jira

  • Is the Jira ticket created and referenced properly?
  • Does the Jira ticket have the proper statuses for documentation (Needs Doc) and QA (Needs QA)?
  • Does the Jira ticket link to the proper milestone (Fix Version field)?

Tests

  • Is an E2E test/test case added for the new feature/change?
  • Are unit tests added where appropriate?
  • Are OpenShift compare files changed for E2E tests (compare/*-oc.yml)?

Config/Logging/Testability

  • Are all needed new/changed options added to default YAML files?
  • Are all needed new/changed options added to the Helm Chart?
  • Did we add proper logging messages for operator actions?
  • Did we ensure compatibility with the previous version or cluster upgrade process?
  • Does the change support oldest and newest supported MongoDB version?
  • Does the change support oldest and newest supported Kubernetes version?

Copilot AI review requested due to automatic review settings December 18, 2025 14:58
@pull-request-size pull-request-size bot added the size/S 10-29 lines label Dec 18, 2025
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes the PSA (Primary-Secondary-Arbiter) configuration handling by simplifying version-specific logic and ensuring the unsafePSA flag is properly respected when adjusting replica set voting configuration.

  • Removes redundant version comparison logic for determining unsafe PSA configuration (versions <= 1.15.0 vs >= 1.16.0)
  • Fixes the vote adjustment logic to skip vote normalization when unsafePSA is enabled
  • Removes extraneous whitespace

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.

File Description
pkg/psmdb/mongo/mongo.go Adds unsafePSA check to prevent vote normalization in unsafe PSA configurations and removes trailing whitespace
pkg/controller/perconaservermongodb/mgo.go Simplifies unsafePSA determination by removing version-based conditional logic and using only cr.Spec.Unsafe.ReplsetSize

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@pull-request-size pull-request-size bot added size/M 30-99 lines and removed size/S 10-29 lines labels Dec 18, 2025
@pooknull pooknull marked this pull request as ready for review December 18, 2025 15:24
@pooknull pooknull requested a review from hors as a code owner December 18, 2025 15:24
Copilot AI review requested due to automatic review settings December 18, 2025 15:24
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

} else {
unsafePSA = cr.Spec.Unsafe.ReplsetSize && rs.Arbiter.Enabled && rs.Arbiter.Size == 1 && !rs.NonVoting.Enabled && rs.Size == 2
}
unsafePSA := cr.Spec.Unsafe.ReplsetSize && rs.Arbiter.Enabled && rs.Arbiter.Size == 1 && !rs.NonVoting.Enabled && rs.Size == 2
Copy link

Copilot AI Dec 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change removes backward compatibility for operator versions 1.15.0 and earlier. The previous code checked the version and used cr.Spec.UnsafeConf for versions <= 1.15.0. While UnsafeConf is automatically migrated to Unsafe.ReplsetSize in the defaults code (see psmdb_defaults.go lines 115-124), this migration only happens when processing the CR. For existing clusters that were deployed with version 1.15.0 or earlier and are being upgraded, there could be a brief period where the old field hasn't been migrated yet.

Consider whether dropping support for these older versions is intentional. If it is, this should be documented in the PR description or migration guide. If backward compatibility should be maintained, the version check should be preserved to handle both UnsafeConf and Unsafe.ReplsetSize fields.

Suggested change
unsafePSA := cr.Spec.Unsafe.ReplsetSize && rs.Arbiter.Enabled && rs.Arbiter.Size == 1 && !rs.NonVoting.Enabled && rs.Size == 2
unsafeReplsetSize := cr.Spec.Unsafe.ReplsetSize || cr.Spec.UnsafeConf
unsafePSA := unsafeReplsetSize && rs.Arbiter.Enabled && rs.Arbiter.Size == 1 && !rs.NonVoting.Enabled && rs.Size == 2

Copilot uses AI. Check for mistakes.
@JNKPercona
Copy link
Collaborator

Test Name Result Time
arbiter passed 00:11:35
balancer passed 00:18:02
cross-site-sharded passed 00:18:14
custom-replset-name passed 00:10:13
custom-tls passed 00:13:48
custom-users-roles passed 00:10:47
custom-users-roles-sharded passed 00:11:33
data-at-rest-encryption passed 00:12:43
data-sharded passed 00:23:01
demand-backup passed 00:15:58
demand-backup-eks-credentials-irsa passed 00:00:07
demand-backup-fs passed 00:21:30
demand-backup-if-unhealthy passed 00:08:10
demand-backup-incremental passed 00:45:47
demand-backup-incremental-sharded passed 00:59:59
demand-backup-physical-parallel passed 00:08:08
demand-backup-physical-aws passed 00:12:08
demand-backup-physical-azure passed 00:12:15
demand-backup-physical-gcp-s3 passed 00:11:37
demand-backup-physical-gcp-native passed 00:11:25
demand-backup-physical-minio passed 00:20:18
demand-backup-physical-minio-native passed 00:20:12
demand-backup-physical-sharded-parallel passed 00:10:43
demand-backup-physical-sharded-aws passed 00:17:59
demand-backup-physical-sharded-azure passed 00:17:42
demand-backup-physical-sharded-gcp-native passed 00:17:25
demand-backup-physical-sharded-minio passed 00:17:09
demand-backup-physical-sharded-minio-native passed 00:17:34
demand-backup-sharded passed 00:24:07
expose-sharded passed 00:33:04
finalizer passed 00:10:11
ignore-labels-annotations passed 00:07:25
init-deploy passed 00:12:46
ldap passed 00:08:55
ldap-tls passed 00:12:45
limits passed 00:06:16
liveness passed 00:08:06
mongod-major-upgrade passed 00:13:23
mongod-major-upgrade-sharded passed 00:20:13
monitoring-2-0 passed 00:24:56
monitoring-pmm3 passed 00:28:18
multi-cluster-service passed 00:12:07
multi-storage passed 00:18:39
non-voting-and-hidden passed 00:16:41
one-pod passed 00:07:45
operator-self-healing-chaos passed 00:12:39
pitr passed 00:31:58
pitr-physical passed 01:03:25
pitr-sharded passed 00:22:38
pitr-to-new-cluster passed 00:24:50
pitr-physical-backup-source passed 00:55:48
preinit-updates passed 00:04:53
pvc-resize passed 00:13:07
recover-no-primary passed 00:27:29
replset-overrides passed 00:16:19
replset-remapping passed 00:08:44
replset-remapping-sharded passed 00:16:39
rs-shard-migration passed 00:13:33
scaling passed 00:10:49
scheduled-backup passed 00:17:09
security-context passed 00:06:52
self-healing-chaos passed 00:15:24
service-per-pod passed 00:18:43
serviceless-external-nodes passed 00:07:25
smart-update passed 00:08:14
split-horizon passed 00:07:54
stable-resource-version passed 00:04:51
storage passed 00:07:33
tls-issue-cert-manager passed 00:29:18
upgrade passed 00:09:47
upgrade-consistency passed 00:06:14
upgrade-consistency-sharded-tls passed 00:53:18
upgrade-sharded passed 00:19:01
upgrade-partial-backup passed 00:15:34
users passed 00:17:18
version-service passed 00:26:03
Summary Value
Tests Run 76/76
Job Duration 03:09:12
Total Test Time 22:15:47

commit: 75daeb1
image: perconalab/percona-server-mongodb-operator:PR-2153-75daeb1a

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size/M 30-99 lines

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants