Skip to content

Conversation

@pooknull
Copy link
Contributor

@pooknull pooknull commented Dec 16, 2025

K8SPSMDB-1429 Powered by Pull Request Badge

https://perconadev.atlassian.net/browse/K8SPSMDB-1429

DESCRIPTION

Problem:
A PSMDB cluster can become ready before all replicasets have been added to the shard. This can cause issues when a restore is created at the same time as the cluster.

Cause:
Previously, the operator only checked whether all mongos pods were ready. If they weren’t, it still proceeded to set the cluster state based on other conditions.

Solution:
The operator should check separately whether sharding is enabled. If it is, the operator should keep the cluster in the initializing state until all replicasets are added to the shard.

CHECKLIST

Jira

  • Is the Jira ticket created and referenced properly?
  • Does the Jira ticket have the proper statuses for documentation (Needs Doc) and QA (Needs QA)?
  • Does the Jira ticket link to the proper milestone (Fix Version field)?

Tests

  • Is an E2E test/test case added for the new feature/change?
  • Are unit tests added where appropriate?
  • Are OpenShift compare files changed for E2E tests (compare/*-oc.yml)?

Config/Logging/Testability

  • Are all needed new/changed options added to default YAML files?
  • Are all needed new/changed options added to the Helm Chart?
  • Did we add proper logging messages for operator actions?
  • Did we ensure compatibility with the previous version or cluster upgrade process?
  • Does the change support oldest and newest supported MongoDB version?
  • Does the change support oldest and newest supported Kubernetes version?

Copilot AI review requested due to automatic review settings December 16, 2025 10:26
@pull-request-size pull-request-size bot added the size/M 30-99 lines label Dec 16, 2025
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR improves the cluster status handling logic for sharded MongoDB clusters by restructuring conditional checks and refactoring the code that determines when a replica set should be added to a shard. The changes aim to make the condition evaluation more explicit and the code flow clearer.

Key Changes

  • Restructured the conditional logic for shard addition to check prerequisites (mongos size, restore status, cluster role) separately from readiness conditions
  • Extracted rsStatus variable to reduce redundant map lookups
  • Replaced manual boolean pointer creation with ptr.To() helper for cleaner code
Comments suppressed due to low confidence (1)

pkg/controller/perconaservermongodb/mgo.go:235

  • The first argument to 'errors.Wrap' is always nil.
			return api.AppStateInit, nil, errors.Wrap(err, "failed to check running restore")

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@pooknull pooknull marked this pull request as ready for review December 16, 2025 11:43
@pooknull pooknull requested a review from hors as a code owner December 16, 2025 11:43
Copilot AI review requested due to automatic review settings December 16, 2025 11:43
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 1 out of 1 changed files in this pull request and generated no new comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Contributor

@egegunes egegunes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it possible to add some unit tests for this?

@egegunes
Copy link
Contributor

is it possible to add some unit tests for this?

@pooknull since we don't have any unit tests covering status code, i also need to add some tests in my PR. you don't need to add tests right now. we can add them in another PR after merging this.

@JNKPercona
Copy link
Collaborator

Test Name Result Time
arbiter passed 00:10:38
balancer passed 00:00:00
cross-site-sharded passed 00:00:00
custom-replset-name passed 00:00:00
custom-tls passed 00:00:00
custom-users-roles passed 00:00:00
custom-users-roles-sharded passed 00:00:00
data-at-rest-encryption passed 00:00:00
data-sharded passed 00:00:00
demand-backup passed 00:00:00
demand-backup-eks-credentials-irsa passed 00:00:00
demand-backup-fs passed 00:00:00
demand-backup-if-unhealthy passed 00:00:00
demand-backup-incremental passed 00:00:00
demand-backup-incremental-sharded passed 00:00:00
demand-backup-physical-parallel passed 00:00:00
demand-backup-physical-aws passed 00:00:00
demand-backup-physical-azure passed 00:00:00
demand-backup-physical-gcp-s3 passed 00:00:00
demand-backup-physical-gcp-native passed 00:00:00
demand-backup-physical-minio passed 00:00:00
demand-backup-physical-minio-native passed 00:00:00
demand-backup-physical-sharded-parallel passed 00:00:00
demand-backup-physical-sharded-aws passed 00:00:00
demand-backup-physical-sharded-azure passed 00:00:00
demand-backup-physical-sharded-gcp-native passed 00:00:00
demand-backup-physical-sharded-minio passed 00:00:00
demand-backup-physical-sharded-minio-native passed 00:00:00
demand-backup-sharded passed 00:00:00
expose-sharded passed 00:00:00
finalizer passed 00:00:00
ignore-labels-annotations passed 00:00:00
init-deploy passed 00:00:00
ldap passed 00:00:00
ldap-tls passed 00:00:00
limits passed 00:00:00
liveness passed 00:00:00
mongod-major-upgrade passed 00:00:00
mongod-major-upgrade-sharded passed 00:00:00
monitoring-2-0 passed 00:00:00
monitoring-pmm3 passed 00:00:00
multi-cluster-service passed 00:00:00
multi-storage passed 00:00:00
non-voting-and-hidden passed 00:00:00
one-pod passed 00:00:00
operator-self-healing-chaos passed 00:00:00
pitr passed 00:00:00
pitr-physical passed 00:00:00
pitr-sharded passed 00:00:00
pitr-to-new-cluster passed 00:00:00
pitr-physical-backup-source passed 00:00:00
preinit-updates passed 00:00:00
pvc-resize passed 00:00:00
recover-no-primary passed 00:00:00
replset-overrides passed 00:00:00
rs-shard-migration passed 00:00:00
scaling passed 00:00:00
scheduled-backup passed 00:00:00
security-context passed 00:00:00
self-healing-chaos passed 00:00:00
service-per-pod passed 00:00:00
serviceless-external-nodes passed 00:00:00
smart-update passed 00:00:00
split-horizon passed 00:00:00
stable-resource-version passed 00:00:00
storage passed 00:00:00
tls-issue-cert-manager passed 00:00:00
upgrade passed 00:00:00
upgrade-consistency passed 00:00:00
upgrade-consistency-sharded-tls passed 00:00:00
upgrade-sharded passed 00:00:00
upgrade-partial-backup passed 00:00:00
users passed 00:00:00
version-service passed 00:00:00
Summary Value
Tests Run 74/74
Job Duration 00:36:57
Total Test Time 00:10:38

commit: fce4e0a
image: perconalab/percona-server-mongodb-operator:PR-2148-fce4e0a0

Copilot AI review requested due to automatic review settings December 18, 2025 20:55
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

@hors hors merged commit 3c4502d into main Dec 18, 2025
13 of 15 checks passed
@hors hors deleted the K8SPSMDB-1429 branch December 18, 2025 20:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size/M 30-99 lines

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants