-
Notifications
You must be signed in to change notification settings - Fork 442
Enrich failure handling #1065
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Enrich failure handling #1065
Conversation
Signed-off-by: wen.rui <wen.rui@daocloud.io>
38c2ac4 to
8aa69f2
Compare
Codecov ReportAll modified and coverable lines are covered by tests ✅
Flags with carried forward coverage won't be shown. Click here to find out more. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR enhances test failure diagnostics by adding utilities to fetch and log pod details across namespaces, adjusts the pod‐running wait interval for GPU workloads, and integrates detailed pod checks after any test failure.
- Increased the polling interval in
WaitForPodRunningfrom 5s to 30s. - Introduced
GetNamespaceList,GetPodLogs, andCheckPodDetailsintest/utils/pod.go. - Updated
AfterEachintest/e2e/pod/test_pod.goto callCheckPodDetailson failures and removed a debugfmt.Printf.
Reviewed Changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.
| File | Description |
|---|---|
| test/utils/pod.go | Added new failure-handling helpers, updated polling interval, and imported I/O packages. |
| test/e2e/pod/test_pod.go | Call CheckPodDetails on test failures and remove leftover fmt.Printf debug statement. |
Comments suppressed due to low confidence (1)
test/utils/pod.go:96
- Use the passed-in context
ctxinstead ofcontext.TODO()to allow cancellation and deadlines to propagate correctly.
pod, err := clientSet.CoreV1().Pods(namespace).Get(context.TODO(), podName, metav1.GetOptions{})
| events, err := GetPodEvents(clientSet, ns, pod.Name) | ||
| if err != nil { | ||
| klog.Errorf("Failed to get events for %s/%s: %v", ns, pod.Name, err) | ||
| return |
Copilot
AI
May 21, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Returning here stops logging details for other pods. Consider using continue to proceed to the next pod and log all failures.
| return | |
| continue |
| logs, err := GetPodLogs(clientSet, ns, pod.Name) | ||
| if err != nil { | ||
| klog.Errorf("Failed to get logs for %s/%s: %v", ns, pod.Name, err) | ||
| return |
Copilot
AI
May 21, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As with events, use continue instead of return so that other pods are still checked and logged.
| return | |
| continue |
| } | ||
|
|
||
| klog.Infof("Show logs for %s/%s:", ns, pod.Name) | ||
| klog.Infof(logs) |
Copilot
AI
May 21, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nitpick] Passing raw logs to Infof can misinterpret formatting verbs—use klog.Info(logs) or klog.Infof("%s", logs) instead.
| klog.Infof(logs) | |
| klog.Infof("%s", logs) |
| return false, nil | ||
| }) | ||
| } | ||
|
|
Copilot
AI
May 21, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nitpick] Add a doc comment to describe the purpose and behavior of this public function for better maintainability.
| // GetNamespaceList retrieves a list of all namespaces in the Kubernetes cluster. | |
| // It takes a Kubernetes clientset as input and returns a slice of namespace names | |
| // or an error if the operation fails. |
What type of PR is this?
/kind feature
What this PR does / why we need it:
Improving failure handling for test.
Which issue(s) this PR fixes:
Fixes #
Special notes for your reviewer:
Error logs:
Does this PR introduce a user-facing change?: