Skip to content

Conversation

@maheshsal
Copy link
Contributor

Two commits

Commit 1: hmi: Add test case to trigger TOD topology switch.

This test triggers the TOD topology failover on all the chips to see OPAL
TI and panic path to make sure OS does not get stuck while going down.

This test needs following skiboot and kernel commit to pass:

skiboot:
  497734984 opal/hmi: set a flag to inform OS that TOD/TB has failed.
  ca349b836 opal/hmi: Don't retry TOD recovery if it is already in failed state.
  017da88b2 opal/hmi: Fix double unlock of hmi lock in failure path.

kernel:
  http://patchwork.ozlabs.org/patch/1051379/

Commit 2: Opal TI: Add test for OPAL TI.

Trigger manual OPAL TI by directly setting scom address provided in
device-tree node ibm,sw-xstop-fir. This is to test basic functionality of
OPAL TI under normal circumstance.

Observations:

  • On Zaius, I see the panic + reboot after HMI failure works fine. But on one of the Witherspoon I have seen hangs in ipmi_msg_sync while dumping dmesg buffer to nvram (pnv_platform_error_reboot->panic_flush_kmsg_end->kmsg_dump->pstore_dump->OPAL..calls..->ipmi_queue_msg_sync). Investigating more to understand why we don't get ipmi timeout which can get systsem out of hang..

  • On Manual OPAL TI, I see following messages:
    3.24326|secure|SecureROM valid - enabling functionality
    4.57365|IPMI: shutdown requested

    I need to try this on few another system with latest PNOR.

NOTE: The above tests verifies that system reboots successfully after panic or OPAL TI OR else test fails with appropriate error message.

Tests can be run with below option independently:
--run testcases.OpTestHMIHandling.OpalTI
--run testcases.OpTestHMIHandling.TodTopologyFailoverOpalTI
--run testcases.OpTestHMIHandling.TodTopologyFailoverPanic

This test triggers the TOD topology failover on all the chips to see OPAL
TI and panic path to make sure OS does not get stuck while going down.

This test needs following skiboot and kernel commit to pass:

skiboot:
  497734984 opal/hmi: set a flag to inform OS that TOD/TB has failed.
  ca349b836 opal/hmi: Don't retry TOD recovery if it is already in failed state.
  017da88b2 opal/hmi: Fix double unlock of hmi lock in failure path.

kernel:
  http://patchwork.ozlabs.org/patch/1051379/

Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Trigger manual OPAL TI by directly setting scom address provided in
device-tree node ibm,sw-xstop-fir. This is to test basic functionality of
OPAL TI under normal circumstance.

Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
@maheshsal
Copy link
Contributor Author

maheshsal commented Mar 22, 2019

Observations:

On Zaius, I see the panic + reboot after HMI failure works fine. But on one of the Witherspoon I
have seen hangs in ipmi_msg_sync while dumping dmesg buffer to nvram
(pnv_platform_error_reboot->panic_flush_kmsg_end->kmsg_dump->pstore_dump
->OPAL..calls..->ipmi_queue_msg_sync). Investigating more to understand why we don't get ipmi
timeout which can get systsem out of hang..

The hang mentioned above on witherspoon is now fixed by skiboot patch at http://patchwork.ozlabs.org/patch/1061289/

@hegdevasant
Copy link

Can you please rebase this PR?

-Vasant

@PraveenPenguin PraveenPenguin force-pushed the master branch 2 times, most recently from 4d0cb14 to b976629 Compare October 6, 2023 07:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants