Skip to content

[Bug] Race condition in GatewayConnectionManager #405

@Puff-Machine

Description

@Puff-Machine

Pre-submission checklist

  • I checked that no other Bug Report describing my problem exists.
  • I am running the latest stable or prerelease firmware.
  • My board is on the list of maintained boards.
  • I accept that this issue may be closed if any of the above are found to be untrue.

Board

Wemos Lolin S3

Firmware version

develop @ 96c373f

Flashing method

pio run -e Wemos-Lolin-S3 -t upload

Describe what happened as precisely as possible.

evh_wiFiDisconnected can set s_wsClient = nullptr while GatewayConnectionManager::Update is running presumably on the other core, with the "correct" timing this can cause the following panic:

[1482155][D][GatewayConnectionManager.cpp:291] Update(): [GatewayConnectionManager] Successfully verified auth token
[1482166][D][GatewayClient.cpp:23] GatewayClient(): [GatewayClient] Creating GatewayClient
[1484084][D][HTTPRequestManager.cpp:272] _readStreamDataChunked(): [HTTPRequestManager] Chunk parsed: 4 81
[1484093][D][HTTPRequestManager.cpp:272] _readStreamDataChunked(): [HTTPRequestManager] Chunk parsed: 3 0
[1484105][I][GatewayConnectionManager.cpp:265] StartConnectingToLCG(): [GatewayConnectionManager] Connecting to LCG endpoint { host: 'de2-gateway.openshock.app', port: 443, path: '/2/ws/hub' } in country DE
[1484123][W][GatewayClient.cpp:63] connect(): [GatewayClient] WEBSOCKET CONNECTION BY RFC DEFINITION IS INSECURE, remote endpoint can not be verified due to lack of CA verification support, theoretically this is a security risk and allows for MITM attacks, but the realistic risk is low
[1509096][D][OtaUpdateManager.cpp:266] otaum_upda[1509097][W][WiFiGeneric.cpp:1062] _eventCallback(): Reason: 200 - BEACON_TIMEOUT
[1509108][I][WiFiManager.cppquestManager.cpp:272] _readStreaSerial read error
:253] _evWiFiDisconnected(): [WiFiManager] Disconnected from network REDACTED
[1509120][D][GatewayClient.cpp:37] ~GatewayClient(): [GatewayClient] Destroying GatewayClient
[1509129][D][GatewayConnectionManager.cpp:56] evh_wiFiDisconnected(): [GatewayConnectionManager] Lost IP address
tetask(): [OtaUpdateManager] WiFi disconnected
[1509141][E][ssl_client.cpp:37] _handle_error(): [start_ssl_client():273]: (-28928) SSL - Bad input parameters to function
[1509154][E][WiFiClientSecure.cpp:144] connect(): start_ssl_client: -28928
Guru Meditation Error: Core  1 panic'ed (InstrFetchProhibited). Exception was unhandled.

Core  1 register dump:
PC      : 0x00000000  PS      : 0x00060e30  A0      : 0x820e9cac  A1      : 0x3fcb2a50
A2      : 0x3fcea9d4  A3      : 0x00000000  A4      : 0x000001bb  A5      : 0x3fceaae8
A6      : 0x00000000  A7      : 0x00170722  A8      : 0x820e9c40  A9      : 0x3fcb29d0
A10     : 0x3fcea9d4  A11     : 0x00000004  A12     : 0x3c13d208  A13     : 0x00000090
A14     : 0x3c13d258  A15     : 0xffff8f00  SAR     : 0x00000019  EXCCAUSE: 0x00000014
EXCVADDR: 0x00000000  LBEG    : 0x400556d5  LEND    : 0x400556e5  LCOUNT  : 0xfffffffd


Backtrace: 0xfffffffd:0x3fcb2a50 0x420e9ca9:0x3fcb2a90 0x420e9d76:0x3fcb2ad0 0x42106ca1:0x3fcb2af0 0x4205932a:0x3fcb2b10 0x42004a76:0x3fcb2b30 0x42005d52:0x3fcb2b50 0x42022343:0x3fcb2bc0




ELF file SHA256: 2398a115a71d0577
$ ~/.platformio/packages/toolchain-xtensa-esp32/bin/xtensa-esp32-elf-addr2line -pfiaC -e .pio/build/Wemos-Lolin-S3/app.elf 0x420e9ca9 0x420e9d76 0x42106ca1 0x4205932a 0x42004a76 0x42005d52 0x42022343
0x420e9ca9: WiFiClientSecure::connect(char const*, unsigned short, char const*, char const*, char const*) at /home/user/.platformio/packages/framework-arduinoespressif32/libraries/WiFiClientSecure/src/WiFiClientSecure.cpp:136
0x420e9d76: WiFiClientSecure::connect(char const*, unsigned short) at /home/user/.platformio/packages/framework-arduinoespressif32/libraries/WiFiClientSecure/src/WiFiClientSecure.cpp:117
0x42106ca1: WiFiClientSecure::connect(char const*, unsigned short, int) at /home/user/.platformio/packages/framework-arduinoespressif32/libraries/WiFiClientSecure/src/WiFiClientSecure.cpp:122
0x4205932a: WebSocketsClient::loop() at /home/user/Documents/Firmware/.pio/libdeps/Wemos-Lolin-S3/WebSockets/src/WebSocketsClient.cpp:317
0x42004a76: OpenShock::GatewayClient::loop() at /home/user/Documents/Firmware/src/GatewayClient.cpp:99
0x42005d52: OpenShock::GatewayConnectionManager::Update() at /home/user/Documents/Firmware/src/GatewayConnectionManager.cpp:296
0x42022343: main_app(void*) at /home/user/Documents/Firmware/src/main.cpp:117 (discriminator 1)

Describe what you expected to happen instead.

The firmware not to panic

In as much detail as possible, describe the exact steps you took to make the problem appear.

Have an unstable connection and get (un?)lucky

Other remarks

No response

Metadata

Metadata

Assignees

Labels

status: triageAwaiting investigation by maintainerstype: bugSomething isn't working

Type

Projects

Status

Todo

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions