Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

flaky unittest: bridge_reconnects_successfully_after_local_connection_interrupted #3021

Open
reubenmiller opened this issue Jul 29, 2024 · 1 comment
Labels
bug Something isn't working

Comments

@reubenmiller
Copy link
Contributor

reubenmiller commented Jul 29, 2024

Describe the bug

Flaky unit tests:

Though a workaround has been added in a PR, which adds a sleep before interrupting the connection.

Fail

Below shows an example when the test fails.

    Finished `test` profile [unoptimized + debuginfo] target(s) in 0.50s
    Starting 1 test across 2 binaries (52 skipped; run ID: 22061f7c-17a5-42de-a646-e3e1ac124df8, nextest profile: default)
       START             tedge_mqtt_bridge::bridge bridge_reconnects_successfully_after_local_connection_interrupted

running 1 test
TEST: local_broker_port: 59103
TEST: cloud_broker_port: 59104
[2024-07-29T11:17:56Z INFO  tedge_mqtt_bridge::health] MQTT bridge connected to cloud broker
[2024-07-29T11:17:56Z INFO  tedge_mqtt_bridge] Bridge cloud connection "cloud" subscribing to [Filter = s/ds, Qos = AtLeastOnce]
[2024-07-29T11:17:56Z INFO  tedge_mqtt_bridge::health] MQTT bridge connected to local broker
[2024-07-29T11:17:56Z INFO  tedge_mqtt_bridge] Bridge cloud connection "local" subscribing to [Filter = c8y/s/us, Qos = AtLeastOnce]
[2024-07-29T11:17:56Z ERROR tedge_mqtt_bridge::health] MQTT bridge failed to connect to local broker: Mqtt state: Io error: Custom { kind: ConnectionAborted, error: "connection closed by peer" }
[2024-07-29T11:17:56Z ERROR tedge_mqtt_bridge::health] MQTT bridge failed to connect to local broker: I/O: connection closed by peer
[2024-07-29T11:17:56Z INFO  tedge_mqtt_bridge::health] MQTT bridge connected to local broker
[2024-07-29T11:17:56Z INFO  tedge_mqtt_bridge] Bridge cloud connection "local" subscribing to [Filter = c8y/s/us, Qos = AtLeastOnce]
thread 'bridge_reconnects_successfully_after_local_connection_interrupted' panicked at crates/extensions/tedge_mqtt_bridge/tests/bridge.rs:241:58:
called `Result::unwrap()` on an `Err` value: timed-out waiting for received message

Caused by:
    deadline has elapsed
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
test bridge_reconnects_successfully_after_local_connection_interrupted ... FAILED

failures:

failures:
    bridge_reconnects_successfully_after_local_connection_interrupted

test result: FAILED. 0 passed; 1 failed; 0 ignored; 0 measured; 3 filtered out; finished in 5.06s

        FAIL [   5.064s] tedge_mqtt_bridge::bridge bridge_reconnects_successfully_after_local_connection_interrupted
   Canceling due to test failure
------------
     Summary [   5.065s] 1 test run: 0 passed, 1 failed, 52 skipped
        FAIL [   5.064s] tedge_mqtt_bridge::bridge bridge_reconnects_successfully_after_local_connection_interrupted
error: test run failed

Pass

Below shows an example when the test passes.

    Finished `test` profile [unoptimized + debuginfo] target(s) in 0.47s
    Starting 1 test across 2 binaries (52 skipped; run ID: 8c895c93-30fb-4ea6-98c1-2dc2cd7d9299, nextest profile: default)
       START             tedge_mqtt_bridge::bridge bridge_reconnects_successfully_after_local_connection_interrupted

running 1 test
TEST: local_broker_port: 59091
TEST: cloud_broker_port: 59092
[2024-07-29T11:17:55Z INFO  tedge_mqtt_bridge::health] MQTT bridge connected to cloud broker
[2024-07-29T11:17:55Z INFO  tedge_mqtt_bridge] Bridge cloud connection "cloud" subscribing to [Filter = s/ds, Qos = AtLeastOnce]
[2024-07-29T11:17:55Z INFO  tedge_mqtt_bridge::health] MQTT bridge connected to local broker
[2024-07-29T11:17:55Z INFO  tedge_mqtt_bridge] Bridge cloud connection "local" subscribing to [Filter = c8y/s/us, Qos = AtLeastOnce]
[2024-07-29T11:17:55Z ERROR tedge_mqtt_bridge::health] MQTT bridge failed to connect to local broker: Mqtt state: Io error: Custom { kind: ConnectionAborted, error: "connection closed by peer" }
[2024-07-29T11:17:55Z ERROR tedge_mqtt_bridge::health] MQTT bridge failed to connect to local broker: I/O: Connection reset by peer (os error 54)
[2024-07-29T11:17:55Z INFO  tedge_mqtt_bridge::health] MQTT bridge connected to local broker
[2024-07-29T11:17:55Z INFO  tedge_mqtt_bridge] Bridge cloud connection "local" subscribing to [Filter = c8y/s/us, Qos = AtLeastOnce]
test bridge_reconnects_successfully_after_local_connection_interrupted ... ok

test result: ok. 1 passed; 0 failed; 0 ignored; 0 measured; 3 filtered out; finished in 0.05s

        PASS [   0.056s] tedge_mqtt_bridge::bridge bridge_reconnects_successfully_after_local_connection_interrupted
------------
     Summary [   0.056s] 1 test run: 1 passed, 52 skipped

Notes

  • Use a fixed ports does not seem to help (so that should rule out a race condition between getting a "free port" and using it)

To Reproduce

A flaky test can be reproduced by re-running the tests until it fails. Sometimes it takes more than ~30 attempts

RUNS=0; while cargo nextest run -p tedge_mqtt_bridge "bridge_reconnects_successfully_after_local_connection_interrupted" --nocapture; do RUNS=$((RUNS+1)) ; echo repeating test run $RUNS; done; echo "Failed after $RUNS runs"

Expected behavior

The test should not fail sporadically.

Screenshots

Environment (please complete the following information):

  • OS [incl. version]: MacOS and Ubuntu 22.04
  • Hardware [incl. revision]: Macbook Pro
  • System-Architecture [e.g. result of "uname -a"]: aarch64 and x86_64
  • thin-edge.io version [e.g. 0.1.0]: 1.1.2~306+gfae4a56

Additional context

@reubenmiller
Copy link
Contributor Author

#3027 introduced running the Rust tests using cargo-nextest which allows marking specific tests as flaky so they are automatically retried on failure, as this will hopefully be a reliable workaround (as the fixed sleep workaround also sporadically failed).

Below is the new console output showing the flakiness of the bridge_reconnects_successfully_after_local_connection_interrupted test:

Caused by:
    0: timed-out waiting for received message
    1: deadline has elapsed

   RETRY 2/5 [         ] tedge_mqtt_bridge::bridge bridge_reconnects_successfully_after_local_connection_interrupted
  TRY 2 PASS [   0.264s] tedge_mqtt_bridge::bridge bridge_reconnects_successfully_after_local_connection_interrupted
------------
     Summary [  31.118s] 1294 tests run: 1294 passed (1 flaky, 1 leaky), 3 skipped
   FLAKY 2/5 [   0.264s] tedge_mqtt_bridge::bridge bridge_reconnects_successfully_after_local_connection_interrupted
    Finished report saved to codecov.json

Reference CI Run:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant