Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

benchmark_udp: hammering with low interval causes issues #16808

Open
kaspar030 opened this issue Sep 3, 2021 · 8 comments · Fixed by #16831
Open

benchmark_udp: hammering with low interval causes issues #16808

kaspar030 opened this issue Sep 3, 2021 · 8 comments · Fixed by #16831
Labels
Type: bug The issue reports a bug / The PR fixes a bug (including spelling errors)

Comments

@kaspar030
Copy link
Contributor

kaspar030 commented Sep 3, 2021

Description

I'm trying benchmark_udp on an nrf52840dk over usbus_cdc_ecm.
I'm seeing some reliability issues with high load.

With down to -i 30, everything is fine.
From -i 29, the shell becomes unresponsive, but the node still pings and I can run more benchmark_udp (restart host tool).
From -i 14, the node seems to freeze. Fixed with #16831.

Steps to reproduce the issue

compile tests/usbus_cdc_ecm with benchmark_udp:

BOARD=nrf52840dk USEMODULE="benchmark_udp" make -Ctests/usbus_cdc_ecm clean all flash -j8

in riots shell:

bench_start <host-link-local-addr> 12345,

then on host:

./benchmark_server -i 14 :: 12345

Expected results

Shell and node stay responsive.

Actual results

Shell freeze or (seemingly) full node freeze.

Versions

Compiled on arch with gcc version 11.2.0.

@kaspar030 kaspar030 changed the title cpu/nrf52840: hammering with benchmark_udp causes issues benchmark_udp: hammering with low interval causes issues Sep 6, 2021
@kaspar030
Copy link
Contributor Author

I'm seeing similar issues on samr21-xpro. I think bench_udp sets too small delays at some point, causing the r/x thread to always return from sock_recv() with ETIMEOUT. But I'm not sure.

@kfessel
Copy link
Contributor

kfessel commented Sep 7, 2021

maybe this is a misbehavior of xtimer_set in gnrc_sock_recv (or it is holding it wrong), if the timeout is short it might be smaller than XTIMER_BACKOFF this will lead to xtimer_spin which in turn will not give way for the recv to proccede and will therfor fire before any further processing could be done in gnrc_sock_recv

@kfessel
Copy link
Contributor

kfessel commented Sep 9, 2021

@kaspar030: maybe you can retest with #16831

@kaspar030
Copy link
Contributor Author

@kaspar030: maybe you can retest with #16831

That PR fixes the second issue (seemingly freezing node)! Thanks.

The shell stays unresponsive, though.
@benpicco probably the _listen_thread is still calling sock_udp_recv() too fast. Is the timeout actually necessary for anything else but evaluating the "running" variable? Maybe something fixed (like, 100ms) would do fine?

@kfessel
Copy link
Contributor

kfessel commented Sep 9, 2021

i think there is still part of this open,
but it might be that the rest is just not enough priority for the shell thread

@kfessel kfessel reopened this Sep 9, 2021
@kfessel
Copy link
Contributor

kfessel commented Sep 10, 2021

the rest seems to be a priority + xtimer_usleep(delay_us) problem in _send_thread in sys/test_utils/benchmark_udp/benchmark_udp.c:137

@kfessel
Copy link
Contributor

kfessel commented Sep 10, 2021

If the delay is less than XTIMER_BACKOFF the _send_thread through xtimer_delay -> will spin and take every cpu cycle from the lower priority shell

Decreasing the priority of the send_thread lower than shell will (at least on native) seem to let the shell take all the cycles as getchar does not switch the thread and is blocking RIOT #16834

@stale
Copy link

stale bot commented Apr 17, 2022

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. If you want me to ignore this issue, please mark it with the "State: don't stale" label. Thank you for your contributions.

@stale stale bot added the State: stale State: The issue / PR has no activity for >185 days label Apr 17, 2022
@maribu maribu added the Type: bug The issue reports a bug / The PR fixes a bug (including spelling errors) label May 22, 2023
@stale stale bot removed the State: stale State: The issue / PR has no activity for >185 days label May 22, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Type: bug The issue reports a bug / The PR fixes a bug (including spelling errors)
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants