Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix panicking after starting UDP server due to closed "halt" channel #595

Conversation

josecelano
Copy link
Member

@josecelano josecelano commented Jan 11, 2024

Relates to: #594

Context

We use a oneshot channel between the main app that launches the UDP server and the running UDP server. It could be used to send the "halt" message to the UDP server. When the app starts the server this server waits for that signal.

That code is not working because when the server starts waiting the channel is closed. That does not affect the server that is still running. And currently, we are not using that signal in production (only for test environments).

Details

The "halt" channel is immediately closed after starting the UDP tracker. So the app panics with:

thread 'tokio-runtime-worker' panicked at src/servers/signals.rs:52:25:
Failed to install stop signal: channel closed

The UDP is started but it's not possible to shut it down gracefully.

I've added a lot of debug lines in this PR:

$ cargo run
   Compiling torrust-tracker v3.0.0-alpha.12-develop (/home/josecelano/Documents/git/committer/me/github/torrust/torrust-tracker)
    Finished dev [optimized + debuginfo] target(s) in 7.05s
     Running `target/debug/torrust-tracker`
Loading default configuration file: `./share/default/config/tracker.development.sqlite3.toml` ...
2024-01-10T17:46:40.348930411+00:00 [torrust_tracker::bootstrap::logging][INFO] logging initialized.
2024-01-10T17:46:40.349651636+00:00 [UDP Tracker][DEBUG] Launcher starting ...
2024-01-10T17:46:40.349677556+00:00 [UDP Tracker][INFO] Starting on: udp://0.0.0.0:6969
2024-01-10T17:46:40.349682046+00:00 [UDP Tracker][INFO] Started on: udp://0.0.0.0:6969
2024-01-10T17:46:40.349684556+00:00 [UDP Tracker][DEBUG] Launcher finished ...
2024-01-10T17:46:40.349687856+00:00 [UDP Tracker][DEBUG] Waiting for packets ...
2024-01-10T17:46:40.349687736+00:00 [torrust_tracker::bootstrap::jobs::http_tracker][INFO] Note: Not loading Http Tracker Service, Not Enabled in Configuration.
2024-01-10T17:46:40.349691306+00:00 [UDP Tracker][DEBUG] Wait for launcher (UDP service) to finish ...
2024-01-10T17:46:40.349689536+00:00 [UDP Tracker][DEBUG] Server on socket address: udp://0.0.0.0:6969 waiting for halt signal ...
2024-01-10T17:46:40.349700506+00:00 [torrust_tracker::bootstrap::jobs::health_check_api][INFO] Starting Health Check API server: http://127.0.0.1:1313
thread 'tokio-runtime-worker' panicked at src/servers/signals.rs:52:25:
Failed to install stop signal: channel closed
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
2024-01-10T17:46:40.349793665+00:00 [torrust_tracker::bootstrap::jobs::health_check_api][INFO] Torrust Health Check API server started on: http://127.0.0.1:1313
2024-01-10T17:46:40.349793225+00:00 [UDP Tracker][DEBUG] Halt signal spawned task stopped on address: udp://0.0.0.0:6969

The function panicking is:

pub async fn shutdown_signal(rx_halt: tokio::sync::oneshot::Receiver<Halted>) {
    let halt = async {
        match rx_halt.await {
            Ok(signal) => signal,
            Err(err) => panic!("Failed to install stop signal: {err}"),
        }
    };

    tokio::select! {
        signal = halt => { info!("Halt signal processed: {}", signal) },
        () = global_shutdown_signal() => { info!("Global shutdown signal processed") }
    }
}

When the thread starts waiting for the signal it receives the error "channel closed".

I fixed a similar problem for the Tracker API and the HTTP Tracker:

Apparently the problem was the sender in the channel was dropped (I don't know why, it looks like a compiler optimization).

In the end, I think the problem was we were not waiting for the right spawned task. I've reorganized the code but the change that fixed the problem is:

        let task = tokio::spawn(async move {
            debug!(target: "UDP Tracker", "Launcher starting ...");

            let starting = launcher.start(tracker, tx_start, rx_halt).await;

            starting.await.expect("UDP server should have started running");

            launcher
        });

We needed to wait for the launcher to finish because in this case, the launcher is not the thread that runs the final server. So we were only waiting until the launcher was "launched" but not for the final task running the UDP server. That is the reason why the channel was closed prematurely.

I have also normalized the log output:

$ cargo run
    Finished dev [optimized + debuginfo] target(s) in 0.07s
     Running `target/debug/torrust-tracker`
Loading default configuration file: `./share/default/config/tracker.development.sqlite3.toml` ...
2024-01-11T17:12:11.134816964+00:00 [torrust_tracker::bootstrap::logging][INFO] logging initialized.
2024-01-11T17:12:11.135473883+00:00 [UDP Tracker][INFO] Starting on: udp://0.0.0.0:6969
2024-01-11T17:12:11.135494422+00:00 [UDP Tracker][INFO] Started on: udp://0.0.0.0:6969
2024-01-11T17:12:11.135503672+00:00 [torrust_tracker::bootstrap::jobs][INFO] TLS not enabled
2024-01-11T17:12:11.135587738+00:00 [HTTP Tracker][INFO] Starting on: http://0.0.0.0:7070
2024-01-11T17:12:11.135612497+00:00 [HTTP Tracker][INFO] Started on: http://0.0.0.0:7070
2024-01-11T17:12:11.135619586+00:00 [torrust_tracker::bootstrap::jobs][INFO] TLS not enabled
2024-01-11T17:12:11.135675454+00:00 [API][INFO] Starting on http://127.0.0.1:1212
2024-01-11T17:12:11.135688443+00:00 [API][INFO] Started on http://127.0.0.1:1212
2024-01-11T17:12:11.135701143+00:00 [Health Check API][INFO] Starting on: http://127.0.0.1:1313
2024-01-11T17:12:11.135718012+00:00 [Health Check API][INFO] Started on: http://127.0.0.1:1313
^C2024-01-11T17:12:17.463967431+00:00 [torrust_tracker][INFO] Torrust shutting down..
2024-01-11T17:12:17.463960892+00:00 [torrust_tracker::servers::signals][INFO] Global shutdown signal processed
2024-01-11T17:12:17.464011399+00:00 [torrust_tracker::servers::signals][INFO] Shutting down HTTP server on socket address: 0.0.0.0:7070
2024-01-11T17:12:17.464015379+00:00 [torrust_tracker::servers::signals][INFO] Sending graceful shutdown signal
!! shuting down in 90 seconds !!
2024-01-11T17:12:17.464035618+00:00 [torrust_tracker::servers::signals][INFO] Global shutdown signal processed
2024-01-11T17:12:17.464055757+00:00 [torrust_tracker::servers::health_check_api::server][INFO] Stopping Torrust Health Check API server o http://127.0.0.1:1313 ...
2024-01-11T17:12:17.464064287+00:00 [torrust_tracker::servers::signals][INFO] Global shutdown signal processed
2024-01-11T17:12:17.464088366+00:00 [torrust_tracker::servers::signals][INFO] Shutting down UDP server on socket address: udp://0.0.0.0:6969
2024-01-11T17:12:17.464067327+00:00 [torrust_tracker::servers::signals][INFO] Shutting down tracker API server on socket address: 127.0.0.1:1212
2024-01-11T17:12:17.464097095+00:00 [torrust_tracker::servers::signals][INFO] Sending graceful shutdown signal
!! shuting down in 90 seconds !!
2024-01-11T17:12:17.464105665+00:00 [Health Check API][INFO] Stopped server running on: http://127.0.0.1:1313
2024-01-11T17:12:17.464093226+00:00 [torrust_tracker::bootstrap::jobs::torrent_cleanup][INFO] Stopping torrent cleanup job..
2024-01-11T17:12:17.464154493+00:00 [torrust_tracker][INFO] Torrust successfully shutdown.

I'm using targets for some reasons:

  1. Easily identify service in logs
  2. Easily filter log by service
  3. Autodiscover services from logs

Copy link

codecov bot commented Jan 11, 2024

Codecov Report

Attention: 20 lines in your changes are missing coverage. Please review.

Comparison is base (49c961c) 80.64% compared to head (5fd0c84) 80.57%.

Files Patch % Lines
src/servers/udp/server.rs 69.23% 12 Missing ⚠️
src/bootstrap/jobs/udp_tracker.rs 0.00% 5 Missing ⚠️
src/bootstrap/jobs/health_check_api.rs 0.00% 3 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff             @@
##           develop     #595      +/-   ##
===========================================
- Coverage    80.64%   80.57%   -0.08%     
===========================================
  Files          117      117              
  Lines         7831     7854      +23     
===========================================
+ Hits          6315     6328      +13     
- Misses        1516     1526      +10     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Added targets to all services especially when they start: [HTTP
Tracker], [UDP Tracker], etc.

```
Loading default configuration file: `./share/default/config/tracker.development.sqlite3.toml` ...
2024-01-11T17:12:11.134816964+00:00 [torrust_tracker::bootstrap::logging][INFO] logging initialized.
2024-01-11T17:12:11.135473883+00:00 [UDP Tracker][INFO] Starting on: udp://0.0.0.0:6969
2024-01-11T17:12:11.135494422+00:00 [UDP Tracker][INFO] Started on: udp://0.0.0.0:6969
2024-01-11T17:12:11.135503672+00:00 [torrust_tracker::bootstrap::jobs][INFO] TLS not enabled
2024-01-11T17:12:11.135587738+00:00 [HTTP Tracker][INFO] Starting on: http://0.0.0.0:7070
2024-01-11T17:12:11.135612497+00:00 [HTTP Tracker][INFO] Started on: http://0.0.0.0:7070
2024-01-11T17:12:11.135619586+00:00 [torrust_tracker::bootstrap::jobs][INFO] TLS not enabled
2024-01-11T17:12:11.135675454+00:00 [API][INFO] Starting on http://127.0.0.1:1212
2024-01-11T17:12:11.135688443+00:00 [API][INFO] Started on http://127.0.0.1:1212
2024-01-11T17:12:11.135701143+00:00 [Health Check API][INFO] Starting on: http://127.0.0.1:1313
2024-01-11T17:12:11.135718012+00:00 [Health Check API][INFO] Started on: http://127.0.0.1:1313
```
@josecelano
Copy link
Member Author

ACK 5fd0c84

@josecelano josecelano marked this pull request as ready for review January 11, 2024 17:19
@josecelano josecelano merged commit 3dd1402 into torrust:develop Jan 11, 2024
11 of 12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

Error starting UDP tracker: Failed to install stop signal: channel closed
1 participant