Skip to content

Commit

Permalink
Merge #395: Implement health check for the Index container
Browse files Browse the repository at this point in the history
0e8b32b ci: [#254] add container healthcheck for index service (Jose Celano)

Pull request description:

  [After discussing how to implement a proper container HEALTHCHECK for the Index ](#254) this final implementation:

  - Doesn't use `curl`, `wget` or other system dependency to [avoid maintaining that dependency and security attacks](https://blog.sixeyed.com/docker-healthchecks-why-not-to-use-curl-or-iwr/). It uses a new binary that you can use like this: `cargo run --bin health_check http://localhost:PORT/health_check`.
  - Adds an API to the tracker statistics importer so that other processes can communicate with the cronjob.
  - Exposes a `/health_check` endpoint in both APIs: the Index API and the Importer API (console cronjob).
  - The endpoint for the Index core API is public but the one for the cronjob it runs only on the localhost.

  You can run the `health_check` from inside the container with the following commands:

  For the API:

  ```console
  / # /usr/bin/health_check http://localhost:3001/health_check
  Health check ...
  STATUS: 200 OK
  ```

  For the Importer:

  ```console
  / # /usr/bin/health_check http://localhost:3002/health_check
  Health check ...
  STATUS: 200 OK
  ```

  ### NOTES

  - The Index application has two services: the rest API and the tracker statistic importer. The healthcheck should check both. The API service should not know the "importer".
  - For the time being, we do not check the services properly. We only ensure that they are running. In the future, we can add more checks to the `health_check` handlers like checking database connection, SMTP server, connection to the tracker, etcetera. However, I don't know if we should check those things here. This healthcheck is only to ensure the container's processes are up and running. Maybe it's not the right place to check its dependencies. Those dependencies can fail at runtime. Maybe a better approach would be to monitor each service independently.
  - The new `/health_check` endpoint could be used for monitoring.
  - Right now, It's only used to wait for the container to be `healthy` after `docker compose up`.

ACKs for top commit:
  josecelano:
    ACK 0e8b32b

Tree-SHA512: cde75ae0f9901896e55db660a782b3b2d6030e5c09a9384dfadb5f8dee0ed650db7c842ce1e152b400b2ae8cc87047840d20e00facd700538d320c5bfa217d6c
  • Loading branch information
josecelano committed Nov 20, 2023
2 parents 04b70ba + 0e8b32b commit fa3bba7
Show file tree
Hide file tree
Showing 16 changed files with 220 additions and 34 deletions.
9 changes: 7 additions & 2 deletions Containerfile
Original file line number Diff line number Diff line change
Expand Up @@ -85,7 +85,9 @@ COPY --from=build \
RUN cargo nextest run --workspace-remap /test/src/ --extract-to /test/src/ --no-run --archive-file /test/torrust-index.tar.zst
RUN cargo nextest run --workspace-remap /test/src/ --target-dir-remap /test/src/target/ --cargo-metadata /test/src/target/nextest/cargo-metadata.json --binaries-metadata /test/src/target/nextest/binaries-metadata.json

RUN mkdir -p /app/bin/; cp -l /test/src/target/release/torrust-index /app/bin/torrust-index
RUN mkdir -p /app/bin/; \
cp -l /test/src/target/release/torrust-index /app/bin/torrust-index; \
cp -l /test/src/target/release/health_check /app/bin/health_check;
# RUN mkdir -p /app/lib/; cp -l $(realpath $(ldd /app/bin/torrust-index | grep "libz\.so\.1" | awk '{print $3}')) /app/lib/libz.so.1
RUN chown -R root:root /app; chmod -R u=rw,go=r,a+X /app; chmod -R a+x /app/bin

Expand All @@ -99,11 +101,13 @@ ARG TORRUST_INDEX_PATH_CONFIG="/etc/torrust/index/index.toml"
ARG TORRUST_INDEX_DATABASE_DRIVER="sqlite3"
ARG USER_ID=1000
ARG API_PORT=3001
ARG IMPORTER_API_PORT=3002

ENV TORRUST_INDEX_PATH_CONFIG=${TORRUST_INDEX_PATH_CONFIG}
ENV TORRUST_INDEX_DATABASE_DRIVER=${TORRUST_INDEX_DATABASE_DRIVER}
ENV USER_ID=${USER_ID}
ENV API_PORT=${API_PORT}
ENV IMPORTER_API_PORT=${IMPORTER_API_PORT}
ENV TZ=Etc/UTC

EXPOSE ${API_PORT}/tcp
Expand All @@ -130,5 +134,6 @@ CMD ["sh"]
FROM runtime as release
ENV RUNTIME="release"
COPY --from=test /app/ /usr/
# HEALTHCHECK CMD ["/usr/bin/wget", "--no-verbose", "--tries=1", "--spider", "localhost:${API_PORT}/version"]
HEALTHCHECK --interval=5s --timeout=5s --start-period=3s --retries=3 \
CMD /usr/bin/health_check http://localhost:${API_PORT}/health_check && /usr/bin/health_check http://localhost:${IMPORTER_API_PORT}/health_check || exit 1
CMD ["/usr/bin/torrust-index"]
8 changes: 4 additions & 4 deletions contrib/dev-tools/container/e2e/mysql/run-e2e-tests.sh
Original file line number Diff line number Diff line change
Expand Up @@ -25,10 +25,10 @@ echo "Running E2E tests using MySQL ..."
./contrib/dev-tools/container/e2e/mysql/e2e-env-up.sh || exit 1

# Wait for conatiners to be healthy
./contrib/dev-tools/container/functions/wait_for_container_to_be_healthy.sh torrust-mysql-1 10 3
# todo: implement healthchecks for tracker and index and wait until they are healthy
#wait_for_container torrust-tracker-1 10 3
#wait_for_container torrust-idx-back-1 10 3
./contrib/dev-tools/container/functions/wait_for_container_to_be_healthy.sh torrust-mysql-1 10 3 || exit 1
# todo: implement healthchecks for the tracker and wait until it's healthy
#./contrib/dev-tools/container/functions/wait_for_container_to_be_healthy.sh torrust-tracker-1 10 3
./contrib/dev-tools/container/functions/wait_for_container_to_be_healthy.sh torrust-index-1 10 3 || exit 1
sleep 20s

# Just to make sure that everything is up and running
Expand Down
8 changes: 4 additions & 4 deletions contrib/dev-tools/container/e2e/sqlite/run-e2e-tests.sh
Original file line number Diff line number Diff line change
Expand Up @@ -26,10 +26,10 @@ echo "Running E2E tests using SQLite ..."
./contrib/dev-tools/container/e2e/sqlite/e2e-env-up.sh || exit 1

# Wait for conatiners to be healthy
./contrib/dev-tools/container/functions/wait_for_container_to_be_healthy.sh torrust-mysql-1 10 3
# todo: implement healthchecks for tracker and index and wait until they are healthy
#wait_for_container torrust-tracker-1 10 3
#wait_for_container torrust-idx-back-1 10 3
./contrib/dev-tools/container/functions/wait_for_container_to_be_healthy.sh torrust-mysql-1 10 3 || exit 1
# todo: implement healthchecks for the tracker and wait until it's healthy
#./contrib/dev-tools/container/functions/wait_for_container_to_be_healthy.sh torrust-tracker-1 10 3
./contrib/dev-tools/container/functions/wait_for_container_to_be_healthy.sh torrust-index-1 10 3 || exit 1
sleep 20s

# Just to make sure that everything is up and running
Expand Down
1 change: 1 addition & 0 deletions share/default/config/index.container.mysql.toml
Original file line number Diff line number Diff line change
Expand Up @@ -48,3 +48,4 @@ max_torrent_page_size = 30

[tracker_statistics_importer]
torrent_info_update_interval = 3600
port = 3002
1 change: 1 addition & 0 deletions share/default/config/index.container.sqlite3.toml
Original file line number Diff line number Diff line change
Expand Up @@ -48,3 +48,4 @@ max_torrent_page_size = 30

[tracker_statistics_importer]
torrent_info_update_interval = 3600
port = 3002
1 change: 1 addition & 0 deletions share/default/config/index.development.sqlite3.toml
Original file line number Diff line number Diff line change
Expand Up @@ -44,3 +44,4 @@ max_torrent_page_size = 30

[tracker_statistics_importer]
torrent_info_update_interval = 3600
port = 3002
37 changes: 15 additions & 22 deletions src/app.rs
Original file line number Diff line number Diff line change
Expand Up @@ -19,8 +19,8 @@ use crate::services::user::{self, DbBannedUserList, DbUserProfileRepository, DbU
use crate::services::{proxy, settings, torrent};
use crate::tracker::statistics_importer::StatisticsImporter;
use crate::web::api::v1::auth::Authentication;
use crate::web::api::{start, Version};
use crate::{mailer, tracker};
use crate::web::api::Version;
use crate::{console, mailer, tracker, web};

pub struct Running {
pub api_socket_addr: SocketAddr,
Expand All @@ -46,8 +46,12 @@ pub async fn run(configuration: Configuration, api_version: &Version) -> Running

let settings = configuration.settings.read().await;

// From [database] config
let database_connect_url = settings.database.connect_url.clone();
let torrent_info_update_interval = settings.tracker_statistics_importer.torrent_info_update_interval;
// From [importer] config
let importer_torrent_info_update_interval = settings.tracker_statistics_importer.torrent_info_update_interval;
let importer_port = settings.tracker_statistics_importer.port;
// From [net] config
let net_ip = "0.0.0.0".to_string();
let net_port = settings.net.port;

Expand Down Expand Up @@ -153,29 +157,18 @@ pub async fn run(configuration: Configuration, api_version: &Version) -> Running
ban_service,
));

// Start repeating task to import tracker torrent data and updating
// Start cronjob to import tracker torrent data and updating
// seeders and leechers info.

let weak_tracker_statistics_importer = Arc::downgrade(&tracker_statistics_importer);

let tracker_statistics_importer_handle = tokio::spawn(async move {
let interval = std::time::Duration::from_secs(torrent_info_update_interval);
let mut interval = tokio::time::interval(interval);
interval.tick().await; // first tick is immediate...
loop {
interval.tick().await;
if let Some(tracker) = weak_tracker_statistics_importer.upgrade() {
drop(tracker.import_all_torrents_statistics().await);
} else {
break;
}
}
});
let tracker_statistics_importer_handle = console::tracker_statistics_importer::start(
importer_port,
importer_torrent_info_update_interval,
&tracker_statistics_importer,
);

// Start API server
let running_api = web::api::start(app_data, &net_ip, net_port, api_version).await;

let running_api = start(app_data, &net_ip, net_port, api_version).await;

// Full running application
Running {
api_socket_addr: running_api.socket_addr,
api_server: running_api.api_server,
Expand Down
37 changes: 37 additions & 0 deletions src/bin/health_check.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
//! Minimal `curl` or `wget` to be used for container health checks.
//!
//! It's convenient to avoid using third-party libraries because:
//!
//! - They are harder to maintain.
//! - They introduce new attack vectors.
use std::{env, process};

#[tokio::main]
async fn main() {
let args: Vec<String> = env::args().collect();
if args.len() != 2 {
eprintln!("Usage: cargo run --bin health_check <HEALTH_URL>");
eprintln!("Example: cargo run --bin health_check http://localhost:3002/health_check");
std::process::exit(1);
}

println!("Health check ...");

let url = &args[1].clone();

match reqwest::get(url).await {
Ok(response) => {
if response.status().is_success() {
println!("STATUS: {}", response.status());
process::exit(0);
} else {
println!("Non-success status received.");
process::exit(1);
}
}
Err(err) => {
println!("ERROR: {err}");
process::exit(1);
}
}
}
3 changes: 3 additions & 0 deletions src/config.rs
Original file line number Diff line number Diff line change
Expand Up @@ -334,12 +334,15 @@ impl Default for Api {
pub struct TrackerStatisticsImporter {
/// The interval in seconds to get statistics from the tracker.
pub torrent_info_update_interval: u64,
/// The port the Importer API is listening on. Default to `3002`.
pub port: u16,
}

impl Default for TrackerStatisticsImporter {
fn default() -> Self {
Self {
torrent_info_update_interval: 3600,
port: 3002,
}
}
}
Expand Down
1 change: 1 addition & 0 deletions src/console/mod.rs
Original file line number Diff line number Diff line change
@@ -1 +1,2 @@
pub mod commands;
pub(crate) mod tracker_statistics_importer;
130 changes: 130 additions & 0 deletions src/console/tracker_statistics_importer.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,130 @@
//! Cronjob to import tracker torrent data and updating seeders and leechers
//! info.
//!
//! It has two services:
//!
//! - The importer which is the cronjob executed at regular intervals.
//! - The importer API.
//!
//! The cronjob sends a heartbeat signal to the API each time it is executed.
//! The last heartbeat signal time is used to determine whether the cronjob was
//! executed successfully or not. The API has a `health_check` endpoint which is
//! used when the application is running in containers.
use std::sync::{Arc, Mutex};

use axum::extract::State;
use axum::routing::{get, post};
use axum::{Json, Router};
use chrono::{DateTime, Utc};
use log::{error, info};
use serde_json::{json, Value};
use tokio::task::JoinHandle;

use crate::tracker::statistics_importer::StatisticsImporter;

const IMPORTER_API_IP: &str = "127.0.0.1";

#[derive(Clone)]
struct ImporterState {
/// Shared variable to store the timestamp of the last heartbeat sent
/// by the cronjob.
pub last_heartbeat: Arc<Mutex<DateTime<Utc>>>,
/// Interval between importation executions
pub torrent_info_update_interval: u64,
}

pub fn start(
importer_port: u16,
torrent_info_update_interval: u64,
tracker_statistics_importer: &Arc<StatisticsImporter>,
) -> JoinHandle<()> {
let weak_tracker_statistics_importer = Arc::downgrade(tracker_statistics_importer);

tokio::spawn(async move {
info!("Tracker statistics importer launcher started");

// Start the Importer API

let _importer_api_handle = tokio::spawn(async move {
let import_state = Arc::new(ImporterState {
last_heartbeat: Arc::new(Mutex::new(Utc::now())),
torrent_info_update_interval,
});

let app = Router::new()
.route("/", get(|| async { Json(json!({})) }))
.route("/health_check", get(health_check_handler))
.with_state(import_state.clone())
.route("/heartbeat", post(heartbeat_handler))
.with_state(import_state);

let addr = format!("{IMPORTER_API_IP}:{importer_port}");

info!("Tracker statistics importer API server listening on http://{}", addr);

axum::Server::bind(&addr.parse().unwrap())
.serve(app.into_make_service())
.await
.unwrap();
});

// Start the Importer cronjob

info!("Tracker statistics importer cronjob starting ...");

let interval = std::time::Duration::from_secs(torrent_info_update_interval);
let mut interval = tokio::time::interval(interval);

interval.tick().await; // first tick is immediate...

loop {
interval.tick().await;

info!("Running tracker statistics importer ...");

if let Err(e) = send_heartbeat(importer_port).await {
error!("Failed to send heartbeat from importer cronjob: {}", e);
}

if let Some(tracker) = weak_tracker_statistics_importer.upgrade() {
drop(tracker.import_all_torrents_statistics().await);
} else {
break;
}
}
})
}

/// Endpoint for container health check.
async fn health_check_handler(State(state): State<Arc<ImporterState>>) -> Json<Value> {
let margin_in_seconds = 10;
let now = Utc::now();
let last_heartbeat = state.last_heartbeat.lock().unwrap();

if now.signed_duration_since(*last_heartbeat).num_seconds()
<= (state.torrent_info_update_interval + margin_in_seconds).try_into().unwrap()
{
Json(json!({ "status": "Ok" }))
} else {
Json(json!({ "status": "Error" }))
}
}

/// The tracker statistics importer cronjob sends a heartbeat on each execution
/// to inform that it's alive. This endpoint handles receiving that signal.
async fn heartbeat_handler(State(state): State<Arc<ImporterState>>) -> Json<Value> {
let now = Utc::now();
let mut last_heartbeat = state.last_heartbeat.lock().unwrap();
*last_heartbeat = now;
Json(json!({ "status": "Heartbeat received" }))
}

/// Send a heartbeat from the importer cronjob to the importer API.
async fn send_heartbeat(importer_port: u16) -> Result<(), reqwest::Error> {
let client = reqwest::Client::new();
let url = format!("http://{IMPORTER_API_IP}:{importer_port}/heartbeat");

client.post(url).send().await?;

Ok(())
}
1 change: 1 addition & 0 deletions src/lib.rs
Original file line number Diff line number Diff line change
Expand Up @@ -200,6 +200,7 @@
//!
//! [tracker_statistics_importer]
//! torrent_info_update_interval = 3600
//! port = 3002
//! ```
//!
//! For more information about configuration you can visit the documentation for the [`config`]) module.
Expand Down
1 change: 1 addition & 0 deletions src/web/api/v1/contexts/settings/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -75,6 +75,7 @@
//! },
//! "tracker_statistics_importer": {
//! "torrent_info_update_interval": 3600
//! "port": 3002
//! }
//! }
//! }
Expand Down
11 changes: 9 additions & 2 deletions src/web/api/v1/routes.rs
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,8 @@ use std::sync::Arc;

use axum::extract::DefaultBodyLimit;
use axum::routing::get;
use axum::Router;
use axum::{Json, Router};
use serde_json::{json, Value};
use tower_http::compression::CompressionLayer;
use tower_http::cors::CorsLayer;

Expand Down Expand Up @@ -34,7 +35,8 @@ pub fn router(app_data: Arc<AppData>) -> Router {
.nest("/proxy", proxy::routes::router(app_data.clone()));

let router = Router::new()
.route("/", get(about_page_handler).with_state(app_data))
.route("/", get(about_page_handler).with_state(app_data.clone()))
.route("/health_check", get(health_check_handler).with_state(app_data))
.nest(&format!("/{API_VERSION_URL_PREFIX}"), v1_api_routes);

let router = if env::var(ENV_VAR_CORS_PERMISSIVE).is_ok() {
Expand All @@ -45,3 +47,8 @@ pub fn router(app_data: Arc<AppData>) -> Router {

router.layer(DefaultBodyLimit::max(10_485_760)).layer(CompressionLayer::new())
}

/// Endpoint for container health check.
async fn health_check_handler() -> Json<Value> {
Json(json!({ "status": "Ok" }))
}
2 changes: 2 additions & 0 deletions tests/common/contexts/settings/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -82,6 +82,7 @@ pub struct Api {
#[derive(Deserialize, Serialize, PartialEq, Debug, Clone)]
pub struct TrackerStatisticsImporter {
pub torrent_info_update_interval: u64,
port: u16,
}

impl From<DomainSettings> for Settings {
Expand Down Expand Up @@ -185,6 +186,7 @@ impl From<DomainTrackerStatisticsImporter> for TrackerStatisticsImporter {
fn from(tracker_statistics_importer: DomainTrackerStatisticsImporter) -> Self {
Self {
torrent_info_update_interval: tracker_statistics_importer.torrent_info_update_interval,
port: tracker_statistics_importer.port,
}
}
}
Loading

0 comments on commit fa3bba7

Please sign in to comment.