IH-533: Remove usage of forkexecd daemon to execute processes #5995

freddy77 · 2024-09-16T10:57:47Z

Forkexecd was written to avoid some issues with Ocaml and multi-threading.
Instead use C code to launch processes and avoid these issues.
Interface remains unchanged from Ocaml side but implementation rely entirely on C code.
vfork() is used to avoid performance memory issue.
Reap of the processes are done directly.
Code automatically reap child processes to avoid zombies.
One small helper is used to better separate Ocaml and C code and handling syslog redirection. This allows to better debug in case of issues.
Syslog handling is done in a separate process allowing to restart the toolstack and keep launched programs running; note that even with forkexecd daemon one process was used for this purpose.
Code tries to keep compatibility with forkexecd, in particular:

SIGPIPE is ignored in the parent;
/dev/null is open with O_WRONLY even for stdin;
file descriptors are limited to 1024.

We use close_range (if available) to reduce system calls to close file descriptors.
Cgroup is set to avoid systemd closing processes on toolstack restart.
There's a fuzzer program to check file remapping algorithm; for this reason the algorithm is in a separate file.

To turn internal debug on, you need to set FORKEXECD_DEBUG_LOGS C preprocessor macro to 1.

The syslog feature allows to run a program and redirect its output/error stream to system log. It's triggered passing "syslog_stdout" argument to "Forkhelpers" functions like "execute_command_get_output". To test this feature use a small preload library to redirect syslog writing. This allows to test program without changing it. The log is redirected to /tmp/xyz instead of /dev/log. The name was chosen to allow for future static build redirection. The C program is used only for the test, so the code style take into account this (specifically it does not try to handle all redirect situation and all error paths). Signed-off-by: Frediano Ziglio <frediano.ziglio@cloud.com>

ocaml/forkexecd/lib/forkhelpers.ml

edwintorok · 2024-09-16T12:04:59Z

ocaml/forkexecd/lib/fe_stubs.c

+        // child
+        if (pipe_fds[0] >= 0)
+            close(pipe_fds[0]);
+        execve(args[0], args, envs);


This is now a lot closer to vfork, and should avoid previous concerns about what functions can be safely called in-between :)

(I haven't reviewed the rest yet)

Trying to reduce that part (C code in Ocaml) the only type passed is a list of string (2 parameters) and the code doing the Ocaml -> C transformation (and vice versa) is in a separated function from the C function doing the job, as suggested by Ocaml manual.

andyhhp · 2024-09-16T14:39:47Z

Very happy to see this making progress again. Two questions:

When you say "SIGPIPE is ignored in the parent;", which process is "the parent" in this context? Xapi definitely wants to be ignoring SIGPIPE, but others probably spawned processes should not have SIGPIPIE played with behind their back.
"/dev/null is open with O_WRONLY even for stdin;" This is a plain bug and wants fixing. Does it want doing before or after this change? (i.e. does it want backporting.)

ocaml/forkexecd/helper/vfork_helper.h

ocaml/forkexecd/helper/syslog.h

edwintorok · 2024-09-16T15:09:46Z

ocaml/forkexecd/lib/fe_stubs.c

+}
+
+/*
+ * Wait a process with a given timeout.


Why can't this be implemented in OCaml? Is a binding to some function missing for this?

It could, but it was done in C for resource constraints. Launching a thread in higher level languages is not that light (note that there are additional code even to reduce C resource usage).

The reason about the thread is not that easy to grasp.

On newer Linux kernel (similarly to Windows) you could use pidfd and a simple poll to wait for a process easily (on Windows you use the handle to the process). But that would require a newer kernel with newer Glibc (and Linux, but that should not be an issue). Waiting a process without pidfd and threads would require some global setting (like a signal handler) which, done in a library, is not nice (library should not change global settings) and racy.

Start to think this would be nice as a comment in the code...

Where do these threads run? We wouldn't want to spawn additional C threads inside XAPI, if anything goes wrong there it could crash XAPI. Can we move the monitoring threads to forkexecd and query its status via the socket? (although if the spawned process is not a child of forkexec, monitoring its status might be more difficult).

In fact in the future we probably want to move most long running processes to systemd transient units (we just need a newer systemd that supports file descriptor passing), and we can leave the monitoring (and logging) to systemd.

We could also have a single dedicated OCaml thread in XAPI, that is responsible for waiting for all children and notifying the appropriate condition variables. In fact would calling Unix.wait in a loop on a dedicated thread achieve that? It wouldn't increase resource usage by much (a single thread should have a fixed cost), in fact with lots of processes it might even decrease the resources used compared to launching a C thread for each one.

This should then further reduce the size of the C code.

sorry, your comment does not make sense, the intention is to remove forkexed, it always has been, not to use it in different ways.

Having a dedicated thread in XAPI doesn't require retaining forkexecd, these were 2 separate suggestions:

keep the monitoring and vforking in forkexecd if that simplifies the code

use a dedicated thread in XAPI to avoid the need for extra C code

We can do the latter without doing the former.

That explanation makes sense. Your main target is reducing C code. Specifically, this part waiting for a process, potentially with a timeout. However, my Ocaml skills are not that great, so I would probably end up with pretty bad Ocaml code. Can you propose something?

I can look in more detail on Friday, in principle something like this (but untested, and probably needs a bit more error handling):

let wait_pid_with_timeout ~timeout pid = let need_to_kill = Atomic.make true in let kill_pid () = if Atomic.get need_to_kill then try Unix.kill Sys.SIGKILL pid with (Unix.Unix_error(Unix.ESRCH, _, _) -> () in let some_unique_id = .... in let finally () = Xapi_periodic_scheduler.remove_from_queue some_unique_id in Xapi_periodic_scheduler.add_to_queue some_unique_id Xapi_periodic_scheduler.OneShot timeout kill_pid; Fun.protect ~finally (fun () -> let pid, status = Unix.waitpid [] pid in Atomic.set need_to_kill false; status )

This has the race condition between a process exiting and the timeout kicking in just at the same time (although it is unlikely that we would've also went through all pids, rolled back and got something else meanwhile in that spot).
I see you improved that in your C code, so we could perhaps add a small binding to Unixext.waitid to waitid, so that we can use waitid instead of waitpid, and then do the 2 step:

let pid, status = Unixext.(waitid P_PID [WEXITED | WNOWAIT] pid) in Atomic.set need_to_kill false; let pid, status = Unix.waitpid [] pid in

ocaml/forkexecd/helper/vfork_helper.h

freddy77 · 2024-09-17T08:12:30Z

Very happy to see this making progress again. Two questions:

1. When you say "SIGPIPE is ignored in the parent;", which process is "the parent" in this context?  Xapi definitely wants to be ignoring SIGPIPE, but others probably spawned processes should not have SIGPIPIE played with behind their back.

2. "/dev/null is open with O_WRONLY even for stdin;" This is a plain bug and wants fixing.  Does it want doing before or after this change?  (i.e. does it want backporting.)

Yes, normally you don't want these behaviors, however I would prefer to not introduce any regression that could cause frictions or issues. I bet that changing 2) won't cause any issue, but on 1) it's not exactly easy to say. For instance, thinking about migration we launch a program that uses sockets and at some point socket will be closed, potentially causing SIGPIPE to get raised. Forkexecd was there "protecting" SIGPIPE, removing this protection would cause crashes if the program itself forgot to handle that signal.

They can easily get fixed later.

Testing was the main reason this project was stuck for months. Now the lab is in a better shape, and we fixed some issue, but I would prefer trying avoiding that path again.

ocaml/forkexecd/lib/fe_stubs.c

edwintorok · 2024-09-17T23:02:12Z

ocaml/forkexecd/lib/fe_stubs.c

+
+    pthread_mutex_lock(&tm->mtx);
+    int res  = tm->stop ? 0:
+        pthread_cond_timedwait(&tm->cond, &tm->mtx, &tm->deadline);


This needs a loop to check for tm->stop becoming true, or the deadline passing (ETIMEDOUT).

pthread_cond_(timed)wait can be woken up spuriously (and in fact one of the workarounds for the broken pthread in glibc >2.27 uses spurious wakeups to reduce the likelihood of hitting the glibc bug, so expect a lot more spurious wakeups on newer libc in XS9), so we always need to check that the change we've been waiting for has actually happened.

ocaml/forkexecd/lib/fe_stubs.c

edwintorok · 2024-09-17T23:12:50Z

ocaml/forkexecd/lib/fe_stubs.c

+    // Avoiding reaping the process will create a zombie process so
+    // the KILL would be directed to that.
+    siginfo_t info;
+    waitid(P_PID, pid, &info, WEXITED|WNOWAIT);


should handle the errors from this

edwintorok · 2024-09-17T23:15:55Z

ocaml/forkexecd/lib/fe_stubs.c

+} timeout_kill;
+
+static void *
+thread_proc_timeout_kill(void *arg)


XAPI already has a singleshot timer where we can register something to happen after a given time (in OCaml), so we don't necessarily need to spawn a separate thread to do that.
See 'make_timeboxed_rpc' for an example on how to use it.

Forkexecd was written to avoid some issues with Ocaml and multi-threading. Instead use C code to launch processes and avoid these issues. Interface remains unchanged from Ocaml side but implementation rely entirely on C code. vfork() is used to avoid performance memory issue. Reap of the processes are done directly. Code automatically reap child processes to avoid zombies. One small helper is used to better separate Ocaml and C code and handling syslog redirection. This allows to better debug in case of issues. Syslog handling is done in a separate process allowing to restart the toolstack and keep launched programs running; note that even with forkexecd daemon one process was used for this purpose. Code tries to keep compatibility with forkexecd, in particular: - SIGPIPE is ignored in the parent; - /dev/null is open with O_WRONLY even for stdin; - file descriptors are limited to 1024. We use close_range (if available) to reduce system calls to close file descriptors. Cgroup is set to avoid systemd closing processes on toolstack restart. There's a fuzzer program to check file remapping algorithm; for this reason the algorithm is in a separate file. To turn internal debug on you need to set FORKEXECD_DEBUG_LOGS C preprocessor macro to 1. Signed-off-by: Frediano Ziglio <frediano.ziglio@cloud.com>

freddy77 requested review from lindig, edwintorok, psafont and robhoes September 16, 2024 10:57

psafont reviewed Sep 16, 2024

View reviewed changes

ocaml/forkexecd/lib/forkhelpers.ml Show resolved Hide resolved

edwintorok reviewed Sep 16, 2024

View reviewed changes

last-genius reviewed Sep 16, 2024

View reviewed changes

ocaml/forkexecd/helper/vfork_helper.h Outdated Show resolved Hide resolved

last-genius reviewed Sep 16, 2024

View reviewed changes

ocaml/forkexecd/helper/vfork_helper.h Outdated Show resolved Hide resolved

last-genius reviewed Sep 16, 2024

View reviewed changes

ocaml/forkexecd/helper/syslog.h Show resolved Hide resolved

edwintorok reviewed Sep 16, 2024

View reviewed changes

contificate reviewed Sep 16, 2024

View reviewed changes

ocaml/forkexecd/helper/vfork_helper.h Show resolved Hide resolved

freddy77 force-pushed the forkexec_upstream branch from 4ea627d to 996b02f Compare September 17, 2024 08:03

freddy77 force-pushed the forkexec_upstream branch from 996b02f to 1a457e5 Compare September 17, 2024 11:51

edwintorok reviewed Sep 17, 2024

View reviewed changes

ocaml/forkexecd/lib/fe_stubs.c Outdated Show resolved Hide resolved

edwintorok reviewed Sep 17, 2024

View reviewed changes

ocaml/forkexecd/lib/fe_stubs.c Show resolved Hide resolved

edwintorok reviewed Sep 17, 2024

View reviewed changes

ocaml/forkexecd/lib/fe_stubs.c Show resolved Hide resolved

edwintorok reviewed Sep 17, 2024

View reviewed changes

freddy77 force-pushed the forkexec_upstream branch from 1a457e5 to a4a2f94 Compare September 18, 2024 08:09

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

IH-533: Remove usage of forkexecd daemon to execute processes #5995

IH-533: Remove usage of forkexecd daemon to execute processes #5995

freddy77 commented Sep 16, 2024

edwintorok Sep 16, 2024

freddy77 Sep 16, 2024

andyhhp commented Sep 16, 2024

edwintorok Sep 16, 2024

freddy77 Sep 17, 2024

freddy77 Sep 17, 2024

edwintorok Sep 17, 2024

freddy77 Sep 18, 2024

edwintorok Sep 18, 2024

freddy77 Sep 18, 2024

edwintorok Sep 18, 2024 •

edited

Loading

freddy77 commented Sep 17, 2024 •

edited

Loading

edwintorok Sep 17, 2024 •

edited

Loading

edwintorok Sep 17, 2024

edwintorok Sep 17, 2024

IH-533: Remove usage of forkexecd daemon to execute processes #5995

Are you sure you want to change the base?

IH-533: Remove usage of forkexecd daemon to execute processes #5995

Conversation

freddy77 commented Sep 16, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

andyhhp commented Sep 16, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

edwintorok Sep 18, 2024 • edited Loading

Choose a reason for hiding this comment

freddy77 commented Sep 17, 2024 • edited Loading

edwintorok Sep 17, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

edwintorok Sep 18, 2024 •

edited

Loading

freddy77 commented Sep 17, 2024 •

edited

Loading

edwintorok Sep 17, 2024 •

edited

Loading