Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MacOS] Race condition in Process.get_connections randomly raises AccessDenied #1901

Closed
gimperiale opened this issue Jan 6, 2021 · 14 comments · Fixed by #1903
Closed

[MacOS] Race condition in Process.get_connections randomly raises AccessDenied #1901

gimperiale opened this issue Jan 6, 2021 · 14 comments · Fixed by #1903

Comments

@gimperiale
Copy link

Summary

  • OS: MacOS Big Sur. I did not test this issue on other OSs.
  • Architecture: Intel 64bit
  • Psutil version: 5.8.0
  • Python version: 3.8.6
  • Type: core

Description

The following code starts a vanilla HTTP server, which runs in the foreground (doesn't fork/daemonize) and listens on a single TCP/IP port. The wrapper waits for the subprocess to start listening before it returns:

import psutil
import subprocess

def start_server(argv):
    proc = subprocess.Popen(argv)
    pproc = psutil.Process(proc.pid)
    while True:
        for conn in pproc.connections():
            if conn.status == "LISTEN":
                return proc
        time.sleep(0.01)

Expected behaviour

The above code either

  • returns after a while, or
  • raises psutil.NoSuchProcess, if the subprocess dies before it opens the port, or
  • hangs forever, if for some weird reason the subprocess fails to listen on the port and doesn't die as a consequence

Observed behaviour

Occasionally, if the subprocess dies in the middle of a call to Process.connections(), the above code raises psutil.AccessDenied. This makes no sense to me as I'm the user that started the process to begin with.
I know for sure that the subprocess died before it even started tampering with its sockets.

Stack trace:

myscript.py:8: in start_server
    for conn in pproc.connections():
lib/python3.8/site-packages/psutil/__init__.py:1162: in connections
    return self._proc.connections(kind)
lib/python3.8/site-packages/psutil/_psosx.py:344: in wrapper
    return fun(self, *args, **kwargs)
lib/python3.8/site-packages/psutil/_psosx.py:534: in connections
    rawlist = cext.proc_connections(self.pid, families, types)
lib/python3.8/contextlib.py:131: in __exit__
    self.gen.throw(type, value, traceback)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

proc = <psutil._psosx.Process object at 0x7fa70f88dec0>

    @contextlib.contextmanager
    def catch_zombie(proc):
        """There are some poor C APIs which incorrectly raise ESRCH when
        the process is still alive or it's a zombie, or even RuntimeError
        (those who don't set errno). This is here in order to solve:
        https://github.com/giampaolo/psutil/issues/1044
        """
        try:
            yield
        except (OSError, RuntimeError) as err:
            if isinstance(err, RuntimeError) or err.errno == errno.ESRCH:
                try:
                    # status() is not supposed to lie and correctly detect
                    # zombies so if it raises ESRCH it's true.
                    status = proc.status()
                except NoSuchProcess:
                    raise err
                else:
                    if status == _common.STATUS_ZOMBIE:
                        raise ZombieProcess(proc.pid, proc._name, proc._ppid)
                    else:
>                       raise AccessDenied(proc.pid, proc._name)
E                       psutil.AccessDenied: psutil.AccessDenied (pid=16109)

lib/python3.8/site-packages/psutil/_psosx.py:378: AccessDenied
@gimperiale gimperiale added the bug label Jan 6, 2021
giampaolo added a commit that referenced this issue Jan 6, 2021
Signed-off-by: Giampaolo Rodola <g.rodola@gmail.com>
@giampaolo
Copy link
Owner

giampaolo commented Jan 7, 2021

I submitted a PR (#1903) which I think should solve this issue, but since I cannot reproduce it, it would be good it you try it first. Can you clone the repo and try that branch? Here's how to do it:

git clone git@github.com:giampaolo/psutil.git
cd psutil
git checkout osx-list-fds-refact
make install

@gimperiale
Copy link
Author

@marouane-miftah can I leave the retest to you?

@marouane-miftah
Copy link

@marouane-miftah can I leave the retest to you?

yes

@marouane-miftah
Copy link

@marouane-miftah can I leave the retest to you?

I am still getting the same error with the new branch.

@giampaolo
Copy link
Owner

I added some debug prints. Please git pull and run your script like this, then paste the output & traceback:

PSUTIL_DEBUG=1 python3 yourscript.py

@marouane-miftah
Copy link

marouane-miftah commented Jan 7, 2021

@giampaolo, hope this helps.

I added some debug prints. Please git pull and run your script like this, then paste the output & traceback:

PSUTIL_DEBUG=1 python3 yourscript.py

../../miniconda3/envs/x/lib/python3.8/site-packages/pshell/procs.py:241: in wait_for_server
conn.laddr.port for conn in proc.connections() if conn.status == "LISTEN"
../../miniconda3/envs/x/lib/python3.8/site-packages/psutil/init.py:1162: in connections
return self._proc.connections(kind)
../../miniconda3/envs/x/lib/python3.8/site-packages/psutil/_psosx.py:344: in wrapper
return fun(self, *args, **kwargs)
../../miniconda3/envs/x/lib/python3.8/site-packages/psutil/_psosx.py:534: in connections
rawlist = cext.proc_connections(self.pid, families, types)
../../miniconda3/envs/x/lib/python3.8/contextlib.py:131: in exit
self.gen.throw(type, value, traceback)


proc = <psutil._psosx.Process object at 0x7fc4206c1240>

@contextlib.contextmanager
def catch_zombie(proc):
    """There are some poor C APIs which incorrectly raise ESRCH when
    the process is still alive or it's a zombie, or even RuntimeError
    (those who don't set errno). This is here in order to solve:
    https://github.com/giampaolo/psutil/issues/1044
    """
    try:
        yield
    except (OSError, RuntimeError) as err:
        if isinstance(err, RuntimeError) or err.errno == errno.ESRCH:
            try:
                # status() is not supposed to lie and correctly detect
                # zombies so if it raises ESRCH it's true.
                status = proc.status()
            except NoSuchProcess:
                raise err
            else:
                if status == _common.STATUS_ZOMBIE:
                    raise ZombieProcess(proc.pid, proc._name, proc._ppid)
                else:
                  raise AccessDenied(proc.pid, proc._name)

E psutil.AccessDenied: psutil.AccessDenied (pid=23261)

../../miniconda3/envs/x/lib/python3.8/site-packages/psutil/_psosx.py:378: AccessDenied
----------------------------------------------------------------------------------------- Captured stderr setup ------------------------------------------------------------------------------------------
psutil-debug> sysctl(KERN_PROCARGS2) -> EINVAL translated to NSP
psutil-debug> sysctl(KERN_PROCARGS2) -> EINVAL translated to NSP
psutil-debug> sysctl(KERN_PROCARGS2) -> EINVAL translated to NSP
psutil-debug> sysctl(KERN_PROCARGS2) -> EINVAL translated to NSP
psutil-debug> sysctl(KERN_PROCARGS2) -> EINVAL translated to NSP
psutil-debug> sysctl(KERN_PROCARGS2) -> EINVAL translated to NSP
psutil-debug> sysctl(KERN_PROCARGS2) -> EINVAL translated to NSP
psutil-debug> sysctl(KERN_PROCARGS2) -> EINVAL translated to NSP
psutil-debug> sysctl(KERN_PROCARGS2) -> EINVAL translated to NSP
psutil-debug> sysctl(KERN_PROCARGS2) -> EINVAL translated to NSP
psutil-debug> sysctl(KERN_PROCARGS2) -> EINVAL translated to NSP
psutil-debug> sysctl(KERN_PROCARGS2) -> EINVAL translated to NSP
psutil-debug> sysctl(KERN_PROCARGS2) -> EINVAL translated to NSP
psutil-debug> sysctl(KERN_PROCARGS2) -> EINVAL translated to NSP

@giampaolo
Copy link
Owner

giampaolo commented Jan 7, 2021

Mmm no. I don't see any debug message and the original OSError exception coming from the C extension module got swallowed / hidden for some reason. It doesn't look like your running this script directly (is this a test suite run?).
You should run it from the shell (and possibly not using conda) as in:

PSUTIL_DEBUG=1 python3 yourscript.py

@marouane-miftah
Copy link

@giampaolo, I have run it as you described but I am note getting any extra DEBUG messages

I also tried export the env variable first as so, and can verify that it was set properly at the end after the error is triggered.

export PSUTIL_DEBUG=1; python myscript.py 
Traceback (most recent call last):
 (application traceback)
OSError: [Errno 48] error while attempting to bind on address ('::', 48588, 0, 0): address already in use
Traceback (most recent call last):
  File "lib/python3.8/site-packages/psutil/_psosx.py", line 365, in catch_zombie
    yield
  File "lib/python3.8/site-packages/psutil/_psosx.py", line 534, in connections
    rawlist = cext.proc_connections(self.pid, families, types)
ProcessLookupError: [Errno 3] No such process (originated from proc_pidinfo())

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "myscript.py", line 28, in <module>
    start_server(cmd_args)
  File "myscript.py", line 10, in start_server
    for conn in pproc.connections():
  File "lib/python3.8/site-packages/psutil/__init__.py", line 1162, in connections
    return self._proc.connections(kind)
  File "lib/python3.8/site-packages/psutil/_psosx.py", line 344, in wrapper
    return fun(self, *args, **kwargs)
  File "lib/python3.8/site-packages/psutil/_psosx.py", line 534, in connections
    rawlist = cext.proc_connections(self.pid, families, types)
  File "lib/python3.8/contextlib.py", line 131, in __exit__
    self.gen.throw(type, value, traceback)
  File "lib/python3.8/site-packages/psutil/_psosx.py", line 378, in catch_zombie
    raise AccessDenied(proc.pid, proc._name)
psutil.AccessDenied: psutil.AccessDenied (pid=31489)

echo $PSUTIL_DEBUG 
1

giampaolo added a commit that referenced this issue Jan 8, 2021
This was done in order to solve
#1044 and
#1100
...but its logic duplicates the one in wrap_exceptions()
decorator.

Also, this should solve #1901 as it did erroneously translated NSP in
AD.

Signed-off-by: Giampaolo Rodola <g.rodola@gmail.com>
@giampaolo
Copy link
Owner

OK, I pushed some changes into osx-list-fds-refact branch. Can you retry?

@marouane-miftah
Copy link

sorry @giampaolo still the same error.

@giampaolo
Copy link
Owner

giampaolo commented Jan 8, 2021

Mmm weird.
This is my test case:

$ cat foo.py
import psutil
import subprocess

proc = subprocess.Popen(["python3", "server.py"])
pproc = psutil.Process(proc.pid)
while True:
    for conn in pproc.connections():
        if conn.status == "LISTEN":
            break
$ cat server.py
import socket
sock = socket.socket()
sock.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)
sock.bind(('127.0.0.1', 8080))
sock.listen(10)

Run:

$ git rev-parse HEAD
7fb7ac3bc0986eb2e18a3179a695b081d92a19e6
$ make test TSCRIPT=foo.py

Before the patch it raised AccessDenied, now it raises NoSuchProcess (as it should).

@no-response
Copy link

no-response bot commented Apr 23, 2021

This issue has been automatically closed because there has been no response for more information from the original author. Please reach out if you have or find the answers requested so that this can be investigated further.

@no-response no-response bot closed this as completed Apr 23, 2021
@giampaolo giampaolo reopened this Oct 5, 2021
giampaolo added a commit that referenced this issue Oct 5, 2021
Signed-off-by: Giampaolo Rodola <g.rodola@gmail.com>
@no-response
Copy link

no-response bot commented Oct 5, 2021

This issue has been automatically closed because there has been no response for more information from the original author. Please reach out if you have or find the answers requested so that this can be investigated further.

@giampaolo
Copy link
Owner

PR merged. This issue should have been fixed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants