Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Weird Asyncio errors on raspberry pi #8

Closed
masenf opened this issue Jan 3, 2023 · 12 comments
Closed

Weird Asyncio errors on raspberry pi #8

masenf opened this issue Jan 3, 2023 · 12 comments

Comments

@masenf
Copy link

masenf commented Jan 3, 2023

(Original issue report received via email)

I am using your release of kiss python library kiss3 here:
https://github.com/python-aprs/kiss3

I'm integrating it into my aprsd packet daemon:
https://github.com/craigerl/aprsd

It generally works fine, but I am running into weird asyncio errors on my raspberry pi debian.
I have run this exact code on macos as well as linux x86 w/o problems, but raspi fails.

when I try and start the kiss interface I get this:

File "/home/pi/aprsd/aprsd/client.py", line 217, in setup_connection
client = kiss.KISS3Client()
File "/home/pi/aprsd/aprsd/clients/kiss.py", line 18, in __init__
self.setup()
File "/home/pi/aprsd/aprsd/clients/kiss.py", line 49, in setup
self.kiss.start()
File "/home/pi/aprsd/.aprsd-venv/lib/python3.10/site-packages/kiss/classes.py", line 62, in start
_, self.protocol = self.loop.run_until_complete(
File "/home/pi/.pyenv/versions/3.10.9/lib/python3.10/asyncio/base_events.py", line 649, in run_until_complete
return future.result()
File "/home/pi/aprsd/.aprsd-venv/lib/python3.10/site-packages/kiss/kiss.py", line 241, in create_tcp_connection
return await _generic_create_connection(
File "/home/pi/aprsd/.aprsd-venv/lib/python3.10/site-packages/kiss/kiss.py", line 212, in _generic_create_connection
transport, protocol = await f(*args, **kwargs)
File "/home/pi/.pyenv/versions/3.10.9/lib/python3.10/asyncio/base_events.py", line 1030, in create_connection
infos = await self._ensure_resolved(
File "/home/pi/.pyenv/versions/3.10.9/lib/python3.10/asyncio/base_events.py", line 1412, in _ensure_resolved
return await loop.getaddrinfo(host, port, family=family, type=type,
File "/home/pi/.pyenv/versions/3.10.9/lib/python3.10/asyncio/base_events.py", line 863, in getaddrinfo
return await self.run_in_executor(
File "/home/pi/.pyenv/versions/3.10.9/lib/python3.10/asyncio/base_events.py", line 816, in run_in_executor
executor = concurrent.futures.ThreadPoolExecutor(
File "/home/pi/.pyenv/versions/3.10.9/lib/python3.10/concurrent/futures/__init__.py", line 49, in __getattr__
from .thread import ThreadPoolExecutor as te
File "/home/pi/.pyenv/versions/3.10.9/lib/python3.10/concurrent/futures/thread.py", line 37, in <module>
threading._register_atexit(_python_exit)
File "/home/pi/.pyenv/versions/3.10.9/lib/python3.10/threading.py", line 1504, in _register_atexit
raise RuntimeError("can't register atexit after shutdown")
RuntimeError: can't register atexit after shutdown
Exception ignored in: <function SyncFrameDecode.__del__ at 0x74f856e8>
Traceback (most recent call last):
File "/home/pi/aprsd/.aprsd-venv/lib/python3.10/site-packages/ax253/decode.py", line 211, in __del__
return self.stop()
File "/home/pi/aprsd/.aprsd-venv/lib/python3.10/site-packages/kiss/classes.py", line 56, in stop
self.protocol.transport.close()
File "/home/pi/aprsd/.aprsd-venv/lib/python3.10/site-packages/ax253/decode.py", line 183, in protocol
raise IOError(
OSError: Underlying connection is not active. Hint: did you call start()?
@masenf
Copy link
Author

masenf commented Jan 3, 2023

@hemna if you have a chance, would you post the python interpreter versions and do a pip freeze on the platforms you have tried (both working and non-working).

Also, if you have a reproducer that I can run to replicate this, I'd appreciate it. From the trace, it looks like the interpreter is exiting at the time start is being called, which makes me suspect threads are in use somewhere (I haven't had a chance to review the code in aprsd but I see it's importing threading for some things).

I'm suspecting that python/cpython#86813 may be related.

@hemna
Copy link

hemna commented Jan 3, 2023

Hi,
Yes aprsd is using threads for packet processing and responses and plugin filtering of responses.

I have tried the apt provided python, which is 3.9.2, but I also have tried pyenv installs of 3.8.16 and 3.10.9.
All have the same problem. I am using a virtual environment to isolate the python packages installed from the system

(.aprsd-venv) pi@digipi:~/aprsd $ pip freeze
add-trailing-comma==2.4.0
alabaster==0.7.12
-e git+ssh://git@github.com/craigerl/aprsd.git@6391c7eed691f912de49de0aa3e0a0fdb5a2edc2#egg=aprsd
aprslib==0.7.2
attrs==22.2.0
autoflake==1.5.3
ax253==0.1.5.post1
Babel==2.11.0
beautifulsoup4==4.11.1
bidict==0.22.0
bitarray==2.6.1
black==22.12.0
build==0.9.0
cachetools==5.2.0
certifi==2022.12.7
cffi==1.15.1
cfgv==3.3.1
chardet==5.1.0
charset-normalizer==2.1.1
click==8.1.3
click-completion==0.5.2
colorama==0.4.6
commonmark==0.9.1
ConfigArgParse==1.5.3
coverage==7.0.1
cryptography==38.0.4
dacite2==2.0.0
dataclasses==0.6
debtcollector==2.5.0
distlib==0.3.6
dnspython==2.2.1
docutils==0.19
eventlet==0.33.2
exceptiongroup==1.1.0
filelock==3.8.2
fixit==0.1.4
flake8==6.0.0
Flask==2.1.2
Flask-Classful==0.14.2
Flask-HTTPAuth==4.7.0
Flask-SocketIO==5.3.2
gray==0.13.0
greenlet==2.0.1
identify==2.5.11
idna==3.4
imagesize==1.4.1
IMAPClient==2.3.1
importlib-metadata==5.2.0
importlib-resources==5.10.1
iniconfig==1.1.1
isort==5.11.4
itsdangerous==2.1.2
Jinja2==3.1.2
jsonschema==4.17.3
kiss3==8.0.0
libcst==0.4.9
MarkupSafe==2.1.1
mccabe==0.7.0
mypy==0.991
mypy-extensions==0.4.3
netaddr==0.8.0
nodeenv==1.7.0
oslo.config==9.0.0
oslo.i18n==5.1.0
packaging==22.0
pathspec==0.10.3
pbr==5.11.0
pep517==0.13.0
pep8-naming==0.13.3
pip-tools==6.12.1
platformdirs==2.6.0
pluggy==1.0.0
plumbum==1.8.0
pre-commit==2.21.0
pycodestyle==2.10.0
pycparser==2.21
pyflakes==3.0.1
Pygments==2.13.0
pyOpenSSL==22.1.0
pyproject_api==1.2.1
pyrsistent==0.19.2
pyserial==3.5
pyserial-asyncio==0.6
pytest==7.2.0
pytest-cov==4.0.0
python-engineio==4.3.4
python-socketio==5.7.2
pytz==2022.7
pyupgrade==3.3.1
PyYAML==6.0
requests==2.28.1
rfc3986==2.0.0
rich==12.6.0
rpyc==5.3.0
shellingham==1.5.0
six==1.16.0
snowballstemmer==2.2.0
soupsieve==2.3.2.post1
Sphinx==5.3.0
sphinxcontrib-applehelp==1.0.2
sphinxcontrib-devhelp==1.0.2
sphinxcontrib-htmlhelp==2.0.0
sphinxcontrib-jsmath==1.0.1
sphinxcontrib-qthelp==1.0.3
sphinxcontrib-serializinghtml==1.1.5
stevedore==4.1.1
tabulate==0.9.0
thesmuggler==1.0.1
tokenize-rt==5.0.0
toml==0.10.2
tomli==2.0.1
tox==4.0.18
typing-inspect==0.8.0
typing_extensions==4.4.0
ua-parser==0.16.1
unify==0.5
untokenize==0.1.1
update-checker==0.18.0
urllib3==1.26.13
user-agents==2.2.0
virtualenv==20.17.1
Werkzeug==2.1.2
wrapt==1.14.1
zipp==3.11.0

@masenf
Copy link
Author

masenf commented Jan 5, 2023

just a note that recent cryptography wheels are failing to build on piwheels, so I pinned to 38.0.1

piwheels/piwheels#328

image

@masenf
Copy link
Author

masenf commented Jan 5, 2023

@hemna can you post a minimal aprsd config that reproduces this issue for you?

I checked out aprsd tonight (it's really cool BTW 👍) on my raspberry pi 4 and it actually worked perfectly for me.

pi@w7dg-aprs-02:~/code/python-aprs/aprsd $ uname -a
Linux w7dg-aprs-02 5.10.92-v7+ #1514 SMP Mon Jan 17 17:36:39 GMT 2022 armv7l GNU/Linux
pi@w7dg-aprs-02:~/code/python-aprs/aprsd $ python3 --version
Python 3.9.2

I'm using a config that explicitly sets enabled = false for aprs_network and sets up kiss_tcp against direwolf running on localhost.

(the only change I made was fixing the pin for cryptography==38.0.1, so I didn't need a rust compiler)

@hemna
Copy link

hemna commented Jan 5, 2023

yah cryptography sucks because of it being rust based and rust builds consistently puke on arm base systems. I don't have a current paired down reproducer of this.

what version of aprsd are you using? I'm working off of master as it will be 3.0.0

fwiw, aprsd is what is driving the APRS REPEAT service here:
http://aprsd-repeat.hemna.com/

@hemna
Copy link

hemna commented Jan 5, 2023

I just tried to reinstall after editing requirements.txt to pin cryptography to 38.0.1

(.aprsd-venv) pi@digipi:~/aprsd $ python --version
Python 3.9.2

Here is what I'm doing

git clone https://github.com/craigerl/aprsd
cd aprsd
vim requirements.txt <---- to pin cryptography to 38.0.1 to prevent compiling
make dev
source .aprsd-venv/bin/activate
aprsd server --loglevel DEBUG

01/05/2023 12:10:46 PM MainThread           DEBUG    GET client 'tcpkiss'                                                            client.py:244
01/05/2023 12:10:47 PM RX_MSG               DEBUG    Starting                                                                          aprsd.py:70
01/05/2023 12:10:47 PM ProcessPKT           DEBUG    Starting                                                                          aprsd.py:70
01/05/2023 12:10:47 PM RX_MSG               DEBUG    ==> setup_connection: call "{'self': <aprsd.client.KISSClient object at           trace.py:48
                                                     0x74126370>}"
01/05/2023 12:10:47 PM KeepAlive            DEBUG    Starting                                                                          aprsd.py:70
01/05/2023 12:10:47 PM RPCThread            DEBUG    Starting                                                                          aprsd.py:70
01/05/2023 12:10:47 PM RX_MSG               DEBUG    KISS(8.0.0) TCP Connection to localhost:8001                                       kiss.py:35
01/05/2023 12:10:47 PM LogMonitorThread     DEBUG    Starting                                                                          aprsd.py:70
01/05/2023 12:10:47 PM KeepAlive            INFO     WB4BOR-2 - Uptime 00:00:00 RX:0 TX:0 Tracker:0 Msgs TX:0 RX:0 Last:00:00:00  keep_alive.py:66
                                                     Email: N/A - RAM Current:62KB Peak:67KB Threads:5
01/05/2023 12:10:47 PM RX_MSG               DEBUG    Starting KISS interface connection                                                 kiss.py:48
01/05/2023 12:10:47 PM RX_MSG               DEBUG    <== setup_connection: exception (178ms) RuntimeError("can't register atexit after trace.py:61
                                                     shutdown")
Exception in thread RX_MSG:
Traceback (most recent call last):
  File "/usr/lib/python3.9/threading.py", line 954, in _bootstrap_inner
    self.run()
  File "/home/pi/aprsd/aprsd/threads/aprsd.py", line 72, in run
    can_loop = self.loop()
  File "/home/pi/aprsd/aprsd/threads/rx.py", line 39, in loop
    self._client.client.consumer(
  File "/home/pi/aprsd/aprsd/client.py", line 52, in client
    self._client = self.setup_connection()
  File "/home/pi/aprsd/aprsd/utils/trace.py", line 58, in trace_logging_wrapper
    result = f(*args, **kwargs)
  File "/home/pi/aprsd/aprsd/client.py", line 217, in setup_connection
    client = kiss.KISS3Client()
  File "/home/pi/aprsd/aprsd/clients/kiss.py", line 18, in __init__
    self.setup()
  File "/home/pi/aprsd/aprsd/clients/kiss.py", line 49, in setup
    self.kiss.start()
  File "/home/pi/aprsd/.aprsd-venv/lib/python3.9/site-packages/kiss/classes.py", line 62, in start
    _, self.protocol = self.loop.run_until_complete(
  File "/usr/lib/python3.9/asyncio/base_events.py", line 642, in run_until_complete
    return future.result()
  File "/home/pi/aprsd/.aprsd-venv/lib/python3.9/site-packages/kiss/kiss.py", line 241, in create_tcp_connection
    return await _generic_create_connection(
  File "/home/pi/aprsd/.aprsd-venv/lib/python3.9/site-packages/kiss/kiss.py", line 212, in _generic_create_connection
    transport, protocol = await f(*args, **kwargs)
  File "/usr/lib/python3.9/asyncio/base_events.py", line 1017, in create_connection
    infos = await self._ensure_resolved(
  File "/usr/lib/python3.9/asyncio/base_events.py", line 1396, in _ensure_resolved
    return await loop.getaddrinfo(host, port, family=family, type=type,
  File "/usr/lib/python3.9/asyncio/base_events.py", line 856, in getaddrinfo
    return await self.run_in_executor(
  File "/usr/lib/python3.9/asyncio/base_events.py", line 809, in run_in_executor
    executor = concurrent.futures.ThreadPoolExecutor(
  File "/usr/lib/python3.9/concurrent/futures/__init__.py", line 49, in __getattr__
    from .thread import ThreadPoolExecutor as te
  File "/usr/lib/python3.9/concurrent/futures/thread.py", line 37, in <module>
    threading._register_atexit(_python_exit)
  File "/usr/lib/python3.9/threading.py", line 1374, in _register_atexit
    raise RuntimeError("can't register atexit after shutdown")
RuntimeError: can't register atexit after shutdown

@hemna
Copy link

hemna commented Jan 5, 2023

my ~/.config/aprsd/aprsd.conf

I have direwolf running on the same host that I'm running aprsd on, hence host is set to localhost

[DEFAULT]

#
# From aprsd.conf
#

# Callsign to use for messages sent by APRSD (string value)
callsign = WB4BOR-2

# Enable saving of watch list, packet tracker between restarts.
# (boolean value)
#enable_save = true

# Save location for packet tracking files. (string value)
#save_location = ~/.config/aprsd

# Enable code tracing (boolean value)
trace_enabled = true

# Units for display, imperial or metric (string value)
#units = imperial

# Comma separated list of enabled plugins for APRSD.To enable
# installed external plugins add them here.The full python path to the
# class name must be used (list value)
#enabled_plugins = aprsd.plugins.email.EmailPlugin,aprsd.plugins.fortune.FortunePlugin,aprsd.plugins.location.LocationPlugin,aprsd.plugins.ping.PingPlugin,aprsd.plugins.query.QueryPlugin,aprsd.plugins.time.TimePlugin,aprsd.plugins.weather.OWMWeatherPlugin,aprsd.plugins.version.VersionPlugin,aprsd.plugins.notify.NotifySeenPlugin


[admin]

#
# From aprsd.conf
#

# Enable the Admin Web Interface (boolean value)
web_enabled = true

# The ip address to listen on (IP address value)
web_ip = 0.0.0.0

# The port to listen on (port value)
# Minimum value: 0
# Maximum value: 65535
web_port = 8002

# The admin user for the admin web interface (string value)
user = admin

# Admin interface password (string value)
password = <my password>


[aprs_fi]

#
# From aprsd.conf
#

# Get the apiKey from your aprs.fi account here:http://aprs.fi/account
# (string value)
apiKey = <My key entered here>


[aprs_network]

#
# From aprsd.conf
#

# Set enabled to False if there is no internet connectivity.This is
# useful for a direwolf KISS aprs connection only. (boolean value)
enabled = false

# APRS Username (string value)
#login = NOCALL

# APRS Password Get the passcode for your callsign here:
# https://apps.magicbug.co.uk/passcode (string value)
#password = <None>

# The APRS-IS hostname (hostname value)
#host = noam.aprs2.net

# APRS-IS port (port value)
# Minimum value: 0
# Maximum value: 65535
#port = 14580


[avwx_plugin]

#
# From aprsd.conf
#

# avwx-api is an opensource project that hasa hosted service here:
# https://avwx.rest/You can launch your own avwx-api in a containerby
# cloning the githug repo here:https://github.com/avwx-rest/AVWX-API
# (string value)
#apiKey = <None>

# The base url for the avwx API.  If you are hosting your ownHere is
# where you change the url to point to yours. (string value)
#base_url = https://avwx.rest


[email_plugin]

#
# From aprsd.conf
#

# (Required) Callsign to validate for doing email commands.Only this
# callsign can check email. This is also where the email notifications
# for new emails will be sent. (string value)
#callsign = <None>

# Enable the Email plugin? (boolean value)
#enabled = false

# Enable the Email plugin Debugging? (boolean value)
#debug = false

# Login username/email for IMAP server (string value)
#imap_login = <None>

# Login password for IMAP server (string value)
#imap_password = <None>

# Hostname/IP of the IMAP server (hostname value)
#imap_host = <None>

# Port to use for IMAP server (port value)
# Minimum value: 0
# Maximum value: 65535
#imap_port = 993

# Use SSL for connection to IMAP Server (boolean value)
#imap_use_ssl = true

# Login username/email for SMTP server (string value)
#smtp_login = <None>

# Login password for SMTP server (string value)
#smtp_password = <None>

# Hostname/IP of the SMTP server (hostname value)
#smtp_host = <None>

# Port to use for SMTP server (port value)
# Minimum value: 0
# Maximum value: 65535
#smtp_port = 465

# Use SSL for connection to SMTP Server (boolean value)
#smtp_use_ssl = true

# List of email shortcuts for checking/sending email For Exmaple:
# wb=walt@walt.com,cl=cl@cl.com
# Means use 'wb' to send an email to walt@walt.com (list value)
#email_shortcuts = <None>


[kiss_serial]

#
# From aprsd.conf
#

# Enable Serial KISS interface connection. (boolean value)
enabled = false

# Serial Device file to use.  /dev/ttyS0 (string value)
#device = /dev/ttyS0

# The Serial device baud rate for communication (integer value)
#baudrate = 9600


[kiss_tcp]

#
# From aprsd.conf
#

# Enable Serial KISS interface connection. (boolean value)
enabled = true

# The KISS TCP Host to connect to. (hostname value)
host = localhost

# The KISS TCP/IP network port (port value)
# Minimum value: 0
# Maximum value: 65535
#port = 8001


[logging]

#
# From aprsd.conf
#

# Date format for log entries (string value)
#date_format = %m/%d/%Y %I:%M:%S %p

# Enable Rich logging (boolean value)
#rich_logging = true

# File to log to (string value)
#logfile = <None>

# Log file format, unless rich_logging enabled. (string value)
#logformat = [%(asctime)s] [%(threadName)-20.20s] [%(levelname)-5.5s] %(message)s - [%(pathname)s:%(lineno)d]


[owm_weather_plugin]

#
# From aprsd.conf
#

# OWMWeatherPlugin api key to OpenWeatherMap's API.This plugin uses
# the openweathermap API to fetchlocation and weather information.To
# use this plugin you need to get an openweathermapaccount and
# apikey.https://home.openweathermap.org/api_keys (string value)
#apiKey = <None>


[query_plugin]

#
# From aprsd.conf
#

# The Ham callsign to allow access to the query plugin from RF.
# (string value)
#callsign = <None>


[rpc_settings]

#
# From aprsd.conf
#

# Enable RPC calls (boolean value)
#enabled = true

# The ip address to listen on (string value)
#ip = localhost

# The port to listen on (port value)
# Minimum value: 0
# Maximum value: 65535
#port = 18861

# Magic word to authenticate requests between client/server (string
# value)
#magic_word = CHANGEME!!!


[watch_list]

#
# From aprsd.conf
#

# Enable the watch list feature.  Still have to enable the correct
# plugin.  Built-in plugin to use is aprsd.plugins.notify.NotifyPlugin
# (boolean value)
#enabled = false

# Callsigns to watch for messsages (list value)
#callsigns = <None>

# The Ham Callsign to send messages to for watch list alerts. (string
# value)
#alert_callsign = <None>

# The number of packets to store. (integer value)
#packet_keep_count = 10

# Time to wait before alert is sent on new message for users in
# callsigns. (integer value)
#alert_time_seconds = 3600

@masenf
Copy link
Author

masenf commented Jan 5, 2023

Fascinating, that looks almost exactly like the setup that I tried. 🤔

What kind of pi are you using? What OS? 64-bit?

And if you have an extra SD card I'm curious if you can reproduce on the latest raspi os?

I'll try again after work with your exact steps and OS and see if I can repro.

@hemna
Copy link

hemna commented Jan 5, 2023

I am trying on a raspi 3b+

(.aprsd-venv) pi@digipi:~/aprsd $ uname -a
Linux digipi 5.15.61-v7+ #1579 SMP Fri Aug 26 11:10:59 BST 2022 armv7l GNU/Linux

(.aprsd-venv) pi@digipi:~/aprsd $ cat /proc/cpuinfo
processor	: 0
model name	: ARMv7 Processor rev 4 (v7l)
BogoMIPS	: 76.80
Features	: half thumb fastmult vfp edsp neon vfpv3 tls vfpv4 idiva idivt vfpd32 lpae evtstrm crc32
CPU implementer	: 0x41
CPU architecture: 7
CPU variant	: 0x0
CPU part	: 0xd03
CPU revision	: 4

processor	: 1
model name	: ARMv7 Processor rev 4 (v7l)
BogoMIPS	: 76.80
Features	: half thumb fastmult vfp edsp neon vfpv3 tls vfpv4 idiva idivt vfpd32 lpae evtstrm crc32
CPU implementer	: 0x41
CPU architecture: 7
CPU variant	: 0x0
CPU part	: 0xd03
CPU revision	: 4

processor	: 2
model name	: ARMv7 Processor rev 4 (v7l)
BogoMIPS	: 76.80
Features	: half thumb fastmult vfp edsp neon vfpv3 tls vfpv4 idiva idivt vfpd32 lpae evtstrm crc32
CPU implementer	: 0x41
CPU architecture: 7
CPU variant	: 0x0
CPU part	: 0xd03
CPU revision	: 4

processor	: 3
model name	: ARMv7 Processor rev 4 (v7l)
BogoMIPS	: 76.80
Features	: half thumb fastmult vfp edsp neon vfpv3 tls vfpv4 idiva idivt vfpd32 lpae evtstrm crc32
CPU implementer	: 0x41
CPU architecture: 7
CPU variant	: 0x0
CPU part	: 0xd03
CPU revision	: 4

Hardware	: BCM2835
Revision	: a02082
Serial		: 0000000099360b7a
Model		: Raspberry Pi 3 Model B Rev 1.2

(.aprsd-venv) pi@digipi:~/aprsd $ cat /etc/issue
Raspbian GNU/Linux 11 \n \l

@masenf
Copy link
Author

masenf commented Jan 7, 2023

vim requirements.txt <---- to pin cryptography to 38.0.1 to prevent compiling

I'm also now having to pin shellingham==1.5.0.post1 since the pinned version 1.5.0 has been yanked from pypi in the last few days.

additionally with make dev, libcst is failing to build for me, so going with make run


all that said, repro is achieved via aprsd server --loglevel DEBUG

01/07/2023 06:37:43 AM MainThread           DEBUG    GET client 'tcpkiss'                                                                                client.py:244
01/07/2023 06:37:43 AM RX_MSG               DEBUG    Starting                                                                                              aprsd.py:70
01/07/2023 06:37:43 AM RX_MSG               DEBUG    ==> setup_connection: call "{'self': <aprsd.client.KISSClient object at 0xb4407b68>}"                 trace.py:48
01/07/2023 06:37:43 AM ProcessPKT           DEBUG    Starting                                                                                              aprsd.py:70
01/07/2023 06:37:43 AM RX_MSG               DEBUG    KISS(8.0.0) TCP Connection to localhost:8001                                                           kiss.py:35
01/07/2023 06:37:43 AM KeepAlive            DEBUG    Starting                                                                                              aprsd.py:70
01/07/2023 06:37:43 AM RPCThread            DEBUG    Starting                                                                                              aprsd.py:70
01/07/2023 06:37:43 AM RX_MSG               DEBUG    Starting KISS interface connection                                                                     kiss.py:48
01/07/2023 06:37:43 AM LogMonitorThread     DEBUG    Starting                                                                                              aprsd.py:70
01/07/2023 06:37:43 AM KeepAlive            INFO     KF7HVM-2 - Uptime 00:00:00 RX:0 TX:0 Tracker:0 Msgs TX:0 RX:0 Last:00:00:00 Email: N/A - RAM     keep_alive.py:66
                                                     Current:58KB Peak:63KB Threads:5                                                                                 
01/07/2023 06:37:43 AM RX_MSG               DEBUG    <== setup_connection: exception (84ms) RuntimeError("can't register atexit after shutdown")           trace.py:61
Exception in thread RX_MSG:
Traceback (most recent call last):
  File "/usr/lib/python3.9/threading.py", line 954, in _bootstrap_inner
    self.run()
  File "/home/masen/aprsd/aprsd/threads/aprsd.py", line 72, in run
    can_loop = self.loop()
  File "/home/masen/aprsd/aprsd/threads/rx.py", line 39, in loop
    self._client.client.consumer(
  File "/home/masen/aprsd/aprsd/client.py", line 52, in client
    self._client = self.setup_connection()
  File "/home/masen/aprsd/aprsd/utils/trace.py", line 58, in trace_logging_wrapper
    result = f(*args, **kwargs)
  File "/home/masen/aprsd/aprsd/client.py", line 217, in setup_connection
    client = kiss.KISS3Client()
  File "/home/masen/aprsd/aprsd/clients/kiss.py", line 18, in __init__
    self.setup()
  File "/home/masen/aprsd/aprsd/clients/kiss.py", line 49, in setup
    self.kiss.start()
  File "/home/masen/aprsd/.aprsd-venv/lib/python3.9/site-packages/kiss/classes.py", line 62, in start
    _, self.protocol = self.loop.run_until_complete(
  File "/usr/lib/python3.9/asyncio/base_events.py", line 642, in run_until_complete
    return future.result()
  File "/home/masen/aprsd/.aprsd-venv/lib/python3.9/site-packages/kiss/kiss.py", line 241, in create_tcp_connection
    return await _generic_create_connection(
  File "/home/masen/aprsd/.aprsd-venv/lib/python3.9/site-packages/kiss/kiss.py", line 212, in _generic_create_connection
    transport, protocol = await f(*args, **kwargs)
  File "/usr/lib/python3.9/asyncio/base_events.py", line 1017, in create_connection
    infos = await self._ensure_resolved(
  File "/usr/lib/python3.9/asyncio/base_events.py", line 1396, in _ensure_resolved
    return await loop.getaddrinfo(host, port, family=family, type=type,
  File "/usr/lib/python3.9/asyncio/base_events.py", line 856, in getaddrinfo
    return await self.run_in_executor(
  File "/usr/lib/python3.9/asyncio/base_events.py", line 809, in run_in_executor
    executor = concurrent.futures.ThreadPoolExecutor(
  File "/usr/lib/python3.9/concurrent/futures/__init__.py", line 49, in __getattr__
    from .thread import ThreadPoolExecutor as te
  File "/usr/lib/python3.9/concurrent/futures/thread.py", line 37, in <module>
    threading._register_atexit(_python_exit)
  File "/usr/lib/python3.9/threading.py", line 1374, in _register_atexit
    raise RuntimeError("can't register atexit after shutdown")
RuntimeError: can't register atexit after shutdown

This is on a raspi 4B with the latest OS + updates

Linux raspberrypi 5.15.84-v7l+ #1613 SMP Thu Jan 5 12:01:26 GMT 2023 armv7l GNU/Linux

Interestingly, when I remove the --loglevel DEBUG the traceback is not printed and kiss3 actually proceeds to connect to direwolf and function normally.

Can you try without the --loglevel DEBUG and see if your able to workaround?

@masenf
Copy link
Author

masenf commented Jan 7, 2023

I made a small change for debugging

diff --git a/aprsd/client.py b/aprsd/client.py
index b8bbf82..eb035e2 100644
--- a/aprsd/client.py
+++ b/aprsd/client.py
@@ -2,6 +2,8 @@ import abc
 import logging
 import time
 
+from concurrent.futures import ThreadPoolExecutor
+
 import aprslib
 from aprslib.exceptions import LoginError
 from oslo_config import cfg

and now the traceback is slightly different, but to the same tune

Exception in thread RX_MSG:
Traceback (most recent call last):
  File "/usr/lib/python3.9/threading.py", line 954, in _bootstrap_inner
    self.run()
  File "/home/masen/aprsd/aprsd/threads/aprsd.py", line 72, in run
    can_loop = self.loop()
  File "/home/masen/aprsd/aprsd/threads/rx.py", line 39, in loop
    self._client.client.consumer(
  File "/home/masen/aprsd/aprsd/client.py", line 54, in client
    self._client = self.setup_connection()
  File "/home/masen/aprsd/aprsd/utils/trace.py", line 58, in trace_logging_wrapper
    result = f(*args, **kwargs)
  File "/home/masen/aprsd/aprsd/client.py", line 219, in setup_connection
    client = kiss.KISS3Client()
  File "/home/masen/aprsd/aprsd/clients/kiss.py", line 18, in __init__
    self.setup()
  File "/home/masen/aprsd/aprsd/clients/kiss.py", line 49, in setup
    self.kiss.start()
  File "/home/masen/aprsd/.aprsd-venv/lib/python3.9/site-packages/kiss/classes.py", line 62, in start
    _, self.protocol = self.loop.run_until_complete(
  File "/usr/lib/python3.9/asyncio/base_events.py", line 642, in run_until_complete
    return future.result()
  File "/home/masen/aprsd/.aprsd-venv/lib/python3.9/site-packages/kiss/kiss.py", line 241, in create_tcp_connection
    return await _generic_create_connection(
  File "/home/masen/aprsd/.aprsd-venv/lib/python3.9/site-packages/kiss/kiss.py", line 212, in _generic_create_connection
    transport, protocol = await f(*args, **kwargs)
  File "/usr/lib/python3.9/asyncio/base_events.py", line 1017, in create_connection
    infos = await self._ensure_resolved(
  File "/usr/lib/python3.9/asyncio/base_events.py", line 1396, in _ensure_resolved
    return await loop.getaddrinfo(host, port, family=family, type=type,
  File "/usr/lib/python3.9/asyncio/base_events.py", line 856, in getaddrinfo
    return await self.run_in_executor(
  File "/usr/lib/python3.9/asyncio/base_events.py", line 814, in run_in_executor
    executor.submit(func, *args), loop=self)
  File "/usr/lib/python3.9/concurrent/futures/thread.py", line 163, in submit
    raise RuntimeError('cannot schedule new futures after '
RuntimeError: cannot schedule new futures after interpreter shutdown

Looking at the code /usr/lib/python3.9/concurrent/futures/thread.py:163

            if _shutdown:
                raise RuntimeError('cannot schedule new futures after '
                                   'interpreter shutdown')

that _shutdown is an interpreter global that is only munged while a lock is held; from thread.py

 17 _threads_queues = weakref.WeakKeyDictionary()
 18 _shutdown = False
 19 # Lock that ensures that new workers are not created while the interpreter is
 20 # shutting down. Must be held while mutating _threads_queues and _shutdown.
 21 _global_shutdown_lock = threading.Lock()
 22 
 23 def _python_exit():
 24     global _shutdown
 25     with _global_shutdown_lock:
 26         _shutdown = True
 27     items = list(_threads_queues.items())
 28     for t, q in items:
 29         q.put(None)
 30     for t, q in items:
 31         t.join()
 32 
 33 # Register for `_python_exit()` to be called just before joining all
 34 # non-daemon threads. This is used instead of `atexit.register()` for
 35 # compatibility with subinterpreters, which no longer support daemon threads.
 36 # See bpo-39812 for context.
 37 threading._register_atexit(_python_exit)

So I added a call to traceback.print_stack() at the point where _shutdown = True and the stack looks like

_global_shudown_lock held by: <_MainThread(MainThread, stopped 3069897088)>
  File "/usr/lib/python3.9/threading.py", line 1415, in _shutdown
    atexit_call()
  File "/usr/lib/python3.9/concurrent/futures/thread.py", line 27, in _python_exit
    import traceback; traceback.print_stack()

And it occurs before kiss3 gets a chance to make its connection, it's actually interleaved with the other output:

01/07/2023 07:15:17 AM RX_MSG               DEBUG    KISS(8.0.0) TCP Connection to localhost:8001                                                           kiss.py:35
01/07/2023 07:15:17 AM KeepAlive            DEBUG    Starting                                                                                              aprsd.py:70
_global_shudown_lock held by: <_MainThread(MainThread, stopped 3070056832)>
01/07/2023 07:15:17 AM KeepAlive            INFO     KF7HVM-2 - Uptime 00:00:00 RX:0 TX:0 Tracker:0 Msgs TX:0 RX:0 Last:00:00:00 Email: N/A - RAM     keep_alive.py:66
                                                     Current:59KB Peak:63KB Threads:5                                                                                 
01/07/2023 07:15:17 AM RX_MSG               DEBUG    Starting KISS interface connection                                                                     kiss.py:48
01/07/2023 07:15:17 AM LogMonitorThread     DEBUG    Starting                                                                                              aprsd.py:70
01/07/2023 07:15:17 AM RPCThread            DEBUG    Starting                                                                                              aprsd.py:70
  File "/usr/lib/python3.9/threading.py", line 1415, in _shutdown
    atexit_call()
  File "/usr/lib/python3.9/concurrent/futures/thread.py", line 27, in _python_exit
    import traceback; traceback.print_stack()
01/07/2023 07:15:17 AM RX_MSG               DEBUG    <== setup_connection: exception (87ms) RuntimeError('cannot schedule new futures after interpreter    trace.py:61
                                                     shutdown')

With this new insight, it was clear to me that the MainThread was exiting, causing that _shutdown flag to be set before asyncio had a chance to even get set up. I ran the command again without DEBUG logging and observed that the MainThread sets _shutdown after the kiss connection comes up. This is a classic race condition. Since debug logging slows the program down enough, it allows the main thread to exit before the child threads have fully initialized.

Adding some code like the following can "beat the race"

diff --git a/aprsd/cmds/server.py b/aprsd/cmds/server.py
index ff5f74e..640eb28 100644
--- a/aprsd/cmds/server.py
+++ b/aprsd/cmds/server.py
@@ -1,6 +1,7 @@
 import logging
 import signal
 import sys
+import time
 
 import click
 from oslo_config import cfg
@@ -105,4 +106,7 @@ def server(ctx, flush):
         log_monitor = threads.log_monitor.LogMonitorThread()
         log_monitor.start()
 
+    LOG.debug("Waiting a second before exiting the MainThread...")
+    time.sleep(1)
+
     return 0

The raspi is slow enough to expose this race condition that was probably not observed on other, faster platforms.


However, I'm not sure just adding a sleep is the best way to proceed.

I admittedly don't do much direct work with threading in python these days, but I think general practice is to keep the MainThread alive as long as the process "should" be running, if it has no other purpose, it can be used to monitor the other threads.

I would probably refactor away the explicit SIGINT handling, and instead replace that with a wait on some threading.Event from the MainThread that you would fire when all of the tracked threads have exited. Then if you catch KeyboardInterrupt in the main thread while waiting for that event, call stop_all(), trigger the .save() calls on each thread, and then actually join() each thread from the MainThread.


As for this issue, I'm not seeing any immediate code fix needed on the kiss3 side.

@hemna
Copy link

hemna commented Jan 7, 2023

wow, thanks so much for the really in depth debugging of this. I was a bit stumped and it seemed there was an issue with asyncio, but clearly I messed up and failed to recognize the obvious here. You are exactly correct the main thread doesn't wait for all the other threads to complete before it bails. I have added a simple join() to one of my long running threads and that has solved it. Crazy stupid error on my part. I had been testing on my macbook pro and linux HP dl360 server for ages and hadn't run into this until I tested it on the rpi3.

I also didn't notice that shellingham was yanked from pypi. that must have been done recently as I have been updating the requirements.txt from the requirements.in (via make update-requirements).

I don't really get asyncio, as I'm a threading person....or at least I thought I was until this error! lol. Thanks a million.

@masenf masenf closed this as completed Jan 7, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants