Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

threadsafety issue with curve_keypair, libsodium, randombytes_close #4241

Closed
minrk opened this issue Aug 13, 2021 · 1 comment · Fixed by #4242
Closed

threadsafety issue with curve_keypair, libsodium, randombytes_close #4241

minrk opened this issue Aug 13, 2021 · 1 comment · Fixed by #4242

Comments

@minrk
Copy link
Member

minrk commented Aug 13, 2021

Please use this template for reporting suspected bugs or requests for help.

Issue description

#3001 reverted the refcounting of libsodium init/close, meaning that randombytes_close() can be called many times while libsodium is still in use, e.g. if zmq_curve_keypair() is called after zmq_context().

When running with libsodium, calling curve_keypair triggers zmq::random_close() which in turn unconditionally triggers randombytes_close(). Calls to e.g. crypto_box_keypair() for the internal keys in curve_server and curve_client do not call zmq::random_open() which means the following sequence occurs:

  • zmq_context(): sodium_init()
  • zmq_curve_keypair(): sodium_init(); randombytes_close()
  • zmq_connect(): crypto_box_keypair() with randombytes closed!

The same threadsafety issue can occur if one context is closed while another is in use.

This is not always a problem, as randombytes_close() is a no-op when getrandom is available, but this isn't always the case (e.g. macOS). This crypto_box_keypair() call can crash due to thread unsafety reinitializing the random data source.

From the libsodium docs:

Explicitly calling [randombytes_close] is almost never required.

It seems like the 'right' fix is to just remove the call to randombytes_close() when using libsodium. Doing so could leave at most one FD open on /dev/urandom, but not more than one. Otherwise, we need to restore the refcounting in #2636 for libsodium, which was removed in #3001, and/or call random::open/close on every curve call, not just context construct/destruct.

Environment

  • libzmq version (commit hash if unreleased): 4.3.4
  • libsodium version: 1.0.18
  • OS: macOS 11.1, linux (ubuntu-20.04 on GitHub Actions)

Minimal test code / Steps to reproduce the issue

Create a CURVE server and keep connecting clients until it crashes. Make sure to call zmq_curve_keypair() after starting a context to trigger the inappropraite use-after-close. In Python:

import itertools
import time
from threading import Thread

import zmq

server_public, server_private = zmq.curve_keypair()

url = 'tcp://127.0.0.1:5555'


def server():
    with zmq.Context() as ctx:
        with ctx.socket(zmq.PULL) as s:
            s.curve_server = True
            s.curve_publickey = server_public
            s.curve_secretkey = server_private
            s.bind(url)
            count = 0
            while True:
                try:
                    print(count, s.recv().decode("ascii"))
                except KeyboardInterrupt:
                    print("interrupted...")
                else:
                    count += 1


def client(ctx, i):
    client_public, client_private = zmq.curve_keypair() # calls randombytes_close
    with ctx.socket(zmq.PUSH) as s:
        s.curve_serverkey = server_public
        s.curve_publickey = client_public
        s.curve_secretkey = client_private
        s.connect(url) # uses closed randombytes, can crash
        s.send(str(i).encode('ascii'))


def main():
    Thread(target=server, daemon=True).start()
    time.sleep(1)
    with zmq.Context() as ctx:
        for i in itertools.count():
            # at most 100 sockets/sec
            # to avoid FD exhaustion waiting for message delivery
            time.sleep(1e-2)
            client(ctx, i)


if __name__ == "__main__":
    main()

What's the actual result? (include assertion message & call stack if applicable)

After a number of sockets (varies from 4 to 1000), crash with:

  * frame #0: 0x00007fff2037392e libsystem_kernel.dylib`__pthread_kill + 10
    frame #1: 0x00007fff203a25bd libsystem_pthread.dylib`pthread_kill + 263
    frame #2: 0x00007fff202f7406 libsystem_c.dylib`abort + 125
    frame #3: 0x0000000101e5c531 libsodium.23.dylib`sodium_misuse + 63
    frame #4: 0x0000000101e666fc libsodium.23.dylib`randombytes_sysrandom_stir + 167
    frame #5: 0x0000000101e66764 libsodium.23.dylib`randombytes_sysrandom_buf + 34
    frame #6: 0x0000000101e44f1a libsodium.23.dylib`crypto_box_curve25519xsalsa20poly1305_keypair + 26
    frame #7: 0x0000000101d66922 libzmq.5.dylib`zmq::curve_server_t::curve_server_t(this=0x0000000101045a00, session_=0x000000010103ec00, peer_address_=<unavailable>, options_=0x0000000101030618, downgrade_sub_=false) at curve_server.cpp:62:10 [opt]
    frame #8: 0x0000000101db540c libzmq.5.dylib`zmq::zmtp_engine_t::handshake_v3_x(this=0x0000000101030600, downgrade_sub_=false) at zmtp_engine.cpp:372:45 [opt]
    frame #9: 0x0000000101db52da libzmq.5.dylib`zmq::zmtp_engine_t::handshake_v3_1(this=<unavailable>) at zmtp_engine.cpp:428:32 [opt] [artificial]
    frame #10: 0x0000000101db46ca libzmq.5.dylib`zmq::zmtp_engine_t::handshake(this=0x0000000101030600) at zmtp_engine.cpp:132:10 [opt]
    frame #11: 0x0000000101d9bd0d libzmq.5.dylib`zmq::stream_engine_base_t::in_event_internal(this=0x0000000101030600) at stream_engine_base.cpp:253:13 [opt]
    frame #12: 0x0000000101d70a3b libzmq.5.dylib`zmq::kqueue_t::loop(this=0x00000001019274f0) at kqueue.cpp:218:30 [opt]
    frame #13: 0x0000000101da052d libzmq.5.dylib`thread_routine(arg_=0x0000000101927530) at thread.cpp:256:5 [opt]
    frame #14: 0x00007fff203a28fc libsystem_pthread.dylib`_pthread_start + 224
    frame #15: 0x00007fff2039e443 libsystem_pthread.dylib`thread_start + 15

What's the expected result?

No crash.

@minrk
Copy link
Member Author

minrk commented Aug 13, 2021

Adding another background thread that just issues new keys and/or opens and closes contexts makes the race much more likely (closer to every 1-20 client sockets):

import itertools
from threading import Thread

import zmq

server_public, server_private = zmq.curve_keypair()

url = 'tcp://127.0.0.1:5555'


def server():
    with zmq.Context() as ctx:
        with ctx.socket(zmq.PULL) as s:
            s.curve_server = True
            s.curve_publickey = server_public
            s.curve_secretkey = server_private
            s.bind(url)
            count = 0
            while True:
                try:
                    print(count, s.recv().decode("ascii"))
                except KeyboardInterrupt:
                    print("interrupted...")
                else:
                    count += 1


def spin_randombytes_close():
    while True:
        with zmq.Context():  # Context.term calls randombytes_close()
            pass
        # zmq.curve_keypair()  # calls randombytes_close()


def client(i):
    client_public, client_private = zmq.curve_keypair()
    with zmq.Context() as ctx:
        with ctx.socket(zmq.PUSH) as s:
            s.curve_serverkey = server_public
            s.curve_publickey = client_public
            s.curve_secretkey = client_private
            s.connect(url)
            s.send(str(i).encode('ascii'))


def main():
    Thread(target=spin_randombytes_close, daemon=True).start()
    Thread(target=server, daemon=True).start()
    for i in itertools.count():
        client(i)


if __name__ == "__main__":
    main()

@minrk minrk changed the title threadafety issue with curve_keypair, libsodium, randombytes_close threadsafety issue with curve_keypair, libsodium, randombytes_close Aug 16, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant