Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

witness_node does not quit when pressed Ctrl+C #1856

Closed
17 tasks
abitmore opened this issue Jul 19, 2019 · 9 comments · Fixed by #2204
Closed
17 tasks

witness_node does not quit when pressed Ctrl+C #1856

abitmore opened this issue Jul 19, 2019 · 9 comments · Fixed by #2204
Assignees
Labels
3d Bug Classification indicating the existing implementation does not match the intention of the design

Comments

@abitmore
Copy link
Member

Bug Description
If the witness_node is started with --rpc-endpoint, it sometimes (but not always) doesn't quit when pressed Ctrl+C. When pressing Ctrl+C again, it will quit.

Impacts
Describe which portion(s) of BitShares Core may be impacted by this bug. Please tick at least one box.

  • API (the application programming interface)
  • Build (the build process or something prior to compiled code)
  • CLI (the command line wallet)
  • Deployment (the deployment process after building such as Docker, Travis, etc.)
  • DEX (the Decentralized EXchange, market engine, etc.)
  • P2P (the peer-to-peer network for transaction/block propagation)
  • Performance (system or user efficiency, etc.)
  • Protocol (the blockchain logic, consensus, validation, etc.)
  • Security (the security of system or user data, etc.)
  • UX (the User Experience)
  • Other (please add below)

Steps To Reproduce
Steps to reproduce the behavior (example outlined below):

  1. Execute API call '...'
  2. Using JSON payload '...'
  3. Received response '...'
  4. See error in screenshot

Expected Behavior
A clear and concise description of what you expected to happen.

Screenshots (optional)
If applicable, add screenshots to help explain process flow and behavior.

Host Environment
Please provide details about the host environment. Much of this information can be found running: witness_node --version.

  • Host OS: [e.g. Ubuntu 18.04 LTS]
  • Host Physical RAM [e.g. 4GB]
  • BitShares Version: [e.g. 2.0.180425]
  • OpenSSL Version: [e.g. 1.1.0g]
  • Boost Version: [e.g. 1.65.1]

CORE TEAM TASK LIST

  • Evaluate / Prioritize Bug Report
  • Refine User Stories / Requirements
  • Define Test Cases
  • Design / Develop Solution
  • Perform QA/Testing
  • Update Documentation
@abitmore abitmore added the 3d Bug Classification indicating the existing implementation does not match the intention of the design label Jul 19, 2019
@abitmore abitmore added this to the 3.3.0 - Feature Release milestone Jul 19, 2019
@abitmore abitmore self-assigned this Jul 19, 2019
@abitmore
Copy link
Member Author

Probably related: bitshares/bitshares-fc#136 (comment)

@abitmore
Copy link
Member Author

abitmore commented Jul 19, 2019

I think the "hang" behavior mentioned in #1303 (comment) is the same bug.

@abitmore
Copy link
Member Author

Stack backtrace when it occurs:

(gdb) thread apply all bt
 
Thread 5 (Thread 0x7fffe3fff700 (LWP 12802)):
#0  pthread_cond_wait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
#1  0x00000000010abf6b in boost::condition_variable::wait(boost::unique_lock<boost::mutex>&) ()
#2  0x00000000010adac9 in fc::thread_d::process_tasks() ()
#3  0x00000000010adbd4 in fc::thread_d::start_process_tasks(long) ()
#4  0x00000000017e7eb1 in make_fcontext ()
#5  0x0000000000000000 in ?? ()
 
Thread 1 (Thread 0x7ffff7fd8780 (LWP 12798)):
#0  pthread_cond_wait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
#1  0x000000000179a0c3 in boost::thread::join_noexcept() ()
#2  0x00000000010b5fa8 in fc::asio::default_io_service_scope::~default_io_service_scope() ()
#3  0x00007ffff66f3ff8 in __run_exit_handlers (status=0, listp=0x7ffff6a7e5f8 <__exit_funcs>, run_list_atexit=run_list_atexit@entry=true) at exit.c:82
#4  0x00007ffff66f4045 in __GI_exit (status=<optimized out>) at exit.c:104
#5  0x00007ffff66da837 in __libc_start_main (main=0xacddf0 <main>, argc=6, argv=0x7fffffffe4b8, init=<optimized out>, fini=<optimized out>,
    rtld_fini=<optimized out>, stack_end=0x7fffffffe4a8) at ../csu/libc-start.c:325
#6  0x0000000000ad3bb9 in _start ()

@nathanielhourt
Copy link
Contributor

I think this goes away if I drop a fc::usleep(fc::milliseconds(100)); at the end of ~websocket_server_impl() (websocket.cpp:303)... Can anyone confirm?

@nathanielhourt
Copy link
Contributor

Some comments: I am specifically dealing with the cli_test hang, which I find is easiest to reproduce if I just run the cli_create_htlc test while I have a stress test running in the background (I used the stress tool with stress --cpu 5 --vm 5). I'm not sure if this is the same hang as affects witness_node

@abitmore
Copy link
Member Author

abitmore commented Aug 9, 2019

We can't add sleep to ~websocket_server_impl() because it would affect API nodes.

@nathanielhourt
Copy link
Contributor

Do we know exactly what's causing the hang?

@pmconrad
Copy link
Contributor

(gdb) thread apply all bt

Thread 3 (Thread 0x7fa64bfff700 (LWP 21084)):
#0  0x00007fa69f95f90d in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x000000000153a254 in boost::asio::detail::posix_event::wait<boost::asio::detail::conditionally_enabled_mutex::scoped_lock> (lock=..., this=<optimized out>)
    at /usr/include/boost/asio/detail/posix_event.hpp:106
#2  boost::asio::detail::conditionally_enabled_event::wait (lock=..., this=0x7fa66808c818)
    at /usr/include/boost/asio/detail/conditionally_enabled_event.hpp:89
#3  boost::asio::detail::scheduler::do_run_one (ec=..., this_thread=..., lock=..., 
    this=<optimized out>) at /usr/include/boost/asio/detail/impl/scheduler.ipp:409
#4  boost::asio::detail::scheduler::run (this=0x7fa66808c7b0, ec=...)
    at /usr/include/boost/asio/detail/impl/scheduler.ipp:154
#5  0x000000000153bd19 in boost::asio::io_context::run (this=<optimized out>)
    at /usr/include/boost/asio/impl/io_context.ipp:62
#6  boost::asio::detail::resolver_service_base::work_io_context_runner::operator() (
    this=<optimized out>) at /usr/include/boost/asio/detail/impl/resolver_service_base.ipp:32
#7  boost::asio::detail::posix_thread::func<boost::asio::detail::resolver_service_base::work_io_context_runner>::run (this=<optimized out>) at /usr/include/boost/asio/detail/posix_thread.hpp:86
#8  0x0000000001537e12 in boost::asio::detail::boost_asio_detail_posix_thread_function (
    arg=0x53c0870) at /usr/include/boost/asio/detail/impl/posix_thread.ipp:74
#9  0x00007fa69f959569 in start_thread () from /lib64/libpthread.so.0
#10 0x00007fa69e925a2f in clone () from /lib64/libc.so.6

Thread 2 (Thread 0x7fa688b65700 (LWP 21057)):
#0  0x00007fa69f95f90d in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x000000000152e44c in boost::condition_variable::wait (m=..., this=0x7fa678000b48)
    at /usr/include/boost/thread/pthread/condition_variable.hpp:81
#2  fc::thread_d::process_tasks (this=this@entry=0x7fa678000b30)
    at .../libraries/fc/src/thread/thread_d.hpp:615
#3  0x000000000152e5d7 in fc::thread_d::start_process_tasks (my=...)
    at .../libraries/fc/src/thread/thread_d.hpp:514
#4  0x00007fa6a0099fdf in make_fcontext () from /usr/lib64/libboost_context.so.1.66.0
#5  0x0000000000000000 in ?? ()

Thread 1 (Thread 0x7fa6a2153880 (LWP 21051)):
#0  0x00007fa69f95f90d in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007fa6a1030bc9 in boost::condition_variable::wait(boost::unique_lock<boost::mutex>&) ()
   from /usr/lib64/libboost_thread.so.1.66.0
#2  0x00007fa6a102a374 in boost::thread::join_noexcept() ()
   from /usr/lib64/libboost_thread.so.1.66.0
#3  0x00000000015338d0 in boost::thread::join (this=0x5186630)
    at /usr/include/boost/thread/detail/thread.hpp:775
#4  fc::asio::default_io_service_scope::~default_io_service_scope (
    this=0x44cc000 <fc::asio::default_io_service()::fc_asio_service>, __in_chrg=<optimized out>)
    at .../libraries/fc/src/asio.cpp:169
#5  0x00007fa69e865d78 in __run_exit_handlers () from /lib64/libc.so.6
#6  0x00007fa69e865dca in exit () from /lib64/libc.so.6
#7  0x00007fa69e84df91 in __libc_start_main () from /lib64/libc.so.6
#8  0x0000000000f2150a in _start () at ../sysdeps/x86_64/start.S:120

The default_io_service_scope is a static variable inside a function in fc. The destructor of the this is called extremely late, after main() has returned. Thread 1 is the main thread, it is waiting for the asio worker thread to terminate. That in turn is waiting on a mutex. Unsure what thread 2 is.

The only filehandles that are open are stdin/out/err, the logfiles, a pipe, and some event/timer fds.

# ls -l /proc/21051/fd
total 0
lrwx------ 1 peter users 64 Sep 16 16:07 0 -> /dev/pts/7
lrwx------ 1 peter users 64 Sep 16 16:10 1 -> /dev/pts/7
lrwx------ 1 peter users 64 Sep 16 16:10 2 -> /dev/pts/7
l-wx------ 1 peter users 64 Sep 16 16:10 3 -> /tmp/bts-test/logs/node/node.log.20190916T100000
l-wx------ 1 peter users 64 Sep 16 16:10 4 -> /tmp/bts-test/logs/p2p/p2p.log.20190916T140000
lr-x------ 1 peter users 64 Sep 16 16:10 40 -> pipe:[15692281]
l-wx------ 1 peter users 64 Sep 16 16:10 41 -> pipe:[15692281]
lrwx------ 1 peter users 64 Sep 16 16:10 7 -> anon_inode:[eventfd]
lrwx------ 1 peter users 64 Sep 16 16:10 8 -> anon_inode:[eventpoll]
lrwx------ 1 peter users 64 Sep 16 16:10 9 -> anon_inode:[timerfd]

@abitmore
Copy link
Member Author

Should be fixed by #2204.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3d Bug Classification indicating the existing implementation does not match the intention of the design
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants