Debugger freezes when triggering timeout inside a timeout block with Puma server #877

st0012 · 2022-12-23T17:52:25Z

Your environment

ruby -v: 3.1.3
rdbg -v: 1.7.1

Describe the bug

When the following conditions all match, the debugger has a high possibility to freeze:

Inside a Rails app with Puma web server

The breakpoint is inside a timeout call, like

  def hello
    Timeout.timeout(3600) do
      binding.b
    end
  end

From the breakpoint, executes something that calls Timeout.timeout underneath

To Reproduce

I've documented reproduction steps in this repo.

Expected behavior

The debugger doesn't freeze.

Additional context

It may not be the root cause, but Timeout's thread being stopped by the debugger plays a major role here.

If we somehow modify this method to skip pausing the Timeout thread, like

    private def thread_stopper
      TracePoint.new(:line) do
        # run on each thread
        tc = ThreadClient.current
        next if tc.management?
        next unless tc.running?
        next if tc == @tc
        next if tc.thread.name && tc.thread.name.match?(/Timeout/)

        tc.on_pause
      end
    end

Then the issue doesn't occur.

The text was updated successfully, but these errors were encountered:

ko1 · 2022-12-23T19:01:48Z

Thank you for the report.

Is it difficult to repro within a single file? (without puma)
Can you show the backtrace when thread_stopper's TP stoppes?

by modifying like that:

   private def thread_stopper
      TracePoint.new(:line) do
        # run on each thread
        tc = ThreadClient.current
        next if tc.management?
        next unless tc.running?
        next if tc == @tc
        STDERR.puts caller.inspect if tc.thread.name.match?(/Timeout/)

        tc.on_pause
      end
    end

ko1 · 2022-12-23T19:05:41Z

Anyway, I assume that it is difficult to fix in theory.

st0012 · 2022-12-23T19:20:16Z

Is it difficult to repro within a single file? (without puma)

I tried it but didn't succeed. I suspect running it from a thread pool is one criteria and will experiment it later.

Can you show the backtrace when thread_stopper's TP stoppes?

"/Users/hung-wulo/.gem/ruby/3.1.2/gems/timeout-0.3.1/lib/timeout.rb:112:in `block (2 levels) in create_timeout_thread'", 
"/Users/hung-wulo/.gem/ruby/3.1.2/gems/timeout-0.3.1/lib/timeout.rb:111:in `synchronize'", 
"/Users/hung-wulo/.gem/ruby/3.1.2/gems/timeout-0.3.1/lib/timeout.rb:111:in `block in create_timeout_thread'"

Anyway, I assume that it is difficult to fix in theory.

Yeah I assumed that to be the case.

In the short-term I'll try to skip the Timeout thread pause to workaround it. Because many code paths in our app is wrapped inside a timeout block (from libs), and our devs constantly hang in those code paths.

st0012 · 2023-01-16T17:52:32Z

Sorry for the delay, I managed to reproduce this with a single Ruby file:

# test.rb
def timeout
  Timeout.timeout(3) do
    puts "Something that needs timeout"
  end
end

Thread.new do
  Timeout.timeout(100) do
    binding.b
  end
end

# just to prevent the main thread from existing too early
sleep 100

$ rdbg -n test.rb

[4, 13] in test.rb
     4|   end
     5| end
     6| 
     7| Thread.new do
     8|   Timeout.timeout(100) do
=>   9|     binding.b
    10|   end
    11| end
    12| 
    13| # just to prevent the main thread from existing too early
=>#0    block in <main> (2 levels) at test.rb:9
  #1    block {|exc=#<Timeout::Error: Timeout::Error>|} in timeout at ~/.gem/ruby/3.1.2/gems/timeout-0.3.1/lib/timeout.rb:199
  # and 5 frames (use `bt' command for all frames)
(rdbg) timeout
Something that needs timeout
nil
(rdbg) timeout
# hangs

It produces like 9 out of 10 times.

And if it helps, here's the output with some debugging messages from my patch in timeout:

$ rdbg -n test.rb
QUEUE_MUTEX locked? false
[Timeout.timeout] acquiring QUEUE_MUTEX
[Timeout.timeout] released QUEUE_MUTEX
QUEUE_MUTEX locked? false
[Timeout thread] acquiring QUEUE_MUTEX
[4, 13] in test.rb
     4|   end
     5| end
     6| 
     7| Thread.new do
     8|   Timeout.timeout(100) do
=>   9|     binding.b
    10|   end
    11| end
    12| 
    13| # just to prevent the main thread from existing too early
=>#0    block in <main> (2 levels) at test.rb:9
  #1    block {|exc=#<Timeout::Error: Timeout::Error>|} in timeout at ~/.gem/ruby/3.1.2/gems/timeout-0.3.1/lib/timeout.rb:199
  # and 5 frames (use `bt' command for all frames)
(rdbg) timeout
QUEUE_MUTEX locked? false
[Timeout.timeout] acquiring QUEUE_MUTEX
[Timeout.timeout] released QUEUE_MUTEX
QUEUE_MUTEX locked? false
Something that needs timeout
nil
(rdbg) timeout
QUEUE_MUTEX locked? true
[Timeout.timeout] acquiring QUEUE_MUTEX

I think the main cause is that the timeout thread would acquire the QUEUE_MUTEX and never let go, likely because the thread would be frozen by then. So the later Timeout.timeout calls wouldn't be able to acquire the lock for synchronize.

But I don't understand why the first timeout call can still acquire the lock and it only deadlocks in the second call 🤔

st0012 · 2023-03-26T17:39:32Z

The same problem doesn't seem to happen to byebug, even though it stops all threads as well.
It seems to use C-level APIs like rb_thread_stop for thread-control. I wonder if that'll be the solution for debug too?

ko1 · 2023-03-28T08:20:12Z

Now I understand the problem.

Simplified code:

th0 = Thread.new{sleep}
m = Mutex.new
th1 = Thread.new do
  m.lock
  sleep 1
  m.unlock
end

sleep 0.5
debugger do: 'm.lock'

In this case,

(1) m is locked by th1
(2) main thread entered the REPL
(3) REPL runs m.lock but th1 doesn't resumed, and m.unlock is not called forever.
(*) Because of th0, deadlock detector doesn't raise an error because th0 can make progress.

Solution 1: Restart all threads for the evaluation on REPL.
Solution 2: Do not stop all threads except the running thread (in this case, main thread).
Solution 3: Make timeout thread exceptional (do not stop)

(Solution 3) doesn't make sense because there are many similar cases.
(Solution 2) maybe make it difficult to debug thread programs.

So Solution 1? In fact the earlier versions restart all threads.

When a thread keeps a lock, and REPL runs a code which needs the lock, other threads should make a progress to release the lock. fix #877

* DAP: use Object#__send__ to avoid name conflicts * DAP: allow custom request extension in Session class * Fix bug that "trace line" does not work after stopping trace once * Increase timeout in debug_code * Fix warning about unused variable * Avoid memberless Struct Related to Ruby bug 19416. * Remove redundant raw detach test The test's subject is the `disconnect` request with `restart: false, terminateDebuggee: <true|false>` arguments, which has been covered by the tests in `disconnect_dap_test.rb`. So this test case has become obsolete. Co-authored-by: Andy Waite <13400+andyw8@users.noreply.github.com> * Always propagate keyword arguments in WebSocketServer * Fix `::Process::daemon` patch As discussed here: ruby@699a31e#r94469355 * Add test * Use local variables * Add test for info consts with an expression Co-authored-by: Andy Waite <andyw8@users.noreply.github.com> * Fix typo in README * ⬆️ Update ci ruby versions * ⏪ Revert Ruby 2.6 * ✏️ Fix typo * CDP: Support evaluating expression on non-current frame * CDP: support reattaching in Chrome DevTools Close ruby#800 * Improved stability for chrome debugging - Display the greeting message regardless of the status of invocation of chrome. This allows coming back to the debugger on a new tab when the window process by `UI_CDP.run_new_chrome` is killed. - Handle `Errno::ESRCH` in `UI_CDP.cleanup_reader`. When the process by `UI_CDP.run_new_chrome` is killed, re-killing it breaks your debugging session and in turn the "debugee". * Make `UI_ServerBase#puts` to behave like `STDERR#puts` `UI_ServerBase#greeting` calls `#puts` with no arguments, this breaks reconnections to the debugger. * Handle not existing $FILENAME in `Session#process_protocol_request` `Session#process_protocol_request` gracefully handles `Errno::ENOENT` when `eval`-ing `$FILENAME`, and it doens't exist. * free terminated ThreadClients fix ruby#899 * DAP: use Mutex when sending response When debug.gem tries to send response from multiple threads, the socket connection is closed. I confirmed this bug when using custom request from vscode-rdbg * Make sure to fail when remote debuggee does not exit after scenarios Because Errno::EPIPE is rescued while sending message to socket, protocol_test_case_test.rb does not pass. protocol_test_case_test.rb had been passed because ReaderThreadError was occurred and the debuggee process was still alive. Here is a scenario. After closing socket, terminated event was sent. However socket was closed, so debuggee process raised Errno::EPIPE and debugggee process was still alive. The test framework detected the status and failed. Thus I fixed so that the test framework does not kill the debuggee process unexpectedly. * Alias Session send methods in WebSocketServer Methods ``respond``, ``respond_fail`` and ``fire_event`` can be aliased to ``send_response``, ``send_fail_response`` and ``send_event``, respectively. * Fix timing bug on session_server creation `@session_server` should be assigned at first. `@session_server = Thread.current` in the session thread does not work because the creator thread can access to `@session_server` before it. * Fix useless assignments Found by running `rubocop --only=Lint/UselessAssignment` * Remove Tracer#puts as it's not used * Avoid raising debuggee not finished error if assertions failed When assertions failed, we should let the AssertionFailedError surface and not calling flunk so we can get correct failure messages. * Fix test builder * DAP: allow custom request extension in ThreadClient class * DAP: introduce Rdbg Record Inspector * Revert "DAP: introduce Rdbg Record Inspector" This reverts commit e29faba. * relax authority check to pass on the Windows. fix ruby/vscode-rdbg#169 * Add test for global variable support in DAP and CDP * Add test for instance variable ordering in DAP PR ruby#806 sorts instance variables by name before returning them. This commit adds a test that verifies this functionality under the DAP protocol. * Fix incorrect method name * dap/cdp_result -> protocol_result From ThreadClient it retunrs :dap_result or :cdp_result to the SESSION. This patch renames it to `:protocol_result` and it will be handled by `process_protocol_result`. * separate 'test_console' and 'test_test' * `test_all` should also run `test_test` * Omit slow tests for healty CI * add workflow * DAP: rename the method name of custom request Follow up for ruby#939 Change the name from "request_..." to "custom_dap_request_" in UI_DAP and ThreadClient for consistency * DAP: support custom request in session class UI_DAP -> Session: custom_dap_request_... Session -> ThreadClient: custom_dap_request_event_... Add "request_event" prefix to clarify it is a response (not Events in DAP) * Add test for DAP's command execution through evaluate request * Enqueue DAP's evaluate command right away * DAP: echo back the given command on `, debug_command` form. * remain breakpoints on reloaded files Breakpoints should be remained on reloaded files. To make sure maintaining loaded file names. fix ruby#870 * Use better approach to signal remote process' exit status 1. `remote_info.failed_process` stores symbol but we only use the value's presence to check if the process is force-killed. 2. The force-killed status is directly controlled by `kill_safely` through `kill_remote_debuggee`, which is directly called right before we check the status with `remote_info.failed_process`. Combining the two, we can just let `kill_safely` and `kill_remote_debuggee` to return the force-killed status and not storing it in `remote_info`, which already contains a bunch of information. This also eliminates the need to pass `test_info` to `kill_safely`, which makes related code easier to understand and maintain. * Remove unused failed_process attribute * Add "result" as an argument to custom_dap_request_event method The generated result in ThreadClient is passed as "result". We usually use it when returning responses to VS Code in Session class * restart all threads on eval When a thread keeps a lock, and REPL runs a code which needs the lock, other threads should make a progress to release the lock. fix ruby#877 * CDP: support remote debugging in different environment From investigation by @ko1-san, the path to set breakpoints seems to be sent in "url" field when debugging in different environment such as WSL(running debuggee) and Windows(running Chrome DevTools). I supported "url" field in this PR. * `-v` prints version and do something Without this patch, `rdbg -v target.rb` prints version and terminates the process. With this pach, `-v` prints version and starts debugging for `target.rb`. If no filename is given, terminates the process. * v1.7.2 * Revert "Workaround VS Code's breakpoint skipping issue" This reverts commit 4313d91. * Revert "Avoid locking all threads on debugger suspension" This reverts commit 1e0f45c. * Revert "Always insert preset commands when executing commands from DAP (#4)" This reverts commit 8eea4c9. --------- Co-authored-by: Naoto Ono <onoto1998@gmail.com> Co-authored-by: Takashi Kokubun <takashikkbn@gmail.com> Co-authored-by: Andy Waite <andyw8@users.noreply.github.com> Co-authored-by: Jeremy Evans <code@jeremyevans.net> Co-authored-by: Andy Waite <13400+andyw8@users.noreply.github.com> Co-authored-by: Mau Magnaguagno <maumagnaguagno@gmail.com> Co-authored-by: Vinicius Stock <vinicius.stock@shopify.com> Co-authored-by: yamashush <38120991+yamashush@users.noreply.github.com> Co-authored-by: Jose D. Gomez R <jose.gomez@suse.com> Co-authored-by: Koichi Sasada <ko1@atdot.net> Co-authored-by: Emily Samp <emily.giurleo@shopify.com>

st0012 · 2023-04-20T17:00:33Z

@ko1 I think the problem was not fully resolved with #947:

In console, if you type Arrow-Up it would still hang. This is because when doing so, reline uses Timeout.timeout underneath.
- For this problem we probably need a different solution? I'm not sure if we can detect the special keystrokes here to restart threads.
In VS Code, evaluating local variables could also cause hanging. I'm not able to reproduce it as reliably, but I've seen it happened a few times.

Would you consider making Solution 2 a configurable option?

ko1 · 2023-05-04T18:51:03Z

Would you consider making Solution 2 a configurable option?

It should be introduced carefully.
Anyway, please file new issue.

Timeout's implementation relies on Thread, which would conflict with `ruby/debug`'s thread-freezing implementation and has casued issues like - ruby/debug#877 - ruby/debug#934 - ruby/debug#1000 This commit avoids the issue by completely removing the use of Timeout.

`ruby/debug`'s thread-freezing implementation and has casued issues like - ruby/debug#877 - ruby/debug#934 - ruby/debug#1000 This commit avoids the issue by completely removing the use of Timeout.

Timeout's implementation relies on Thread, which would conflict with `ruby/debug`'s thread-freezing implementation and has casued issues like - ruby/debug#877 - ruby/debug#934 - ruby/debug#1000 This commit avoids the issue by completely removing the use of Timeout.

(ruby/reline#580) Timeout's implementation relies on Thread, which would conflict with `ruby/debug`'s thread-freezing implementation and has casued issues like - ruby/debug#877 - ruby/debug#934 - ruby/debug#1000 This commit avoids the issue by completely removing the use of Timeout. ruby/reline@d4f0cd3fe1

There are several commands that evaluate objects and call methods on them, such as: - info - ls (outline) - trace object - display If the called method acquires a mutex shared with another thread (e.g. when using Timeout), then it'd cause a deadlock as all threads are stopped. For example, if there's an ActiveRecord Relation object stored as a local variable, and the user runs the `info` command, then it'd call `inspect` on it and trigger a database query, it could cause a deadlock as described in ruby#877. This commit fixes the issue by unfreezing all threads before evaluating those commands.

There are several commands that evaluate objects and call methods on them, such as: - info - ls (outline) - trace object - display If the called method acquires a mutex shared with another thread (e.g. when using Timeout), then it'd cause a deadlock as all threads are stopped. For example, if there's an ActiveRecord Relation object stored as a local variable, and the user runs the `info` command, then it'd call `inspect` on it and trigger a database query, it could cause a deadlock as described in ruby#877. This commit fixes the issue by unfreezing all threads before evaluating those commands and freezing them again after receiving the result event.

There are several commands that evaluate objects and call methods on them, such as: - info - ls (outline) - trace object - display If the called method acquires a mutex shared with another thread (e.g. when using Timeout), then it'd cause a deadlock as all threads are stopped. For example, if there's an ActiveRecord Relation object stored as a local variable, and the user runs the `info` command, then it'd call `inspect` on it and trigger a database query, it could cause a deadlock as described in #877. This commit fixes the issue by unfreezing all threads before evaluating those commands and freezing them again after receiving the result event.

st0012 mentioned this issue Dec 25, 2022

key typing as ASCII sequences, and then freezes Ruby #871

Closed

ko1 added the bug Something isn't working label Mar 7, 2023

st0012 mentioned this issue Mar 23, 2023

Debug freezes after a few keystrokes #934

Open

ko1 mentioned this issue Mar 28, 2023

restart all threads on eval #947

Merged

ko1 added a commit that referenced this issue Mar 28, 2023

restart all threads on eval

fa6b096

When a thread keeps a lock, and REPL runs a code which needs the lock, other threads should make a progress to release the lock. fix #877

ko1 closed this as completed in #947 Mar 28, 2023

ko1 added a commit that referenced this issue Mar 28, 2023

restart all threads on eval

9d2a5c0

When a thread keeps a lock, and REPL runs a code which needs the lock, other threads should make a progress to release the lock. fix #877

st0012 mentioned this issue Apr 20, 2023

Avoid locking all threads on debugger suspension Shopify/debug#16

Merged

st0012 mentioned this issue Aug 14, 2023

Remove Timeout usage ruby/reline#580

Merged

st0012 mentioned this issue Sep 14, 2023

Require Reline 0.3.8+ to avoid frozen issue #1020

Merged

st0012 mentioned this issue Oct 26, 2023

Unfreeze threads for some object-evaluating commands #1030

Merged

st0012 mentioned this issue Nov 18, 2023

Fallback to non-colored inspect when coloring takes too long ruby/irb#759

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Debugger freezes when triggering timeout inside a timeout block with Puma server #877

Debugger freezes when triggering timeout inside a timeout block with Puma server #877

st0012 commented Dec 23, 2022

ko1 commented Dec 23, 2022

ko1 commented Dec 23, 2022

st0012 commented Dec 23, 2022

st0012 commented Jan 16, 2023 •

edited

Loading

st0012 commented Mar 26, 2023 •

edited

Loading

ko1 commented Mar 28, 2023 •

edited

Loading

st0012 commented Apr 20, 2023 •

edited

Loading

ko1 commented May 4, 2023

Debugger freezes when triggering timeout inside a timeout block with Puma server #877

Debugger freezes when triggering timeout inside a timeout block with Puma server #877

Comments

st0012 commented Dec 23, 2022

ko1 commented Dec 23, 2022

ko1 commented Dec 23, 2022

st0012 commented Dec 23, 2022

st0012 commented Jan 16, 2023 • edited Loading

st0012 commented Mar 26, 2023 • edited Loading

ko1 commented Mar 28, 2023 • edited Loading

st0012 commented Apr 20, 2023 • edited Loading

ko1 commented May 4, 2023

st0012 commented Jan 16, 2023 •

edited

Loading

st0012 commented Mar 26, 2023 •

edited

Loading

ko1 commented Mar 28, 2023 •

edited

Loading

st0012 commented Apr 20, 2023 •

edited

Loading