Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Debugger freezes when triggering timeout inside a timeout block with Puma server #877

Closed
st0012 opened this issue Dec 23, 2022 · 8 comments · Fixed by #947
Closed

Debugger freezes when triggering timeout inside a timeout block with Puma server #877

st0012 opened this issue Dec 23, 2022 · 8 comments · Fixed by #947
Labels
bug Something isn't working

Comments

@st0012
Copy link
Member

st0012 commented Dec 23, 2022

Your environment

  • ruby -v: 3.1.3
  • rdbg -v: 1.7.1

Describe the bug

When the following conditions all match, the debugger has a high possibility to freeze:

  • Inside a Rails app with Puma web server

  • The breakpoint is inside a timeout call, like

      def hello
        Timeout.timeout(3600) do
          binding.b
        end
      end
  • From the breakpoint, executes something that calls Timeout.timeout underneath

To Reproduce

I've documented reproduction steps in this repo.

Expected behavior

The debugger doesn't freeze.

Additional context

It may not be the root cause, but Timeout's thread being stopped by the debugger plays a major role here.

If we somehow modify this method to skip pausing the Timeout thread, like

    private def thread_stopper
      TracePoint.new(:line) do
        # run on each thread
        tc = ThreadClient.current
        next if tc.management?
        next unless tc.running?
        next if tc == @tc
        next if tc.thread.name && tc.thread.name.match?(/Timeout/)

        tc.on_pause
      end
    end

Then the issue doesn't occur.

@ko1
Copy link
Collaborator

ko1 commented Dec 23, 2022

Thank you for the report.

  • Is it difficult to repro within a single file? (without puma)
  • Can you show the backtrace when thread_stopper's TP stoppes?

by modifying like that:

   private def thread_stopper
      TracePoint.new(:line) do
        # run on each thread
        tc = ThreadClient.current
        next if tc.management?
        next unless tc.running?
        next if tc == @tc
        STDERR.puts caller.inspect if tc.thread.name.match?(/Timeout/)

        tc.on_pause
      end
    end

@ko1
Copy link
Collaborator

ko1 commented Dec 23, 2022

Anyway, I assume that it is difficult to fix in theory.

@st0012
Copy link
Member Author

st0012 commented Dec 23, 2022

Is it difficult to repro within a single file? (without puma)

I tried it but didn't succeed. I suspect running it from a thread pool is one criteria and will experiment it later.

Can you show the backtrace when thread_stopper's TP stoppes?

"/Users/hung-wulo/.gem/ruby/3.1.2/gems/timeout-0.3.1/lib/timeout.rb:112:in `block (2 levels) in create_timeout_thread'", 
"/Users/hung-wulo/.gem/ruby/3.1.2/gems/timeout-0.3.1/lib/timeout.rb:111:in `synchronize'", 
"/Users/hung-wulo/.gem/ruby/3.1.2/gems/timeout-0.3.1/lib/timeout.rb:111:in `block in create_timeout_thread'"

Anyway, I assume that it is difficult to fix in theory.

Yeah I assumed that to be the case.

In the short-term I'll try to skip the Timeout thread pause to workaround it. Because many code paths in our app is wrapped inside a timeout block (from libs), and our devs constantly hang in those code paths.

@st0012
Copy link
Member Author

st0012 commented Jan 16, 2023

Sorry for the delay, I managed to reproduce this with a single Ruby file:

# test.rb
def timeout
  Timeout.timeout(3) do
    puts "Something that needs timeout"
  end
end

Thread.new do
  Timeout.timeout(100) do
    binding.b
  end
end

# just to prevent the main thread from existing too early
sleep 100
$ rdbg -n test.rb

[4, 13] in test.rb
     4|   end
     5| end
     6| 
     7| Thread.new do
     8|   Timeout.timeout(100) do
=>   9|     binding.b
    10|   end
    11| end
    12| 
    13| # just to prevent the main thread from existing too early
=>#0    block in <main> (2 levels) at test.rb:9
  #1    block {|exc=#<Timeout::Error: Timeout::Error>|} in timeout at ~/.gem/ruby/3.1.2/gems/timeout-0.3.1/lib/timeout.rb:199
  # and 5 frames (use `bt' command for all frames)
(rdbg) timeout
Something that needs timeout
nil
(rdbg) timeout
# hangs

It produces like 9 out of 10 times.

And if it helps, here's the output with some debugging messages from my patch in timeout:

$ rdbg -n test.rb
QUEUE_MUTEX locked? false
[Timeout.timeout] acquiring QUEUE_MUTEX
[Timeout.timeout] released QUEUE_MUTEX
QUEUE_MUTEX locked? false
[Timeout thread] acquiring QUEUE_MUTEX
[4, 13] in test.rb
     4|   end
     5| end
     6| 
     7| Thread.new do
     8|   Timeout.timeout(100) do
=>   9|     binding.b
    10|   end
    11| end
    12| 
    13| # just to prevent the main thread from existing too early
=>#0    block in <main> (2 levels) at test.rb:9
  #1    block {|exc=#<Timeout::Error: Timeout::Error>|} in timeout at ~/.gem/ruby/3.1.2/gems/timeout-0.3.1/lib/timeout.rb:199
  # and 5 frames (use `bt' command for all frames)
(rdbg) timeout
QUEUE_MUTEX locked? false
[Timeout.timeout] acquiring QUEUE_MUTEX
[Timeout.timeout] released QUEUE_MUTEX
QUEUE_MUTEX locked? false
Something that needs timeout
nil
(rdbg) timeout
QUEUE_MUTEX locked? true
[Timeout.timeout] acquiring QUEUE_MUTEX

I think the main cause is that the timeout thread would acquire the QUEUE_MUTEX and never let go, likely because the thread would be frozen by then. So the later Timeout.timeout calls wouldn't be able to acquire the lock for synchronize.

But I don't understand why the first timeout call can still acquire the lock and it only deadlocks in the second call 🤔

@ko1 ko1 added the bug Something isn't working label Mar 7, 2023
@st0012
Copy link
Member Author

st0012 commented Mar 26, 2023

The same problem doesn't seem to happen to byebug, even though it stops all threads as well.
It seems to use C-level APIs like rb_thread_stop for thread-control. I wonder if that'll be the solution for debug too?

@ko1
Copy link
Collaborator

ko1 commented Mar 28, 2023

Now I understand the problem.

Simplified code:

th0 = Thread.new{sleep}
m = Mutex.new
th1 = Thread.new do
  m.lock
  sleep 1
  m.unlock
end

sleep 0.5
debugger do: 'm.lock'

In this case,

(1) m is locked by th1
(2) main thread entered the REPL
(3) REPL runs m.lock but th1 doesn't resumed, and m.unlock is not called forever.
(*) Because of th0, deadlock detector doesn't raise an error because th0 can make progress.


  • Solution 1: Restart all threads for the evaluation on REPL.
  • Solution 2: Do not stop all threads except the running thread (in this case, main thread).
  • Solution 3: Make timeout thread exceptional (do not stop)

(Solution 3) doesn't make sense because there are many similar cases.
(Solution 2) maybe make it difficult to debug thread programs.

So Solution 1? In fact the earlier versions restart all threads.

ko1 added a commit that referenced this issue Mar 28, 2023
When a thread keeps a lock, and REPL runs a code which needs the
lock, other threads should make a progress to release the lock.

fix #877
@ko1 ko1 closed this as completed in #947 Mar 28, 2023
ko1 added a commit that referenced this issue Mar 28, 2023
When a thread keeps a lock, and REPL runs a code which needs the
lock, other threads should make a progress to release the lock.

fix #877
st0012 added a commit to Shopify/debug that referenced this issue Mar 30, 2023
* DAP: use Object#__send__ to avoid name conflicts

* DAP: allow custom request extension in Session class

* Fix bug that "trace line" does not work after stopping trace once

* Increase timeout in debug_code

* Fix warning about unused variable

* Avoid memberless Struct

Related to Ruby bug 19416.

* Remove redundant raw detach test

The test's subject is the `disconnect` request with `restart: false,
terminateDebuggee: <true|false>` arguments, which has been covered by the
tests in `disconnect_dap_test.rb`. So this test case has become obsolete.

Co-authored-by: Andy Waite <13400+andyw8@users.noreply.github.com>

* Always propagate keyword arguments in WebSocketServer

* Fix `::Process::daemon` patch

As discussed here:
ruby@699a31e#r94469355

* Add test

* Use local variables

* Add test for info consts with an expression

Co-authored-by: Andy Waite <andyw8@users.noreply.github.com>

* Fix typo in README

* ⬆️ Update ci ruby versions

* ⏪ Revert Ruby 2.6

* ✏️ Fix typo

* CDP: Support evaluating expression on non-current frame

* CDP: support reattaching in Chrome DevTools

Close ruby#800

* Improved stability for chrome debugging

- Display the greeting message regardless of the status of invocation of
  chrome.

  This allows coming back to the debugger on a new tab when the window
  process by `UI_CDP.run_new_chrome` is killed.

- Handle `Errno::ESRCH` in `UI_CDP.cleanup_reader`.

  When the process by `UI_CDP.run_new_chrome` is killed, re-killing it
  breaks your debugging session and in turn the "debugee".

* Make `UI_ServerBase#puts` to behave like `STDERR#puts`

`UI_ServerBase#greeting` calls `#puts` with no arguments, this
breaks reconnections to the debugger.

* Handle not existing $FILENAME in `Session#process_protocol_request`

`Session#process_protocol_request` gracefully handles `Errno::ENOENT`
when `eval`-ing `$FILENAME`, and it doens't exist.

* free terminated ThreadClients

fix ruby#899

* DAP: use Mutex when sending response

When debug.gem tries to send response from multiple threads, the socket connection is closed. I confirmed this bug when using custom request from vscode-rdbg

* Make sure to fail when remote debuggee does not exit after scenarios

Because Errno::EPIPE is rescued while sending message to socket, protocol_test_case_test.rb does not pass. protocol_test_case_test.rb had been passed because ReaderThreadError was occurred and the debuggee process was still alive. Here is a scenario. After closing socket, terminated event was sent. However socket was closed, so debuggee process raised Errno::EPIPE and debugggee process was still alive. The test framework detected the status and failed.

Thus I fixed so that the test framework does not kill the debuggee process unexpectedly.

* Alias Session send methods in WebSocketServer 

Methods ``respond``, ``respond_fail`` and ``fire_event`` can be aliased to ``send_response``, ``send_fail_response`` and ``send_event``, respectively.

* Fix timing bug on session_server creation

`@session_server` should be assigned at first.

`@session_server = Thread.current` in the session thread does not
work because the creator thread can access to `@session_server`
before it.

* Fix useless assignments

  Found by running `rubocop --only=Lint/UselessAssignment`

* Remove Tracer#puts as it's not used

* Avoid raising debuggee not finished error if assertions failed

When assertions failed, we should let the AssertionFailedError surface
and not calling flunk so we can get correct failure messages.

* Fix test builder

* DAP: allow custom request extension in ThreadClient class

* DAP: introduce Rdbg Record Inspector

* Revert "DAP: introduce Rdbg Record Inspector"

This reverts commit e29faba.

* relax authority check

to pass on the Windows.
fix ruby/vscode-rdbg#169

* Add test for global variable support in DAP and CDP

* Add test for instance variable ordering in DAP

PR ruby#806 sorts instance variables by name before returning them. This
commit adds a test that verifies this functionality under the DAP
protocol.

* Fix incorrect method name

* dap/cdp_result -> protocol_result

From ThreadClient it retunrs :dap_result or :cdp_result to the SESSION.
This patch renames it to `:protocol_result` and it will be handled
by `process_protocol_result`.

* separate 'test_console' and 'test_test'

* `test_all` should also run `test_test`

* Omit slow tests for healty CI

* add workflow

* DAP: rename the method name of custom request

Follow up for ruby#939

Change the name from "request_..." to "custom_dap_request_" in UI_DAP and ThreadClient for consistency

* DAP: support custom request in session class

UI_DAP -> Session: custom_dap_request_...

Session -> ThreadClient: custom_dap_request_event_...

Add "request_event" prefix to clarify it is a response (not Events in DAP)

* Add test for DAP's command execution through evaluate request

* Enqueue DAP's evaluate command right away

* DAP: echo back the given command

on `, debug_command` form.

* remain breakpoints on reloaded files

Breakpoints should be remained on reloaded files. To make sure
maintaining loaded file names.

fix ruby#870

* Use better approach to signal remote process' exit status

1. `remote_info.failed_process` stores symbol but we only use the value's
   presence to check if the process is force-killed.
2. The force-killed status is directly controlled by `kill_safely` through
    `kill_remote_debuggee`, which is directly called right before we check
    the status with `remote_info.failed_process`.

Combining the two, we can just let `kill_safely` and `kill_remote_debuggee` to
return the force-killed status and not storing it in `remote_info`, which
already contains a bunch of information.

This also eliminates the need to pass `test_info` to `kill_safely`, which
makes related code easier to understand and maintain.

* Remove unused failed_process attribute

* Add "result" as an argument to custom_dap_request_event method

The generated result in ThreadClient is passed as "result". We usually use it when returning responses to VS Code in Session class

* restart all threads on eval

When a thread keeps a lock, and REPL runs a code which needs the
lock, other threads should make a progress to release the lock.

fix ruby#877

* CDP: support remote debugging in different environment

From investigation by @ko1-san, the path to set breakpoints seems to be sent in "url" field when debugging in different environment such as WSL(running debuggee) and Windows(running Chrome DevTools). I supported "url" field in this PR.

* `-v` prints version and do something

Without this patch, `rdbg -v target.rb` prints version and terminates
the process. With this pach, `-v` prints version and starts debugging
for `target.rb`.

If no filename is given, terminates the process.

* v1.7.2

* Revert "Workaround VS Code's breakpoint skipping issue"

This reverts commit 4313d91.

* Revert "Avoid locking all threads on debugger suspension"

This reverts commit 1e0f45c.

* Revert "Always insert preset commands when executing commands from DAP (#4)"

This reverts commit 8eea4c9.

---------

Co-authored-by: Naoto Ono <onoto1998@gmail.com>
Co-authored-by: Takashi Kokubun <takashikkbn@gmail.com>
Co-authored-by: Andy Waite <andyw8@users.noreply.github.com>
Co-authored-by: Jeremy Evans <code@jeremyevans.net>
Co-authored-by: Andy Waite <13400+andyw8@users.noreply.github.com>
Co-authored-by: Mau Magnaguagno <maumagnaguagno@gmail.com>
Co-authored-by: Vinicius Stock <vinicius.stock@shopify.com>
Co-authored-by: yamashush <38120991+yamashush@users.noreply.github.com>
Co-authored-by: Jose D. Gomez R <jose.gomez@suse.com>
Co-authored-by: Koichi Sasada <ko1@atdot.net>
Co-authored-by: Emily Samp <emily.giurleo@shopify.com>
@st0012
Copy link
Member Author

st0012 commented Apr 20, 2023

@ko1 I think the problem was not fully resolved with #947:

  • In console, if you type Arrow-Up it would still hang. This is because when doing so, reline uses Timeout.timeout underneath.
    • For this problem we probably need a different solution? I'm not sure if we can detect the special keystrokes here to restart threads.
  • In VS Code, evaluating local variables could also cause hanging. I'm not able to reproduce it as reliably, but I've seen it happened a few times.

Would you consider making Solution 2 a configurable option?

@ko1
Copy link
Collaborator

ko1 commented May 4, 2023

Would you consider making Solution 2 a configurable option?

It should be introduced carefully.
Anyway, please file new issue.

st0012 added a commit to ruby/reline that referenced this issue Aug 14, 2023
Timeout's implementation relies on Thread, which would conflict with
`ruby/debug`'s thread-freezing implementation and has casued issues like

- ruby/debug#877
- ruby/debug#934
- ruby/debug#1000

This commit avoids the issue by completely removing the use of Timeout.
st0012 added a commit to ruby/reline that referenced this issue Aug 14, 2023
`ruby/debug`'s thread-freezing implementation and has casued issues like

- ruby/debug#877
- ruby/debug#934
- ruby/debug#1000

This commit avoids the issue by completely removing the use of Timeout.
st0012 added a commit to ruby/reline that referenced this issue Aug 14, 2023
Timeout's implementation relies on Thread, which would conflict with
`ruby/debug`'s thread-freezing implementation and has casued issues like

- ruby/debug#877
- ruby/debug#934
- ruby/debug#1000

This commit avoids the issue by completely removing the use of Timeout.
st0012 added a commit to ruby/reline that referenced this issue Aug 14, 2023
Timeout's implementation relies on Thread, which would conflict with
`ruby/debug`'s thread-freezing implementation and has casued issues like

- ruby/debug#877
- ruby/debug#934
- ruby/debug#1000

This commit avoids the issue by completely removing the use of Timeout.
st0012 added a commit to ruby/reline that referenced this issue Aug 16, 2023
Timeout's implementation relies on Thread, which would conflict with
`ruby/debug`'s thread-freezing implementation and has casued issues like

- ruby/debug#877
- ruby/debug#934
- ruby/debug#1000

This commit avoids the issue by completely removing the use of Timeout.
tompng pushed a commit to ruby/reline that referenced this issue Aug 20, 2023
Timeout's implementation relies on Thread, which would conflict with
`ruby/debug`'s thread-freezing implementation and has casued issues like

- ruby/debug#877
- ruby/debug#934
- ruby/debug#1000

This commit avoids the issue by completely removing the use of Timeout.
matzbot pushed a commit to ruby/ruby that referenced this issue Aug 20, 2023
(ruby/reline#580)

Timeout's implementation relies on Thread, which would conflict with
`ruby/debug`'s thread-freezing implementation and has casued issues like

- ruby/debug#877
- ruby/debug#934
- ruby/debug#1000

This commit avoids the issue by completely removing the use of Timeout.

ruby/reline@d4f0cd3fe1
st0012 added a commit to Shopify/debug that referenced this issue Oct 26, 2023
There are several commands that evaluate objects and call methods on them,
such as:
- info
- ls (outline)
- trace object
- display

If the called method acquires a mutex shared with another thread (e.g.
when using Timeout), then it'd cause a deadlock as all threads are stopped.

For example, if there's an ActiveRecord Relation object stored as a local
variable, and the user runs the `info` command, then it'd call `inspect`
on it and trigger a database query, it could cause a deadlock as described
in ruby#877.

This commit fixes the issue by unfreezing all threads before evaluating
those commands.
st0012 added a commit to Shopify/debug that referenced this issue Oct 26, 2023
There are several commands that evaluate objects and call methods on them,
such as:
- info
- ls (outline)
- trace object
- display

If the called method acquires a mutex shared with another thread (e.g.
when using Timeout), then it'd cause a deadlock as all threads are stopped.

For example, if there's an ActiveRecord Relation object stored as a local
variable, and the user runs the `info` command, then it'd call `inspect`
on it and trigger a database query, it could cause a deadlock as described
in ruby#877.

This commit fixes the issue by unfreezing all threads before evaluating
those commands and freezing them again after receiving the result event.
st0012 added a commit to Shopify/debug that referenced this issue Oct 26, 2023
There are several commands that evaluate objects and call methods on them,
such as:
- info
- ls (outline)
- trace object
- display

If the called method acquires a mutex shared with another thread (e.g.
when using Timeout), then it'd cause a deadlock as all threads are stopped.

For example, if there's an ActiveRecord Relation object stored as a local
variable, and the user runs the `info` command, then it'd call `inspect`
on it and trigger a database query, it could cause a deadlock as described
in ruby#877.

This commit fixes the issue by unfreezing all threads before evaluating
those commands and freezing them again after receiving the result event.
ko1 pushed a commit that referenced this issue Nov 13, 2023
There are several commands that evaluate objects and call methods on them,
such as:
- info
- ls (outline)
- trace object
- display

If the called method acquires a mutex shared with another thread (e.g.
when using Timeout), then it'd cause a deadlock as all threads are stopped.

For example, if there's an ActiveRecord Relation object stored as a local
variable, and the user runs the `info` command, then it'd call `inspect`
on it and trigger a database query, it could cause a deadlock as described
in #877.

This commit fixes the issue by unfreezing all threads before evaluating
those commands and freezing them again after receiving the result event.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Development

Successfully merging a pull request may close this issue.

2 participants