Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Disable GC profiling on Ruby 3 by default due to Ruby VM Ractor-related bug #2354

Merged
merged 2 commits into from
Nov 8, 2022

Conversation

ivoanjo
Copy link
Member

@ivoanjo ivoanjo commented Nov 8, 2022

What does this PR do?:

This PR:

  • Adds a toggle that can be used to enable/disable the GC profiling feature added in [PROF-6260] Enable tracking of time spent in GC #2336
  • Enables the GC profiling feature by default on Ruby 2.x and disables the feature by default on Ruby 3.x
  • Exposes a new DD_PROFILING_FORCE_ENABLE_GC / settings.profiling.advanced.force_enable_gc_profiling setting to allow customers that don't use Ractors to manually re-enable the GC profiling feature.

Motivation:

While validating that the new profiler correctly works with Ractors, I discovered that adding specs that invoked Ractors caused the previously-built GC profiling feature (and attached specs) to start failing.

I was able to track this down to an actual VM bug, which I've reported upstream as https://bugs.ruby-lang.org/issues/19112.

To avoid impacting any customers (specifically, they would potentially observe profiles with incomplete data), I decided to disable GC profiling for Ruby 3.x customers for now.

I am hoping that the Ruby core team can be convinced to fix this issue and backport it to existing Ruby 3.x stable releases, so that we can again re-enable this feature for these users.

As an alternative, if this issue sticks around for a long time and we really want this feature to be on by default for Ruby 3.x customers, there are probably are ways for us to automatically detect if Ractors are in use or not. I chose not to pursue such a mechanism for now.

Additional Notes:

(None)

How to test the change?:

The change includes test coverage. You can also observe this feature by profiling a Ruby application:

  • On Ruby 2.x you will see garbage collection frames
  • On Ruby 3.x you will not see garbage collection frames unless you force enable the feature.

…argument

In Ruby, there's no common superclass for `TrueClass` and `FalseClass`,
and thus there's no `T_BOOL` or `T_BOOLEAN` we can use for
`ENFORCE_TYPE`.

Thus, I've added a specific `ENFORCE_BOOLEAN` helper to support this
case.

(Also removed the `_type` argument which never ended up being used
in `raise_unexpected_type`.)
…ed bug

**What does this PR do?**:

This PR:

* Adds a toggle that can be used to enable/disable the GC profiling
  feature added in #2336
* Enables the GC profiling feature by default on Ruby 2.x and
  disables the feature by default on Ruby 3.x
* Exposes a new `DD_PROFILING_FORCE_ENABLE_GC` /
  `settings.profiling.advanced.force_enable_gc_profiling` setting to
  allow customers that don't use Ractors to manually re-enable the
  GC profiling feature.

**Motivation**:

While validating that the new profiler correctly works with Ractors,
I discovered that adding specs that invoked Ractors caused the
previously-built GC profiling feature (and attached specs) to start
failing.

I was able to track this down to an actual VM bug, which I've
reported upstream as <https://bugs.ruby-lang.org/issues/19112>.

To avoid impacting any customers (specifically, they would
potentially observe profiles with incomplete data), I decided to
disable GC profiling for Ruby 3.x customers for now.

I am hoping that the Ruby core team can be convinced to fix this
issue and backport it to existing Ruby 3.x stable releases, so
that we can again re-enable this feature for these users.

As an alternative, if this issue sticks around for a long time and
we really want this feature to be on by default for Ruby 3.x
customers, there are probably are ways for us to automatically
detect if Ractors are in use or not. I chose not to pursue such
a mechanism for now.

**Additional Notes**:

(None)

**How to test the change?**:

The change includes test coverage. You can also observe this
feature by profiling a Ruby application:

* On Ruby 2.x you will see garbage collection frames
* On Ruby 3.x you will not see garbage collection frames unless you
  force enable the feature.
@ivoanjo ivoanjo requested a review from a team November 8, 2022 12:31
@github-actions github-actions bot added core Involves Datadog core libraries profiling Involves Datadog profiling labels Nov 8, 2022
Comment on lines -131 to +136
rb_define_singleton_method(collectors_cpu_and_wall_time_worker_class, "_native_initialize", _native_initialize, 2);
rb_define_singleton_method(collectors_cpu_and_wall_time_worker_class, "_native_initialize", _native_initialize, 3);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For context, the 2 to 3 change here is the number of positional arguments to the method. This is documented here.

Effectively there’s three ways to “wire up” a C function to be called as a Ruby method:

  • if you pass in a number >= 0 then that’s the number of positional arguments to the method. E.g. def foo(a, b, c) => rb_define_method(..., foo_implementation, 3) and foo_implementation is expected to have a signature that receives self + 3 other arguments, e.g. VALUE foo_implementation(VALUE self, VALUE a, VALUE b, VALUE c).
  • if you pass in -1 then Ruby will pass the arguments as a C array
  • if you pas in -2 then Ruby will pass in the arguments as a Ruby array (e.g. it’s the C equivalent of def foo(*args))

I moved it from 2 to 3 since I was passing the new gc_profiling_enabled value

@ivoanjo ivoanjo merged commit ec91fe9 into master Nov 8, 2022
@ivoanjo ivoanjo deleted the ivoanjo/workaround-ractor-tracepoint-issue branch November 8, 2022 13:47
@github-actions github-actions bot added this to the 1.6.0 milestone Nov 8, 2022
ivoanjo added a commit that referenced this pull request Dec 8, 2022
…to overhead

**What does this PR do?**:

We previously only enabled GC profiling by default on Ruby 2.x
(we disabled it on 3.x due to a bug, see #2354).

This PR changes the profiling configuration to never enable
GC profiling by default (thus bringing the Ruby 2.x and 3.x
defaults into sync).

**Motivation**:

I've had one new Ruby profiler alpha user report that GC profiling
had a quite big overhead, and I could replicate the issue on a
GC-heavy benchmark.

Since we'd like to get the new Ruby profiler out into customer's
hands in beta, I decided to disable this feature by default, and
come back to it after beta is out.

**Additional Notes**:

GC profiling can still be enabled via configuration.

**How to test the change?**:

Change includes test coverage. Furthermore, profiling any
application will no longer show the `Garbage Collection` frames in the
flamegraph by default.
ivoanjo added a commit that referenced this pull request Jan 11, 2024
**What does this PR do?**

This PR re-enables a garbage collection profiling spec in the
`CpuAndWallTimeWorker` that was disabled on Ruby 3.

This test was disabled in #2354 as this caused flaky tests due to
a Ruby VM bug related to Ractor GC.

Recently, this "flaky when combined with Ractors" behavior started
showing up in other places and @AlexJF came up with a better solution
in #3320 of separating out the Ractor tests.

Thus, we can now run this spec on Ruby 3.

**Motivation**:

Recently, while working on #3356, I noticed that if I commented out
the calls to `rb_postponed_job_[trigger|register_one]` in
`on_gc_event`, no specs failed.

This was pretty surprising to me -- as it was weird that we didn't
have test coverage for it.

After looking into it, I realized we did have specs for it -- but
we weren't running them on Ruby 3, which I was using for local
development.

The re-enabled spec correctly covers that situation.

**Additional Notes:**

N/A

**How to test the change?**

Validate that test now runs on Ruby 3 and test suite is still green.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core Involves Datadog core libraries profiling Involves Datadog profiling
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants