Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[PROF-8667] Split profiling tests into ractor and non-ractor suites. #3320

Merged
merged 6 commits into from
Dec 13, 2023

Conversation

AlexJF
Copy link
Contributor

@AlexJF AlexJF commented Dec 12, 2023

What does this PR do?

  • Separate the profiling test suite from main to allow for profiling-specific customizations.
  • Tag all tests that use Ractor.new with ractors => true tag.
  • Introduce two spec tasks for profiling:
    • profiling:main - Run all tests EXCLUDING ractors tagged tests.
    • profiling:ractors - Run all ractors tagged tests.

Motivation:

As soon as you execute Ractor.new, the VM transitions to a multi-ractor state. This transition cannot be reversed and will change the behaviour of the VM. For example, ObjectSpace::_id2ref will start to raise errors on unshareable objects.

This essentially introduces an unavoidable cross-test side-effect that may make tests flaky depending on execution order. By containing tests that create actors to their own rspec run, we scope this side-effect to those tests that would always trigger it anyway.

Additional Notes:

How to test the change?

For Datadog employees:

  • If this PR touches code that signs or publishes builds or packages, or handles
    credentials of any kind, I've requested a review from @DataDog/security-design-and-guidance.
  • This PR doesn't touch any of that.

Unsure? Have a question? Request a review!

@AlexJF AlexJF force-pushed the alexjf/prof-8667-heap-profiling-split-ractor-tests branch from 3c4f242 to bcae1c5 Compare December 12, 2023 11:13
@ivoanjo
Copy link
Member

ivoanjo commented Dec 12, 2023

Looks reasonable! The CI issue I believe it's because we're still trying to run the profiling stuff with spec:main but it's not getting compiled.

Perhaps we shouldn't run in spec:main? It's a bit of a misnomer to be honest (it's kinda "core + tracing + ...") but I don't think it would be a very hard edge, since anyone changing something that impacts profiling would still see it in CI, even if they ran spec:main and it said everything was OK when it wasn't.

Base automatically changed from alexjf/prof-8667-heap-profiling-part1 to master December 12, 2023 12:28
@AlexJF AlexJF force-pushed the alexjf/prof-8667-heap-profiling-split-ractor-tests branch from b17e4fa to 2f69cbd Compare December 12, 2023 12:39
@codecov-commenter
Copy link

codecov-commenter commented Dec 12, 2023

Codecov Report

All modified and coverable lines are covered by tests ✅

Comparison is base (dfbc324) 98.23% compared to head (b02a143) 98.23%.
Report is 3 commits behind head on master.

Additional details and impacted files
@@           Coverage Diff           @@
##           master    #3320   +/-   ##
=======================================
  Coverage   98.23%   98.23%           
=======================================
  Files        1253     1253           
  Lines       72992    72982   -10     
  Branches     3430     3429    -1     
=======================================
- Hits        71701    71693    -8     
+ Misses       1291     1289    -2     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Member

@ivoanjo ivoanjo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 LGTM

@AlexJF AlexJF marked this pull request as ready for review December 12, 2023 14:36
@AlexJF AlexJF requested review from a team as code owners December 12, 2023 14:36
AlexJF and others added 3 commits December 12, 2023 18:06
RSpec was warning

> WARNING: An expectation of `:run` was set on `nil`.

because the setup task may not be loaded.

We can remove this `allow` because the whole component is being mocked,
so the setup task will never get run in this case.
@AlexJF AlexJF force-pushed the alexjf/prof-8667-heap-profiling-split-ractor-tests branch from dd4083b to b02a143 Compare December 13, 2023 13:01
@AlexJF AlexJF merged commit 14b40d4 into master Dec 13, 2023
218 checks passed
@AlexJF AlexJF deleted the alexjf/prof-8667-heap-profiling-split-ractor-tests branch December 13, 2023 13:28
@github-actions github-actions bot added this to the 1.19.0 milestone Dec 13, 2023
ivoanjo added a commit that referenced this pull request Jan 10, 2024
…rectly

**What does this PR do?**

This PR tweaks the RSpec configuration to skip any tests that make use
of ractors (that are tagged with `:ractor => true`) unless RSpec
is run with `-t ractors`.

**Motivation:**

As discussed in #3320,
ractors have a few bugs which cause flaky behavior on other specs
we currently have.

And thus in the above PR, we changed the way we run the profiling
specs in CI and via rake to make sure to separate tests using ractors
out.

BUT, such a separation did not apply to when running rspec directly,
which I do all the time. I even got confused at some point and thought
we had flaky tests, when what I was seeing was the flakiness that
happened when I ran all tests, together, including the ractor ones.

To improve this, this PR adds an automatic skip to ractor tests,
unless you specifically opt into them. This way, running
`bundle exec rspec <some file> ` won't accidentally run ractor specs,
unless you ask for them.

**Additional Notes:**

N/A

**How to test the change?**

```
$ bundle exec rspec --format=progress ./spec/datadog/profiling/native_extension_spec.rb
Run options: include {:focus=>true}

All examples were filtered out; ignoring {:focus=>true}

Randomized with seed 24705
..............*.*.

Pending: (Failures listed here are expected and do not affect your suite's status)

  1) Datadog::Profiling::NativeExtension ddtrace_rb_ractor_main_p when Ruby has support for Ractors on a background Ractor
     # Skipping ractor tests. Use rake spec:profiling:ractors or pass -t ractors to rspec to run.
     # ./spec/datadog/profiling/native_extension_spec.rb:101

  2) Datadog::Profiling::NativeExtension ddtrace_rb_ractor_main_p when Ruby has no support for Ractors
     # Behavior does not apply to current Ruby version
     # ./spec/datadog/profiling/native_extension_spec.rb:75

Finished in 0.41116 seconds (files took 0.41574 seconds to load)
18 examples, 0 failures, 2 pending

$ bundle exec rspec --format=progress -t ractors ./spec/datadog/profiling/native_extension_spec.rb
Run options: include {:focus=>true, :ractors=>true}

Randomized with seed 5699
spec/datadog/profiling/native_extension_spec.rb:98: warning: Ractor is experimental, and the behavior may change in future versions of Ruby! Also there are many implementation issues.
.

Finished in 0.01124 seconds (files took 0.42492 seconds to load)
1 example, 0 failures

$ bundle exec rake spec:profiling:ractors
 # ... etc
Run options: include {:focus=>true, :ractors=>true}

Randomized with seed 48950
spec/datadog/profiling/collectors/cpu_and_wall_time_worker_spec.rb:716: warning: Ractor is experimental, and the behavior may change in future versions of Ruby! Also there are many implementation issues.
...

Finished in 0.04489 seconds (files took 0.53091 seconds to load)
3 examples, 0 failures
```
ivoanjo added a commit that referenced this pull request Jan 11, 2024
**What does this PR do?**

This PR re-enables a garbage collection profiling spec in the
`CpuAndWallTimeWorker` that was disabled on Ruby 3.

This test was disabled in #2354 as this caused flaky tests due to
a Ruby VM bug related to Ractor GC.

Recently, this "flaky when combined with Ractors" behavior started
showing up in other places and @AlexJF came up with a better solution
in #3320 of separating out the Ractor tests.

Thus, we can now run this spec on Ruby 3.

**Motivation**:

Recently, while working on #3356, I noticed that if I commented out
the calls to `rb_postponed_job_[trigger|register_one]` in
`on_gc_event`, no specs failed.

This was pretty surprising to me -- as it was weird that we didn't
have test coverage for it.

After looking into it, I realized we did have specs for it -- but
we weren't running them on Ruby 3, which I was using for local
development.

The re-enabled spec correctly covers that situation.

**Additional Notes:**

N/A

**How to test the change?**

Validate that test now runs on Ruby 3 and test suite is still green.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants