Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add JIT debugging helpers, and other debugging output improvements #100123

Merged
merged 1 commit into from
Mar 22, 2024

Conversation

BruceForstall
Copy link
Member

  1. Add DOTNET_JitHoistLimit to specify the maximum number of hoisted expressions to allow before stopping.
  2. Add DOTNET_JitStressModeNamesAllow to allow specifying a set of STRESS mode names to allow. Use this in conjunction with a DOTNET_JitStress setting to allow JitStress to work as usual (with the usual weightings) but only for a specified set of stress modes. This is basically the opposite of DOTNET_JitStressModeNamesNot. One use is the enable JitStress, get a JitDump, see all the JitStress mode names that were enabled, then set DOTNET_JitStressModeNamesAllow to that set
    (e.g., STRESS_RANDOM_INLINE STRESS_GENERIC_VARN STRESS_UNWIND). Remove them one by one (or "binary search") to find the minimal set that causes a bug, to reduce noise in the dump and additional complexity in the IR and generated code.
  3. Fix a few JitDump cases missing dspPtr to create diffable dumps.
  4. Add DOTNET_JitInlineLimit support to RandomPolicy
  5. Add DOTNET_JitNoCSE2 support to CSE_HeuristicRandom
  6. Remove an extra newline from eePrintObjectDescription

1. Add `DOTNET_JitHoistLimit` to specify the maximum number of
hoisted expressions to allow before stopping.
2. Add `DOTNET_JitStressModeNamesAllow` to allow specifying a set of
STRESS mode names to allow. Use this in conjunction with a `DOTNET_JitStress`
setting to allow JitStress to work as usual (with the usual weightings)
but only for a specified set of stress modes. This is basically the
opposite of `DOTNET_JitStressModeNamesNot`. One use is the enable JitStress,
get a JitDump, see all the JitStress mode names that were enabled, then
set `DOTNET_JitStressModeNamesAllow` to that set
(e.g., `STRESS_RANDOM_INLINE STRESS_GENERIC_VARN STRESS_UNWIND`). Remove them
one by one (or "binary search") to find the minimal set that causes a bug, to
reduce noise in the dump and additional complexity in the IR and generated code.
3. Fix a few JitDump cases missing `dspPtr` to create diffable dumps.
4. Add `DOTNET_JitInlineLimit` support to `RandomPolicy`
5. Add `DOTNET_JitNoCSE2` support to `CSE_HeuristicRandom`
6. Remove an extra newline from `eePrintObjectDescription`
@dotnet-issue-labeler dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Mar 22, 2024
Copy link
Contributor

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

@BruceForstall
Copy link
Member Author

@AndyAyersMS PTAL
cc @dotnet/jit-contrib

@jakobbotsch
Copy link
Member

jakobbotsch commented Mar 22, 2024

2. Add DOTNET_JitStressModeNamesAllow to allow specifying a set of STRESS mode names to allow. Use this in conjunction with a DOTNET_JitStress setting to allow JitStress to work as usual (with the usual weightings) but only for a specified set of stress modes. This is basically the opposite of DOTNET_JitStressModeNamesNot. One use is the enable JitStress, get a JitDump, see all the JitStress mode names that were enabled, then set DOTNET_JitStressModeNamesAllow to that set
(e.g., STRESS_RANDOM_INLINE STRESS_GENERIC_VARN STRESS_UNWIND). Remove them one by one (or "binary search") to find the minimal set that causes a bug, to reduce noise in the dump and additional complexity in the IR and generated code.

DOTNET_JitStressModeNames can already be used for this. Do we need more variants? My usual approach to any JitStress issue is to identify the responsible method(s) with DOTNET_JitStressRange, then the responsible stress modes with DOTNET_JitStressModeNames.

@AndyAyersMS
Copy link
Member

I agree with Jakob, once you've narrowed a stress failure down to one specific method then you can force enable the set of the modes that were randomly enabled and then whittle down that set to isolate which modes are required.

Is there some use case where we want to approach this the other way around, try and whittle down the mode set first and then pinpoint the method?

Copy link
Member

@AndyAyersMS AndyAyersMS left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes LGTM.

I'm ok with keeping the new config for stress allow, but as noted elsewhere I'm not sure it's needed.

@BruceForstall
Copy link
Member Author

DOTNET_JitStressModeNames can already be used for this. Do we need more variants? My usual approach to any JitStress issue is to identify the responsible method(s) with DOTNET_JitStressRange, then the responsible stress modes with DOTNET_JitStressModeNames.

This does not work. The stress modes in DOTNET_JitStressModeNames are forced on unconditionally at any point they are queried. (Note, fyi, that they are additive to any existing DOTNET_JitStress mode, if set, unless DOTNET_JitStressModeNamesOnly=1 is also set.).

Thus, for a call to compStressCompile(STRESS_PROMOTE_FEWER_STRUCTS, 50), it should be true about 50% of the time. When you set DOTNET_JitStressModeNames it is true 100% of the time. This behavior is different than the behavior that would occur if that stress mode kicked in under a generic DOTNET_JitStress=1.

What you're doing you probably want to use the new DOTNET_JItStresModeNamesAllow instead.

fwiw, I normally did what you suggest. But it was always frustrating that a test failure that repros with DOTNET_JitStress=1 would not repro with DOTNET_JitStressModeNames=A B C for the exact same enabled stress modes. That's because the named stress modes would kick in at different rates and the IR and generated code would therefore differ.

@BruceForstall
Copy link
Member Author

/ba-g Build Analysis doesn't seem to be rerunning after issue opened

@jakobbotsch
Copy link
Member

Thus, for a call to compStressCompile(STRESS_PROMOTE_FEWER_STRUCTS, 50), it should be true about 50% of the time. When you set DOTNET_JitStressModeNames it is true 100% of the time. This behavior is different than the behavior that would occur if that stress mode kicked in under a generic DOTNET_JitStress=1.

I don't think that's true. If compStressCompile(x) returns true in a compilation, then it will always return true in that same compilation. There is no probability involved outside selecting which compilations enable what stress modes. If you have already narrowed down to a specific method, then there is no difference between them anymore assuming we know the set that is enabled randomly for that method. Am I wrong?

@AndyAyersMS
Copy link
Member

Yeah, that is my impression too, the enabled stress modes are not "random" but are a deterministic function of the stress mode config value and the (root) method hash. They only seem random when you look across methods.

@BruceForstall
Copy link
Member Author

I don't think that's true. If compStressCompile(x) returns true in a compilation, then it will always return true in that same compilation.

Not true.

Remember, compStressCompile takes two arguments: area and weight. For any single compile, a "hash" is computed based on the method hash, stress mode area, and JitStress value, modulo 100, so we get a number [0..100). "hash" is the same throughout the compilation, but weight varies by callsite. If hash < weight, we stress. That means that for compStressCompile(STRESS_GENERIC_VARN, v1) and compStressCompile(STRESS_GENERIC_VARN, v2), if we stress for v1, and v1 < v2, then we stress for v2, but not necessarily vice-versa. E.g., if we compute hash=50, then compStressCompile(STRESS_GENERIC_VARN, 10) does not stress but compStressCompile(STRESS_GENERIC_VARN, 75) does.

const unsigned hash = (info.compMethodHash() ^ compStressAreaHash(stressArea) ^ stressLevel) % MAX_STRESS_WEIGHT;
assert(hash < MAX_STRESS_WEIGHT && weight <= MAX_STRESS_WEIGHT);
return (hash < weight);

@BruceForstall BruceForstall merged commit f7334c4 into dotnet:main Mar 22, 2024
107 of 110 checks passed
@BruceForstall BruceForstall deleted the AddDebuggingHelpers branch March 22, 2024 17:34
@jakobbotsch
Copy link
Member

I don't think that's true. If compStressCompile(x) returns true in a compilation, then it will always return true in that same compilation.

Not true.

Remember, compStressCompile takes two arguments: area and weight. For any single compile, a "hash" is computed based on the method hash, stress mode area, and JitStress value, modulo 100, so we get a number [0..100). "hash" is the same throughout the compilation, but weight varies by callsite. If hash < weight, we stress. That means that for compStressCompile(STRESS_GENERIC_VARN, v1) and compStressCompile(STRESS_GENERIC_VARN, v2), if we stress for v1, and v1 < v2, then we stress for v2, but not necessarily vice-versa. E.g., if we compute hash=50, then compStressCompile(STRESS_GENERIC_VARN, 10) does not stress but compStressCompile(STRESS_GENERIC_VARN, 75) does.

const unsigned hash = (info.compMethodHash() ^ compStressAreaHash(stressArea) ^ stressLevel) % MAX_STRESS_WEIGHT;
assert(hash < MAX_STRESS_WEIGHT && weight <= MAX_STRESS_WEIGHT);
return (hash < weight);

That makes sense. Are there any stress modes where we vary the weight at different call sites? I was under the impression that we don't really do that.

@BruceForstall
Copy link
Member Author

Here are all the unique calls (unique as in different stress mode + weight combination). We do vary the weights, especially for STRESS_GENERIC_VARN.

compStressCompile(STRESS_64RSLT_MUL, 20)
compStressCompile(STRESS_BB_PROFILE, 15)
compStressCompile(STRESS_BYREF_PROMOTION, 25)
compStressCompile(STRESS_CATCH_ARG, 5)
compStressCompile(STRESS_CHK_FLOW_UPDATE, 30)
compStressCompile(STRESS_CHK_REIMPORT, 15)
compStressCompile(STRESS_CLONE_EXPR, 30)
compStressCompile(STRESS_DBL_ALN, 20)
compStressCompile(STRESS_DO_WHILE_LOOPS, 30)
compStressCompile(STRESS_EMITTER, 1)
compStressCompile(STRESS_EMITTER, 50)
compStressCompile(STRESS_FOLD, 50)
compStressCompile(STRESS_FORCE_INLINE, 0)
compStressCompile(STRESS_GENERIC_CHECK, 0)
compStressCompile(STRESS_GENERIC_VARN, 15)
compStressCompile(STRESS_GENERIC_VARN, 20)
compStressCompile(STRESS_GENERIC_VARN, 30)
compStressCompile(STRESS_GENERIC_VARN, 5)
compStressCompile(STRESS_GENERIC_VARN, 50)
compStressCompile(STRESS_GENERIC_VARN, 60)
compStressCompile(STRESS_IF_CONVERSION_COST, 25)
compStressCompile(STRESS_IF_CONVERSION_INNER_LOOPS, 25)
compStressCompile(STRESS_LCL_FLDS, 5)
compStressCompile(STRESS_LEGACY_INLINE, 50)
compStressCompile(STRESS_MAKE_CSE, MAX_STRESS_WEIGHT)
compStressCompile(STRESS_MAKE_CSE, percentage)
compStressCompile(STRESS_MERGED_RETURNS, 50)
compStressCompile(STRESS_MIN_OPTS, 5)
compStressCompile(STRESS_NO_OLD_PROMOTION, 10)
compStressCompile(STRESS_NULL_OBJECT_CHECK, 30)
compStressCompile(STRESS_OPT_BOOLS_COMPARE_CHAIN_COST, 25)
compStressCompile(STRESS_OPT_BOOLS_GC, 20)
compStressCompile(STRESS_OPT_BOOLS_GC, 50)
compStressCompile(STRESS_OPT_REPEAT, 10)
compStressCompile(STRESS_PHYSICAL_PROMOTION, 25)
compStressCompile(STRESS_PHYSICAL_PROMOTION_COST, 25)
compStressCompile(STRESS_POISON_IMPLICIT_BYREFS, 25)
compStressCompile(STRESS_PROFILER_CALLBACKS, 5)
compStressCompile(STRESS_PROMOTE_FEWER_STRUCTS, 50)
compStressCompile(STRESS_RANDOM_INLINE, 50)
compStressCompile(STRESS_REVERSE_FLAG, 60)
compStressCompile(STRESS_SPLIT_TREES_RANDOMLY, 10)
compStressCompile(STRESS_SPLIT_TREES_REMOVE_COMMAS, 10)
compStressCompile(STRESS_SSA_INFO, 20)
compStressCompile(STRESS_STORE_BLOCK_UNROLLING, 50)
compStressCompile(STRESS_SWITCH_CMP_BR_EXPANSION, 50)
compStressCompile(STRESS_TAILCALL, 5)
compStressCompile(STRESS_UNROLL_LOOPS, 50)
compStressCompile(STRESS_UNSAFE_BUFFER_CHECKS, 25)
compStressCompile(STRESS_UNWIND, 10)
compStressCompile(STRESS_UNWIND, 5)
compStressCompile(STRESS_VN_BUDGET, 50)

@jakobbotsch
Copy link
Member

Doesn't it mean it's impossible to reliably create test cases with those stress modes enabled in the required way? You are forced to rely on randomness that easily breaks with test refactorings (as we have seen with a number of test cases using JitStress before). It seems like we should split these different calls into separate stress modes so that they can be reliably controlled when we need to turn stress failures into test cases.

@BruceForstall
Copy link
Member Author

You're suggesting that, given the following calls, say:

compStressCompile(STRESS_GENERIC_VARN, 15)
compStressCompile(STRESS_GENERIC_VARN, 20)

if a test repro requires the 2nd but not the first, then you can't just use DOTNET_JitStressModeNames=STRESS_GENERIC_VARN.

If we got rid of weights and created new stress modes for any cases that currently use the same mode with different weights, that would solve the problem.

@jakobbotsch
Copy link
Member

You're suggesting that, given the following calls, say:

compStressCompile(STRESS_GENERIC_VARN, 15)
compStressCompile(STRESS_GENERIC_VARN, 20)

if a test repro requires the 2nd but not the first, then you can't just use DOTNET_JitStressModeNames=STRESS_GENERIC_VARN.

If we got rid of weights and created new stress modes for any cases that currently use the same mode with different weights, that would solve the problem.

Right... My mental model was always that stress modes are binary things, either enabled or disabled, and that the weight you pass to compStressCompile indicates the fraction of compilations in which the stress mode should be enabled. So the weight is really associated with the stress mode, not the call site, and it doesn't really make sense to use different weights in different call sites (which sort of end up meaning that the same stress mode is both enabled and disabled).

Perhaps something to fix up if we ever end up in the situation where this matters...

BruceForstall added a commit to BruceForstall/runtime that referenced this pull request Mar 22, 2024
@github-actions github-actions bot locked and limited conversation to collaborators Apr 23, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants