Test failure: JIT/opt/Vectorization/UnrollEqualsStartsWIth/UnrollEqualsStartsWIth.dll #86265

BruceForstall · 2023-05-15T18:35:29Z

https://dev.azure.com/dnceng-public/public/_build/results?buildId=272907&view=ms.vss-test-web.build-test-results-tab

coreclr windows x86 Checked jitstress2_jitstressregs0x1000 @ Windows.10.Amd64.Open

set DOTNET_TieredCompilation=0
set DOTNET_JitStress=2
set DOTNET_JitStressRegs=0x1000

Looks like failure of this test brought down the whole run.

18:47:18.821 Running test: JIT/opt/Vectorization/UnrollEqualsStartsWIth/UnrollEqualsStartsWIth.dll
App Exit Code: 3
Expected: 100
Actual: 3
END EXECUTION - FAILED
FAILED
[XUnitLogChecker]: Item 'JIT.opt' did not finish running. Checking and fixing the log...
[XUnitLogChecker]: XUnit log file has been fixed!

211/292 tests run.
* 210 tests passed.
* 0 tests failed.
* 1 tests skipped.

[XUnitLogChecker]: Checking for dumps...
[XUnitLogChecker]: No crash dumps found. Continuing...
[XUnitLogChecker]: Finished!

A different run

coreclr windows x86 Checked jitstress2_jitstressregs1 @ Windows.10.Amd64.Open

with:

set DOTNET_TieredCompilation=0
set DOTNET_JitStress=2
set DOTNET_JitStressRegs=1

crashed:

18:57:33.193 Running test: JIT/opt/Vectorization/UnrollEqualsStartsWIth/UnrollEqualsStartsWIth.dll

Assert failure(PID 3456 [0x00000d80], Thread: 4432 [0x1150]): !PreemptiveGCDisabled()

CORECLR! Thread::DetachThread + 0xCF (0x727a9aa5)
CORECLR! TlsDestructionMonitor::~TlsDestructionMonitor + 0x8F (0x72bf72f5)
CORECLR! _dyn_tls_dtor + 0x86 (0x72c099b6)
NTDLL! RtlDecompressBuffer + 0xDE (0x77aaea4e)
NTDLL! LdrShutdownThread + 0x386 (0x77a7eeb6)
NTDLL! LdrSetAppCompatDllRedirectionCallback + 0x1052A (0x77ad3c6a)
NTDLL! LdrShutdownProcess + 0x15F (0x77a92d7f)
NTDLL! RtlExitUserProcess + 0x96 (0x77a93886)
KERNEL32! ExitProcess + 0x13 (0x7795b3d3)
CORECLR! exit_or_terminate_process + 0x38 (0x72c43f08)
    File: D:\a\_work\1\s\src\coreclr\vm\threads.cpp Line: 979
    Image: C:\h\w\AA420919\p\corerun.exe

App Exit Code: -1073740286
Expected: 100
Actual: -1073740286
END EXECUTION - FAILED
FAILED
[XUnitLogChecker]: Item 'JIT.opt' did not finish running. Checking and fixing the log...
[XUnitLogChecker]: XUnit log file has been fixed!

211/292 tests run.
* 210 tests passed.
* 0 tests failed.
* 1 tests skipped.

[XUnitLogChecker]: Checking for dumps...
[XUnitLogChecker]: No crash dumps found. Continuing...
[XUnitLogChecker]: Finished!

Another failure:

https://dev.azure.com/dnceng-public/public/_build/results?buildId=273272&view=ms.vss-test-web.build-test-results-tab

coreclr windows x86 Checked jitstress2_jitstressregs8 @ Windows.10.Amd64.Open

set DOTNET_TieredCompilation=0
set DOTNET_JitStress=2
set DOTNET_JitStressRegs=8

18:45:54.299 Running test: JIT/opt/Vectorization/UnrollEqualsStartsWIth/UnrollEqualsStartsWIth.dll
App Exit Code: 3
Expected: 100
Actual: 3
END EXECUTION - FAILED
FAILED
[XUnitLogChecker]: Item 'JIT.opt' did not finish running. Checking and fixing the log...
[XUnitLogChecker]: XUnit log file has been fixed!

211/292 tests run.
* 210 tests passed.
* 0 tests failed.
* 1 tests skipped.

[XUnitLogChecker]: Checking for dumps...
[XUnitLogChecker]: No crash dumps found. Continuing...
[XUnitLogChecker]: Finished!

@markples Another issue with merging?

cc @EgorBo

The text was updated successfully, but these errors were encountered:

ghost · 2023-05-15T18:35:37Z

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

Issue Details

https://dev.azure.com/dnceng-public/public/_build/results?buildId=272907&view=ms.vss-test-web.build-test-results-tab

coreclr windows x86 Checked jitstress2_jitstressregs0x1000 @ Windows.10.Amd64.Open

set DOTNET_TieredCompilation=0
set DOTNET_JitStress=2
set DOTNET_JitStressRegs=0x1000

Looks like failure of this test brought down the whole run.

18:47:18.821 Running test: JIT/opt/Vectorization/UnrollEqualsStartsWIth/UnrollEqualsStartsWIth.dll
App Exit Code: 3
Expected: 100
Actual: 3
END EXECUTION - FAILED
FAILED
[XUnitLogChecker]: Item 'JIT.opt' did not finish running. Checking and fixing the log...
[XUnitLogChecker]: XUnit log file has been fixed!

211/292 tests run.
* 210 tests passed.
* 0 tests failed.
* 1 tests skipped.

[XUnitLogChecker]: Checking for dumps...
[XUnitLogChecker]: No crash dumps found. Continuing...
[XUnitLogChecker]: Finished!

A different run

coreclr windows x86 Checked jitstress2_jitstressregs1 @ Windows.10.Amd64.Open

with:

set DOTNET_TieredCompilation=0
set DOTNET_JitStress=2
set DOTNET_JitStressRegs=1

crashed:

18:57:33.193 Running test: JIT/opt/Vectorization/UnrollEqualsStartsWIth/UnrollEqualsStartsWIth.dll

Assert failure(PID 3456 [0x00000d80], Thread: 4432 [0x1150]): !PreemptiveGCDisabled()

CORECLR! Thread::DetachThread + 0xCF (0x727a9aa5)
CORECLR! TlsDestructionMonitor::~TlsDestructionMonitor + 0x8F (0x72bf72f5)
CORECLR! _dyn_tls_dtor + 0x86 (0x72c099b6)
NTDLL! RtlDecompressBuffer + 0xDE (0x77aaea4e)
NTDLL! LdrShutdownThread + 0x386 (0x77a7eeb6)
NTDLL! LdrSetAppCompatDllRedirectionCallback + 0x1052A (0x77ad3c6a)
NTDLL! LdrShutdownProcess + 0x15F (0x77a92d7f)
NTDLL! RtlExitUserProcess + 0x96 (0x77a93886)
KERNEL32! ExitProcess + 0x13 (0x7795b3d3)
CORECLR! exit_or_terminate_process + 0x38 (0x72c43f08)
    File: D:\a\_work\1\s\src\coreclr\vm\threads.cpp Line: 979
    Image: C:\h\w\AA420919\p\corerun.exe

App Exit Code: -1073740286
Expected: 100
Actual: -1073740286
END EXECUTION - FAILED
FAILED
[XUnitLogChecker]: Item 'JIT.opt' did not finish running. Checking and fixing the log...
[XUnitLogChecker]: XUnit log file has been fixed!

211/292 tests run.
* 210 tests passed.
* 0 tests failed.
* 1 tests skipped.

[XUnitLogChecker]: Checking for dumps...
[XUnitLogChecker]: No crash dumps found. Continuing...
[XUnitLogChecker]: Finished!

Another failure:

https://dev.azure.com/dnceng-public/public/_build/results?buildId=273272&view=ms.vss-test-web.build-test-results-tab

coreclr windows x86 Checked jitstress2_jitstressregs8 @ Windows.10.Amd64.Open

set DOTNET_TieredCompilation=0
set DOTNET_JitStress=2
set DOTNET_JitStressRegs=8

18:45:54.299 Running test: JIT/opt/Vectorization/UnrollEqualsStartsWIth/UnrollEqualsStartsWIth.dll
App Exit Code: 3
Expected: 100
Actual: 3
END EXECUTION - FAILED
FAILED
[XUnitLogChecker]: Item 'JIT.opt' did not finish running. Checking and fixing the log...
[XUnitLogChecker]: XUnit log file has been fixed!

211/292 tests run.
* 210 tests passed.
* 0 tests failed.
* 1 tests skipped.

[XUnitLogChecker]: Checking for dumps...
[XUnitLogChecker]: No crash dumps found. Continuing...
[XUnitLogChecker]: Finished!

@markples Another issue with merging?

cc @EgorBo

Author:	BruceForstall
Assignees:	-
Labels:	`JitStress`, `area-CodeGen-coreclr`, `blocking-clean-ci-optional`
Milestone:	8.0.0

EgorBo · 2023-05-15T18:41:15Z

Note: this test may be quite slow in stress modes, although, it used to work fine

markples · 2023-05-15T20:58:27Z

Do you think there is a JitStress/merged test group interaction here?

This test is interesting. It internally has a merged test group implementation and apparently has 127512 tests. They are executed via reflection, and a normal failure would cause one of the internal (sub-)batches to be terminated. A crash will take out the entire group. This infrastructure seems much heavier than the new wrapper code.

I think converting this to xunit-style tests would reduce the code, though a 100k+ test group would be a new stress to the system.

To be clear, I'm not suggesting any of that as a fix for the test failure - just something that I noticed while looking at the test.

markples · 2023-05-16T19:06:22Z

This has been inconsistent to repro across configurations. I have gotten it with -c Checked -lc Release and only in the merged group (and not in a "merged" group with just the one test. Visual Studio shuts down when this test is executed (and the report bug mechanism isn't properly searching for matching bugs for me). It isn't reproing under windbg.

There are a number of exception that are thrown and caught throughout the merged group - divide by zeros and access violations (for null). [edit - was incorrect - ~~This test itself is as well (though I need to double check because I don't see null in the test list~~] Is there possibly an interaction between accessing null literals and jitstress? (or somehow a history of exceptions given the merged group interaction, but perhaps that is misleading somehow).

It seems like there is an actual problem even if merged groups are exposing it, though since it is jitstress it might be a jitstress issue and not a product one.

markples · 2023-05-17T23:05:29Z

The actual failure that the logs don't seem to capture (or maybe I've ended up in a different configuration?) is an assertion failure:

src/coreclr/vm/siginfo.cpp line 4928

void PromoteCarefully(promote_func   fn,
                      PTR_PTR_Object ppObj,
                      ScanContext*   sc,
                      uint32_t       flags /* = GC_CALL_INTERIOR*/ )
{
    LIMITED_METHOD_CONTRACT;

    //
    // Sanity check that the flags contain only these three values
    //
    assert((flags & ~(GC_CALL_INTERIOR|GC_CALL_PINNED)) == 0);

    //
    // Sanity check that GC_CALL_INTERIOR FLAG is set
    //
    assert(flags & GC_CALL_INTERIOR);

#if !defined(DACCESS_COMPILE)

    //
    // Sanity check the stack scan limit
    //
>>  assert(sc->stack_limit != 0);  <<

This suggests that the difficulty reproing it might have to do with GC timing. Does jitstress do anything odd with GC (triggering, reporting, etc.)? I.e., is it likely a jitstress or product issue?

I had previous seen a bit of a pattern where I can get the failure to happen the first time after building in src/tests but not again after that. It wasn't 100%, and it seems like confusing information since the test runs for a few seconds before failing (which makes some sort of cold start seem less likely).

It's certainly an interesting assertion to hit, though.

BruceForstall · 2023-05-17T23:15:02Z

Does jitstress do anything odd with GC (triggering, reporting, etc.)? I.e., is it likely a jitstress or product issue?

It appears to hit with multiple different JitStressRegs values. Maybe it does or would hit without JitStressRegs set if run enough?

It also appears to hit only with JitStress=2?

JitStress doesn't do anything specific related to GC: it doesn't trigger GC more, though might change between fully and partially interruptible. GC reporting changes because the generated code changes.

markples · 2023-05-18T11:22:10Z

Whether the "first time after building" observation is accurate or not, I was able to use it to get a case in windbg. Below is the stack trace. Note that, at first glance anyway, this doesn't look like a typical GC bug (misreporting, heap corruption, etc.) as the assertion is about the stack limit.

...
[0xf]    coreclr!PromoteCarefully+0x58   0xceff140   0x6124d51b   
[0x10]   coreclr!GcEnumObject+0x4b   0xceff15c   0x61067cae   
[0x11]   coreclr!EECodeManager::EnumGcRefs+0xe3e   0xceff178   0x6124d782   
[0x12]   coreclr!GcStackCrawlCallBack+0x152   0xceff5dc   0x611752fa   
[0x13]   coreclr!Thread::MakeStackwalkerCallback+0x48   0xceff634   0x611769d4   
[0x14]   coreclr!Thread::StackWalkFramesEx+0x186   0xceff650   0x611767ce   
[0x15]   coreclr!Thread::StackWalkFrames+0x159   0xceff928   0x6124cc04   
[0x16]   coreclr!ScanStackRoots+0x196   0xceffc60   0x6124addf   
[0x17]   coreclr!GCToEEInterface::GcScanRoots+0xf1   0xceffd74   0x6148c192   
[0x18]   coreclr!WKS::gc_heap::background_mark_phase+0x406   0xceffd98   0x61495ffe   
[0x19]   coreclr!WKS::gc_heap::gc1+0x13b   0xceffe00   0x6148e51d   
[0x1a]   coreclr!WKS::gc_heap::bgc_thread_function+0xcc   0xceffe24   0x6148e6c7   
[0x1b]   coreclr!WKS::gc_heap::bgc_thread_stub+0x27   0xceffe44   0x6124976e   
[0x1c]   coreclr!<lambda_6e1c80cead0a95b6f179a4ba7ad5e186>::operator()+0x84   0xceffe48   0x75a47d59   
[0x1d]   KERNEL32!BaseThreadInitThunk+0x19   0xceffe68   0x77d0b74b   
[0x1e]   ntdll!__RtlUserThreadStart+0x2b   0xceffe78   0x77d0b6cf   
[0x1f]   ntdll!_RtlUserThreadStart+0x1b   0xceffed0   0x0

BruceForstall · 2023-05-18T16:06:02Z

Maybe @dotnet/gc or @janvorli have insight?

mangod9 · 2023-05-18T16:19:35Z

@markples do you have the dumps available? @cshung is this related to any of your recent changes?

janvorli · 2023-05-18T16:43:04Z

@markples if you can share the dump and the contents of the artifacts\tests\coreclr\Windows.x64.XXXXX\Tests\Core_Root, I can take a look.

markples · 2023-05-18T17:50:33Z

Thanks @janvorli. I've copied a .dump /m, .dump /mf, and the contents of Windows.x86.checked\Tests\Core_Root to https://microsoft-my.sharepoint.com/:f:/p/markples/EmAjXtmylUxMoQPjBpFC9ucBd5mGrD0IKPPopBCkJr0H4g?e=7z4efq. This should have MSFT-organization access.

A few key points from the discussion above:

Repro is so far x86 only
Repro is running under a jit stress mode
Repro is very unreliable, though I have seen a strong correlation between hitting one and the first execution after building in src\tests

markples · 2023-05-18T22:28:00Z

b539509 has the same symptoms but seems to be equally difficult to repro.

janvorli · 2023-05-18T23:52:42Z

I can see that the issue is caused by the topmost explicit frame of the thread that GC is scanning being InlinedCallFrame and having m_pCallSiteSP set to 0. That value is then stored in the stack_limit and that later on triggers the assert sc->stack_limit != 0 that I can see in the dump.

0:009> dt (InlinedCallFrame)0x02f7e920 
coreclr!InlinedCallFrame
   +0x000 __VFN_table : 0x61715b3c 
   +0x004 m_Next           : 0xffffffff Frame
   +0x008 m_Datum          : 0x80000000 NDirectMethodDesc
   +0x00c m_pCallSiteSP    : (null) 
   +0x010 m_pCallerReturnAddress : 0
   +0x014 m_pCalleeSavedFP : 0x2f7ea9c
   +0x018 m_pThread        : (null)

Notice that the m_pCallerReturnAddress is set to 0 too. That indicates that the frame doesn't have active call. InlinedCallFrames are used for pinvokes and they stay on the per-thread list of explicit frames even when there is no pinvoke in progress. Only the m_pCallerReturnAddress is set to 0 in that case and the m_pCallSiteSP is not set.

So it seems the code getting the stack_limit is wrong and it should also check InlinedCallFrame::FrameHasActiveCall(pTopFrame).
This is the current code:

runtime/src/coreclr/vm/gcenv.ee.cpp

Lines 117 to 126 in 29d485c

    
           Frame* pTopFrame = pThread->GetFrame(); 
        
           Object ** topStack = (Object **)pTopFrame; 
        
           if ((pTopFrame != ((Frame*)-1)) 
        
               && (pTopFrame->GetVTablePtr() == InlinedCallFrame::GetMethodFrameVPtr())) { 
        
               // It is an InlinedCallFrame. Get SP from it. 
        
               InlinedCallFrame* pInlinedFrame = (InlinedCallFrame*)pTopFrame; 
        
               topStack = (Object **)pInlinedFrame->GetCallSiteSP(); 
        
           } 
        
           sc->stack_limit = (uintptr_t)topStack;

It seems it should look like this:

    Frame* pTopFrame = pThread->GetFrame();
    Object ** topStack = (Object **)pTopFrame;
    if (InlinedCallFrame::FrameHasActiveCall(pTopFrame))
    {
        // It is an InlinedCallFrame with active call. Get SP from it.
        InlinedCallFrame* pInlinedFrame = (InlinedCallFrame*)pTopFrame;
        topStack = (Object **)pInlinedFrame->GetCallSiteSP();
    }

    sc->stack_limit = (uintptr_t)topStack;

mangod9 · 2023-05-19T00:53:22Z

why is this failing now? Did codegen for inlining change recently which is now triggering this issue?

janvorli · 2023-05-19T01:06:35Z

Maybe that something (maybe related to the jitstress) has made GC happening at places around pinvokes that were not reported as GC safe before and now they are.

markples · 2023-05-19T17:16:30Z

I don't immediately see anything scanning the log for pinvoke and inliner. #85975 @jkotas touches the inliner but doesn't look like it was supposed to have any changes in behavior. @dotnet/jit-contrib is anyone aware of changes around pinvokes?

Chronologically, it looks like the switch to merged test groups is a likely factor. Also, I was unable to repro the problem with merged test groups turned off, but the repro is flaky so that's not conclusive. Merged test groups have two main effects: (1) there are a few more methods on the stack while tests are executing, (2) other tests will have executed in the same process prior to a particular test being run. So one theory would be that the issue was already there but a GC was never triggered at that point due to the individual test not having done enough allocation yet. I constructed a merged test group with just the single test, and it didn't repro, but again the flaky nature only makes this a partial case for (2) over (1).

The failure showed up in both this test and b539509 (two copies of it in different merged groups). Both have some arrays and exceptions, though I don't think the active part of b539509 is throwing immediately near the failure point (not sure about UnrollEqualsStartsWith).

janvorli · 2023-05-19T17:43:24Z

@markples since you are able to repro the issue, would you be willing to test the change I've shared above to verify that
it really fixes the problem?

markples · 2023-05-19T22:42:30Z

The issue hasn't reproed after several attempts. Thank you for the investigation and fix!

For reference, here are my steps:

setup: build -rc Checked -lc Release -a x86 -subset clr+libs

each iteration:

in artifacts\tests\coreclr: rmdir /s /q obj windows.x86.Checked
in src\tests: build.cmd Checked x86 generatelayoutonly /p:LibrariesConfiguration=Release & build -priority 1 x86 checked test JIT\opt\JIT.opt.csproj /p:LibrariesConfiguration=Release
in artifacts\tests\coreclr\windows.x86.Checked\JIT\opt\JIT.opt: JIT.opt.cmd -coreroot <root>\artifacts\tests\coreclr\windows.x86.Checked\Tests\Core_Root (I tried a few times per build, but the repro was only occurring on the first)

jkotas · 2023-05-20T00:34:03Z

    Object ** topStack = (Object **)pTopFrame;
    if (InlinedCallFrame::FrameHasActiveCall(pTopFrame))
    {
        // It is an InlinedCallFrame with active call. Get SP from it.
        InlinedCallFrame* pInlinedFrame = (InlinedCallFrame*)pTopFrame;
        topStack = (Object **)pInlinedFrame->GetCallSiteSP();
    }

    sc->stack_limit = (uintptr_t)topStack;

@janvorli I do not think that this fix is quite right. If the top InlinedCallFrame is inactive, it can be somewhere in the middle of the stack and there can be a lot of managed code frames between the actual top of the stack and the stack_limit. Similarly, if the thread has no Frames, we will set the stack_limit to (Frame*)-1.

We should either compute stack_limit more precisely, or we should rename it to stack_limit_estimate and document that it may not cover the entire active stack. In the latter case, the whole if (InlinedCallFrame::FrameHasActiveCall(pTopFrame)) special case can be deleted.

jkotas · 2023-05-20T00:35:29Z

Here is a repro that hits the assert reliably within a few seconds (compile with /o+, run with TieredCompilation=0 on x86 debug or checked build):

using System;
using System.Threading;
using System.Runtime.InteropServices;
using System.Runtime.CompilerServices;

object myObject = new Object();

WaitAndPInvoke(default, 3, ref myObject);
ClearStack();
WaitAndPInvoke(default, Int32.MaxValue, ref myObject);

[MethodImpl(MethodImplOptions.NoInlining)]
static void ClearStack()
{
    Span<int> s = stackalloc int[1000];
    s.Clear();
}

[MethodImpl(MethodImplOptions.NoInlining)]
static void WaitAndPInvoke(BigStruct s, int c, ref object o)
{
    if (c == Int32.MaxValue) new Thread(TriggerGC).Start();
    for (int i = 0; i < c; i++) ThrowAndCatch(null);
    GetTickCount();
    o.ToString();
}

[MethodImpl(MethodImplOptions.NoInlining)]
static void ThrowAndCatch(object o)
{
   try
   {
      o.ToString();
   }
   catch (NullReferenceException)
   {
   }
}


[DllImport("kernel32")]
extern static int GetTickCount();

static void TriggerGC()
{
   for (;;)
   {
      Thread.Sleep(0);
      GC.Collect();
   }
}

[StructLayout(LayoutKind.Explicit, Size=200)]
struct BigStruct {
}

jkotas · 2023-05-20T00:39:31Z

why is this failing now? Did codegen for inlining change recently which is now triggering this issue?

We are reading uninitialized value on the stack. The assert is triggered when the uninitialized value is zero.

janvorli · 2023-05-25T00:44:19Z

@jkotas you are right that the case when InlinedCallFrame is not active is problematic. However, that's the case on Windows only. On Unix, we always have an explicit frame that's at the lowest address we want to scan (when the thread is interrupted in managed code, we have the RedirectedThreadFrame there). We should never see inactive InlinedCallFrame as the first frame when doing GC scan on Unix. Since this stack_limit was introduced to fix a Unix specific issue, we can fix it just by making the stack_limit stuff Unix only.

janvorli · 2023-08-21T15:55:56Z

@jkotas I am unable to reproduce the problem using the repro that you've shared above. The only thing I am not sure about is what you've meant by "compile with /o+". I am building it using dotnet build -c Checked -a x86 and have <Optimize>True</Optimize> in the csproj.

janvorli · 2023-08-21T16:26:52Z

Btw, after looking at the original dump with the issue again, the real culprit is actually elsewhere. During the GC scan, we should have some other frame on top of the stack than an inactive inlined call frame. I could see that besides the crash, the GC stack walk started in the middle of the managed stack.

jkotas · 2023-08-21T19:04:51Z

@jkotas I am unable to reproduce the problem using the repro that you've shared above

It still repros for me. Here are the exact repro steps that I have just tried:

build -s clr -a x86 -c checked
cd artifacts\bin\coreclr\windows.x86.Checked
notepad test.cs (copy, paste and save my repro: https://github.com/dotnet/runtime/issues/86265#issuecomment-1555398337)
csc /r:System.Private.CoreLib.dll /o+ test.cs (assumes csc alias that invokes recent C# compiler) 
set DOTNET_TieredCompilation=0

Result:

---------------------------
Microsoft Visual C++ Runtime Library
---------------------------
Assertion failed!

Program: ...ifacts\bin\coreclr\windows.x86.Checked\coreclr.dll
File: C:\runtime\src\coreclr\vm\siginfo.cpp
Line: 4968

Expression: sc->stack_limit != 0

For information on how your program can cause an assertion
failure, see the Visual C++ documentation on asserts

(Press Retry to debug the application - JIT must be enabled)
---------------------------
Abort   Retry   Ignore   
---------------------------

There is a problem with computing stack_limit in the presence of inactive InlinedCallFrame. We were trying to call GetCallSiteSP on that frame to get the call site SP. But when the inlined call frame is not active (there is no call to native code in progress), that method returns random junk. But even if we checked for an active call before reading the value, it would not get correct stack limit on x86 windows when there was no active call with the inlined call frame being the topmost one. That can happen with hardware exception handling on Windows x86. On other targets, we always have other explicit frame on top of the explicit frames stack, but on windows x86, we don't use the FaultingExceptionFrame for hardware exceptions, so the inactive inlined call frame could be the topmost one when GC starts to scan stack roots. Since the stack_limit was introduced to fix a Unix specific problem, I have fixed that by disabling the stack_limit usage for x86 windows. Close dotnet#86265

* Fix stack_limit handling There is a problem with computing stack_limit in the presence of inactive InlinedCallFrame. We were trying to call GetCallSiteSP on that frame to get the call site SP. But when the inlined call frame is not active (there is no call to native code in progress), that method returns random junk. But even if we checked for an active call before reading the value, it would not get correct stack limit on x86 windows when there was no active call with the inlined call frame being the topmost one. That can happen with hardware exception handling on Windows x86. On other targets, we always have other explicit frame on top of the explicit frames stack, but on windows x86, we don't use the FaultingExceptionFrame for hardware exceptions, so the inactive inlined call frame could be the topmost one when GC starts to scan stack roots. Since the stack_limit was introduced to fix a Unix specific problem, I have fixed that by disabling the stack_limit usage for x86 windows. Close #86265 * Add back the stack_limit field to keep contract unchanged * Reflect PR feedback - Minimalize the number of ifdefs * Reflect PR feedback

There is a problem with computing stack_limit in the presence of inactive InlinedCallFrame. We were trying to call GetCallSiteSP on that frame to get the call site SP. But when the inlined call frame is not active (there is no call to native code in progress), that method returns random junk. But even if we checked for an active call before reading the value, it would not get correct stack limit on x86 windows when there was no active call with the inlined call frame being the topmost one. That can happen with hardware exception handling on Windows x86. On other targets, we always have other explicit frame on top of the explicit frames stack, but on windows x86, we don't use the FaultingExceptionFrame for hardware exceptions, so the inactive inlined call frame could be the topmost one when GC starts to scan stack roots. Since the stack_limit was introduced to fix a Unix specific problem, I have fixed that by disabling the stack_limit usage for x86 windows. Close #86265

* Fix stack_limit handling There is a problem with computing stack_limit in the presence of inactive InlinedCallFrame. We were trying to call GetCallSiteSP on that frame to get the call site SP. But when the inlined call frame is not active (there is no call to native code in progress), that method returns random junk. But even if we checked for an active call before reading the value, it would not get correct stack limit on x86 windows when there was no active call with the inlined call frame being the topmost one. That can happen with hardware exception handling on Windows x86. On other targets, we always have other explicit frame on top of the explicit frames stack, but on windows x86, we don't use the FaultingExceptionFrame for hardware exceptions, so the inactive inlined call frame could be the topmost one when GC starts to scan stack roots. Since the stack_limit was introduced to fix a Unix specific problem, I have fixed that by disabling the stack_limit usage for x86 windows. Close #86265 * Add back the stack_limit field to keep contract unchanged * Reflect PR feedback - Minimalize the number of ifdefs * Reflect PR feedback --------- Co-authored-by: Jan Vorlicek <janvorli@microsoft.com>

BruceForstall added JitStress CLR JIT issues involving JIT internal stress modes area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI blocking-clean-ci-optional Blocking optional rolling runs labels May 15, 2023

BruceForstall added this to the 8.0.0 milestone May 15, 2023

JulieLeeMSFT assigned janvorli May 18, 2023

markples mentioned this issue May 18, 2023

Test failure: JIT/Regression/VS-ia64-JIT/V2.0-RTM/b539509/b539509/b539509.dll #86263

Closed

markples added runtime-coreclr specific to the CoreCLR runtime and removed area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI labels May 24, 2023

jkotas added area-VM-coreclr and removed runtime-coreclr specific to the CoreCLR runtime JitStress CLR JIT issues involving JIT internal stress modes labels May 25, 2023

janvorli mentioned this issue Aug 22, 2023

Fix stack_limit handling #90937

Merged

ghost added the in-pr There is an active PR which will close this issue when it is merged label Aug 22, 2023

janvorli closed this as completed in #90937 Aug 24, 2023

ghost removed the in-pr There is an active PR which will close this issue when it is merged label Aug 24, 2023

ghost locked as resolved and limited conversation to collaborators Sep 24, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Test failure: JIT/opt/Vectorization/UnrollEqualsStartsWIth/UnrollEqualsStartsWIth.dll #86265

Test failure: JIT/opt/Vectorization/UnrollEqualsStartsWIth/UnrollEqualsStartsWIth.dll #86265

BruceForstall commented May 15, 2023

ghost commented May 15, 2023

EgorBo commented May 15, 2023

markples commented May 15, 2023

markples commented May 16, 2023 •

edited

Loading

markples commented May 17, 2023

BruceForstall commented May 17, 2023

markples commented May 18, 2023 •

edited

Loading

BruceForstall commented May 18, 2023

mangod9 commented May 18, 2023

janvorli commented May 18, 2023

markples commented May 18, 2023 •

edited

Loading

markples commented May 18, 2023

janvorli commented May 18, 2023

mangod9 commented May 19, 2023

janvorli commented May 19, 2023

markples commented May 19, 2023

janvorli commented May 19, 2023

markples commented May 19, 2023

jkotas commented May 20, 2023

jkotas commented May 20, 2023 •

edited

Loading

jkotas commented May 20, 2023

janvorli commented May 25, 2023

janvorli commented Aug 21, 2023

janvorli commented Aug 21, 2023

jkotas commented Aug 21, 2023

Test failure: JIT/opt/Vectorization/UnrollEqualsStartsWIth/UnrollEqualsStartsWIth.dll #86265

Test failure: JIT/opt/Vectorization/UnrollEqualsStartsWIth/UnrollEqualsStartsWIth.dll #86265

Comments

BruceForstall commented May 15, 2023

ghost commented May 15, 2023

EgorBo commented May 15, 2023

markples commented May 15, 2023

markples commented May 16, 2023 • edited Loading

markples commented May 17, 2023

BruceForstall commented May 17, 2023

markples commented May 18, 2023 • edited Loading

BruceForstall commented May 18, 2023

mangod9 commented May 18, 2023

janvorli commented May 18, 2023

markples commented May 18, 2023 • edited Loading

markples commented May 18, 2023

janvorli commented May 18, 2023

mangod9 commented May 19, 2023

janvorli commented May 19, 2023

markples commented May 19, 2023

janvorli commented May 19, 2023

markples commented May 19, 2023

jkotas commented May 20, 2023

jkotas commented May 20, 2023 • edited Loading

jkotas commented May 20, 2023

janvorli commented May 25, 2023

janvorli commented Aug 21, 2023

janvorli commented Aug 21, 2023

jkotas commented Aug 21, 2023

markples commented May 16, 2023 •

edited

Loading

markples commented May 18, 2023 •

edited

Loading

markples commented May 18, 2023 •

edited

Loading

jkotas commented May 20, 2023 •

edited

Loading