Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[mono][gc] Objects larger than LOS section size call the LOS heap to grow without bound #97991

Closed
lambdageek opened this issue Feb 5, 2024 · 4 comments · Fixed by #98154
Closed

Comments

@lambdageek
Copy link
Member

lambdageek commented Feb 5, 2024

consider this program:

long bigger_than_los_section = 32 * 1024 * 1024;
long bigger_than_small_obj = 100 * 1024;
long small_obj = 50;

long alloc_size = bigger_than_small_obj;

long cnt = 0;
long[] data = new long[1024*1024*32];
var end = DateTime.Now + TimeSpan.FromMilliseconds(10000);
do
{
    // these are just dummy computations to keep CPU busy
    Interlocked.Increment(ref cnt);
    data = new long[alloc_size];
    data[0] = cnt;
    // force some GC
    if(cnt % 10000 == 0)
    {
        GC.Collect();
        var total = GC.GetTotalMemory(true);
        Console.WriteLine("GC.GetTotalMemory: "+total);
                        }
}
while(DateTime.Now<end);

if alloc_size is small_obj then the LOS size stays constant, as we would expect

$ MONO_LOG_MASK=gc MONO_LOG_LEVEL=debug dotnet run
...
Mono: GC_MAJOR: (user request) time 0.05ms, stw 0.08ms los size: 1024K in use: 24K
...
Mono: GC_MAJOR: (user request) time 0.05ms, stw 0.06ms los size: 1024K in use: 24K

if alloc_size is bigger_than_small_obj then we also have a LOS size that stays constant

$ MONO_LOG_MASK=gc MONO_LOG_LEVEL=debug dotnet run
...
Mono: GC_MAJOR_CONCURRENT_FINISH: (finishing) time 0.23ms, stw 0.12ms los size: 3072K in use: 1625K
...
Mono: GC_MAJOR_CONCURRENT_FINISH: (finishing) time 0.24ms, stw 0.14ms los size: 3072K in use: 1625K

if alloc_size is bigger_than_los_section then the LOS heap size grows bigger and bigger (despite us not actually placing the new long[alloc_size] in the LOS section (!)

$ MONO_LOG_MASK=gc MONO_LOG_LEVEL=debug dotnet run
...
Mono: GC_MAJOR_CONCURRENT_FINISH: (finishing) time 0.20ms, stw 0.13ms los size: 525344K in use: 524313K
...
Mono: GC_MAJOR_CONCURRENT_FINISH: (finishing) time 0.19ms, stw 0.13ms los size: 787504K in use: 786457K
...
Mono: GC_MAJOR_CONCURRENT_FINISH: (finishing) time 0.14ms, stw 0.09ms los size: 1049664K in use: 1048601K
...
Mono: GC_MAJOR_CONCURRENT_FINISH: (finishing) time 0.20ms, stw 0.14ms los size: 1573984K in use: 1572889K
...

This showed up in a MT WASM test #97970

@ghost ghost added the untriaged New issue has not been triaged by the area owner label Feb 5, 2024
@lambdageek lambdageek added area-GC-mono and removed area-GC-coreclr untriaged New issue has not been triaged by the area owner labels Feb 5, 2024
@ghost
Copy link

ghost commented Feb 5, 2024

Tagging subscribers to this area: @BrzVlad
See info in area-owners.md if you want to be subscribed.

Issue Details

consider this program:

long bigger_than_los_section = 32 * 1024 * 1024;
long bigger_than_small_obj = 100 * 1024;
long small_obj = 50;

long alloc_size = bigger_than_small_obj;

long cnt = 0;
long[] data = new long[1024*1024*32];
var end = DateTime.Now + TimeSpan.FromMilliseconds(10000);
do
{
    // these are just dummy computations to keep CPU busy
    Interlocked.Increment(ref cnt);
    data = new long[alloc_size];
    data[0] = cnt;
    // force some GC
    if(cnt % 10000 == 0)
    {
        GC.Collect();
        var total = GC.GetTotalMemory(true);
        Console.WriteLine("GC.GetTotalMemory: "+total);
                        }
}
while(DateTime.Now<end);

if alloc_size is small_obj then the LOS size stays constant, as we would expect

$ MONO_LOG_MASK=gc MONO_LOG_LEVEL=debug dotnet run
...
Mono: GC_MAJOR: (user request) time 0.05ms, stw 0.08ms los size: 1024K in use: 24K
...
Mono: GC_MAJOR: (user request) time 0.05ms, stw 0.06ms los size: 1024K in use: 24K

if alloc_size is bigger_than_small_obj then we also have a LOS size that stays constant

$ MONO_LOG_MASK=gc MONO_LOG_LEVEL=debug dotnet run
...
Mono: GC_MAJOR_CONCURRENT_FINISH: (finishing) time 0.23ms, stw 0.12ms los size: 3072K in use: 1625K
...
Mono: GC_MAJOR_CONCURRENT_FINISH: (finishing) time 0.24ms, stw 0.14ms los size: 3072K in use: 1625K

if alloc_size is bigger_than_los_section then the LOS heap size grows bigger and bigger (despite us not actually placing the new long[alloc_size] in the LOS section (!)

$ MONO_LOG_MASK=gc MONO_LOG_LEVEL=debug dotnet run
...
Mono: GC_MAJOR_CONCURRENT_FINISH: (finishing) time 0.20ms, stw 0.13ms los size: 525344K in use: 524313K
...
Mono: GC_MAJOR_CONCURRENT_FINISH: (finishing) time 0.19ms, stw 0.13ms los size: 787504K in use: 786457K
...
Mono: GC_MAJOR_CONCURRENT_FINISH: (finishing) time 0.14ms, stw 0.09ms los size: 1049664K in use: 1048601K
...
Mono: GC_MAJOR_CONCURRENT_FINISH: (finishing) time 0.20ms, stw 0.14ms los size: 1573984K in use: 1572889K
...

I believe this is due to this code:

sgen_ensure_free_space (size, GENERATION_OLD);
#ifdef USE_MALLOC
obj = g_malloc (size + sizeof (LOSObject));
memset (obj, 0, size + sizeof (LOSObject));
#else
if (size > LOS_SECTION_OBJECT_LIMIT) {
size_t obj_size = size + sizeof (LOSObject);
int pagesize = mono_pagesize ();
size_t alloc_size = SGEN_ALIGN_UP_TO (obj_size, pagesize);
if (sgen_memgov_try_alloc_space (alloc_size, SPACE_LOS)) {
obj = (LOSObject *)sgen_alloc_os_memory (alloc_size, (SgenAllocFlags)(SGEN_ALLOC_HEAP | SGEN_ALLOC_ACTIVATE), NULL, MONO_MEM_ACCOUNT_SGEN_LOS);
if (obj) {
sgen_los_memory_usage_total += alloc_size;
obj = randomize_los_object_start (obj, obj_size, alloc_size, pagesize);
}
}
} else {
obj = get_los_section_memory (size + sizeof (LOSObject));

We always call sgen_ensure_free_space even if the size is bigger than LOS_SECTION_OBJECT_LIMIT - so we might be always asking the OS for more memory even if we can't fit the request into that memory anyway - and then we never return those allocations to the OS.

This showed up in a MT WASM test #97970

Author: lambdageek
Assignees: -
Labels:

area-GC-mono

Milestone: -

@lambdageek lambdageek added the untriaged New issue has not been triaged by the area owner label Feb 5, 2024
@lambdageek
Copy link
Member Author

lambdageek commented Feb 5, 2024

It's weird that we keep printing a larger and larger LOS size. sgen_los_free_object has a sgen_los_memory_usage_total -= size; in the case where size > LOS_SECTION_OBJECT_LIMIT


I thought maybe the problem is in the call to ensure_free_space before the size > LOS_SECTION_OBJECT_LIMIT check in this code

sgen_ensure_free_space (size, GENERATION_OLD);
#ifdef USE_MALLOC
obj = g_malloc (size + sizeof (LOSObject));
memset (obj, 0, size + sizeof (LOSObject));
#else
if (size > LOS_SECTION_OBJECT_LIMIT) {
size_t obj_size = size + sizeof (LOSObject);
int pagesize = mono_pagesize ();
size_t alloc_size = SGEN_ALIGN_UP_TO (obj_size, pagesize);
if (sgen_memgov_try_alloc_space (alloc_size, SPACE_LOS)) {
obj = (LOSObject *)sgen_alloc_os_memory (alloc_size, (SgenAllocFlags)(SGEN_ALLOC_HEAP | SGEN_ALLOC_ACTIVATE), NULL, MONO_MEM_ACCOUNT_SGEN_LOS);
if (obj) {
sgen_los_memory_usage_total += alloc_size;
obj = randomize_los_object_start (obj, obj_size, alloc_size, pagesize);
}
}
} else {
obj = get_los_section_memory (size + sizeof (LOSObject));

but I don't see ensure_free_space actually asking the memory governor for more space. So it's something in sgen-los itself, I guess.

@lambdageek
Copy link
Member Author

lambdageek commented Feb 6, 2024

Update With the interpreter too, it stays constant, like with the JIT. with MONO_ENV_OPTIONS=--interp the LOS size to a higher point and then oscillates there:

$ MONO_ENV_OPTIONS=--interp MONO_LOG_MASK=gc MONO_LOG_LEVEL=debug dotnet run
...
Mono: GC_MAJOR_CONCURRENT_FINISH: (finishing) time 0.21ms, stw 0.07ms los size: 525344K in use: 524313K
...
Mono: GC_MAJOR_CONCURRENT_FINISH: (finishing) time 0.23ms, stw 0.16ms los size: 787504K in use: 786457K
...
Mono: GC_MAJOR_CONCURRENT_FINISH: (finishing) time 0.23ms, stw 0.17ms los size: 2098304K in use: 2097177K
...
Mono: GC_MAJOR_CONCURRENT_FINISH: (finishing) time 0.23ms, stw 0.16ms los size: 2360464K in use: 2359321K
...
Mono: GC_MAJOR_CONCURRENT_FINISH: (finishing) time 0.24ms, stw 0.18ms los size: 2360464K in use: 2359321K
...
Mono: GC_MAJOR_CONCURRENT_FINISH: (finishing) time 0.22ms, stw 0.16ms los size: 2098304K in use: 2097177K
...
Mono: GC_MAJOR_CONCURRENT_FINISH: (finishing) time 0.24ms, stw 0.17ms los size: 1836144K in use: 1835033K
...
Mono: GC_MAJOR_CONCURRENT_FINISH: (finishing) time 0.25ms, stw 0.18ms los size: 2360464K in use: 2359321K
...
Mono: GC_MAJOR_CONCURRENT_FINISH: (finishing) time 0.24ms, stw 0.17ms los size: 2360464K in use: 2359321K
...
Mono: GC_MAJOR_CONCURRENT_FINISH: (finishing) time 0.23ms, stw 0.16ms los size: 2098304K in use: 2097177K

With the interpreter it climbs higher and then oscillates around 2098304K.
This is on osx-arm64 with the .NET 8.0.1 SDK


I think it's a problem with the test case. This variant runs with constant LOS size:

using System.Runtime.CompilerServices;

long bigger_than_los_section = 32 * 1024 * 1024;
long bigger_than_small_obj = 100 * 1024;
long small_obj = 50;

long alloc_size = bigger_than_los_section;

long cnt = 0;
long[] data = null;
AssignData(ref data, alloc_size);
var end = DateTime.Now + TimeSpan.FromMilliseconds(10000);
do
{
    // these are just dummy computations to keep CPU busy
    Interlocked.Increment(ref cnt);
    AssignData(ref data, alloc_size);
    data[0] = cnt;
    // force some GC
    if(cnt % 10000 == 0)
    {
        GC.Collect();
        // GC.WaitForPendingFinalizers();
        // GC.Collect();
        // GC.WaitForPendingFinalizers();
        var total = GC.GetTotalMemory(true);
        Console.WriteLine("GC.GetTotalMemory: "+total);
    }
}
while(DateTime.Now<end);


[MethodImpl(MethodImplOptions.NoInlining)]
static void AssignData (ref long[] d, long size, int recurse = 20)
{
    if (recurse >= 0) {
        AssignData(ref d, size, recurse - 1);
    } else {
        d = new long[size];
    }
}
...
Mono: GC_MAJOR_CONCURRENT_FINISH: (finishing) time 0.22ms, stw 0.14ms los size: 525344K in use: 524313K
...
Mono: GC_MAJOR_CONCURRENT_FINISH: (finishing) time 0.14ms, stw 0.08ms los size: 1049664K in use: 1048601K
...
Mono: GC_MAJOR_CONCURRENT_FINISH: (finishing) time 0.23ms, stw 0.16ms los size: 787504K in use: 786457K
...
Mono: GC_MAJOR_CONCURRENT_FINISH: (finishing) time 0.23ms, stw 0.17ms los size: 787504K in use: 786457K
...
Mono: GC_MAJOR_CONCURRENT_FINISH: (finishing) time 0.23ms, stw 0.17ms los size: 787504K in use: 786457K

it starts a bit lower, spikes and then reaches a steady state of 787504K

@pavelsavara @BrzVlad

@lambdageek
Copy link
Member Author

lambdageek commented Feb 6, 2024

I can get the interpreter to hold steady at the same LOS size as the JIT if I chance the recurse dept to be 200 instead of 20

Mono: GC_MAJOR_CONCURRENT_FINISH: (finishing) time 0.22ms, stw 0.13ms los size: 787504K in use: 786457K
...
Mono: GC_MAJOR_CONCURRENT_FINISH: (finishing) time 0.20ms, stw 0.12ms los size: 787504K in use: 786457K

@ghost ghost added the in-pr There is an active PR which will close this issue when it is merged label Feb 8, 2024
@ghost ghost removed in-pr There is an active PR which will close this issue when it is merged untriaged New issue has not been triaged by the area owner labels Feb 9, 2024
@github-actions github-actions bot locked and limited conversation to collaborators Mar 10, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant