Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove hard_limit_for_bookkeeping #77480

Merged
merged 5 commits into from
Dec 12, 2022

Conversation

PeterSolMS
Copy link
Contributor

As it turns out, it's not so easy to set aside a hard limit for the GC's bookkeeping, because some regions will be partially comitted and so will need more bookkeeping information as a percentage.

So this removes the hard_limit_for_bookkeeping and associated fields, and instead commits memory for the mark array eagerly, even if BGC is not in progress.

…C's bookkeeping, because some regions will be partially comitted and so will need more bookkeeping information as a percentage.

So this removes the hard_limit_for_bookkeeping and associated fields, and instead commits memory for the mark array eagerly, even if BGC is not in progress.
@ghost
Copy link

ghost commented Oct 26, 2022

Tagging subscribers to this area: @dotnet/gc
See info in area-owners.md if you want to be subscribed.

Issue Details

As it turns out, it's not so easy to set aside a hard limit for the GC's bookkeeping, because some regions will be partially comitted and so will need more bookkeeping information as a percentage.

So this removes the hard_limit_for_bookkeeping and associated fields, and instead commits memory for the mark array eagerly, even if BGC is not in progress.

Author: PeterSolMS
Assignees: PeterSolMS
Labels:

area-GC-coreclr

Milestone: -

Copy link
Member

@cshung cshung left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I liked this idea more than trying to set aside memory for bookkeeping. The one thing I am not sure about is that now we are committing the mark array for every region. If background GC is happening anyway, it might be fine because we will be committing them anyway, but what if we don't?

src/coreclr/gc/gcpriv.h Outdated Show resolved Hide resolved
dprintf (GC_TABLE_LOG, ("new seg %Ix, mark_array is %Ix",
heap_segment_mem (region), mark_array));
if (((region->flags & heap_segment_flags_ma_committed) == 0) &&
!commit_mark_array_new_seg (__this, region))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shall we guard this under gc_can_use_concurrent?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

commit_mark_array_new_seg shouldn't do anything if concurrent is disabled; although I'm looking at the code, we are not initializing background_saved_highest_address in init_gc_heap which we should. @PeterSolMS could you please add this (ie, init background_saved_highest_address and background_saved_lowest_address to 0 in init_gc_heap like we do with WKS GC)? we could still guard this with gc_can_use_concurrent just to indicate our intention but it's not essential.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just pushed changes with the initialization of background_saved_lowest_address and background_saved_highest_address, and the restoration of current_total_committed_bookkeeping.

@Maoni0
Copy link
Member

Maoni0 commented Oct 26, 2022

If background GC is happening anyway, it might be fine because we will be committing them anyway, but what if we don't?

that is the drawback with doing this. but if concurrent GC is enabled, it's very likely we'll need to do a BGC at some point so it's not too bad.

…ments back.

Initialize background_saved_lowest_address and background_saved_lowest_address.

Removed unrelated bug fix which is addressed by a separate PR.
@PeterSolMS
Copy link
Contributor Author

I think I found a hole in our logic - clear_region_info (called by return_free_region) will decommit the mark array in high memory load. When get_free_region tries to commit again, we'll fail and get the same AV as before. Perhaps this means we'll have to keep the mark array committed, i.e. disable the logic to decommit the mark array in high memory load?

…info.

The issue is that GC needs at least one free region, and it needs the mark array committed. So when it tries to get a region at the end, and we cannot commit the mark array, we cannot actually get a free region. Currently we AV in this case, but there really isn't a good option to recover at this point. So it's better to keep the mark array committed and perhaps fail with an OOM exception.
@PeterSolMS
Copy link
Contributor Author

With this change, we get much closer to actually exhausting the available memory in hard limit cases than before.

In a test case with hard limit set to 0x13c1d000 = 471,859,200, we only get to 331,468,800 committed bytes before running out of memory. The reason is that the memory set aside for bookkeeping is not sufficient, so we run into a situation where the mark array for a region can not be committed.

With this change, we get to 471,842,816 committed bytes before running out. "!dumpheap -stat" reports 462,803,767 bytes in objects, of which 10,684,992 are in free objects, so that we have a total of 452,118,775 bytes in useful objects. 6,078,464 are used for bookkeeping purposes (mainly for the mark array).

@frankbuckley
Copy link
Contributor

With this change, we get much closer to actually exhausting the available memory in hard limit cases than before.

In a test case with hard limit set to 0x13c1d000 = 471,859,200, we only get to 331,468,800 committed bytes before running out of memory...

With this change, we get to 471,842,816 committed bytes before running out...

Will this be backported to 7.0?

@ghost ghost locked as resolved and limited conversation to collaborators Jan 12, 2023
@mangod9
Copy link
Member

mangod9 commented Apr 24, 2023

/backport to release/7.0-staging

@github-actions github-actions bot unlocked this conversation Apr 24, 2023
@github-actions
Copy link
Contributor

Started backporting to release/7.0-staging: https://github.com/dotnet/runtime/actions/runs/4789151761

@github-actions
Copy link
Contributor

@mangod9 backporting to release/7.0-staging failed, the patch most likely resulted in conflicts:

$ git am --3way --ignore-whitespace --keep-non-patch changes.patch

Applying: As it turns out, it's not so easy to set aside a hard limit for the GC's bookkeeping, because some regions will be partially comitted and so will need more bookkeeping information as a percentage.
Using index info to reconstruct a base tree...
M	src/coreclr/gc/gc.cpp
M	src/coreclr/gc/gcpriv.h
Falling back to patching base and 3-way merge...
Auto-merging src/coreclr/gc/gcpriv.h
Auto-merging src/coreclr/gc/gc.cpp
CONFLICT (content): Merge conflict in src/coreclr/gc/gc.cpp
error: Failed to merge in the changes.
hint: Use 'git am --show-current-patch=diff' to see the failed patch
Patch failed at 0001 As it turns out, it's not so easy to set aside a hard limit for the GC's bookkeeping, because some regions will be partially comitted and so will need more bookkeeping information as a percentage.
When you have resolved this problem, run "git am --continue".
If you prefer to skip this patch, run "git am --skip" instead.
To restore the original branch and stop patching, run "git am --abort".
Error: The process '/usr/bin/git' failed with exit code 128

Please backport manually!

@github-actions
Copy link
Contributor

@mangod9 an error occurred while backporting to release/7.0-staging, please check the run log for details!

Error: git am failed, most likely due to a merge conflict.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Apr 24, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Fail to get_free_region in thread_final_regions due to low memory for mark array might lead to AV.
5 participants