Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Revert "Don't expose guard pages to malloc_stack API consumers" #55555

Merged
merged 3 commits into from
Aug 29, 2024

Conversation

vtjnash
Copy link
Sponsor Member

@vtjnash vtjnash commented Aug 21, 2024

Reverts #54591

This cause the runtime to misbehave and crash, since all of the consumers of this information in the runtime assumed that the guard pages are accounted for correctly as part of the reserved allocation. Nothing in the runtime ever promised that it is valid to access the pages beyond the current redzone (indeed, ASAN would forbid it as well).

@vtjnash vtjnash added the backport 1.11 Change should be backported to release-1.11 label Aug 21, 2024
@lgoettgens
Copy link
Contributor

Cc @fingolfin

@giordano giordano added the kind:revert This reverts a previously merged PR. label Aug 21, 2024
@fingolfin
Copy link
Contributor

I have a hard time understanding the PR description, probably because I am misunderstanding something fundamental, sorry :-/. So some questions:

This cause the runtime to misbehave and crash, since all of the consumers of this information in the runtime

What are these consumers? The description makes it sound as if everything should burn and explode, but at least the Julia test suite seemed to pass.

assumed that the guard pages are accounted for correctly as part of the reserved allocation.

I guess what is "correct" depends on the definition of the interface. I read this as saying: "the semantics of malloc_stack and free_stack changed, but some code relied on the old semantics and got broken by the patch". Is that it?

Nothing in the runtime ever promised that it is valid to access the pages beyond the current redzone (indeed, ASAN would forbid it as well).

I have trouble following you at all here. Access which pages? Which redzones (is that ASAN terminology)?

Anyway: if this is reverted, then I'll dust of my old alternative patch, which modifies jl_active_task_stack to remove the guard page.

Just to explain, the goal of the removed patch was to to simplify scanning task stacks (for a conservative GC) for references to objects. For that we want to stop before hitting the guard page.

@fingolfin
Copy link
Contributor

(To be clear, I am not objecting to the reversal, I just would like to understand to make sure there is not some deeper issue in my stack scanning code)

@vtjnash
Copy link
Sponsor Member Author

vtjnash commented Aug 22, 2024

The task scanning should stop once it hits the current stack pointer, and never go all of the way to reaching these addresses. The code elsewhere detects if faults occur on the stack and treats them differently than other segfaults.

@vtjnash
Copy link
Sponsor Member Author

vtjnash commented Aug 22, 2024

The redzone is the number of bytes past the current stack pointer that are also used by the function (https://en.wikipedia.org/wiki/Red_zone_(computing))

@JeffBezanson
Copy link
Sponsor Member

Yes, I agree --- you should only scan up to the actual stack pointer, and excluding guard pages was just a hack to attempt to make scanning work without knowing the stack pointer.

@JeffBezanson
Copy link
Sponsor Member

Also,
55555
55555
55555
55555
55555
!

@fingolfin
Copy link
Contributor

We'd be happy to only scan to the stack pointer, but AFAIK so far it isn't available, or is it now? We've performed the conservative scanning of task stacks for many years now, and it worked well enough.
That said, if the SP is available (how?), then from my perspective ideally jl_active_task_stack would be adjusted to use it to adjust its return values

@KristofferC KristofferC mentioned this pull request Aug 26, 2024
33 tasks
@vtjnash
Copy link
Sponsor Member Author

vtjnash commented Aug 29, 2024

All the code in base (e.g. the unmap here and signal handling elsewhere) wants to know the full size of the memory region being managed, based upon the observed need to add jl_guard_size to every use of the size and base pointer of the region (lines diff here)

@vtjnash vtjnash merged commit e921dc8 into master Aug 29, 2024
5 of 8 checks passed
@vtjnash vtjnash deleted the revert-54591-mh/guard_pages branch August 29, 2024 18:12
void* stk = mmap(0, bufsz, PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
if (stk == MAP_FAILED)
return MAP_FAILED;

#ifdef JL_USE_GUARD_PAGE
Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This used to be #if !defined(JL_HAVE_UCONTEXT) && !defined(JL_HAVE_SIGALTSTACK) before #54591

Copy link
Sponsor Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, those changed in a subsequent PR that I didn't want to revert

KristofferC pushed a commit that referenced this pull request Sep 12, 2024
Reverts #54591

This cause the runtime to misbehave and crash, since all of the
consumers of this information in the runtime assumed that the guard
pages are accounted for correctly as part of the reserved allocation.
Nothing in the runtime ever promised that it is valid to access the
pages beyond the current redzone (indeed, ASAN would forbid it as well).
KristofferC added a commit that referenced this pull request Sep 17, 2024
Backported PRs:
- [x] #55480 <!-- Fix push! for OffsetVectors, add tests for push! and
append! on AbstractVector -->
- [x] #55443 <!-- Add test for upper/lower/titlecase and fix call -->
- [x] #55524 <!-- Set `.jl` sources as read-only during installation -->
- [x] #55500 <!-- make jl_thread_suspend_and_get_state safe -->
- [x] #55506 <!-- Fix indexing in _mapreducedim for OffsetArrays -->
- [x] #55564 <!-- Empty out loaded_precompiles dict instead of asserting
it's empty. -->
- [x] #55567 <!-- Initialize threadpools correctly during sysimg build
-->
- [x] #55596 <!-- Fast bounds-check for CartesianIndex ranges -->
- [x] #55605 <!-- Reroute Symmetric/Hermitian + Diagonal through
triangular -->
- [x] #55640 <!-- win: move stack_overflow_warning to the backtrace
fiber -->
- [x] #55715 <!-- Add precompile signatures to Markdown to reduce
latency. -->
- [x] #55593 <!-- Fix invalidations for FileIO -->
- [x] #55555 <!-- Revert "Don't expose guard pages to malloc_stack API
consumers" -->
- [x] #55720 <!-- Fix `pkgdir` for extensions -->
- [x] #55729 <!-- Avoid confounding compilation side effects of
`@time_imports` -->
- [x] #55718 <!-- Fix `@time_imports` extension recognition -->
- [x] #55522 <!-- Fix tr for Symmetric/Hermitian block matrices -->

Contains multiple commits, manual intervention needed:
- [ ] #55509 <!-- Fix cong implementation to be properly random and not
just cycling. -->

Non-merged PRs with backport label:
- [ ] #55641 <!-- fall back to slower stat filesize if optimized
filesize fails -->
- [ ] #55534 <!-- Set stdlib sources as read-only during installation
-->
- [ ] #55499 <!-- propagate the terminal's `displaysize` to the
`IOContext` used by the REPL -->
- [ ] #55458 <!-- Allow for generically extracting unannotated string
-->
- [ ] #55457 <!-- Make AnnotateChar equality consider annotations -->
- [ ] #55453 <!-- Privatise the annotations API, for StyledStrings -->
- [ ] #55355 <!-- relocation: account for trailing path separator in
depot paths -->
- [ ] #55220 <!-- `isfile_casesensitive` fixes on Windows -->
- [ ] #55169 <!-- `propertynames` for SVD respects private argument -->
- [ ] #54457 <!-- Make `String(::Memory)` copy -->
- [ ] #53957 <!-- tweak how filtering is done for what packages should
be precompiled -->
- [ ] #51479 <!-- prevent code loading from lookin in the versioned
environment when building Julia -->
- [ ] #50813 <!-- More doctests for Sockets and capitalization fix -->
- [ ] #50157 <!-- improve docs for `@inbounds` and
`Base.@propagate_inbounds` -->
- [ ] #41244 <!-- Fix shell `cd` error when working dir has been deleted
-->
@KristofferC KristofferC mentioned this pull request Sep 24, 2024
30 tasks
@KristofferC KristofferC removed the backport 1.11 Change should be backported to release-1.11 label Sep 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind:revert This reverts a previously merged PR.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants