-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sys/tlsf: move heap intialization for tlsf-malloc to auto-init #12021
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks reasonable.
The application should not need to bother with initializing the memory allocator.
@haukepetersen This may no be enough- there is code that runs before auto init. The standard library itself could be using malloc for once-off allocation of buffers. See #4490, #5796. Also see my PoC branch tlsf-init-experiments and in particular this file. |
Is this really a concern though? If it did, it would crash already without this patch, just by using |
@benpicco it sort of does, see #5796 (comment) Of course, this patch is better than nothing. Still the right thing would be to do this before the libc initialization: |
Yes, it is correct that this PR does not help for any Including the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm working on a solution based on my previous work. I will include a test too.
@@ -20,6 +20,11 @@ | |||
|
|||
#include "auto_init.h" | |||
|
|||
#ifdef MODULE_TLSF_MALLOC | |||
#include "tlsf-malloc.h" | |||
static uint8_t _tlsf_pool[TLSF_MALLOC_HEAPSIZE]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On platforms other than malloc this can be derived from _sheap
and _eheap
linker symbols, which has the advantage of automatically using all available heap.
@@ -93,6 +98,10 @@ | |||
|
|||
void auto_init(void) | |||
{ | |||
#ifdef MODULE_TLSF_MALLOC | |||
/* NOTE: this should be initialized before any other modules */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is right, but is not enough. It should be initialized before the C library itself (that is what .preinit_array
is for).
Sounds good. Are you planning to PR this anytime soon (like this week)? If so, there is no to keep this PR... |
Yes, this week. I already got it working on Friday, but I'm trying to figure out how to specify how much heap to use, maybe you can help me. I'm using Heap grows upwards and stack downwards and they eventually meet and when this happens things go boom. Keep in mind tlsf does not have any way to grow a memory pool, only to add new ones.
|
IMO we should do this without
As discussed offline, this is not the case in RIOT, all stacks are statically allocated at runtime. |
This test is currently failing because of RIOT-OS#4490, RIOT-OS#5796 and RIOT-OS#12021. When using TLSF as the system allocator it should be initialized - Automatically, as that is what the user expects. - Early in the boot process, since the C library mallocs internal buffers. Failing to do so will lead to a crash as the issues and this test shows. The test is blacklisted and will be whitelisted in the next commit with the fix.
The TLSF allocator needs to be initialized before use. This is an issue when it is used as a default system allocator since the user expects to be able to call malloc right away. This is made worse by the fact that the C library uses malloc internally to create buffers and that may happen before the user's code has a chance to run. As a consequence, even doing printf when using USEMODULE=tlsf-malloc will lead to a crash. A mechanism is needed to: 1. Initialize the pool early. 2. Determine which memory should be used as a heap and reserve it. Issue (1) is solved by adding the initializer to the C library's `.preinit_array`, which is a cross-file array of function pointers that run before the library is initialized -that is before _init(). See the newlib source code for more details. Point (2) is important because TLSF dows not support growing the pool, only adding new ones. We would like to initialize it with a pool as big as possible. In native (2) is handled by defining a static array of fixed size (given by TLSF_NATIVE_HEAPSIZE). Memory is plentiful in native and we down't care about the overhead of zeroing out this array. On embedded targets using newlib (this may be working on other plaforms, I only tested ARM) `sbrk()` is used to find the start of the heap and reserve it and the `_eheap` linker symbol is used to determine the end of the usable heap. An array is a bad choice here because the size would be board dependent and hard to determine without build-system magic and because it would be zeroed by default, making the boot sequence way longer. sbrk() does nothing more than move a pointer that marks the fraction of the space between _sheap and _eheap that is reserved. Since we are using the whole heap it might be tempting to just use the symbols to derive the pool location and size and to sidestep sbrk(). Especially since the memory allocation functions are expected to be the only users of such a feature. That "trick" would make the OS impossible to debug in case the was a mistake and some of the original allocation functions slipped through non-overriden. If sbrk is used to reserve the entirety of the space then that rogue function will try to call it and fail as no more heap is available. In fact this is how I found out that I was overriding the wrong functions (put a breakpoint int sbrk and show a traceback.) If sbrk is sidestepped one would have nasty and impossible to debug memory corruption errors. A third option could be to use the heap space directly and not define sbrk. This is beyond the scope of this change, but is probably the route to go for platform that do not define this call (but first do a thoroug investigation of how the libc works in that platform). Messing with the global system allocator is not an easy thing to do. I would say that tslf-malloc is ATM _only_ supported in native and cortex-m. Testing procedure: Run `tests/pkg_tlsf_malloc`. Fixes: RIOT-OS#4490, RIOT-OS#5796. Closes: RIOT-OS#12021
This test is currently failing because of RIOT-OS#4490, RIOT-OS#5796 and RIOT-OS#12021. When using TLSF as the system allocator it should be initialized - Automatically, as that is what the user expects. - Early in the boot process, since the C library mallocs internal buffers. Failing to do so will lead to a crash as the issues and this test shows. The test is blacklisted and will be whitelisted in the next commit with the fix.
The TLSF allocator needs to be initialized before use. This is an issue when it is used as a default system allocator since the user expects to be able to call malloc right away. This is made worse by the fact that the C library uses malloc internally to create buffers and that may happen before the user's code has a chance to run. As a consequence, even doing printf when using USEMODULE=tlsf-malloc will lead to a crash. A mechanism is needed to: 1. Initialize the pool early. 2. Determine which memory should be used as a heap and reserve it. Issue (1) is solved by adding the initializer to the C library's `.preinit_array`, which is a cross-file array of function pointers that run before the library is initialized -that is before _init(). See the newlib source code for more details. Point (2) is important because TLSF dows not support growing the pool, only adding new ones. We would like to initialize it with a pool as big as possible. In native (2) is handled by defining a static array of fixed size (given by TLSF_NATIVE_HEAPSIZE). Memory is plentiful in native and we down't care about the overhead of zeroing out this array. On embedded targets using newlib (this may be working on other plaforms, I only tested ARM) `sbrk()` is used to find the start of the heap and reserve it and the `_eheap` linker symbol is used to determine the end of the usable heap. An array is a bad choice here because the size would be board dependent and hard to determine without build-system magic and because it would be zeroed by default, making the boot sequence way longer. sbrk() does nothing more than move a pointer that marks the fraction of the space between _sheap and _eheap that is reserved. Since we are using the whole heap it might be tempting to just use the symbols to derive the pool location and size and to sidestep sbrk(). Especially since the memory allocation functions are expected to be the only users of such a feature. That "trick" would make the OS impossible to debug in case the was a mistake and some of the original allocation functions slipped through non-overriden. If sbrk is used to reserve the entirety of the space then that rogue function will try to call it and fail as no more heap is available. In fact this is how I found out that I was overriding the wrong functions (put a breakpoint int sbrk and show a traceback.) If sbrk is sidestepped one would have nasty and impossible to debug memory corruption errors. A third option could be to use the heap space directly and not define sbrk. This is beyond the scope of this change, but is probably the route to go for platform that do not define this call (but first do a thoroug investigation of how the libc works in that platform). Messing with the global system allocator is not an easy thing to do. I would say that tslf-malloc is ATM _only_ supported in native and cortex-m. Testing procedure: Run `tests/pkg_tlsf_malloc`. Fixes: RIOT-OS#4490, RIOT-OS#5796. Closes: RIOT-OS#12021
I'm not bold enough to make that claim 😁 . |
This test is currently failing because of RIOT-OS#4490, RIOT-OS#5796 and RIOT-OS#12021. When using TLSF as the system allocator it should be initialized - Automatically, as that is what the user expects. - Early in the boot process, since the C library mallocs internal buffers. Failing to do so will lead to a crash as the issues and this test shows. The test is blacklisted and will be whitelisted in the next commit with the fix.
The TLSF allocator needs to be initialized before use. This is an issue when it is used as a default system allocator since the user expects to be able to call malloc right away. This is made worse by the fact that the C library uses malloc internally to create buffers and that may happen before the user's code has a chance to run. As a consequence, even doing printf when using USEMODULE=tlsf-malloc will lead to a crash. A mechanism is needed to: 1. Initialize the pool early. 2. Determine which memory should be used as a heap and reserve it. Issue (1) is solved by adding the initializer to the C library's `.preinit_array`, which is a cross-file array of function pointers that run before the library is initialized -that is before _init(). See the newlib source code for more details. Point (2) is important because TLSF dows not support growing the pool, only adding new ones. We would like to initialize it with a pool as big as possible. In native (2) is handled by defining a static array of fixed size (given by TLSF_NATIVE_HEAPSIZE). Memory is plentiful in native and we down't care about the overhead of zeroing out this array. On embedded targets using newlib (this may be working on other plaforms, I only tested ARM) `sbrk()` is used to find the start of the heap and reserve it and the `_eheap` linker symbol is used to determine the end of the usable heap. An array is a bad choice here because the size would be board dependent and hard to determine without build-system magic and because it would be zeroed by default, making the boot sequence way longer. sbrk() does nothing more than move a pointer that marks the fraction of the space between _sheap and _eheap that is reserved. Since we are using the whole heap it might be tempting to just use the symbols to derive the pool location and size and to sidestep sbrk(). Especially since the memory allocation functions are expected to be the only users of such a feature. That "trick" would make the OS impossible to debug in case the was a mistake and some of the original allocation functions slipped through non-overriden. If sbrk is used to reserve the entirety of the space then that rogue function will try to call it and fail as no more heap is available. In fact this is how I found out that I was overriding the wrong functions (put a breakpoint int sbrk and show a traceback.) If sbrk is sidestepped one would have nasty and impossible to debug memory corruption errors. A third option could be to use the heap space directly and not define sbrk. This is beyond the scope of this change, but is probably the route to go for platform that do not define this call (but first do a thoroug investigation of how the libc works in that platform). Messing with the global system allocator is not an easy thing to do. I would say that tslf-malloc is ATM _only_ supported in native and cortex-m. Testing procedure: Run `tests/pkg_tlsf_malloc`. Fixes: RIOT-OS#4490, RIOT-OS#5796. Closes: RIOT-OS#12021
Contribution description
Currently, when
tlsf-malloc
is used, the initialization of the heap memory is left to the application. This poses a problem when modules/packages usemalloc
and if they are initialized from auto-init -> this leads totlsf-malloc
crashing, as not memory is allocated to it at that point.This PR solves this issue by moving the buffer allocation for
tlsf-malloc
into auto-init, before any other modules are initialized. The buffer size is determined byTLSF_MALLOC_HEAPSIZE
, which is per default set to 4k, but can simply be redefined from the build system...Testing procedure
As of now,
examples/ccn-lite-relay
is the only application in RIOT usingtlsf-malloc
. For verification, simply compile this PR and compare the used memory with master, it should be identical. Next you can alter the value forTLSF_MALLOC_HEAPSIZE
, and the RAM usage of the build should change by the difference of the new value and 10240.Reproducing the crash behavior pointed out above is slightly more difficult, as the code is not yet PRed. I found it when using NimBLE and CCN-lite at the same time, as NimBLE has some rare
malloc´ calls in its initialization code, and those fail once
tlsf-malloc` is used...Issues/PRs references
See issues #4490, #5796.