Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Design a more general solution to memory managment #378

Open
antocuni opened this issue Nov 11, 2021 · 0 comments
Open

Design a more general solution to memory managment #378

antocuni opened this issue Nov 11, 2021 · 0 comments
Labels
enhancement New feature or request heavydb Related to heavydb server

Comments

@antocuni
Copy link
Contributor

The current memory management design of rbc is prone to memory leaks and we should design a better way to deal with allocation of custom objects.
Unless I am mistaken, the current rbc type support is as follows:

  • primitive types such as ints and floats are supported. These don't require any memory allocation
  • more complex python types such as list, dict and str are not supported
  • the only supported type which requires memory allocation are omnisci_backend.Buffer and its subclasses such as Array.

In the following I will refer only to Array, but obviously everything can be applied also to other current and future types which require memory allocation.

Consider the following code:

from rbc.omnisci_backend import Array
@omnisci('float64[](void)')
def func():
    a = Array(10, types.float64)
    b = Array(10, types.float64)
    c = Array(10, types.float64)
    return c

The current strategy works as follows:

  1. the compiler searches for all the calls to Array and records the LLVM Value where the result of the allocation is stored. Conceptually, it's equivalent as remembering that the variables a, b, and c must be eventually freed
  2. it realizes that c is returned to the caller and must survive
  3. it inserts calls to free(a) and free(b) before returning

(Note: this is the theory. At the moment of writing, it is buggy and doesn't work, see #377)

This strategy works only for very simple cases but it is fundamentally problematic and cannot be generalized to more complex cases. Solving this it's basically equivalent to the halting problem so we need a better way.

Normally, virtual machines solve the problem by using a runtime garbage collector and/or reference counting, but I don't think this is reasonable for our use case. Let me write down some ideas/proposals for how to solve the issue.

Proposal 0: improve the current algorithm

I call it proposal 0 because I don't think it's really an option, but I add it for completeness. We could try to improve our current algorithm to catch the most common uses cases. Maybe it is even possible to detect unsupported use cases and emit compile error in that cases.

Pros:

  • mostly transparent for the end user (when it works)

Cons:

  • hard to implement
  • compile time errors which might be hard to tackle down and/or confusing for the user
  • if we fail to write our logic correctly, there is a high risk to cause memory leaks and/or crashes

Proposal 1: explicit memory management

We pretend that the language used to write user functions is Python, but actually it's a statically typed subset of it which is not far from C. So, one option is to stop pretending and let the users to explicit free() their memory. Something like this:

from rbc.omnisci_backend import Array
@omnisci('float64[](void)')
def func():
    a = Array(10, types.float64)
    b = Array(10, types.float64)
    c = Array(10, types.float64)
    free(a)   # or maybe even del a
    free(b)
    return c

If we want to be advanced we can even provide some syntax niceties (although I don't know how easy it is to add this functionality to numba):

from rbc.omnisci_backend import Array
@omnisci('float64[](void)')
def func():
    with Array(10, types.float64) as a:
        with Array(10, types.float64) as b:
            c = Array(10, types.float64)
    # a and b are automatically freed
    return c

Pros:

  • the semantics is very easy to understand and it's obvious to C/C++ programmers
  • easy to implement

Cons:

  • error prone: this can lead to leaks and/or crashes. Possibly, this might cause also a crash in the omniscidb server itself
  • the semantics is confusing for a casual reader: it looks like you are writing python, but as soon as you introduce the @omnisci decorator you suddenly need to manage your memory, which is unexpected

Proposal 2: explicit ownership rules (a.k.a. "The Rust Way")

We could augment our typesytem with ownership rules. Rust has similar concepts.
Basically, the idea is that each Array can have one and only one owner which is responsible to deallocate it. When you pass the value around, you pass a borrowed reference, unless you explicitly give away the ownership. When you return the value, you are also returning the ownership. The typesytem can also statically check that you cannot free the object as long as there are borrowed references around.

This is basically proposal 0 "done right", because it no longer depends on some heuristic but on typesystem rules which can be proven correct. To fully work, we need some syntax to transfer the ownership and possibly a way to free the object early (which basically means to transfer the ownership to nobody).
Also, some innocent-looking code will not work, e.g.:

def func(condition):
    a = Array(10, types.float)
    if condition:
        b = a
    else:
        b = Array(20, types.float)
    return b

The code above would be reject by the typesytem because the type of b has two different ownership rules in the two branches of the if.

Pros:

  • user defined functions can be statically proven correct (at least from the point of view of memory management)

Cons:

  • hard to implement
  • it is harder to teach the concept to the end user
  • the user might have to fight with the compiler in order to convince the typesystem that his/her code is correct

Proposal 3.1: free-it-all strategy (a.k.a. "poor man's GC")

The idea behind this proposal is that we are not interested in a general memory allocation strategy. What we really want is to be able to call UDF/UDTF without leaking memory, and we can do that with very well defined lifetime rules:

  1. User functions can allocate as many Arrays as they like
  2. The only way for anArray to escape to the "external world" is be to returned to the caller
  3. All other allocated Arrays are automatically destroyed when the function exits

This means for example that user functions are not allowed to persist Arrays between calls, for example by putting them in some global state. But I don't think that this is possible at all at the moment.

One possible way to implement this is to associate all allocations to an explicit "memory manager" which you get from the outside:

def func(mgr):
    # mgr is the memory manager
    a = Array(mgr, 10, types.float64)
    b = omnisci_np.zeros(mgr, 10, types.float64)
    ...

In the example above, all the memory associated to mgr will be freed when the caller decides so, and it is impossible to allocate memory if you don't have one (because allocation functions expect it). If you need to call a function which internally allocates, you must pass it around explicitly, such as in the call to omnisci_np.zeros() above.

The idea is that the manager is created by omniscidb at the point in which it calls UDF/UDTF. Something like this (in some sort of pseudocode):

void call_udf(udf_t function) {
    MemoryManager mgr = CreateMemoryManager();
    auto result = function(mgr, other_arguments, ...);
    mgr.keep_alive(result); // to make sure it is not freed
    mgr.free_everythig();
    // do something with result
}

@guilhermeleobas suggests that we can reuse the TableManager for that, since there is already a way to pass it to the UDTF.

Pros:

  • mostly transparent for the end users, apart that they have to pass this mgr around everywhere
  • it is trivial to prove that it works
  • it makes it possible to implement very fast allocation strategies. For example, a MemoryManager could decide to allocate a fixed chunk of RAM and keep a pointer to indicate how much it has been used; malloc() will be implemented by just incrementing the pointer (and allocate a new chunk of RAM if we have reached the end of the current one), which is much faster than the stdlib malloc(). (NOTE: this is more or less how the PyPy GC allocates memory and it's one of the reasons it's so fast).
  • easy to implement

Cons:

  • you have to pass around mgr everywhere
  • you cannot have the standard signature for some of the Array API functions, because you need to pass mgr to those which can possibly allocate memory (such as zeros in the example above)

Proposal 3.2: implicit free-it-all strategy

This is the same as 3.1, but we hack at the code generated by numba to implicitly pass mgr around. So for example, you call zeros(10, types.float64) and the call automatically becomes zeros(mgr, 10, types.float64).

Pros:

  • completely transparent for the end users
  • you can use the standard Array API signatures

Cons:

  • harder to implement (I don't even know if numba provides the needed hooks)
  • explicit is better than implicit. If the user see the mgr, it is trivial to explain that the memory belongs to it. If it's a hidden implementation detail, suddenly the lifetime rules become magical and/or obscure

Proposal 3.3: global mgr

Same as Proposal 3.2, but instead of passing it around, we rely on a global (or thread-local) mgr to be reachable. I think this quickly leads to problems if we have concurrent and/or parallel and/or recursive calls, so it should just be avoided.

Conclusions

Personally, I think that the best is Proposal 3.1, "free-it-all strategy". The cons don't look too bad to me, and the pros are very important, especially the fact that it's easy to implement, which means less bugs.

@antocuni antocuni added enhancement New feature or request heavydb Related to heavydb server labels Nov 11, 2021
antocuni added a commit that referenced this issue Dec 20, 2021
#380)

- add rbc.stdlib.array_api, which rbc users can import to use the array-like API

- the array api is no longer specific to omnisci, now it works also in normal @rjit functions. This makes it much easier to develop and test

- introduce rbclib. This is the embryo of a "runtime library" to be called by the code generated by rbc. In particular, it provides rbclib_allocate_varlen_buffer, which can be used to allocate arrays for non-omnisci targets

- store the names of the allocation/deallocation array functions in the TargetInfo. This makes it possible to call the appropriate allocate_varlen_buffer function depending on whether we are targeting omnisci or not

- introduce the "debug allocator" which keeps track of allocations/deallocations and detect memory leaks (see e.g issue #378 )
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request heavydb Related to heavydb server
Projects
None yet
Development

No branches or pull requests

1 participant