Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DISCUSSION] libcudf column abstraction redesign #1443

Closed
jrhemstad opened this issue Apr 17, 2019 · 55 comments · Fixed by #2207
Closed

[DISCUSSION] libcudf column abstraction redesign #1443

jrhemstad opened this issue Apr 17, 2019 · 55 comments · Fixed by #2207
Assignees
Labels
feature request New feature or request libcudf Affects libcudf (C++/CUDA) code. proposal Change current process or code

Comments

@jrhemstad
Copy link
Contributor

jrhemstad commented Apr 17, 2019

Creating and interfacing with the gdf_column C struct is rife with problems. We need better abstraction(s) to make libcudf a safer and more pleasant library to use and develop.

In lieu of adding support for any new column types, the initial goal of a cudf::column design is to ease the current pain points in creating and interfacing with the column data structure in libcudf.

Goals:

  • Identify pain points with existing gdf_column structure
  • Derive requirements for an abstraction or set of abstractions to ease those pain points
  • Define an API design that satisfies the requirements
  • Provide a working implementation of the design

Non-Goals

  • Derive requirements to support new column types, e.g., variable width elements, compressed columns, etc.
  • Support delayed materialization or lazy evaluation

Note that a “Non-Goal” is not something that we want to expressly forbid in our redesign, but rather are not the focus of the current effort. Whenever possible, we can make design decisions that will enable these “Non-Goals” sometime in the future, so long as those decisions do not compromise the timely accomplishment of the above “Goals”

Process

1. Gather pain points

  • Those who wish to participate should list 3-5 pain points (in priority order) that they would like to solve with the column redesign.
    • Note that choosing to participate implies a commitment to putting in the effort to derive requirements and provide feedback on designs, i.e., if you want something to change, you’re expected to put in the work to make it happen.
  • Pain points should be submitted by responding to this issue.
  • @jrhemstad will take responsibility for gathering pain points and distilling/organizing based on functional area.
  • Proposed Deadline: 0.7 release

2. Derive requirements

  • Distill pain points into satisfiable requirements
  • @jrhemstad will take responsibility for providing an initial draft of requirements from pain points and distributing for feedback.
  • Stakeholders will provide feedback on requirements and iterate until consensus is reached on initial requirements
  • Proposed Deadline: 0.8 Release

3. Design Mock Up

  • Create draft interface of class(es) that attempt to satisfy requirements.
  • APIs should be fully Doxymented.
  • Code does not need to function nor compile
  • Design should be submitted via a PR to cuDF
  • TBD will take responsibility for providing an initial interface design
  • Stakeholders will provide feedback and iterate until consensus is reached on design
  • Proposed Deadline: 0.8 Release

4. Implementation

  • Implement the agreed upon interface
  • Should provide Google Test unit tests
  • Implementation/testing will likely expose necessary design changes
  • Implementation should be submitted as a PR to cuDF
  • TBD will take responsibility for implementing/testing the design
  • Stakeholders will review implementation PR until consensus is reached
  • Proposed Deadline 0.8 Release

5. Initial Refactor

  • Two candidate libcudf features shall be chosen for refactoring to use the new cudf::column abstraction
  • Two developers (TBD) will take responsibility for refactoring the features (one each) to use the newly designed abstraction(s) and submitting a cuDF PR for review. At least one of the developers shall be different from the developer who designed and implemented the column abstraction.
  • Any required design changes exposed in refactoring shall be discussed in the PR
  • Stakeholders will review refactored feature until consensus is reached
  • TBD will be responsible for creating/amending a style guide with lessons learned and best practices for refactoring a feature using gdf_column to the new abstraction(s)
  • Proposed Deadline: 0.9 Release

6. Full Refactor

  • Remaining libcudf features will be refactored one at a time to use the new column abstraction(s)
  • The style guide mentioned above will be distributed to all libcudf developers to provide guidance in this refactoring effort
  • This will be an ongoing process that likely will not be fully complete for several releases
@jrhemstad jrhemstad added feature request New feature or request help wanted proposal Change current process or code libcudf Affects libcudf (C++/CUDA) code. labels Apr 17, 2019
@jrhemstad
Copy link
Contributor Author

jrhemstad commented Apr 17, 2019

My pain points:

  1. Users are burdened with explicitly allocating/freeing device memory for columns. Ownership of the memory is ill-defined and inconsistent.
  2. Basic interfacing with a column requires touching raw pointers. I.e., required to do things like static_cast<T>(col->data), if(nullptr == col->valid), gdf_is_valid(col->valid, i), etc.
  3. Allocating and interacting with the null bitmask requires explicit action by users such as calling gdf_valid_allocation_size and gdf_num_bitmask_elements.

@kkraus14
Copy link
Collaborator

Pain Points

  1. Numba is currently used all over for memory allocation / deallocation and would MUCH rather that owned by libcudf but maintain interoperability with Numba Device Arrays.
  2. Python column classes are completely disjointed from libcudf. Should ideally reference a cudf::column object to be used via Cython for better cudf<->libcudf interop.
  3. Type specific implementations currently exist in Python for types like Strings and Categoricals, but ideally should be in cudf::column for better maintainability and cross-language functionality.

@eyalroz
Copy link
Collaborator

eyalroz commented Apr 17, 2019

Pain points

  1. Baked-in meta-data/statistics: A null count might be a useful thing, but - so are the min and max values, or the number of distinct values. These should be in a separate data structure, or some mix-in that a subclass of the column class could adopt.
  2. No single-bit-width columns. Factor-8 waste of space? That's outrageous; egregious; preposterous.
  3. Huge amount of idiosyncratic code for the validity indicators. We could just use the fact that it's another (non-nullable) column and invoke existing code (which should support 1-bit-wide columns).
  4. No compositionality. For example, a column of strings (or variable-length data) could be representedby a column of lengths and a column of offsets, offsets into a column of single characters. (not the only possible representation); a nullable column - discussed in the previous item (and again, that's not the only reasonable representation) <- Addressing this fully would be a gargantuan task. But ignoring it completely is not a good idea either.
  5. A supposed "column" today can be an invalid state, so you have to check columns whenever you get them in.
  6. Not using modern C++ vocabulary types like optional and variant to express aspects of columns' semantics. It's the idiomatic thing to do.
  7. Difficult to switch from working with type-erased to type-imbued columns and vice versa.
  8. Mutability - it isn't well defined, nor enforced, who and when, if at all, should be able to change a column (the host-side members and the on-device data) once it's been constructed.
  9. Column names. This may be a nice feature for debugging, but in reality - columns would be identified by some sort of taxonomy, e.g. schema+table+intra-table name - not by a single name. And it's not very useful to just keep the third name. We certainly don't want people to start encoding complex structures into the name. And in our code - we don't know what names to give output columns anyway. So let's drop the names

@harrism
Copy link
Member

harrism commented Apr 17, 2019

Pain Points

Mostly reiterating what I think are the most important things to fix first. Basically, the problem is that gdf_column lacks encapsulation.

  1. Explicit, manual, and low-level allocation and deallocation of column memory. Numba iswas used for this on the Python side, which required RMM to provide a monkey-patched numba workalike API for Python-side numba.cuda.device_array allocation.
  2. Returning new columns from cudf algorithms is risky because most routines still take pre-allocated column memory as an argument. Users may think that because it's returned they don't have to free the memory, which would lead to leaks.
  3. Managing valid masks is error prone and explicit.
  4. Because it's not a proper class, the API for columns is via free functions, which mostly have silly names and inconsistent interfaces.

@revans2
Copy link
Contributor

revans2 commented Apr 17, 2019

  1. It is not clear who owns a block of memory and who does not.
    • It is nice to have the ability to ignore the functions provided and take ownership of the memory to do things like coalescing allocations and host to the device transfers.
  2. Documentation. I am okay with having the guts of the column on display. The issue is that I have to reach into the guts to move data back and forth between the host and device, and it is not explained anywhere fully what those guts really are. What the contract is I have to follow. This is really mostly around strings and string categories where you have to special case them. But it becomes more critical as we add in things outside of the scope of this redesign.
  3. Sorry I really only have 2 for now.

@harrism
Copy link
Member

harrism commented Apr 17, 2019

I have an idea regarding step 5. Revise to:

5. Initial Refactor

  1. Two candidate libcudf features shall be chosen for refactoring to use the new cudf::column abstraction
  2. Two developers (TBD) will take responsibility for refactoring the features (one each) to use the newly designed abstraction(s) and submitting a cuDF PR for review. At least one of the developers shall be different from the developer who designed and implemented the column abstraction.
  3. Any required design changes exposed in refactoring shall be discussed in the PR...

I think this will help flush out more issues in the abstraction. We could keep it at one candidate feature, done by a different engineer than the column implementer, but two would be nice.

@harrism
Copy link
Member

harrism commented Apr 17, 2019

I also would like to tighten up the schedule a bit. I would like to get the basic abstraction designed and implemented by 0.8 release. If we can't do that, then I think the scope is too broad and should be narrowed. Since most of our pain points agree, I don't think the scope is too large at this point.

@jrhemstad
Copy link
Contributor Author

Updated with @harrism's feedback about schedule and the initial refactor process.

@jlowe
Copy link
Member

jlowe commented Apr 17, 2019

Pain Points

  1. Device memory management is explicit and error-prone, and I would like to see automated output allocation. However there are some use-cases with pre-allocated output targets, and it would be nice to not preclude those.
  2. Related to the previous, the management of CPU-side objects associated with gdf_column can be messy. Who is responsible for freeing the column name? Should it be possible to create a string category column without a corresponding NVCategory and if/when that should be freed when the column is freed?
  3. Validity masks are exposed more than I think they need to be.
  4. The lack of a C++ wrapper that bundles the relevant methods and simplifies proper handling of resource allocations within code that can throw.

@eyalroz
Copy link
Collaborator

eyalroz commented Apr 18, 2019

@jlowe and everyone else: About column names - do we actually use column names at all in libcudf? I'm not asking about Python code, just the C/C++.

@kovaltan
Copy link
Contributor

My pain points:

  1. bitmap iterator and/or class (validity bitmask handling nulls)
    The rack of bitmap iterator makes the implementation messy when the API handles the nulls.
    It would be easy to handle nulls if we use const bitmap iterator with transform iterator.
    Also validity bitmask class will be useful to manage nulls including allocating/deallocating helper

  2. input colum check at each function
    gdf functions usually have sanity check for input/output columns, it's better if cudf::column has sanity check API:
    GDF_REQUIRE(nullptr != col_in, GDF_DATASET_EMPTY);
    GDF_REQUIRE(nullptr == col_in->valid || 0 == col_in->null_count, GDF_VALIDITY_UNSUPPORTED);
    or
    CUDF_EXPECTS(col != nullptr, "Input column is null");

  3. Returning new columns from cudf algorithms is risky
    Same as @harrism
    The allocation of the columns should be out side of the algorithms.

  4. cudf::test::column_wrapper<cudf::wrapper types> does not accept integer initializer like {0,1,2,3}
    It is out of topic, it's better we have.

@eyalroz
Copy link
Collaborator

eyalroz commented Apr 18, 2019

@kovaltan :

  • You suggest that the allocation of columns should occur outside of "the algorithms" (by which I take it you mean the implementation of cudf API functions). But is that a pain point about the column abstraction we use? And - do you think this should be resolved through the design of a column class?
  • About the input checks: Note that, for now, we have PR [REVIEW] Adding some trivial gdf_column-related utility functions #1390, and specifically validate(my_gdf_column) which performs all of these checks.

@kovaltan
Copy link
Contributor

@eyalroz ,

  • You suggest that the allocation of columns should occur outside of "the algorithms" (by which I take it you mean the implementation of cudf API functions). But is that a pain point about the column abstraction we use? And - do you think this should be resolved through the design of a column class?

I mean "the algorithms" like cudf::scan(), does not mean cudf::column class.
For now, we passes pre-allocate output column to cudf::scan().
I think it should be pre-allocated, should not be allocated in cudf::scan().
cudf::column class can allocate device memory, and may have device_memory_owner flag.
cudf::column should free the device memory if device_memory_owner = true.

It's really nice!

@jlowe
Copy link
Member

jlowe commented Apr 18, 2019

@eyalroz

About column names - do we actually use column names at all in libcudf?

The column names are filled in by the csv and parquet parsers. There needs to be some way to convey the names discovered during parsing back to the caller. However I could see some debate whether it really needs to be stored directly in the column structure.

@revans2
Copy link
Contributor

revans2 commented Apr 18, 2019

@eyalroz and @kovaltan
I do agree that a column should have some knowledge about ownership of the memory it references.
I also realize that

Support delayed materialization or lazy evaluation

is not a part of the design goals here. But I think it will become an important feature likely combined with JITing so we should at least think ahead to how this might work and not paint ourselves into a corner. One of the key pieces of doing the lazy evaluation is that it can eliminate intermediate results and as such will not need to allocate the corresponding memory.

So if we want to be able to eliminate memory allocation we cannot pre-allocate it for every operation. I would propose, instead, that the operation itself do the device memory allocation. Not allocate the column class, but allocate the device memory the column ultimately points to. We could structure the column class so that it knows how to "allocate" and "free" memory which can be overridden by the user creating it. That should give us the flexibility to do just about anything we want with memory management of the columns, but also lets the algorithms themselves decide when to allocate memory, and even how much memory to allocate.

@jrhemstad
Copy link
Contributor Author

jrhemstad commented Apr 18, 2019

Returning new columns from cudf algorithms is risky
Same as @harrism
The allocation of the columns should be out side of the algorithms.

@kovaltan I think you may have misunderstood Mark's comment here:

Returning new columns from cudf algorithms is risky because most routines still take pre-allocated column memory as an argument. Users may think that because it's returned they don't have to free the memory, which would lead to leaks.

I believe he was referring to how presently returning a gdf_column from a function is risky because we don't have any gdf_column destructor, so the burden falls on the caller to free the column's underlying memory.

For example:

// This is leaky because it requires the caller to remember to free the column's device memory
gdf_column output = gdf_algorithm(input...);

However, we have identified that is desirable to be able to return outputs from functions rather than use output parameters (for various reasons: clearer style, delayed JIT evaluation, etc.). Therefore, we want to make that safer via ownership semantics.

As such, we likely will be moving in the direction that algorithms allocate the device memory for outputs.

To be clear:

// This is safe because lifetime management of the device memory is automated
cudf::column output = cudf::algorithm( input... );

@eyalroz
Copy link
Collaborator

eyalroz commented Apr 18, 2019

@revans2 :

I ... realize that .... delayed materialization or lazy evaluation ... is not a part of the design goals here. But I think it will become an important feature ... so we should at least think ahead to how this might work and not paint ourselves into a corner.

You are preaching to the choir - I definitely agree (in fact I would love to work on faster JITing using PTX, regardless of cudf). This part of why I've proposed removing some of the features of columns for the basic/fundamental abstraction - flexibility for later use.

One of the key pieces of doing the lazy evaluation is that it can eliminate intermediate results and as such will not need to allocate the corresponding memory. So if we want to be able to eliminate memory allocation we cannot pre-allocate it for every operation.

But why are we certain that the allocating code will be part of what gets JITed together? Clearly, to construct operators by JITting we would use composable building-blocks which are not the functions in cudf/functions.h. So I would not be worried about this point.

I would propose, instead, that the operation itself do the device memory allocation. Not allocate the column class, but allocate the device memory the column ultimately points to. We could structure the column class so that it knows how to "allocate" and "free" memory which can be overridden by the user creating it.

I'm thinking about a design which is reminiscent of std::unique_ptr - where a choice of a different template parameter allows for a different de/allocation mechanism. If I followed you correctly, I think that should allow for the behavior you described.

@eyalroz
Copy link
Collaborator

eyalroz commented Apr 18, 2019

@jlowe :

The column names are filled in by the csv and parquet parsers. There needs to be some way to convey the names discovered during parsing back to the caller.

Certainly, but the parsers are not part of libcudf; so why should this extra information be passed to libcudf if it doesn't get used there at all?

@harrism
Copy link
Member

harrism commented Apr 19, 2019

Certainly, but the parsers are not part of libcudf; so why should this extra information be passed to libcudf if it doesn't get used there at all?

Since when? The CSV, parquet, and soon ORC and Jason-lines parsers are all part of libcudf.

@eyalroz
Copy link
Collaborator

eyalroz commented Apr 19, 2019

@harrism : Yes, right, I misspoke. That is part of the libcudf codebase. But the library itself, or rather the API functions which process columns, don't take CSVs and parse them, nor do they make calls to CSV-parsing code. And the names aren't used (it seems).

So, what if the names were to simply be parsed into a separate array in the csv_read_args structure?

@kovaltan
Copy link
Contributor

@jrhemstad

Ah, I've misunderstood Mark's comment.
I have only one concern about allocating device memory inside of cudf::algorithm().
If allocating is in cudf::algorithm(), it's hard for users to manage device memory allocation by themselves,
though I'm not sure if the users want, and there would be some way to pass user defined allocator to cudf::algorithm().

'''
// This is safe because lifetime management of the device memory is automated
cudf::column* output = cudf::algorithm( input... );
'''

Pros:

  • easy to use, users can use output column not to allocate it by themselves
  • it is safety deleting cudf::column can free underlying device memory
  • input/output dtype/size/nulls validation is not needed

Cons:

  • it's hard for users to manage device memory allocation by themselves

@jrhemstad
Copy link
Contributor Author

I have only one concern about allocating device memory inside of cudf::algorithm().
If allocating is in cudf::algorithm(), it's hard for users to manage device memory allocation by themselves, though I'm not sure if the users want, and there would be some way to pass user defined allocator to cudf::algorithm().

You've alluded to the solution. It's orthogonal to the column design, but ultimately we would like it such that you can pass a custom allocator into any cudf algorithm, similar to how you can with Thrust.

@OlivierNV
Copy link
Contributor

Regarding column names: I think it should eventually not be part of the column itself but part of a parent array of column that would include the schema, containing column names and their relationship (will also allow preserving schema information when reading/writing columnar formats).

@nsakharnykh
Copy link
Collaborator

Pain points

  1. Touching raw pointers and underlying data structures of the column
    a. Difficult to track down out-of-bounds and null access errors
    b. Difficult to evaluate other internal column formats (e.g. a list of chunks instead of one flat array)
    c. Difficult to update the validity mask from a CUDA kernel
  2. Memory management
    a. Who owns the memory allocated for the column, when it should be freed
    b. What happens when someone is trying to copy data (shallow vs deep copy semantics)
    c. What are the mechanisms to realloc the column (resize/expand)

@eyalroz
Copy link
Collaborator

eyalroz commented Apr 25, 2019

it should provide common functionality to those system. Cudf already does some of this with partition, which has really no value outside of a system like these.

The thing is, libcudf doesn't provide all possible common functionality, and AFAIK it's not supposed to. Granted, though, the exact scope of libcudf doesn't seem to be put in writing well enough (certainly not in the repository's README.md...)

Also, again, can you be specific about concrete optimization opportunities which will be missed due to columns being immutable after construction?

@thomcom
Copy link
Contributor

thomcom commented May 2, 2019

Pain points

  1. gdf_columns are specifically that - columns. This makes indexing by rows, a common operation, expensive and meticulous at the python level of cudf. Transpose is now supported, but is an extremely expensive operation. Dozens of Pandas API calls support an axis argument, trivially alternating between the index or columns of a Dataframe or Series. It would be nice if the new take on gdf_column wasn't, literally, columns, but supported the ability to "swap axis" in an inexpensive single operation.
  2. Names - Pandas MultiIndex allows for multiple columns with the same name - something that can't be achieved easily in our current setup. We need an abstraction that is more complex than 1:1 between columns-as-allocated and the set of column-identifiers. A column identifier that is simultaneously a name and an index might cover all necessary use cases.
  3. Width x Height concerns - This is maybe a long shot, but CUDF is severely limited by dataframe orientation. Wide matrices quickly lose their edge over CPU based computing. Relating to Collaborations on columnar data structures  #1 - columns should have arbitrary axes, and the most efficient implementation should be chosen either by the developer or by the algorithm.
  4. There's a lot of complaints about memory management, for good reason. I work 95+% of the time in python so 😎 but memory pools, reference counting and garbage collection, anyone? Objective-C makes a great example of how to do this well. Our gains are found during the GPU accelerated convolution of many frame values - I don't think that performance loss from GC would be noticeable.

@eyalroz
Copy link
Collaborator

eyalroz commented May 2, 2019

@thomcom : Following your listing of points:

  1. Can you explain do you mean by "indexing by rows"? Do you mean accessing tuples - individual rows of a table - rather than dealing with disparate columns?
    Also, as for transposing - would mean replacing, say, a single column with 1 million elements with 1 million columns with a single element each. Certainly that's not supported... at least not in libcudf :-(
  2. I think you're talking about the Python side here, because in libcudf we don't access columns by name. I also think you can even read from a CSV with multiple columns having the same name (not sure though).
  3. This relates to what you said about transposition. Essentially, libcudf handles columns, not two- or multi-dimensional data; and this is central to the design, beyond the column abstraction even. A library for multi-dimensional data is different kind of beast than a library for manipulating columns; so you're suggesting a change in the library's entire scope. I am not for or against such a change, but here we're just considering a column abstraction for libcudf with its current scope. However, you might consider the encoding of a matrix in a three-column table, with columns for "matrix_row", "matrix_column" and "value:.
  4. Modern C++ avoids the need for garbage collection, using smart pointers and other RAII classes, which are freed exactly when they stop being used. This is (mostly) encapsulated in the Resource Management section of the Core C++ Programming Guidelines. You could argue that this is insufficient/inappropriate for our case and a garbage collector is required - but you would need to make a convincing argument, since a garbage collector is a large amount of very serious code, with all sorts of overhead (for developers and at runtime).

@harrism
Copy link
Member

harrism commented May 3, 2019

Actually @eyalroz we DO have transpose in libcudf but it's slow for large input columns, for obvious reasons. In general, @thomcom I think we would love to be able to provide efficient multi-axis indexing, but if you think about it, this is a very hard problem. It's easy in, say tensor libraries because everything in the tensor has the same datatype. But in a typed language, how do you efficiently transpose a table of columns with different types so you can access the row memory efficiently? You end up with columns that have to support multiple data types. Not pretty however you take it on...

@eyalroz
Copy link
Collaborator

eyalroz commented May 3, 2019

Actually @eyalroz we DO have transpose in libcudf but it's slow for large input columns, for obvious reasons.

Hmm, cognitive dissonance on my part there. Couldn't bring myself to remember something which didn't make sense... #1603 .

@thomcom
Copy link
Contributor

thomcom commented May 3, 2019

Thanks for your response @eyalroz.

  1. From the perspective of your average ML developer, accessing a tuple - a row of a table, doesn't have a real distinction from accessing a column. A.T.T = A after all. From a math perspective the orientation only matters to determine order of operations. At the user level it would be nice to have the ability to treat rows and columns interchangeably, including, at least, fast transpose. I partially understand your perspective coming from a database design standpoint.

  2. Ok - at the python level column naming has been causing me some real headaches - sounds like it is my problem (literally).

  3. This is specifically why I wanted to bring it up and at least get it thought about here. Following up on @harrism as well - it is possible to encode rows and columns together in contiguous data and index into it in O(1). I agree it could get complicated, and obviously single-type columns are much more efficient than columns with mixed type. I want you guys thinking about it and knowing that Pandas people are asking for it, because transpose and row/column major conversions are used very frequently and they're slow like a barge in python, unless the library is written to just swap labels in python-native data (like Pandas). Then it's just slow like python.

I'm sorry if I'm way out in left field here. I have a fantasy that with careful design we could, for example, transpose 10GB of columns in a few operations, simply by shuffling labels around. This would be powerful.

  1. The most common pain point has been memory management. I'm not terribly up-to-date in C++11 or 14, but I don't get the impression that "Do RAII better" is the answer for our many developers. I suggested reference counting and memory pools because it sounded like the best of both worlds in this context. Developers can simply pull new objects cudf::column cool_column = coolalg(args...) and not worry about managing them, or they can allocate a memory pool and do whatever manual allocations they want, trusting that when the memory pool has been destructed then they have cleaned up after themselves.

@thomcom
Copy link
Contributor

thomcom commented May 3, 2019

In regards to all my "lets support matrix styled data" - if this is gdf_column and it is going to stay column only, I understand. :)

@jrhemstad
Copy link
Contributor Author

jrhemstad commented May 22, 2019

@revans2 @jlowe @felipeblazing

@harrism and I are discussing whether or not an allocator should be exposed to libcudf users to control how columns are allocated. I.e., should the libcudf functions accept an allocator parameter to control how device memory is allocated for output cudf::columns that are returned from libcudf functions.

If RMM w/ pool mode is "fast enough" such that memory allocations are no longer a bottleneck, then is there another reason you'd need control over how the memory is allocated? Would it be a showstopper to not provide control over how the device memory is allocated? (Just making sure we're covering our bases and understand all the use cases).

@eyalroz
Copy link
Collaborator

eyalroz commented May 22, 2019

@jrhemstad : What about @felipeblazing 's allocator use scenarios mentioned in this comment?

@jlowe
Copy link
Member

jlowe commented May 22, 2019

The two cases I can think of where allocators could be useful:

  1. Managing one's own temp space as @felipeblazing called out here. I've heard RMM needs to synchronize on a stream for some allocation/free scenarios even in pool mode. For that use case, would RMM's pool mode and a sequence of allocations and frees of the same size be essentially just as fast as the manually managed temp space that was allocated once?

  2. Building a large column by concatenating a series of outputs as they are generated. For example, there are 5 columns that each need some operation performed and the outputs concatenated into one big column at the end. Instead of paying for 2x the output space and the full copy via gdf_column_concat, the code could allocate the large output column and execute the 5 operations into the output column's space at the appropriate offset for each. Yes, there are issues with this when validity vectors are involved and the column sizes aren't a multiple of 8. It also violates the principle of column immutability as the app code mutates the final output column over the 5 operations, so I could see how one could argue this isn't a valid use-case.

@harrism
Copy link
Member

harrism commented Jun 21, 2019

The design is in progress, but moving this to 0.9 to keep it open.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request New feature or request libcudf Affects libcudf (C++/CUDA) code. proposal Change current process or code
Projects
None yet
Development

Successfully merging a pull request may close this issue.