Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Parquet-based cache serializer #638
Add Parquet-based cache serializer #638
Changes from 9 commits
ac395a5
8afa515
324f2a5
3ea37cf
868ead7
8c05165
43579ea
de77c87
9a7aea3
9081cbc
d50cd4f
01b6de4
b202ef3
b1a547d
cb156e6
93b05f7
c414b18
392d7e2
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Technically if it is on the CPU then it is the responsibility of the producer to close the batch, not the consumer.
This is one of the odd things with spark. They expect the producer to close the batch, where as we expect the consumer to do it. They did this so that they could reuse the memory for a batch whenever possible, mostly because it is used in just a limited number of locations and with a very limited size of a batch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you saying not to close it? If it was being closed by the producer, shouldn't it have caused a doublefree error?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this code is ever tested, but it would probably result in the CPU parquet writer trying to use freed memory.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You cannot use a map for this. There is not a one to one mapping between a batch and an internal row. You will have to return an iterator and please look at how GpuColumnarToRowExec does this conversion because it should be much more performant.
Also we need some tests that exercise this code path.