Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ColumnarBatch to CachedBatch and back #1001

Merged
merged 8 commits into from
Oct 27, 2020

Conversation

razajafri
Copy link
Collaborator

Write ColumnarBatch to CachedBatch and Read CachedBatch into ColumnarBatch

When writing ColumnarBatch to a CachedBatch, I am converting it to a rowIterator and essentially writing it like an InternalRow. A more performant way to do this might be to write the file as Columnar but it can be explored as a follow-on.

Sign off empty-commit

Signed-off-by: Raza Jafri rjafri@nvidia.com

Write ColumnarBatch to CachedBatch and Read CachedBatch into
ColumnarBatch

Sign off empty-commit

Signed-off-by: Raza Jafri <rjafri@nvidia.com>
Copy link
Collaborator

@revans2 revans2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't get through all of the code there is a lot to cover. I'll try to spend some more time on this soon.

@jlowe jlowe changed the title [REVIEW] ColumnarBatch to CachedBatch and back ColumnarBatch to CachedBatch and back Oct 22, 2020
@razajafri
Copy link
Collaborator Author

I didn't get through all of the code there is a lot to cover. I'll try to spend some more time on this soon.

Let me clean it up more and add docs.

@razajafri razajafri mentioned this pull request Oct 22, 2020
12 tasks
@sameerz sameerz added the performance A performance related task/issue label Oct 22, 2020
Removed RapidsVectorizedColumnReader in favor of Reflection on
VectorizedColumnReader
@razajafri
Copy link
Collaborator Author

@revans2 @jlowe I have addressed all your concerns I think. Can you PTAL

val num = Math.min(capacity.toLong, totalCountLoadedSoFar - rowsReturned).toInt
for (i <- columnReaders.indices) {
if (columnReaders(i) != null) {
val readBatchMethod =
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have moved this up at the class level so we don't have to look it up every time. Update coming soon

@razajafri
Copy link
Collaborator Author

build

@razajafri
Copy link
Collaborator Author

build

revans2
revans2 previously approved these changes Oct 26, 2020
integration_tests/src/main/python/cache_test.py Outdated Show resolved Hide resolved
@razajafri
Copy link
Collaborator Author

build

@razajafri
Copy link
Collaborator Author

@jlowe I think I have addressed all your concerns.

@revans2 can you bless it again?

@razajafri razajafri merged commit aa4558c into NVIDIA:branch-0.3 Oct 27, 2020
@razajafri razajafri deleted the cache-plug-columnar-cpu branch October 28, 2020 04:50
sperlingxx pushed a commit to sperlingxx/spark-rapids that referenced this pull request Nov 20, 2020
Write ColumnarBatch to CachedBatch and Read CachedBatch into
ColumnarBatch

Sign off empty-commit

Signed-off-by: Raza Jafri <rjafri@nvidia.com>
nartal1 pushed a commit to nartal1/spark-rapids that referenced this pull request Jun 9, 2021
Write ColumnarBatch to CachedBatch and Read CachedBatch into
ColumnarBatch

Sign off empty-commit

Signed-off-by: Raza Jafri <rjafri@nvidia.com>
nartal1 pushed a commit to nartal1/spark-rapids that referenced this pull request Jun 9, 2021
Write ColumnarBatch to CachedBatch and Read CachedBatch into
ColumnarBatch

Sign off empty-commit

Signed-off-by: Raza Jafri <rjafri@nvidia.com>
tgravescs pushed a commit to tgravescs/spark-rapids that referenced this pull request Nov 30, 2023
…IDIA#1001)

Signed-off-by: spark-rapids automation <70000568+nvauto@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance A performance related task/issue
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants