Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix reading ORC/PARQUET over empty clipped schema #5674

Merged

Conversation

sperlingxx
Copy link
Collaborator

@sperlingxx sperlingxx commented May 27, 2022

Fixes #5672
Fixes #5694

Fix reading ORC/PARQUET over empty clipped schema through bypassing the call of cuDF table reader.

Signed-off-by: sperlingxx lovedreamf@gmail.com

Signed-off-by: sperlingxx <lovedreamf@gmail.com>
@sperlingxx
Copy link
Collaborator Author

build

@jlowe jlowe added this to the May 23 - Jun 3 milestone May 27, 2022
Copy link
Member

@jlowe jlowe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks OK but there are questions about the consistency of acquiring the semaphore on empty batches. Could address this as part of #4568.

// not reading any data, so return a degenerate ColumnarBatch with the row count
if (currentChunkMeta.numTotalRows == 0) {
None
} else {
val rows = currentChunkMeta.numTotalRows.toInt
// Someone is going to process this data, even if it is just a row count
GpuSemaphore.acquireIfNecessary(TaskContext.get(), metrics(SEMAPHORE_WAIT_TIME))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is inconsistent with the semantics of the ORC scan below. In this case, it will acquire the semaphore despite returning a row-count-only batch, while the ORC case below will avoid acquiring the semaphore unless it actually generates null columns.

Copy link
Collaborator Author

@sperlingxx sperlingxx May 30, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aligned all GPU semaphore acquisitions for empty batches.

case meta: HostMemoryEmptyMetaData =>
// Not reading any data, but add in partition data if needed
// Someone is going to process this data, even if it is just a row count
GpuSemaphore.acquireIfNecessary(TaskContext.get(), metrics(SEMAPHORE_WAIT_TIME))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same inconsistent acquisition on empty batches issue here.

Comment on lines 1933 to 1935
// Not reading any data, but add in partition data if needed
// Someone is going to process this data, even if it is just a row count
GpuSemaphore.acquireIfNecessary(TaskContext.get(), metrics(SEMAPHORE_WAIT_TIME))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same semaphore acquisition on empty batch consistency here.

@sameerz sameerz added the bug Something isn't working label May 27, 2022
Signed-off-by: sperlingxx <lovedreamf@gmail.com>
@sperlingxx
Copy link
Collaborator Author

build

tgravescs
tgravescs previously approved these changes May 31, 2022
Copy link
Collaborator

@tgravescs tgravescs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

needs upmerged

@tgravescs
Copy link
Collaborator

tgravescs commented May 31, 2022

@sperlingxx does this also fix #5694?

If not can you please take a look at that issue.

@sperlingxx
Copy link
Collaborator Author

@sperlingxx does this also fix #5694?

If not can you please take a look at that issue.

Yes, @tgravescs. It will fix #5694 as well.

@sperlingxx
Copy link
Collaborator Author

I apologize that I did an upmerge to 22.08 by mistake. To fix the blunder, I had to run a force update to shift the base.

@sperlingxx
Copy link
Collaborator Author

build

2 similar comments
@sperlingxx
Copy link
Collaborator Author

build

@sperlingxx
Copy link
Collaborator Author

build

Signed-off-by: sperlingxx <lovedreamf@gmail.com>
@sperlingxx
Copy link
Collaborator Author

build

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
4 participants