Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make the sync marker uniform for the Avro coalescing reader #5428

Merged
merged 2 commits into from
May 7, 2022

Conversation

firestarman
Copy link
Collaborator

@firestarman firestarman commented May 5, 2022

This PR is to fix the issue #5312 by appending the same sync marker to every block that being coalesced into a single Avro file.

It has mainly

  • introduced a new class named BatchContext, and a new method createBatchContext in the parent class of the coalescing reader. This new method returns a BatchContext who lives during the whole stage of building a coalesced memory file. By this design, the merged Avro header can be shared with the steps of estimating the output size, writing file header and building the block data for creating a batch.
  • updated the copyBlocksData and GPU Avro coalescing reader to support writing a given sync marker.
  • added the related tests.

Performance on Local

  • CPU 12 cores, and one GPU (Titan V, with 12GB memory)

  • Non-partitioned 2000 avro files, 4.4GB in total in LOCAL storage

    CPU PERFILE COALESCING (non-uniform-sync) COALESCING (uniform-sync)
    time(sec) 27.844 24.758 16.005 15.713

The numbers above show this change will not lead to any perf regression for the coalescing reader.

closes #5312

Signed-off-by: Firestarman firestarmanllc@gmail.com

Signed-off-by: Firestarman <firestarmanllc@gmail.com>
@firestarman
Copy link
Collaborator Author

build

@firestarman firestarman added the bug Something isn't working label May 5, 2022
@jlowe
Copy link
Member

jlowe commented May 5, 2022

I'm confused on the reported performance results. Shouldn't this be reporting the performance of the GPU coalescing reader before/after this change rather than comparing the performance against the CPU? Seems like we're changing way too many variables between the two setups to isolate the performance impact of any single change (i.e.: changing CPU to GPU, non-coalescing to coalescing, uniform sync vs. non-uniform sync, etc.).

Signed-off-by: Firestarman <firestarmanllc@gmail.com>
@firestarman
Copy link
Collaborator Author

firestarman commented May 6, 2022

I'm confused on the reported performance results. Shouldn't this be reporting the performance of the GPU coalescing reader before/after this change rather than comparing the performance against the CPU? Seems like we're changing way too many variables between the two setups to isolate the performance impact of any single change (i.e.: changing CPU to GPU, non-coalescing to coalescing, uniform sync vs. non-uniform sync, etc.).

I simply thought it could show us even with this change, we still get better perf than CPU.
Updated to align with the reports of PR #5306, plus an additional column to show the perf with this change.

@firestarman firestarman requested a review from jlowe May 6, 2022 01:29
@firestarman
Copy link
Collaborator Author

build

@firestarman firestarman merged commit aa3dbab into NVIDIA:branch-22.06 May 7, 2022
@firestarman firestarman deleted the coal-sync branch May 7, 2022 01:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
2 participants