Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix a hanging issue when processing empty data. #841

Merged
merged 2 commits into from
Sep 25, 2020

Conversation

firestarman
Copy link
Collaborator

@firestarman firestarman commented Sep 24, 2020

The output iterator will wait on the batch queue when calling hasNext, and suppose to be waked up when the Python runner inserts something into the batch queue. But the insertion will never happen if the input data is empty. So it hangs forever.

The solution is to let the Python runner always wake up the output iterator after it finishes the data writing by calling the new added API finish().

Also added the test for it. The 'small_data' is small enough to let some tasks get no data when running.

Signed-off-by: Firestarman firestarmanllc@gmail.com

@firestarman firestarman linked an issue Sep 24, 2020 that may be closed by this pull request
@sameerz sameerz added the bug Something isn't working label Sep 24, 2020
@firestarman
Copy link
Collaborator Author

build

revans2
revans2 previously approved these changes Sep 24, 2020
Copy link
Collaborator

@revans2 revans2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work finding this.

@firestarman
Copy link
Collaborator Author

build

The output iterator will wait on the batch queue when calling `hasNext`,
and suppose to be waked up when the Python runner inserts something into
the batch queue. But the insertion will never happen if the input data
is empty. So it hangs forever.

The solution is to let the Python runner always wake up the output iterator
after it finishes the data writing by calling the new added API `finish()`.

Signed-off-by: Firestarman <firestarmanllc@gmail.com>
The 'small_data' is small enough to let some tasks get
no data when running.

Now only test this for the Scalar type who just implements
the columnar pipeline.

Signed-off-by: Firestarman <firestarmanllc@gmail.com>
@firestarman firestarman changed the title [WIP] Fix a hanging issue when processing empty data. Fix a hanging issue when processing empty data. Sep 25, 2020
@firestarman
Copy link
Collaborator Author

build

@firestarman
Copy link
Collaborator Author

@revans2 Added the test for it. Could you take another look?

@firestarman
Copy link
Collaborator Author

build

1 similar comment
@revans2
Copy link
Collaborator

revans2 commented Sep 25, 2020

build

@firestarman firestarman merged commit 779b9fa into NVIDIA:branch-0.3 Sep 25, 2020
@firestarman firestarman deleted the fix-hang-issue branch September 25, 2020 23:39
NvTimLiu pushed a commit to NvTimLiu/spark-rapids that referenced this pull request Oct 16, 2020
* Fix a hanging issue when processing empty data.

The output iterator will wait on the batch queue when calling `hasNext`,
and suppose to be waked up when the Python runner inserts something into
the batch queue. But the insertion will never happen if the input data
is empty. So it hangs forever.

The solution is to let the Python runner always wake up the output iterator
after it finishes the data writing by calling the new added API `finish()`.

Signed-off-by: Firestarman <firestarmanllc@gmail.com>

* Add tests for processing empty data.

The 'small_data' is small enough to let some tasks get
no data when running.

Now only test this for the Scalar type who just implements
the columnar pipeline.

Signed-off-by: Firestarman <firestarmanllc@gmail.com>
sperlingxx pushed a commit to sperlingxx/spark-rapids that referenced this pull request Nov 20, 2020
* Fix a hanging issue when processing empty data.

The output iterator will wait on the batch queue when calling `hasNext`,
and suppose to be waked up when the Python runner inserts something into
the batch queue. But the insertion will never happen if the input data
is empty. So it hangs forever.

The solution is to let the Python runner always wake up the output iterator
after it finishes the data writing by calling the new added API `finish()`.

Signed-off-by: Firestarman <firestarmanllc@gmail.com>

* Add tests for processing empty data.

The 'small_data' is small enough to let some tasks get
no data when running.

Now only test this for the Scalar type who just implements
the columnar pipeline.

Signed-off-by: Firestarman <firestarmanllc@gmail.com>
nartal1 pushed a commit to nartal1/spark-rapids that referenced this pull request Jun 9, 2021
* Fix a hanging issue when processing empty data.

The output iterator will wait on the batch queue when calling `hasNext`,
and suppose to be waked up when the Python runner inserts something into
the batch queue. But the insertion will never happen if the input data
is empty. So it hangs forever.

The solution is to let the Python runner always wake up the output iterator
after it finishes the data writing by calling the new added API `finish()`.

Signed-off-by: Firestarman <firestarmanllc@gmail.com>

* Add tests for processing empty data.

The 'small_data' is small enough to let some tasks get
no data when running.

Now only test this for the Scalar type who just implements
the columnar pipeline.

Signed-off-by: Firestarman <firestarmanllc@gmail.com>
nartal1 pushed a commit to nartal1/spark-rapids that referenced this pull request Jun 9, 2021
* Fix a hanging issue when processing empty data.

The output iterator will wait on the batch queue when calling `hasNext`,
and suppose to be waked up when the Python runner inserts something into
the batch queue. But the insertion will never happen if the input data
is empty. So it hangs forever.

The solution is to let the Python runner always wake up the output iterator
after it finishes the data writing by calling the new added API `finish()`.

Signed-off-by: Firestarman <firestarmanllc@gmail.com>

* Add tests for processing empty data.

The 'small_data' is small enough to let some tasks get
no data when running.

Now only test this for the Scalar type who just implements
the columnar pipeline.

Signed-off-by: Firestarman <firestarmanllc@gmail.com>
tgravescs pushed a commit to tgravescs/spark-rapids that referenced this pull request Nov 30, 2023
…IDIA#841)

Signed-off-by: spark-rapids automation <70000568+nvauto@users.noreply.github.com>

Signed-off-by: spark-rapids automation <70000568+nvauto@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[BUG] udf_cudf_test::test_with_column fails with IPC error
3 participants