-
Notifications
You must be signed in to change notification settings - Fork 887
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[REVIEW] Fix ORC reader issue with decimal type #6466
[REVIEW] Fix ORC reader issue with decimal type #6466
Conversation
Please update the changelog in order to start CI tests. View the gpuCI docs here. |
@harrism should we push this 0.16 as this is priority-0 and and change is just one line. |
@rgsl888prabhu is there a small test file we could add to cover the error case here? |
It is a 64KB file with which we can reproduce, would it be fine to add it ? |
Is it possible to make a smaller repro? |
I am trying to create smaller file to test it. |
Codecov Report
@@ Coverage Diff @@
## branch-0.16 #6466 +/- ##
===============================================
- Coverage 84.45% 83.00% -1.46%
===============================================
Files 82 95 +13
Lines 13846 14901 +1055
===============================================
+ Hits 11694 12368 +674
- Misses 2152 2533 +381
Continue to review full report at Codecov.
|
Hmm. That doesn't look right: n is the number of valid runs. It breaks out of the loop without incrementing it if the payload for this run would exceed the amount of data available in the buffer. What are the values in the failure case ? |
This issue occurs when numvals=1024, so for n=1023, pos > max_pos and it would break. But at n=1023, we have already updated the value val[n] with lastpos, and as this is using zero indexing, number of run would always be |
@OlivierNV And as you suggested, changing the |
It's storing the last offset but not using it (which is likely problem since that same location is initially used to store the scale). |
github is experiencing issues, and that's why my commit is not yet being shown in this PR. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Please select |
@rgsl888prabhu did you forget to check in the test files? |
I see the Python test file change |
The Python test file refers to: |
@vuule He meant orc format files for tests. @kkraus14 I just uploaded them. |
@vuule do you want to review before we mark this as ready to merge? |
@rgsl888prabhu Do you think we should check for perf regression before merging this? Looks unlikely to me, but I don't know the context well. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good stuff. Thanks for finding smaller repro files.
Thanks to @OlivierNV. |
Will try it and let you know. |
Tried with a ORC dataset with just one column with There is not much change in spped. older
newer
|
Number of values being read was't considering last update while breaking, which caused the subsequent calls behave oddly, leading to improper access of memory segments.
closes #5892