-
Notifications
You must be signed in to change notification settings - Fork 230
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add udf jar to nightly integration tests #1420
Conversation
Signed-off-by: Jason Lowe <jlowe@nvidia.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall this looks fine, but we need to either make it so the tests pass without this jar in place (skip tests if it cannot fine the jar) or we need to update the docs
https://github.com/NVIDIA/spark-rapids/blob/branch-0.4/integration_tests/README.md
to indicate that this is required. Preferably we do both so that they know it is optional but adds in more tests. I am fine if we do this as a follow on fix.
The concern I have with this approach is it will be oh-so-easy to not notice the test is always skipping in practice and therefore this functionality goes completely untested. |
Signed-off-by: Jason Lowe <jlowe@nvidia.com>
I understand your concern. That is partly why I added in the extra flags to print out info about all of the skipped tests too. From the perspective of a user tring to run this on their own cluster I would prefer to have graceful degredation? Also as a side note we do it on all of the other dependencies. If you didn't set the time zone to UTC then all the timestamp tests are skipped. If you didn't include the integration tests jar then TPCH, TPCDS, TPCxBB and the mortgage tests are skipped. Perhaps what we need then is a flag to say this is a full test setup fail instead of skipping if these things are missing. |
+1 for that approach. I'll file a tracking issue for it now. Part of the difficulty of skipping this test automatically is Python doesn't have a great way to know if the UDF jar is there or not. The lookup is happening on the JVM side, so we'd either need to do some hacky proxy reflection in Python and catch errors, mark the test as xfail just in case it fails, or make the user pass a custom pytest flag to either run or avoid these tests. |
Sorry I just checked the code and I was wrong we don't actually do that. I thought we did. What happens with the JVM bridge if the class is not there you get a python error about how the object does not exist, which should be simple enough to detect, but I would be fine with just updating the docs. and having a follow on issue for a flag with fallback. |
Integration docs are updated. I caught a 0.3->0.4 merge issue where the CUDA classifier was accidentally dropped from the cudf jar in the cmdline examples which is also fixed. I filed #1421 to track improving the integration tests so we avoid silently skipping tests due to a bad environment setup. |
Signed-off-by: Jason Lowe <jlowe@nvidia.com>
@revans2 @tgravescs since we have a tracking issue for my concern in #1421, I went ahead and made the RAPIDS UDF tests automatically skip if it fails to load the UDF function. When we fix the tracking issue, we can change this skip to a fail if the "don't skip tests that are not expected to be skipped" flag is passed on the command-line. |
build |
* Add udf jar to nightly integration tests Signed-off-by: Jason Lowe <jlowe@nvidia.com> * Update integration test docs for udf-examples jar Signed-off-by: Jason Lowe <jlowe@nvidia.com> * Skip RAPIDS UDF tests if UDF fails to load Signed-off-by: Jason Lowe <jlowe@nvidia.com>
* Add udf jar to nightly integration tests Signed-off-by: Jason Lowe <jlowe@nvidia.com> * Update integration test docs for udf-examples jar Signed-off-by: Jason Lowe <jlowe@nvidia.com> * Skip RAPIDS UDF tests if UDF fails to load Signed-off-by: Jason Lowe <jlowe@nvidia.com>
…IDIA#1420) Signed-off-by: spark-rapids automation <70000568+nvauto@users.noreply.github.com>
#1393 added a new udf jar that is needed when running the integration tests. The nightly integration tests need to pass this extra jar to the
spark-submit
command.