Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] test_substring_column failed #8147

Closed
pxLi opened this issue Apr 19, 2023 · 4 comments · Fixed by #8175
Closed

[BUG] test_substring_column failed #8147

pxLi opened this issue Apr 19, 2023 · 4 comments · Fixed by #8175
Assignees
Labels
bug Something isn't working test Only impacts tests

Comments

@pxLi
Copy link
Collaborator

pxLi commented Apr 19, 2023

Describe the bug
failed in integration test, rapids_databricks_nightly-dev-github run ID: 698

mismatched cpu and gpu output

[2023-04-19T07:55:03.160Z] =================================== FAILURES ===================================
[2023-04-19T07:55:03.160Z] �[31m�[1m____________________________ test_substring_column _____________________________�[0m
[2023-04-19T07:55:03.160Z] [gw0] linux -- Python 3.8.10 /usr/bin/python
[2023-04-19T07:55:03.160Z] 
[2023-04-19T07:55:03.160Z]     def test_substring_column():
[2023-04-19T07:55:03.160Z]         str_gen = mk_str_gen('.{0,30}')
[2023-04-19T07:55:03.160Z] >       assert_gpu_and_cpu_are_equal_collect(
[2023-04-19T07:55:03.160Z]             lambda spark: three_col_df(spark, str_gen, int_gen, int_gen).selectExpr(
[2023-04-19T07:55:03.160Z]                 'SUBSTRING(a, b, c)',
[2023-04-19T07:55:03.160Z]                 'SUBSTRING(a, b, 0)',
[2023-04-19T07:55:03.160Z]                 'SUBSTRING(a, b, 5)',
[2023-04-19T07:55:03.160Z]                 'SUBSTRING(a, b, -5)',
[2023-04-19T07:55:03.160Z]                 'SUBSTRING(a, b, 100)',
[2023-04-19T07:55:03.160Z]                 'SUBSTRING(a, b, -100)',
[2023-04-19T07:55:03.160Z]                 'SUBSTRING(a, b, NULL)',
[2023-04-19T07:55:03.160Z]                 'SUBSTRING(a, 0, c)',
[2023-04-19T07:55:03.160Z]                 'SUBSTRING(a, 5, c)',
[2023-04-19T07:55:03.160Z]                 'SUBSTRING(a, -5, c)',
[2023-04-19T07:55:03.160Z]                 'SUBSTRING(a, 100, c)',
[2023-04-19T07:55:03.160Z]                 'SUBSTRING(a, -100, c)',
[2023-04-19T07:55:03.160Z]                 'SUBSTRING(a, NULL, c)',
[2023-04-19T07:55:03.160Z]                 'SUBSTRING(\'abc\', b, c)',
[2023-04-19T07:55:03.160Z]                 'SUBSTRING(\'abc\', 1, c)',
[2023-04-19T07:55:03.160Z]                 'SUBSTRING(\'abc\', 0, c)',
[2023-04-19T07:55:03.160Z]                 'SUBSTRING(\'abc\', 5, c)',
[2023-04-19T07:55:03.160Z]                 'SUBSTRING(\'abc\', -1, c)',
[2023-04-19T07:55:03.160Z]                 'SUBSTRING(\'abc\', -5, c)',
[2023-04-19T07:55:03.161Z]                 'SUBSTRING(\'abc\', NULL, c)',
[2023-04-19T07:55:03.161Z]                 'SUBSTRING(\'abc\', b, 10)',
[2023-04-19T07:55:03.161Z]                 'SUBSTRING(\'abc\', b, -10)',
[2023-04-19T07:55:03.161Z]                 'SUBSTRING(\'abc\', b, 2)',
[2023-04-19T07:55:03.161Z]                 'SUBSTRING(\'abc\', b, 0)',
[2023-04-19T07:55:03.161Z]                 'SUBSTRING(\'abc\', b, NULL)',
[2023-04-19T07:55:03.161Z]                 'SUBSTRING(\'abc\', b)',
[2023-04-19T07:55:03.161Z]                 'SUBSTRING(a, b)'))
[2023-04-19T07:55:03.161Z] 
[2023-04-19T07:55:03.161Z] �[1m�[31m../../src/main/python/string_test.py�[0m:351: 
[2023-04-19T07:55:03.161Z] _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
[2023-04-19T07:55:03.161Z] �[1m�[31m../../src/main/python/asserts.py�[0m:562: in assert_gpu_and_cpu_are_equal_collect
[2023-04-19T07:55:03.161Z]     _assert_gpu_and_cpu_are_equal(func, 'COLLECT', conf=conf, is_cpu_first=is_cpu_first)
[2023-04-19T07:55:03.161Z] �[1m�[31m../../src/main/python/asserts.py�[0m:493: in _assert_gpu_and_cpu_are_equal
[2023-04-19T07:55:03.161Z]     assert_equal(from_cpu, from_gpu)
[2023-04-19T07:55:03.161Z] �[1m�[31m../../src/main/python/asserts.py�[0m:106: in assert_equal
[2023-04-19T07:55:03.161Z]     _assert_equal(cpu, gpu, float_check=get_float_check(), path=[])
[2023-04-19T07:55:03.161Z] �[1m�[31m../../src/main/python/asserts.py�[0m:42: in _assert_equal
[2023-04-19T07:55:03.161Z]     _assert_equal(cpu[index], gpu[index], float_check, path + [index])
[2023-04-19T07:55:03.161Z] �[1m�[31m../../src/main/python/asserts.py�[0m:35: in _assert_equal
[2023-04-19T07:55:03.161Z]     _assert_equal(cpu[field], gpu[field], float_check, path + [field])
[2023-04-19T07:55:03.161Z] _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
[2023-04-19T07:55:03.161Z] 
[2023-04-19T07:55:03.161Z] cpu = '', gpu = '\x95\x8bìÁ'
[2023-04-19T07:55:03.161Z] float_check = <function get_float_check.<locals>.<lambda> at 0x7f6e7a553b80>
[2023-04-19T07:55:03.161Z] path = [1, 'substring(a, 5, c)']
[2023-04-19T07:55:03.161Z] 
[2023-04-19T07:55:03.161Z]     def _assert_equal(cpu, gpu, float_check, path):
[2023-04-19T07:55:03.161Z]         t = type(cpu)
[2023-04-19T07:55:03.161Z]         if (t is Row):
[2023-04-19T07:55:03.161Z]             assert len(cpu) == len(gpu), "CPU and GPU row have different lengths at {} CPU: {} GPU: {}".format(path, len(cpu), len(gpu))
[2023-04-19T07:55:03.161Z]             if hasattr(cpu, "__fields__") and hasattr(gpu, "__fields__"):
[2023-04-19T07:55:03.161Z]                 assert cpu.__fields__ == gpu.__fields__, "CPU and GPU row have different fields at {} CPU: {} GPU: {}".format(path, cpu.__fields__, gpu.__fields__)
[2023-04-19T07:55:03.161Z]                 for field in cpu.__fields__:
[2023-04-19T07:55:03.161Z]                     _assert_equal(cpu[field], gpu[field], float_check, path + [field])
[2023-04-19T07:55:03.161Z]             else:
[2023-04-19T07:55:03.161Z]                 for index in range(len(cpu)):
[2023-04-19T07:55:03.161Z]                     _assert_equal(cpu[index], gpu[index], float_check, path + [index])
[2023-04-19T07:55:03.161Z]         elif (t is list):
[2023-04-19T07:55:03.161Z]             assert len(cpu) == len(gpu), "CPU and GPU list have different lengths at {} CPU: {} GPU: {}".format(path, len(cpu), len(gpu))
[2023-04-19T07:55:03.161Z]             for index in range(len(cpu)):
[2023-04-19T07:55:03.161Z]                 _assert_equal(cpu[index], gpu[index], float_check, path + [index])
[2023-04-19T07:55:03.161Z]         elif (t is tuple):
[2023-04-19T07:55:03.161Z]             assert len(cpu) == len(gpu), "CPU and GPU list have different lengths at {} CPU: {} GPU: {}".format(path, len(cpu), len(gpu))
[2023-04-19T07:55:03.161Z]             for index in range(len(cpu)):
[2023-04-19T07:55:03.161Z]                 _assert_equal(cpu[index], gpu[index], float_check, path + [index])
[2023-04-19T07:55:03.161Z]         elif (t is pytypes.GeneratorType):
[2023-04-19T07:55:03.161Z]             index = 0
[2023-04-19T07:55:03.161Z]             # generator has no zip :( so we have to do this the hard way
[2023-04-19T07:55:03.161Z]             done = False
[2023-04-19T07:55:03.161Z]             while not done:
[2023-04-19T07:55:03.161Z]                 sub_cpu = None
[2023-04-19T07:55:03.161Z]                 sub_gpu = None
[2023-04-19T07:55:03.161Z]                 try:
[2023-04-19T07:55:03.161Z]                     sub_cpu = next(cpu)
[2023-04-19T07:55:03.161Z]                 except StopIteration:
[2023-04-19T07:55:03.161Z]                     done = True
[2023-04-19T07:55:03.161Z]     
[2023-04-19T07:55:03.161Z]                 try:
[2023-04-19T07:55:03.161Z]                     sub_gpu = next(gpu)
[2023-04-19T07:55:03.161Z]                 except StopIteration:
[2023-04-19T07:55:03.161Z]                     done = True
[2023-04-19T07:55:03.161Z]     
[2023-04-19T07:55:03.161Z]                 if done:
[2023-04-19T07:55:03.161Z]                     assert sub_cpu == sub_gpu and sub_cpu == None, "CPU and GPU generators have different lengths at {}".format(path)
[2023-04-19T07:55:03.161Z]                 else:
[2023-04-19T07:55:03.161Z]                     _assert_equal(sub_cpu, sub_gpu, float_check, path + [index])
[2023-04-19T07:55:03.161Z]     
[2023-04-19T07:55:03.161Z]                 index = index + 1
[2023-04-19T07:55:03.161Z]         elif (t is dict):
[2023-04-19T07:55:03.162Z]             # The order of key/values is not guaranteed in python dicts, nor are they guaranteed by Spark
[2023-04-19T07:55:03.162Z]             # so sort the items to do our best with ignoring the order of dicts
[2023-04-19T07:55:03.162Z]             cpu_items = list(cpu.items()).sort(key=_RowCmp)
[2023-04-19T07:55:03.162Z]             gpu_items = list(gpu.items()).sort(key=_RowCmp)
[2023-04-19T07:55:03.162Z]             _assert_equal(cpu_items, gpu_items, float_check, path + ["map"])
[2023-04-19T07:55:03.162Z]         elif (t is int):
[2023-04-19T07:55:03.162Z]             assert cpu == gpu, "GPU and CPU int values are different at {}".format(path)
[2023-04-19T07:55:03.162Z]         elif (t is float):
[2023-04-19T07:55:03.162Z]             if (math.isnan(cpu)):
[2023-04-19T07:55:03.162Z]                 assert math.isnan(gpu), "GPU and CPU float values are different at {}".format(path)
[2023-04-19T07:55:03.162Z]             else:
[2023-04-19T07:55:03.162Z]                 assert float_check(cpu, gpu), "GPU and CPU float values are different {}".format(path)
[2023-04-19T07:55:03.162Z]         elif isinstance(cpu, str):
[2023-04-19T07:55:03.162Z] >           assert cpu == gpu, "GPU and CPU string values are different at {}".format(path)
[2023-04-19T07:55:03.162Z] �[1m�[31mE           AssertionError: GPU and CPU string values are different at [1, 'substring(a, 5, c)']�[0m

Steps/Code to reproduce bug
Please provide a list of steps or a code sample to reproduce the issue.
Avoid posting private or sensitive data.

Expected behavior
Passed CI

@pxLi pxLi added bug Something isn't working ? - Needs Triage Need team to review and classify test Only impacts tests labels Apr 19, 2023
@tgravescs
Copy link
Collaborator

failing integration tests as well so more then just data bricks

@jlowe
Copy link
Member

jlowe commented Apr 19, 2023

Might be related to rapidsai/cudf#13057 but I have not verified this yet, testing it now.

@jlowe jlowe self-assigned this Apr 19, 2023
@jlowe
Copy link
Member

jlowe commented Apr 19, 2023

Yep, reverting the cudf PR allows the test to pass. I'll work on a C++ repro for libcudf.

@jlowe
Copy link
Member

jlowe commented Apr 19, 2023

Filed rapidsai/cudf#13173

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working test Only impacts tests
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants