[BUG] test_no_fallback_when_ansi_enabled failed in databricks #3611

abellina · 2021-09-22T21:26:05Z

@razajafri found this in one of his PRs, the results from the CPU and GPU do not match for test_no_fallback_when_ansi_enabled, which I added here #3597:

First few rows from the CPU:

 [Row(a=None, first(b)=118, last(b)=507, min(b)=118, max(b)=507), Row(a=1, first(b)=507, last(b)=507, min(b)=507, max(b)=507),  Row(a=2, first(b)=848, last(b)=848, min(b)=848, max(b)=848),

First few rows from the GPU:

[Row(a=1, first(b)=507, last(b)=507, min(b)=507, max(b)=507), Row(a=2, first(b)=848, last(b)=848, min(b)=848, max(b)=848)

Last row on the GPU:

Row(a=None, first(b)=118, last(b)=507, min(b)=118, max(b)=507

I am not entirely sure how this is happening, after a coalesce(1) and orderBy(every column).

The text was updated successfully, but these errors were encountered:

jlowe · 2021-09-22T21:41:14Z

I am not entirely sure how this is happening, after a coalesce(1) and orderBy(every column)

But the orderBy is not the last thing in the query?

        df = gen_df(spark, [('a', data_gen), ('b', data_gen)], length=100)
        # coalescing because of first/last are not deterministic
        df = df.coalesce(1).orderBy("a", "b")
        return df.groupBy('a').agg(f.first("b"), f.last("b"), f.min("b"), f.max("b"))

What's preventing a Spark implementation from hash-aggregating the grouping and therefore the resulting order of the output rows being non-deterministic because it's dumping hash table contents?

abellina · 2021-09-22T21:46:33Z

Ok @jlowe is absolutely right. Just adding the ignore order marker here should do it.

razajafri · 2021-09-22T21:54:40Z

Do you want me to do this as part of my #3330? Adding skip is the same amount of work as adding ignore_order

abellina added bug Something isn't working ? - Needs Triage Need team to review and classify P0 Must have for release labels Sep 22, 2021

abellina self-assigned this Sep 22, 2021

abellina added this to the Sep 13 - Sep 24 milestone Sep 22, 2021

razajafri mentioned this issue Sep 22, 2021

Support int96RebaseModeInWrite and int96RebaseModeInRead #3330

Merged

abellina mentioned this issue Sep 22, 2021

Ignore order for the test_no_fallback_when_ansi_enabled #3615

Merged

pxLi mentioned this issue Sep 23, 2021

Update Version to 21.12.0-SNAPSHOT [databricks] #3617

Merged

abellina closed this as completed in #3615 Sep 23, 2021

sameerz removed the ? - Needs Triage Need team to review and classify label Sep 27, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] test_no_fallback_when_ansi_enabled failed in databricks #3611

[BUG] test_no_fallback_when_ansi_enabled failed in databricks #3611

abellina commented Sep 22, 2021 •

edited

Loading

jlowe commented Sep 22, 2021

abellina commented Sep 22, 2021

razajafri commented Sep 22, 2021

[BUG] test_no_fallback_when_ansi_enabled failed in databricks #3611

[BUG] test_no_fallback_when_ansi_enabled failed in databricks #3611

Comments

abellina commented Sep 22, 2021 • edited Loading

jlowe commented Sep 22, 2021

abellina commented Sep 22, 2021

razajafri commented Sep 22, 2021

abellina commented Sep 22, 2021 •

edited

Loading