[FEA] custom kernel for date/timestamp formatting/parsing #10032

revans2 · 2023-12-12T17:19:39Z

Is your feature request related to a problem? Please describe.
Spark uses java for date/timestamp parsing and formatting. We have been using a CUDF kernel that uses formats that are compatible with python/C++. But the java formats are very different, so we have to do a mapping. But there are java formats that are not ambiguous until they are mapped into the format the cudf supports. We really should just write our own kernel that tries to do what Spark/Java does directly.

res-life · 2023-12-26T06:23:08Z

One related issue: #10083

Java API gets:
+10000-01-01
And cuDF gets:
0000-01-01
for date: 10000-01-01 when the format is yyyy-MM-dd

andygrove · 2024-01-08T19:44:37Z

Some notes on parsing dates from JSON, based on #9975

Depending on the Spark version, there can be different code paths depending on whether a dateFormat is specified or not. Some of the differences that we need to be able to handle are:

We sometimes need to support single-digit months and days, and sometimes we require two digits
We sometimes need to trim all leading and trailing whitespace, sometimes we only trim specific whitespace chars, sometimes we don't trim at all
Sometimes we perform a cast instead of a parse, and in that case we support special values "epoch", "now", "today", "yesterday", and "tomorrow" (definitely an edge case because it doesn't much sense to store relative terms like this in a json file)

res-life · 2024-01-18T06:42:11Z

Sometimes we perform a cast instead of a parse, and in that case we support special values "epoch", "now", "today", "yesterday", and "tomorrow" (definitely an edge case because it doesn't much sense to store relative terms like this in a json file)

When cast string to timestamp, only Spark31x supports special values you mentioned, Spark 320 and 320+ do not support special values.

revans2 · 2024-01-19T20:47:33Z

only Spark31x supports special values you mentioned

Are we just not going to support the special values in Spark 3.1 and document it? or are we going to do special post processing to fix them up?

res-life · 2024-01-25T10:28:56Z

Are we just not going to support the special values in Spark 3.1 and document it? or are we going to do special post processing to fix them up?

I suggest do special post processing:
For now and epoch they are not time zone awared.
For today/tomorrow/yesterday they are time zone awared. Generate them in Java in the default time zone, then replace the mached string to the values.

revans2 added feature request New feature or request ? - Needs Triage Need team to review and classify labels Dec 12, 2023

revans2 mentioned this issue Dec 12, 2023

GPU supports yyyyMMdd format by post process for the from_unixtime function #10023

Merged

mattahrens removed the ? - Needs Triage Need team to review and classify label Dec 12, 2023

NVnavkumar mentioned this issue Dec 13, 2023

[FEA] Support Cast for String to Timestamps for non-UTC timezones #10035

Open

2 tasks

res-life mentioned this issue Dec 19, 2023

[FEA] New kernel to support parsing dates/timestamps string with a timezone parameter. NVIDIA/spark-rapids-jni#1655

Closed

3 tasks

res-life mentioned this issue Dec 26, 2023

[BUG] test_date_format_for_time and test_date_format_maybe_incompat failed in non-utc job #10083

Open

2 tasks

revans2 mentioned this issue Jan 5, 2024

Improve dateFormat support in GpuJsonScan and make tests consistent with GpuStructsToJson [databricks] #9975

Merged

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEA] custom kernel for date/timestamp formatting/parsing #10032

[FEA] custom kernel for date/timestamp formatting/parsing #10032

revans2 commented Dec 12, 2023

res-life commented Dec 26, 2023 •

edited

Loading

andygrove commented Jan 8, 2024 •

edited

Loading

res-life commented Jan 18, 2024

revans2 commented Jan 19, 2024

res-life commented Jan 25, 2024

[FEA] custom kernel for date/timestamp formatting/parsing #10032

[FEA] custom kernel for date/timestamp formatting/parsing #10032

Comments

revans2 commented Dec 12, 2023

res-life commented Dec 26, 2023 • edited Loading

andygrove commented Jan 8, 2024 • edited Loading

res-life commented Jan 18, 2024

revans2 commented Jan 19, 2024

res-life commented Jan 25, 2024

res-life commented Dec 26, 2023 •

edited

Loading

andygrove commented Jan 8, 2024 •

edited

Loading