-
Notifications
You must be signed in to change notification settings - Fork 230
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] Support years with up to 7 digits when casting from String to Date in Spark 3.2 #3382
Comments
The change here is that Spark 3.0 / 3.1 only supports 4 digit years when casting from string to date, but Spark 3.2 supports between 4 and 7 digits. See apache/spark@c9813f7 for more information. The test output is confusing because Spark 3.0
Spark 3.2
|
Hi @andygrove, I checked the timestamp implementation of cuDF. For now, it only supports parsing strings with fixed-length specifiers. For specifier "Y", the length is 4. And it looks not a small work to support variable-length specifiers in terms of cuDF. Shall I file an issue in cuDF repo? |
Alternatively, perhaps we can manually extract the year part, then narrow down the values of year to fit the 4-digits range. And weadd back the exceeded values of year through cuDF API |
I think this is another one of those places where we are going to need to do something custom to actually fix it. CUDF has been very adamant in the past that the date formats follow the C standard library formats, and as was stated before variable length is not something that they are willing to support because of the possibility of ambiguity in the formats and because of being standards based. If variable width is supported and we have the pattern Perhaps in the short term we add in a config so users can opt-into the smaller date cast functionality in 3.2. |
I still see these failures, even after @sperlingxx's PR went in. |
I have a "fix" for this that has us fall back to the CPU in cases where we cannot support it, but gives the user a config to override it. I am working on updating the tests now. Not sure if I will get done before the end of day or not (meetings). |
Describe the bug
We have test failures in
CastOpSuite
now that Spark 3.2 supports years with up to 7 digits when casting from string to date.Steps/Code to reproduce bug
Run
CastOpSuite
with Spark 3.2Expected behavior
Tests should pass.
Environment details (please complete the following information)
N/A
Additional context
Spark commit: apache/spark@c9813f7
See issue #3406 for supporting other changes in SPARK-35780.
The text was updated successfully, but these errors were encountered: