Custom kernel for converting timestamps b/n UTC and non-UTC non-DST timezones #1553

NVnavkumar · 2023-11-14T02:09:36Z

This is a custom kernel implementation of transition timezones to and from UTC. This caches the timezone transition database from Java for use on the GPU to be compatible with the Spark implementation.

This passes the test suite used for the CPU POC (TimeZoneSuite.scala) in the spark-rapids codebase to be compatible with Apache Spark (See NVIDIA/spark-rapids#9739 for updates to that test suite)

Signed-off-by: Navin Kumar <navink@nvidia.com>

…nc with real timezone DB Signed-off-by: Navin Kumar <navink@nvidia.com>

Signed-off-by: Navin Kumar <navink@nvidia.com>

…exactly, switch to upper bound. Update tests for edge case. Signed-off-by: Navin Kumar <navink@nvidia.com>

Signed-off-by: Navin Kumar <navink@nvidia.com>

NVnavkumar · 2023-11-16T08:02:54Z

build

revans2 · 2023-11-16T15:26:17Z

src/main/java/com/nvidia/spark/rapids/jni/GpuTimeZoneDB.java

+   * parts of the database. I prefer the former solution at least until we see a performance hit
+   * where we are waiting on the database to finish loading.
+   */
+  public static void cacheDatabase() {


Can we have a way to pass in a HostMemoryAllocator to this so we can do retry if needed in the future?

Note that this is fine to do as a follow on PR. We just need it for host memory limits at some point.

Filed #1570

revans2 · 2023-11-16T15:30:27Z

src/main/java/com/nvidia/spark/rapids/jni/GpuTimeZoneDB.java

+            new HostColumnVector.BasicType(false, DType.INT32));
+        HostColumnVector.DataType resultType =
+            new HostColumnVector.ListType(false, childType);
+        HostColumnVector fixedTransitions = HostColumnVector.fromLists(resultType,


Do we want/need a way to make this so we don't warn about leaking this? We are looking at making leaks fail unit tests. In Spark a lot of times there are races when trying to shut down things, especially if there is a failure.

This also is fine to do as a follow on issue.

Filed #1571

revans2 · 2023-11-16T15:37:02Z

src/main/java/com/nvidia/spark/rapids/jni/GpuTimeZoneDB.java

+          try {
+            zoneId = ZoneId.of(tzId).normalized(); // we use the normalized form to dedupe
+          } catch (ZoneRulesException e) {
+            continue;


Can we have a comment about when this would happen? It feels odd to just eat it and skip the timezone.

It actually would never happen in this case. This try/catch might have been added by the IDE, but it's not necessary here. This is an exception that occurs when you pass in an invalid Timezone Id to this method that can't be found in the IANA database.

Should add, that the source of this is Java itself (TimeZone.getAvailableIds()), in which the data is coming from the same place.

Never mind, it seems that this data is somewhat inconsistent (probably because there is some ambiguity to be resolved, ie the 3-letter abbreviations which are available but deprecated). Maybe we should file a followup issue to handle that case?

This all depends on how many time zones we are going to be able to support, and if we end up supporting them dynamically or not. A comment and a follow on issue should be fine. I am happy to have them skipped for now, but eventually we will need a full fix.

revans2 · 2023-11-16T15:58:13Z

src/main/java/com/nvidia/spark/rapids/jni/GpuTimeZoneDB.java

+
+  // TODO: Deprecate this API when we support all timezones 
+  // (See https://github.com/NVIDIA/spark-rapids/issues/6840)
+  public static boolean isSupportedTimeZone(ZoneId desiredTimeZone) {


It is going to take a lot to really get rid of this. We are likely going to have to have some special case processing for ZoneOffsets. From what I can tell ZoneId.of uses ZoneOffset.of if the string looks like a ZoneOffset. There also appears to be some parsing going on for UT, GMT, and UTC offsets that I don't fully understand yet. We might need a follow on issue to look at how to dynamically look at offsets for UTC.

src/main/java/com/nvidia/spark/rapids/jni/GpuTimeZoneDB.java

revans2 · 2023-11-16T16:21:38Z

src/main/java/com/nvidia/spark/rapids/jni/GpuTimeZoneDB.java

+public class GpuTimeZoneDB {
+
+  private CompletableFuture<Map<String, Integer>> zoneIdToTableFuture;
+  private CompletableFuture<HostColumnVector> fixedTransitionsFuture;


So what exactly is the data type stored here? It looks to be a LIST<STRUCT<startSeconds: int64, endSecond: Int64, offsetSeconds: Int64>>?

src/test/java/com/nvidia/spark/rapids/jni/TimeZoneTest.java

src/main/cpp/src/timezones.cu

Signed-off-by: Navin Kumar <navink@nvidia.com>

…ase. Signed-off-by: Navin Kumar <navink@nvidia.com>

Signed-off-by: Navin Kumar <navink@nvidia.com>

…e, because the transition would still happen on that exact time. Signed-off-by: Navin Kumar <navink@nvidia.com>

Signed-off-by: Navin Kumar <navink@nvidia.com>

NVnavkumar · 2023-11-16T22:38:13Z

build

revans2

Looking good.

ttnghia

Please format C++ code using clang-format v16. The style file should be ./src/main/cpp/.clang-format.

ttnghia · 2023-11-16T22:55:57Z

src/main/cpp/src/timezones.cu

+ * 
+ * @tparam typestamp_type type of the input and output timestamp
+ * @param timestamp input timestamp
+ * @param transitions the transitions 


Missing two more @param.

ttnghia · 2023-11-16T22:57:25Z

src/main/cpp/src/GpuTimeZoneDBJni.cpp

+        auto input = reinterpret_cast<cudf::column_view const*>(input_handle);
+        auto transitions = reinterpret_cast<cudf::table_view const*>(transitions_handle);
+        auto index = static_cast<cudf::size_type>(tz_index);


It is recommended to use auto const.

src/main/cpp/src/timezones.hpp

src/main/cpp/tests/timezones.cpp

src/main/java/com/nvidia/spark/rapids/jni/GpuTimeZoneDB.java

ttnghia · 2023-11-21T18:38:19Z

src/main/java/com/nvidia/spark/rapids/jni/GpuTimeZoneDB.java

+  }
+
+
+  public static void shutdown() {


Also should this be synchronized?

It's synchronized in the close method.

ttnghia · 2023-11-21T18:40:35Z

src/main/java/com/nvidia/spark/rapids/jni/GpuTimeZoneDB.java

+    Table transitions = instance.getTransitions();
+    ColumnVector result = new ColumnVector(convertTimestampColumnToUTC(input.getNativeView(),
+        transitions.getNativeView(), tzIndex));
+    transitions.close();


I feel this is very expensive since we upload data to GPU and create a new transition table every time we call this function. Can we cache the transition table inside instance?

Yeah. I plan to update this in a future PR since right now we just need to see if we are computing the right thing on the GPU. I think it's a still an open question as to how to cache it and what makes sense. I will file a follow up issue on optimizing this.

Also, right now the functionality will be hidden behind a configuration flag, so there is time to optimize before fully exposing.

Filed this #1588

…o gpu-timezone-non-repeating-transition

Signed-off-by: Navin Kumar <navink@nvidia.com>

ttnghia · 2023-11-21T22:29:36Z

src/main/cpp/src/timezones.cu

+#include <cudf/column/column.hpp>
+#include <cudf/column/column_device_view.cuh>
+#include <cudf/column/column_factories.hpp>
+#include <cudf/detail/null_mask.hpp>
+#include <cudf/lists/list_device_view.cuh>
+#include <cudf/lists/lists_column_device_view.cuh>
+#include <cudf/table/table.hpp>
+#include <cudf/types.hpp>
+#include <rmm/cuda_stream_view.hpp>
+#include <rmm/exec_policy.hpp>
+#include <thrust/binary_search.h>
+
+#include "timezones.hpp"


The headers should be grouped by a "near to far" order: local headers first, then cudf_test, then cudf/, then thrust, then rmm, finally C++ built-in. This applies for all C++ files.

Signed-off-by: Navin Kumar <navink@nvidia.com>

NVnavkumar · 2023-11-21T22:53:39Z

build

Signed-off-by: Navin Kumar <navink@nvidia.com>

NVnavkumar · 2023-11-21T22:59:08Z

build

src/main/cpp/src/timezones.cu

Signed-off-by: Navin Kumar <navink@nvidia.com>

NVnavkumar · 2023-11-21T23:23:13Z

build

Signed-off-by: Navin Kumar <navink@nvidia.com>

ttnghia · 2023-11-21T23:46:42Z

src/main/cpp/src/timezones.hpp

+#include <rmm/cuda_stream_view.hpp>
+
+#include <cstddef>
+


Suggested change

Signed-off-by: Navin Kumar <navink@nvidia.com>

NVnavkumar · 2023-11-21T23:49:16Z

build

NVnavkumar added 13 commits November 9, 2023 12:21

Semi-working kernel for timestamp timezone conversion

616e368

Signed-off-by: Navin Kumar <navink@nvidia.com>

Updated gtest with transition list

ac5aa18

Refactor tests to use transitions as fixture

8c53cce

Add more items in the column to test each transition

c87f7fc

Updated unit gtests for timezone kernel

558b882

Implementation of GpuTimeZoneDB with matching interface with CPU POC.

e33bb3a

Add minimal convert from UTC test

ca8502a

Signed-off-by: Navin Kumar <navink@nvidia.com>

Fix wrong offset bug in creating transition DB and update tests to sy…

3a22b6d

…nc with real timezone DB Signed-off-by: Navin Kumar <navink@nvidia.com>

Cleanup and sync test with CPP version.

10476bc

Signed-off-by: Navin Kumar <navink@nvidia.com>

Merge branch 'branch-23.12' into gpu-timezone-non-repeating-transition

2f0f32a

Fix bug that happens when we pass a timestamp on the transition time …

3094fe7

…exactly, switch to upper bound. Update tests for edge case. Signed-off-by: Navin Kumar <navink@nvidia.com>

Update timezone handling for convert to UTC and update tests

5b7f09e

Signed-off-by: Navin Kumar <navink@nvidia.com>

Internalize the daemon thread running to cache the timezone db

d78159a

Signed-off-by: Navin Kumar <navink@nvidia.com>

NVnavkumar mentioned this pull request Nov 16, 2023

Update timezone test framework to support both GPU and CPU POC NVIDIA/spark-rapids#9739

Closed

revans2 reviewed Nov 16, 2023

View reviewed changes

NVnavkumar added 6 commits November 16, 2023 10:37

Fix null pointer exception by creating the instance automatically

8b016b7

Signed-off-by: Navin Kumar <navink@nvidia.com>

Fix the visibility of these methods.

058c5cd

Signed-off-by: Navin Kumar <navink@nvidia.com>

Add comment to note the type of the column vector stored in the datab…

9d71bd4

…ase. Signed-off-by: Navin Kumar <navink@nvidia.com>

Remove the TIMESTAMP_DAYS code here.

21f7364

Signed-off-by: Navin Kumar <navink@nvidia.com>

Update this. I think the subtracting one second now doesn't make sens…

2c13b6d

…e, because the transition would still happen on that exact time. Signed-off-by: Navin Kumar <navink@nvidia.com>

Update tests to handle around the instant of transition.

7afbf1c

Signed-off-by: Navin Kumar <navink@nvidia.com>

NVnavkumar marked this pull request as ready for review November 16, 2023 22:38

NVnavkumar changed the title ~~[WIP] Custom kernel for converting timestamps b/n UTC and non-UTC non-DST timezones~~ Custom kernel for converting timestamps b/n UTC and non-UTC non-DST timezones Nov 16, 2023

revans2 reviewed Nov 16, 2023

View reviewed changes

NVnavkumar mentioned this pull request Nov 16, 2023

[FEA] Add retry to GPU timezone database caching operation #1570

Open

ttnghia reviewed Nov 16, 2023

View reviewed changes

ttnghia reviewed Nov 21, 2023

View reviewed changes

src/main/cpp/src/timezones.hpp Outdated Show resolved Hide resolved

ttnghia reviewed Nov 21, 2023

View reviewed changes

src/main/cpp/tests/timezones.cpp Outdated Show resolved Hide resolved

ttnghia reviewed Nov 21, 2023

View reviewed changes

src/main/cpp/tests/timezones.cpp Outdated Show resolved Hide resolved

ttnghia reviewed Nov 21, 2023

View reviewed changes

src/main/java/com/nvidia/spark/rapids/jni/GpuTimeZoneDB.java Outdated Show resolved Hide resolved

ttnghia reviewed Nov 21, 2023

View reviewed changes

src/main/java/com/nvidia/spark/rapids/jni/GpuTimeZoneDB.java Show resolved Hide resolved

ttnghia reviewed Nov 21, 2023

View reviewed changes

src/main/java/com/nvidia/spark/rapids/jni/GpuTimeZoneDB.java Outdated Show resolved Hide resolved

ttnghia reviewed Nov 21, 2023

View reviewed changes

NVnavkumar added 4 commits November 21, 2023 11:00

Merge branch 'branch-23.12' of github.com:NVIDIA/spark-rapids-jni int…

5588d06

…o gpu-timezone-non-repeating-transition

Address some feedback with some cleanup

a190226

Signed-off-by: Navin Kumar <navink@nvidia.com>

Refactor into template function

1b76f84

Signed-off-by: Navin Kumar <navink@nvidia.com>

Address more feedback by adding some aliases and cleanup

6b4a496

Signed-off-by: Navin Kumar <navink@nvidia.com>

ttnghia reviewed Nov 21, 2023

View reviewed changes

NVnavkumar added 2 commits November 21, 2023 14:44

Header reordering by near-to-far

7355eb4

Address some Java feedback

44ce6ae

Signed-off-by: Navin Kumar <navink@nvidia.com>

NVnavkumar mentioned this pull request Nov 21, 2023

[FEA] Cache transitions table for GpuTimeZoneDB to keep on the GPU throughout the query #1588

Open

Fix comment

cf0449c

Signed-off-by: Navin Kumar <navink@nvidia.com>

ttnghia reviewed Nov 21, 2023

View reviewed changes

src/main/cpp/src/timezones.cu Outdated Show resolved Hide resolved

fix formatting of JNI CPP file

965e92a

Signed-off-by: Navin Kumar <navink@nvidia.com>

Addressing more feedback, refactor again.

c3bfa14

Signed-off-by: Navin Kumar <navink@nvidia.com>

ttnghia reviewed Nov 21, 2023

View reviewed changes

src/main/cpp/src/timezones.hpp

#include <rmm/cuda_stream_view.hpp>

#include <cstddef>

Copy link

Collaborator

ttnghia Nov 21, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change

ttnghia previously approved these changes Nov 21, 2023

View reviewed changes

Pre-commit clang-format

a165262

Signed-off-by: Navin Kumar <navink@nvidia.com>

NVnavkumar dismissed ttnghia’s stale review via a165262 November 21, 2023 23:48

ttnghia approved these changes Nov 21, 2023

View reviewed changes

NVnavkumar merged commit 0fa5796 into NVIDIA:branch-23.12 Nov 22, 2023
2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Custom kernel for converting timestamps b/n UTC and non-UTC non-DST timezones #1553

Custom kernel for converting timestamps b/n UTC and non-UTC non-DST timezones #1553

NVnavkumar commented Nov 14, 2023 •

edited

Loading

NVnavkumar commented Nov 16, 2023

revans2 Nov 16, 2023

revans2 Nov 16, 2023

NVnavkumar Nov 17, 2023

revans2 Nov 16, 2023

revans2 Nov 16, 2023

NVnavkumar Nov 17, 2023

revans2 Nov 16, 2023

NVnavkumar Nov 16, 2023

NVnavkumar Nov 16, 2023

NVnavkumar Nov 16, 2023

revans2 Nov 16, 2023

revans2 Nov 16, 2023

revans2 Nov 16, 2023

NVnavkumar commented Nov 16, 2023

revans2 left a comment

ttnghia left a comment •

edited

Loading

ttnghia Nov 16, 2023

ttnghia Nov 16, 2023

ttnghia Nov 21, 2023

NVnavkumar Nov 21, 2023

ttnghia Nov 21, 2023 •

edited

Loading

NVnavkumar Nov 21, 2023

NVnavkumar Nov 21, 2023

NVnavkumar Nov 21, 2023

ttnghia Nov 21, 2023

NVnavkumar commented Nov 21, 2023

NVnavkumar commented Nov 21, 2023

NVnavkumar commented Nov 21, 2023

ttnghia Nov 21, 2023

NVnavkumar commented Nov 21, 2023

Custom kernel for converting timestamps b/n UTC and non-UTC non-DST timezones #1553

Custom kernel for converting timestamps b/n UTC and non-UTC non-DST timezones #1553

Conversation

NVnavkumar commented Nov 14, 2023 • edited Loading

NVnavkumar commented Nov 16, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

NVnavkumar commented Nov 16, 2023

revans2 left a comment

Choose a reason for hiding this comment

ttnghia left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ttnghia Nov 21, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

NVnavkumar commented Nov 21, 2023

NVnavkumar commented Nov 21, 2023

NVnavkumar commented Nov 21, 2023

Choose a reason for hiding this comment

NVnavkumar commented Nov 21, 2023

NVnavkumar commented Nov 14, 2023 •

edited

Loading

ttnghia left a comment •

edited

Loading

ttnghia Nov 21, 2023 •

edited

Loading