Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add native split_part to the cross db utils #299

Closed
dave-connors-3 opened this issue Mar 20, 2023 · 2 comments
Closed

add native split_part to the cross db utils #299

dave-connors-3 opened this issue Mar 20, 2023 · 2 comments
Labels
enhancement New feature or request Stale

Comments

@dave-connors-3
Copy link
Contributor

Describe the feature

Right now, dbt offers native support for any cross-db macros, including split_part. The implementation in dbt-spark right now is fairly complex, given that there's not native support for split_part in spark. This gets especially complex for negative arguments to the function, which relies on some heavy string parsing to properly translate a negative arg into the appropriate index in the array. Given that databricks natively supports this function including negative part_numbers, it may be a good idea to implement here to avoid this complex logic.

Describe alternatives you've considered

Do nothing, inherit from dbt-spark

Additional context

Please include any other relevant context here.

Who will this benefit?

package maintainers like me!

Are you interested in contributing this feature?

For sure

@dave-connors-3 dave-connors-3 added the enhancement New feature or request label Mar 20, 2023
@dbeatty10
Copy link
Contributor

dbt-labs/dbt-spark #689 added support for negative part_number argument to dbt-spark. Like @dave-connors-3 mentioned, dbt-databricks will just inherit this logic, so it is not strictly necessary to override it.

But providing a native implementation might be as simple as copy-pasting the logic from the default implementation:

dbt/include/databricks/macros/utils/split_part.sql

{% macro databricks__split_part(string_text, delimiter_text, part_number) %}

    split_part(
        {{ string_text }},
        {{ delimiter_text }},
        {{ part_number }}
        )

{%- endmacro %}

Or just inherit the default from dbt-core:

{% macro databricks__split_part(string_text, delimiter_text, part_number) %}

    {{ dbt.default__split_part(string_text, delimiter_text, part_number) }}

{%- endmacro %}

⚠️ Caveat

I didn't check one way or the other if the semantics of split_part in Databricks would necessitate bifurcated logic like in here.

This inherited test in dbt-databricks includes a negative test case as-of dbt-core v1.6, so it should catch cases where any new implementation is off.

Copy link

github-actions bot commented Jan 8, 2024

This issue has been marked as Stale because it has been open for 180 days with no activity. If you would like the issue to remain open, please remove the stale label or comment on the issue.

@github-actions github-actions bot added the Stale label Jan 8, 2024
@benc-db benc-db closed this as not planned Won't fix, can't repro, duplicate, stale Jan 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request Stale
Projects
None yet
Development

No branches or pull requests

3 participants