-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added support for generic types in SqlBackend
#272
Conversation
This PR adds the ability to use rich dataclasses like: ```python @DataClass class Foo: first: str second: bool | None @DataClass class Nested: foo: Foo mapping: dict[str, int] array: list[int] ```
❌ 34/35 passed, 1 failed, 3 skipped, 38m35s total ❌ test_appends_complex_types: databricks.sdk.errors.platform.BadRequest: [INSUFFICIENT_PERMISSIONS] Insufficient privileges: (2.218s)
Running from acceptance #383 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have some comments, no implementation changes. Please add an integration test showing that the tables can be created for a dataclass with complex types
prefix = f"{prefix}: " | ||
raise StructInferError(f"{prefix}unsupported type: {type_ref.__name__}") | ||
|
||
def _infer_nullable(self, type_ref: type, path: list[str]) -> SqlType: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we infer a union type here, but we only support the nullable as outcome:
def _infer_nullable(self, type_ref: type, path: list[str]) -> SqlType: | |
def _infer_union(self, type_ref: type, path: list[str]) -> SqlType: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we're using spark types here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Checking my observations so far.
if isinstance(value, float): | ||
return f"{value:0.2f}" | ||
if isinstance(value, str): | ||
value = str(value).replace("'", "''") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This always makes me nervous, because with most flavours of SQL there are esoteric and unexpected ways of escaping and we find ourselves by the coffee machine talking about little bobby tables.
I suspect that \
also needs to be escaped. There might be more. 😕
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
further PRs ;)
return ", ".join(data) | ||
|
||
@classmethod | ||
def _value_to_sql(cls, value: Any) -> str: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand this approach, and that it's not being introduced in this PR but is rather the existing way things work. However I would like to note that this approach does leave itself open to type confusion because the schema (and types) depends on the typing hints whereas the control flow for conversion depends on the values and we never verify that the values have the expected types.
Is this harmless in this context? (Can it be abused? I'm not sure. Because the context is kind of open, this is hard to verify and that often makes people nervous. But this is why most modern serde frameworks use the schema as the blueprint during serialisation and deserialisation rather than the relying on or trusting the runtime types of values.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PR is welcome ;)
if isinstance(value, list): | ||
values = ", ".join(cls._value_to_sql(v) for v in value) | ||
return f"ARRAY({values})" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this allow tuples?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we don't support tuples
if isinstance(value, list): | ||
values = ", ".join(cls._value_to_sql(v) for v in value) | ||
return f"ARRAY({values})" | ||
if isinstance(value, dict): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As with list
should we allow non-dict mappings here? (Named tuples and mapping-proxy spring to mind.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no, because we can't infer types from those just yet
if type_ref is set: | ||
raise StructInferError("Cannot determine element type of set. Rewrite as: set[XXX]") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think the rest of the code supports sets?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
apparently, we do support them - (set[str], "ARRAY<STRING>"),
passed without any modifications
* Added Functionality to export any dashboards-as-code into CSV ([#269](#269)). The `DashboardMetadata` class now includes a new method, `export_to_zipped_csv`, which enables exporting any dashboard as CSV files in a ZIP archive. This method accepts `sql_backend` and `export_path` as parameters and exports dashboard queries to CSV files in the specified ZIP archive by iterating through tiles and fetching dashboard queries if the tile is a query. To ensure the proper functioning of this feature, unit tests and manual testing have been conducted. A new test, `test_dashboards_export_to_zipped_csv`, has been added to verify the correct export of dashboard data to a CSV file. * Added support for generic types in `SqlBackend` ([#272](#272)). In this release, we've added support for using rich dataclasses, including those with optional and generic types, in the `SqlBackend` of the `StatementExecutionBackend` class. The new functionality is demonstrated in the `test_supports_complex_types` unit test, which creates a `Nested` dataclass containing various complex data types, such as nested dataclasses, `datetime` objects, `dict`, `list`, and optional fields. This enhancement is achieved by updating the `save_table` method to handle the conversion of complex dataclasses to SQL statements. To facilitate type inference, we've introduced a new `StructInference` class that converts Python dataclasses and built-in types to their corresponding SQL Data Definition Language (DDL) representations. This addition simplifies data definition and manipulation operations while maintaining type safety and compatibility with various SQL data types.
* Added Functionality to export any dashboards-as-code into CSV ([#269](#269)). The `DashboardMetadata` class now includes a new method, `export_to_zipped_csv`, which enables exporting any dashboard as CSV files in a ZIP archive. This method accepts `sql_backend` and `export_path` as parameters and exports dashboard queries to CSV files in the specified ZIP archive by iterating through tiles and fetching dashboard queries if the tile is a query. To ensure the proper functioning of this feature, unit tests and manual testing have been conducted. A new test, `test_dashboards_export_to_zipped_csv`, has been added to verify the correct export of dashboard data to a CSV file. * Added support for generic types in `SqlBackend` ([#272](#272)). In this release, we've added support for using rich dataclasses, including those with optional and generic types, in the `SqlBackend` of the `StatementExecutionBackend` class. The new functionality is demonstrated in the `test_supports_complex_types` unit test, which creates a `Nested` dataclass containing various complex data types, such as nested dataclasses, `datetime` objects, `dict`, `list`, and optional fields. This enhancement is achieved by updating the `save_table` method to handle the conversion of complex dataclasses to SQL statements. To facilitate type inference, we've introduced a new `StructInference` class that converts Python dataclasses and built-in types to their corresponding SQL Data Definition Language (DDL) representations. This addition simplifies data definition and manipulation operations while maintaining type safety and compatibility with various SQL data types.
This PR adds the ability to use rich dataclasses like: