Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added support for generic types in SqlBackend #272

Merged
merged 2 commits into from
Sep 11, 2024
Merged

Added support for generic types in SqlBackend #272

merged 2 commits into from
Sep 11, 2024

Conversation

nfx
Copy link
Contributor

@nfx nfx commented Sep 11, 2024

This PR adds the ability to use rich dataclasses like:

@dataclass
class Foo:
    first: str
    second: bool | None

@dataclass
class Nested:
    foo: Foo
    mapping: dict[str, int]
    array: list[int]

This PR adds the ability to use rich dataclasses like:

```python
@DataClass
class Foo:
    first: str
    second: bool | None

@DataClass
class Nested:
    foo: Foo
    mapping: dict[str, int]
    array: list[int]
```
Copy link

github-actions bot commented Sep 11, 2024

❌ 34/35 passed, 1 failed, 3 skipped, 38m35s total

❌ test_appends_complex_types: databricks.sdk.errors.platform.BadRequest: [INSUFFICIENT_PERMISSIONS] Insufficient privileges: (2.218s)
databricks.sdk.errors.platform.BadRequest: [INSUFFICIENT_PERMISSIONS] Insufficient privileges:
User does not have permission CREATE,USAGE on database `TEST_SCHEMA`. SQLSTATE: 42501
16:18 DEBUG [databricks.sdk] Loaded from environment
16:18 DEBUG [databricks.sdk] Ignoring pat auth, because metadata-service is preferred
16:18 DEBUG [databricks.sdk] Ignoring basic auth, because metadata-service is preferred
16:18 DEBUG [databricks.sdk] Attempting to configure auth: metadata-service
16:18 INFO [databricks.sdk] Using Databricks Metadata Service authentication
[gw9] linux -- Python 3.10.14 /home/runner/work/lsql/lsql/.venv/bin/python
16:18 DEBUG [databricks.sdk] Loaded from environment
16:18 DEBUG [databricks.sdk] Ignoring pat auth, because metadata-service is preferred
16:18 DEBUG [databricks.sdk] Ignoring basic auth, because metadata-service is preferred
16:18 DEBUG [databricks.sdk] Attempting to configure auth: metadata-service
16:18 INFO [databricks.sdk] Using Databricks Metadata Service authentication
16:18 DEBUG [databricks.labs.lsql.backends] [api][execute] CREATE TABLE IF NOT EXISTS hive_metastore.TEST_SCHEMA.txrOb (foo STRUCT<first:STRING,second:BOOLEAN>... (134 more bytes)
16:18 DEBUG [databricks.labs.lsql.core] Executing SQL statement: CREATE TABLE IF NOT EXISTS hive_metastore.TEST_SCHEMA.txrOb (foo STRUCT<first:STRING,second:BOOLEAN> NOT NULL, since DATE NOT NULL, created TIMESTAMP NOT NULL, mapping MAP<STRING,LONG> NOT NULL, array ARRAY<LONG> NOT NULL) USING DELTA
16:18 DEBUG [databricks.sdk] POST /api/2.0/sql/statements/
> {
>   "format": "JSON_ARRAY",
>   "statement": "CREATE TABLE IF NOT EXISTS hive_metastore.TEST_SCHEMA.txrOb (foo STRUCT<first:STRING,second:BOOLEAN>... (134 more bytes)",
>   "warehouse_id": "TEST_DEFAULT_WAREHOUSE_ID"
> }
< 200 OK
< {
<   "statement_id": "01ef7059-863c-1703-bf7a-63d64fd1bd7a",
<   "status": {
<     "error": {
<       "error_code": "BAD_REQUEST",
<       "message": "[INSUFFICIENT_PERMISSIONS] Insufficient privileges:\nUser does not have permission CREATE,USAGE o... (37 more bytes)"
<     },
<     "state": "FAILED"
<   }
< }
16:18 DEBUG [databricks.sdk] Loaded from environment
16:18 DEBUG [databricks.sdk] Ignoring pat auth, because metadata-service is preferred
16:18 DEBUG [databricks.sdk] Ignoring basic auth, because metadata-service is preferred
16:18 DEBUG [databricks.sdk] Attempting to configure auth: metadata-service
16:18 INFO [databricks.sdk] Using Databricks Metadata Service authentication
16:18 DEBUG [databricks.labs.lsql.backends] [api][execute] CREATE TABLE IF NOT EXISTS hive_metastore.TEST_SCHEMA.txrOb (foo STRUCT<first:STRING,second:BOOLEAN>... (134 more bytes)
16:18 DEBUG [databricks.labs.lsql.core] Executing SQL statement: CREATE TABLE IF NOT EXISTS hive_metastore.TEST_SCHEMA.txrOb (foo STRUCT<first:STRING,second:BOOLEAN> NOT NULL, since DATE NOT NULL, created TIMESTAMP NOT NULL, mapping MAP<STRING,LONG> NOT NULL, array ARRAY<LONG> NOT NULL) USING DELTA
16:18 DEBUG [databricks.sdk] POST /api/2.0/sql/statements/
> {
>   "format": "JSON_ARRAY",
>   "statement": "CREATE TABLE IF NOT EXISTS hive_metastore.TEST_SCHEMA.txrOb (foo STRUCT<first:STRING,second:BOOLEAN>... (134 more bytes)",
>   "warehouse_id": "TEST_DEFAULT_WAREHOUSE_ID"
> }
< 200 OK
< {
<   "statement_id": "01ef7059-863c-1703-bf7a-63d64fd1bd7a",
<   "status": {
<     "error": {
<       "error_code": "BAD_REQUEST",
<       "message": "[INSUFFICIENT_PERMISSIONS] Insufficient privileges:\nUser does not have permission CREATE,USAGE o... (37 more bytes)"
<     },
<     "state": "FAILED"
<   }
< }
[gw9] linux -- Python 3.10.14 /home/runner/work/lsql/lsql/.venv/bin/python

Running from acceptance #383

Copy link
Contributor

@JCZuurmond JCZuurmond left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have some comments, no implementation changes. Please add an integration test showing that the tables can be created for a dataclass with complex types

src/databricks/labs/lsql/structs.py Show resolved Hide resolved
src/databricks/labs/lsql/structs.py Outdated Show resolved Hide resolved
prefix = f"{prefix}: "
raise StructInferError(f"{prefix}unsupported type: {type_ref.__name__}")

def _infer_nullable(self, type_ref: type, path: list[str]) -> SqlType:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we infer a union type here, but we only support the nullable as outcome:

Suggested change
def _infer_nullable(self, type_ref: type, path: list[str]) -> SqlType:
def _infer_union(self, type_ref: type, path: list[str]) -> SqlType:

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we're using spark types here

tests/unit/test_backends.py Outdated Show resolved Hide resolved
tests/unit/test_structs.py Outdated Show resolved Hide resolved
tests/unit/test_structs.py Show resolved Hide resolved
src/databricks/labs/lsql/backends.py Show resolved Hide resolved
Copy link
Contributor

@asnare asnare left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Checking my observations so far.

src/databricks/labs/lsql/backends.py Outdated Show resolved Hide resolved
src/databricks/labs/lsql/backends.py Outdated Show resolved Hide resolved
src/databricks/labs/lsql/backends.py Outdated Show resolved Hide resolved
if isinstance(value, float):
return f"{value:0.2f}"
if isinstance(value, str):
value = str(value).replace("'", "''")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This always makes me nervous, because with most flavours of SQL there are esoteric and unexpected ways of escaping and we find ourselves by the coffee machine talking about little bobby tables.

I suspect that \ also needs to be escaped. There might be more. 😕

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

further PRs ;)

return ", ".join(data)

@classmethod
def _value_to_sql(cls, value: Any) -> str:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand this approach, and that it's not being introduced in this PR but is rather the existing way things work. However I would like to note that this approach does leave itself open to type confusion because the schema (and types) depends on the typing hints whereas the control flow for conversion depends on the values and we never verify that the values have the expected types.

Is this harmless in this context? (Can it be abused? I'm not sure. Because the context is kind of open, this is hard to verify and that often makes people nervous. But this is why most modern serde frameworks use the schema as the blueprint during serialisation and deserialisation rather than the relying on or trusting the runtime types of values.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR is welcome ;)

Comment on lines +179 to +181
if isinstance(value, list):
values = ", ".join(cls._value_to_sql(v) for v in value)
return f"ARRAY({values})"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this allow tuples?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we don't support tuples

if isinstance(value, list):
values = ", ".join(cls._value_to_sql(v) for v in value)
return f"ARRAY({values})"
if isinstance(value, dict):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As with list should we allow non-dict mappings here? (Named tuples and mapping-proxy spring to mind.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no, because we can't infer types from those just yet

Comment on lines +108 to +109
if type_ref is set:
raise StructInferError("Cannot determine element type of set. Rewrite as: set[XXX]")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think the rest of the code supports sets?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

apparently, we do support them - (set[str], "ARRAY<STRING>"), passed without any modifications

tests/unit/test_structs.py Show resolved Hide resolved
@nfx nfx merged commit fe7bf4d into main Sep 11, 2024
6 of 8 checks passed
@nfx nfx deleted the feat/nested branch September 11, 2024 16:27
nfx added a commit that referenced this pull request Sep 11, 2024
* Added Functionality to export any dashboards-as-code into CSV ([#269](#269)). The `DashboardMetadata` class now includes a new method, `export_to_zipped_csv`, which enables exporting any dashboard as CSV files in a ZIP archive. This method accepts `sql_backend` and `export_path` as parameters and exports dashboard queries to CSV files in the specified ZIP archive by iterating through tiles and fetching dashboard queries if the tile is a query. To ensure the proper functioning of this feature, unit tests and manual testing have been conducted. A new test, `test_dashboards_export_to_zipped_csv`, has been added to verify the correct export of dashboard data to a CSV file.
* Added support for generic types in `SqlBackend` ([#272](#272)). In this release, we've added support for using rich dataclasses, including those with optional and generic types, in the `SqlBackend` of the `StatementExecutionBackend` class. The new functionality is demonstrated in the `test_supports_complex_types` unit test, which creates a `Nested` dataclass containing various complex data types, such as nested dataclasses, `datetime` objects, `dict`, `list`, and optional fields. This enhancement is achieved by updating the `save_table` method to handle the conversion of complex dataclasses to SQL statements. To facilitate type inference, we've introduced a new `StructInference` class that converts Python dataclasses and built-in types to their corresponding SQL Data Definition Language (DDL) representations. This addition simplifies data definition and manipulation operations while maintaining type safety and compatibility with various SQL data types.
@nfx nfx mentioned this pull request Sep 11, 2024
nfx added a commit that referenced this pull request Sep 11, 2024
* Added Functionality to export any dashboards-as-code into CSV
([#269](#269)). The
`DashboardMetadata` class now includes a new method,
`export_to_zipped_csv`, which enables exporting any dashboard as CSV
files in a ZIP archive. This method accepts `sql_backend` and
`export_path` as parameters and exports dashboard queries to CSV files
in the specified ZIP archive by iterating through tiles and fetching
dashboard queries if the tile is a query. To ensure the proper
functioning of this feature, unit tests and manual testing have been
conducted. A new test, `test_dashboards_export_to_zipped_csv`, has been
added to verify the correct export of dashboard data to a CSV file.
* Added support for generic types in `SqlBackend`
([#272](#272)). In this
release, we've added support for using rich dataclasses, including those
with optional and generic types, in the `SqlBackend` of the
`StatementExecutionBackend` class. The new functionality is demonstrated
in the `test_supports_complex_types` unit test, which creates a `Nested`
dataclass containing various complex data types, such as nested
dataclasses, `datetime` objects, `dict`, `list`, and optional fields.
This enhancement is achieved by updating the `save_table` method to
handle the conversion of complex dataclasses to SQL statements. To
facilitate type inference, we've introduced a new `StructInference`
class that converts Python dataclasses and built-in types to their
corresponding SQL Data Definition Language (DDL) representations. This
addition simplifies data definition and manipulation operations while
maintaining type safety and compatibility with various SQL data types.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants