Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[feature request] easier API to set table properties #502

Closed
kevinjqliu opened this issue Mar 6, 2024 · 6 comments · Fixed by #503
Closed

[feature request] easier API to set table properties #502

kevinjqliu opened this issue Mar 6, 2024 · 6 comments · Fixed by #503

Comments

@kevinjqliu
Copy link
Contributor

Feature Request / Improvement

Background

Currently, there are two ways to set table properties.

  1. in create_table, pass in properties as dictionary

    table = catalog.create_table(random_identifier, table_schema_nested, properties={"format-version": "1"})

  2. using table set_properties transaction, pass in as key/value pair

    with table.transaction() as transaction:
    transaction.set_properties(abc="def")

With the introduction of TableProperties, it would be great to make overriding table properties easier.
With the current methods, we can only override table properties during create_table since this is not valid python:

with table.transaction() as transaction: 
    transaction.set_properties(TableProperties.PARQUET_ROW_GROUP_SIZE_BYTES=1000) 

Proposed solution

Allow set_properties to take in a dictionary. And also allow the dictionary value to be non-string type (similar to #469)

with table.transaction() as transaction: 
    transaction.set_properties({TableProperties.PARQUET_ROW_GROUP_SIZE_BYTES: 1000}) 
@kevinjqliu
Copy link
Contributor Author

Another workaround can be to use

with tbl.transaction() as transaction:
    transaction._apply((SetPropertiesUpdate(updates=properties),))

@sungwy
Copy link
Collaborator

sungwy commented Mar 7, 2024

I think this is a great idea @kevinjqliu . Most of the table properties have . in the key anyways, and hence we can't pass them as kwargs anyways. Changing the input parameter to a dictionary feels a lot more straightforward.

@HonahX
Copy link
Contributor

HonahX commented Mar 7, 2024

Hi @kevinjqliu @syun64 I think currently we can also set properties using dictionaries, just need ** in front of the dictionary:

properties: Dict[str, str] = {...}
with table.transaction() as transaction: 
    transaction.set_properties(**properties) 

This also works for properties having . in the key.

We do similar thing for catalog properties: load_catalog, example passing in glue session properties

Do you think the above way is good enough or the ** thing is still not very intuitive for users? If we finally choose to change the input parameter to dict, shall we also do the same for catalog?

@sungwy
Copy link
Collaborator

sungwy commented Mar 7, 2024

@HonahX - yes, that's right.

We can currently achieve functionally the same thing by providing the properties as **properties into the function call. I guess my point is that, if most of the properties have special characters in them so that they must be passed in as dictionary key values anyways, wouldn't it feel more straightforward for the users if we defined properties as a dictionary type in the first place?

Or does it still make sense to keep it as it is because something like below is possible:

load_catalog( "my_catalog", type="rest", uri="http://localhost:8181")

or

load_catalog( "my_catalog", type="rest", uri="http://localhost:8181", **{
            "s3.endpoint": "http://localhost:9000",
            "s3.access-key-id": "admin",
            "s3.secret-access-key": "password",
        }

Where users would appreciate the ability to pass in simpler catalog properties like type and uri as simple kwargs?

@kevinjqliu
Copy link
Contributor Author

oh wow, thanks @HonahX. I didn't think to pass the properties dictionary as kwargs.

Do you think the above way is good enough or the ** thing is still not very intuitive for users?

I feel like its hard to make the connection from dictionary to kwargs. And having the ability to pass the dictionary feels more
intuitive. #503 makes it possible to pass in a dictionary to set_properties.

If we finally choose to change the input parameter to dict, shall we also do the same for catalog?

I'd +1 that change. We might have to support both dictionary and kwargs to keep it backwards compatible, similar to #503

if most of the properties have special characters in them so that they must be passed in as dictionary key values anyways, wouldn't it feel more straightforward for the users if we defined properties as a dictionary type in the first place?

+1

@HonahX
Copy link
Contributor

HonahX commented Mar 8, 2024

Thanks @syun64 @kevinjqliu for the thoughts and explanation. Since users seem to benefit from dictionaries and kwargs, I think it is good idea to support both (as in #503 )

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants