Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

What is Table Identifier? #341

Closed
kevinjqliu opened this issue Jan 31, 2024 · 3 comments
Closed

What is Table Identifier? #341

kevinjqliu opened this issue Jan 31, 2024 · 3 comments

Comments

@kevinjqliu
Copy link
Contributor

Question

In test_base.py the Table Identifier, TEST_TABLE_IDENTIFIER, is a tuple of 4 elements

TEST_TABLE_IDENTIFIER = ("com", "organization", "department", "my_table")

In most other places, the Table identifier is either a tuple of 2 elements (database_name, table_name) or a string with 2 parts, database_name.table_name.
See more examples

In #289, I wanted to implement the location for in-memory catalog the same way as other catalog implementations, by using the _resolve_table_location function.

location = self._resolve_table_location(location, database_name, table_name)

However, due to the Table Identifier being a 4-element tuple, I cannot parse the database name using identifier_to_database_and_table

database_name, table_name = self.identifier_to_database_and_table(identifier)

What is the proper spec of Table Identifier? And what part of it represents the database_name?

In Java implementation, TableIdentifier is made up of two parts, the namespace and the name, where
Namespace is a list of string.

@Fokko
Copy link
Contributor

Fokko commented Feb 1, 2024

Hey @kevinjqliu As you already concluded, most catalogs only use two parts, one namespace and the table name. The REST catalog supports hierarchical namespaces, and of course we can add it to the InMemory one as well.

What is the proper spec of Table Identifier? And what part of it represents the database_name?

The last element represents the table name, and everything in front of it are the name-spaces.

@kevinjqliu
Copy link
Contributor Author

kevinjqliu commented Feb 2, 2024

So if I understand correctly, TableIdentifier consists of the namespace and the table name. The namespace can be multiple parts, for example ("com"."apache"."iceberg") or "com.apache.iceberg". And table name is 1 word, i.e. "foo".

If this is the case, is "database name" part of the namespace and possibly the last element of the namespace?
There are a few references to "database name" in the codebase and I want to understand the relation.
Particularly, the _resolve_table_location function exposed by catalog/__init__.py

And a followup question,

The REST catalog supports hierarchical namespaces

What is the definition of "hierarchical namespaces" here?

@kevinjqliu
Copy link
Contributor Author

Answering my own questions above.

A TableIdentifier is in the form of (namespace, ..., database_name, table_name). table_name is the last element of the tuple. database_name is the second to last element of the tuple. everything else is part of the namespace name.
TableIdentifier can be represented as either tuple or a single string separated by .

A namespace can be "hierarchical" because it has multiple levels, such as (foo) or (foo, bar) or (foo, bar, baz).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants