Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve definition of dependency table column names and dtypes #419

Closed
hagenw opened this issue May 28, 2024 · 0 comments · Fixed by #420
Closed

Improve definition of dependency table column names and dtypes #419

hagenw opened this issue May 28, 2024 · 0 comments · Fixed by #420
Labels
enhancement New feature or request

Comments

@hagenw
Copy link
Member

hagenw commented May 28, 2024

As stated in #416 (review), our current definition of the column names and dtypes of the dependency table is done not in a nice way:

audb/audb/core/define.py

Lines 23 to 64 in 44df511

class DependField:
r"""Fields stored in dependency table."""
ARCHIVE = 0
BIT_DEPTH = 1
CHANNELS = 2
CHECKSUM = 3
DURATION = 4
FORMAT = 5
REMOVED = 6
SAMPLING_RATE = 7
TYPE = 8
VERSION = 9
DEPEND_FIELD_NAMES = {
DependField.ARCHIVE: "archive",
DependField.BIT_DEPTH: "bit_depth",
DependField.CHANNELS: "channels",
DependField.CHECKSUM: "checksum",
DependField.DURATION: "duration",
DependField.FORMAT: "format",
DependField.REMOVED: "removed",
DependField.SAMPLING_RATE: "sampling_rate",
DependField.TYPE: "type",
DependField.VERSION: "version",
}
DEPEND_FIELD_DTYPES = {
DependField.ARCHIVE: "string[pyarrow]",
DependField.BIT_DEPTH: "int32[pyarrow]",
DependField.CHANNELS: "int32[pyarrow]",
DependField.CHECKSUM: "string[pyarrow]",
DependField.DURATION: "float64[pyarrow]",
DependField.FORMAT: "string[pyarrow]",
DependField.REMOVED: "int32[pyarrow]",
DependField.SAMPLING_RATE: "int32[pyarrow]",
DependField.TYPE: "int32[pyarrow]",
DependField.VERSION: "string[pyarrow]",
}
DEPEND_INDEX_DTYPE = "object"

I guess, the main motivation behind the current structure was to make it similar to how DependType and DEPEND_TYPE_NAMES is defined. But I would rather also change those definition, than staying with the current solution.

One possible solution would be to create a single dictionary, that stores a mapping from column name to dtype.
The only thing, not directly fitting is the DEPEND_INDEX_DTYPE, as the index is a special column, and we need to see, if we can add it to the dictionary or not.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant