Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve definition of dependency table entries #420

Merged
merged 6 commits into from
Aug 14, 2024
Merged

Improve definition of dependency table entries #420

merged 6 commits into from
Aug 14, 2024

Conversation

hagenw
Copy link
Member

@hagenw hagenw commented May 28, 2024

Closes #419

The following improvements are done to all settings inside audb.core.define that are related to the dependency table:

  • Move all related entries under the # Dependencies section
  • Use the consistent wording DEPENDENCY in all related variables
  • Add docstrings to all related variables
  • Use a dictionary to define column name -> dtypes mappings
  • Use a dictionary to define dependency type name -> dependency type integer mappings

The idea of the previous implementation was to have a central place, where we can change a name if needed.
E.g. define.DependField.ARCHIVE = "archive" could be changed to define.DependField.ARCHIVE = "zipfile",
without the need to change it at any other place of the code. In principle, this makes sense, but with the dependency table, we are anyway not allowed to change those names, as they are also stored inside the dependency table as column names.
In my opinion, this means, we also don't have to ensure that we have a central place to define the name,
and can write "archive" instead of define.DependField.ARCHIVE, which also improves readability of the code. And allows us to specify the dictionaries, introduced in this pull request.

@hagenw hagenw marked this pull request as draft May 28, 2024 10:01
@hagenw
Copy link
Member Author

hagenw commented Jun 4, 2024

This is an example implementation, how we might solve #419. @ChristianGeng if you have a better idea, feel free to open a different pull request, or add your comments here.

@hagenw hagenw marked this pull request as ready for review August 14, 2024 09:25
Copy link
Member

@ChristianGeng ChristianGeng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • The old DEPEND_TYPE_NAMES is not needed any more and can be hardcoded to

"meta", "media" etc. without a loss in generality.

  • I personally like moving DEPENDENCIES -> DEPENDENCY to singular. There seems to be no clear convention on that topic, neither in python pep nor in the SQL community. Some people recommend to use plural for tables, and singular for columns in order to distinguish between te two. But even if it does not help to disambiguate I like reading it better

  • Representing DEPENDENCY_TABLE as a dict is very fortunate: like this is looks very much like a table scheme. In addition, in case that we decide to move to polars one day, this will rather simplify things, as polars uses a dict for data frame scheme definition. Saying that, polars uses a collections.OrderedDict. I do not see the advantage of that, as all newer python version will respect the order of k,v creation and not return random permutations of key names. So using a dict I take it is fine. Also the dict stresses the data structure nature in comparison to principally unrelated attributes.

  • The additional docstrings I find come in handy too. They are not intended to be part of the sphinx pages, but are great in their functions as reminders

    Therefore I would simply approve this.

@hagenw hagenw merged commit 99eb738 into main Aug 14, 2024
8 checks passed
@hagenw hagenw deleted the improve-define branch August 14, 2024 12:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Improve definition of dependency table column names and dtypes
2 participants