RFC: Revamp the repository modules structure #232

llucax · 2023-11-02T13:07:38Z

This is based in #231, so please only look at the last commit. Also it might be useful to just see the branch code instead of the diff: https://github.com/llucax/frequenz-channels-python/tree/structure-rfc/src/frequenz/channels

These are the most notable changes:

The _base_classes modules is split into the _receiver and _sender modules.
Sender and receiver exceptions are moved from the _exceptions module to the new _sender and _receiver modules.
The frequenz.channel package now only exposes the new _receiver and _sender modules, and the _exceptions and _select modules.
All channels and receiver modules are moved to the frequenz.channels package and made public.
All public nested classes were moved to the top level of the corresponding module.

Advantages of this structure:

It completely removes circular dependencies.
It avoids importing unnecessary code. In Python importing means code execution, so even when it is done only at startup, it adds some overhead.

Also by not importing unnecessary code, we can potentially add real optional dependencies. For example, if a project doesn't need to use a file watcher, they could avoid pulling the unnecessary awatch dependency. This is not done in this PR, but it could be done in the future.
By having the channels and receivers on their own module we can move public nested classes were moved to the top level of the corresponding module withough having to add superflous prefixes for support classes.
Removing nested classes avoids having to use hacky constructions, like requiring the use of from __future__ import annotations, types as strings (nested classes) and confusing the mkdocstrings tools when extracting and cross-linking docs.
The frequenz.channels package exposes all classes that are used once you have setted up your channels, so the importing should still be pretty terse in most cases and only frequenz.channels would need to be imported in modules only taking some receivers and iterating or selecting over them.
Makes docs easier to navigate/discover, as the Table of contents becomes much smaller for each module and the module list is shown in the left.

Old:

vs. New:

There is no need to import `typing.Deque` in newer Python versions. Signed-off-by: Leandro Lucarella <luca-frequenz@llucax.com>

We do this when the arguments could be easily mistake or hard to know what they are for in the call site, for example when the type is a `str` or `int`, and for most optional arguments. Signed-off-by: Leandro Lucarella <luca-frequenz@llucax.com>

There is no reason to have them of `Any` type, anybody can explicitly convert any object to `str` using `str(obj)` if they want to do so. Signed-off-by: Leandro Lucarella <luca-frequenz@llucax.com>

`anext()` wasn't available in older Python versions. Signed-off-by: Leandro Lucarella <luca-frequenz@llucax.com>

For the rest of the API we use full names instead of abbreviated names, so better to do the same here for consistency and clarity. Signed-off-by: Leandro Lucarella <luca-frequenz@llucax.com>

It might be useful to be able to test if a channel is closed or not. Signed-off-by: Leandro Lucarella <luca-frequenz@llucax.com>

Signed-off-by: Leandro Lucarella <luca-frequenz@llucax.com>

The `_limit` is already accesible via the `_deque`'s `maxsize`. Signed-off-by: Leandro Lucarella <luca-frequenz@llucax.com>

`maxsize` comes from `deque` and it is not very clear, and doesn't follow the snake_case convention. Also exposes the `limit` for `Anycast` as a (read-only) property. Signed-off-by: Leandro Lucarella <luca-frequenz@llucax.com>

This is only for debugging purposes, to show in the string representation and logs, and it is added to match other channels. Signed-off-by: Leandro Lucarella <luca-frequenz@llucax.com>

Since the `name` is only for debugging purposes, it shouldn't be required. If no `name` is specified, a name will be created based on the `id()` of the channel, so channels can be easily uniquely identified. Also makes the `name` accessible via a read-only property. Signed-off-by: Leandro Lucarella <luca-frequenz@llucax.com>

Signed-off-by: Leandro Lucarella <luca-frequenz@llucax.com>

Make the `latest` message accessible through a read-only property. Signed-off-by: Leandro Lucarella <luca-frequenz@llucax.com>

UUIDs are expensive to create and not really necessary in this case as we are just keeping a local list of unique objects, so using the object's hash is enough. We also use a `id(self)`-based name by default if a `name` was not provided when creating a `new_receiver()`. A simple test using `timeit` shows a 2 orders of magnitude improvement in Python 3.11: ```console $ python -m timeit -s "import uuid" "uuid.uuid4()" 100000 loops, best of 5: 2.67 usec per loop $ python -m timeit "hash(__name__)" 10000000 loops, best of 5: 19.6 nsec per loop $ python -m timeit "id(__name__)" 10000000 loops, best of 5: 20.1 nsec per loop ``` Signed-off-by: Leandro Lucarella <luca-frequenz@llucax.com>

This is only used to create the underlying `Broadcast` channel's names, so instead of using 2 separate strings, just use a plain `name` as with other channels. Also like with other channels, make the `name` optional and default to a more readable `id(self)` representation (using `_` separators). Signed-off-by: Leandro Lucarella <luca-frequenz@llucax.com>

Use underscore separators to format the `id(self)`. Also only use the default if `name` is `None`. Before an empty string would also be changed to the default, but if an user passed an empty string, it is better to leave it untouched. Signed-off-by: Leandro Lucarella <luca-frequenz@llucax.com>

When debugging and logging objects it is very useful to get a descriptive and clear string representation. The new representation uses the class name and the user defined name (if any) for `str` and a `repr` that tries to show how the class was created but also important internal state. Signed-off-by: Leandro Lucarella <luca-frequenz@llucax.com> # ------------------------ >8 ------------------------ # Do not modify or remove the line above. # Everything below it will be ignored. # # Conflicts: # src/frequenz/channels/_base_classes.py # src/frequenz/channels/_bidirectional.py

Now that we have nice string representations, it is no longer needed to build one every time we log something. Signed-off-by: Leandro Lucarella <luca-frequenz@llucax.com>

This is so it is closer to the other public attributes (properties) of the class. Also remove the mention to the default, as the default belongs to the `__init__` argument docs. Signed-off-by: Leandro Lucarella <luca-frequenz@llucax.com>

Signed-off-by: Leandro Lucarella <luca-frequenz@llucax.com>

llucax · 2023-11-02T13:14:38Z

BTW, I was thinking of merging (no pun intended) both Merge and MergeNamed into the merge module, but I didn't want to go for more changes in the RFC, we can do further fine-tuning if we agree to go with it.

These are the most notable changes: - The `_base_classes` modules is split into the `_receiver` and `_sender` modules. - Sender and receiver exceptions are moved from the `_exceptions` module to the new `_sender` and `_receiver` modules. - The `frequenz.channel` package now only exposes the new `_receiver` and `_sender` modules, and the `_exceptions` and `_select` modules. - All channels and receiver modules are moved to the `frequenz.channels` package and made public. - All public nested classes were moved to the top level of the corresponding module. Advantages of this structure: - It completely removes circular dependencies. - It avoids importing unnecessary code. In Python importing means code execution, so even when it is done only at startup, it adds some overhead. Also by not importing unnecessary code, we can potentially add real optional dependencies. For example, if a project doesn't need to use a file watcher, they could avoid pulling the unnecessary `awatch` dependency. This is not done in this PR, but it could be done in the future. - By having the channels and receivers on their own module we can move public nested classes were moved to the top level of the corresponding module withough having to add superflous prefixes for support classes. - Removing nested classes avoids having to use hacky constructions, like requiring the use of `from __future__ import annotations`, types as strings (nested classes) and confusing the `mkdocstrings` tools when extracting and cross-linking docs. - The `frequenz.channels` package exposes all classes that are used once you have setted up your channels, so the importing should still be pretty terse in most cases and only `frequenz.channels` would need to be imported in modules only taking some receivers and iterating or selecting over them. Signed-off-by: Leandro Lucarella <luca-frequenz@llucax.com>

llucax · 2023-11-06T11:12:03Z

So we discussed this and some people still preferred the more flat hierarchy, so we decided on the following:

Keep the _base_classes.py split.

Rationale: Avoid circular dependencies.
Move all symbols to the top-level except from timer and file_watcher which are in their own module.

Rationale:
- None are real channels not utilities to work generically on channels, they are more like message generators from input from the outside world.
- Having them in a separate module allows for making watchfiles an optional dependency.
- Having each on its own module avoid having to add prefixes to support classes (like file_watcher.Event -> FileWatcherEvent).
Alternatives: Keep both in the util or extensions package. Discarded because it will pull the watchfiles dependency even if you only want to use the Timer class. Also thinking about Google-style importsusing from frequenz.channels import extensions makes the module have a too generic name.

Although is not part of the restructuring itself, we also decided to:

Remove Bidirectional and Peakable

Rationale: They were created as a hack for use cases that are not needed anymore. Pending check that Peakable can really be removed in the SDK.
- Remove Peekable #233
- Remove Bidirectional #234

Future work:

Make timer and file_watcher separate python packages: frequenz-channels-timer and frequenz-channels-file-watcher. This way the pulling of dependencies is more explicit and also it is clear that they are not considered core components.

This might need support from repo-config, as we need to see how we generate docs, etc. so we leave it out of 1.0.

llucax · 2023-11-08T08:39:25Z

Superseded by #235.

llucax added 21 commits November 2, 2023 11:11

Use deque for typing

f355fd3

There is no need to import `typing.Deque` in newer Python versions. Signed-off-by: Leandro Lucarella <luca-frequenz@llucax.com>

Make some arguments keyword-only

f054789

We do this when the arguments could be easily mistake or hard to know what they are for in the call site, for example when the type is a `str` or `int`, and for most optional arguments. Signed-off-by: Leandro Lucarella <luca-frequenz@llucax.com>

Make exception messages str

e537ea8

There is no reason to have them of `Any` type, anybody can explicitly convert any object to `str` using `str(obj)` if they want to do so. Signed-off-by: Leandro Lucarella <luca-frequenz@llucax.com>

Replace __anext__() with anext()

6e680cf

`anext()` wasn't available in older Python versions. Signed-off-by: Leandro Lucarella <luca-frequenz@llucax.com>

Rename recv argument as receiver

c654ab7

For the rest of the API we use full names instead of abbreviated names, so better to do the same here for consistency and clarity. Signed-off-by: Leandro Lucarella <luca-frequenz@llucax.com>

Add is_closed property to channels

6c9b84f

It might be useful to be able to test if a channel is closed or not. Signed-off-by: Leandro Lucarella <luca-frequenz@llucax.com>

Add missing types to members in __init__()

be89c5c

Signed-off-by: Leandro Lucarella <luca-frequenz@llucax.com>

Anycast: Clarify what happens when the buffer is full

3a9952b

Signed-off-by: Leandro Lucarella <luca-frequenz@llucax.com>

Anycast: Don't store the _limit

5aded0e

The `_limit` is already accesible via the `_deque`'s `maxsize`. Signed-off-by: Leandro Lucarella <luca-frequenz@llucax.com>

Rename maxsize to limit

b525072

`maxsize` comes from `deque` and it is not very clear, and doesn't follow the snake_case convention. Also exposes the `limit` for `Anycast` as a (read-only) property. Signed-off-by: Leandro Lucarella <luca-frequenz@llucax.com>

Anycast: Add optional name argument

92122df

This is only for debugging purposes, to show in the string representation and logs, and it is added to match other channels. Signed-off-by: Leandro Lucarella <luca-frequenz@llucax.com>

Add name property to bidirectional

5380ebc

Signed-off-by: Leandro Lucarella <luca-frequenz@llucax.com>

Broadcast: Add the latest message as a property

2a03e51

Make the `latest` message accessible through a read-only property. Signed-off-by: Leandro Lucarella <luca-frequenz@llucax.com>

Use the string representation for logging

b2ea4fa

Now that we have nice string representations, it is no longer needed to build one every time we log something. Signed-off-by: Leandro Lucarella <luca-frequenz@llucax.com>

Update release notes

a698896

Signed-off-by: Leandro Lucarella <luca-frequenz@llucax.com>

github-actions bot added part:docs Affects the documentation part:tests Affects the unit, integration and performance (benchmarks) tests part:channels Affects channels implementation part:core Affects the core types (`Sender`, `Receiver`, exceptions, etc.) labels Nov 2, 2023

llucax force-pushed the structure-rfc branch from 3b28017 to f7880b8 Compare November 2, 2023 13:39

github-actions bot added the part:tooling Affects the development tooling (CI, deployment, dependency management, etc.) label Nov 2, 2023

llucax self-assigned this Nov 7, 2023

llucax closed this Nov 8, 2023

llucax added this to the v1.0.0 milestone Nov 8, 2023

llucax added the resolution:wontfix This will not be worked on label Nov 8, 2023

llucax deleted the structure-rfc branch January 16, 2024 08:23

llucax modified the milestones: v1.0.0, Dropped Jan 29, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RFC: Revamp the repository modules structure #232

RFC: Revamp the repository modules structure #232

llucax commented Nov 2, 2023 •

edited

Loading

llucax commented Nov 2, 2023

llucax commented Nov 6, 2023 •

edited

Loading

llucax commented Nov 8, 2023

RFC: Revamp the repository modules structure #232

RFC: Revamp the repository modules structure #232

Conversation

llucax commented Nov 2, 2023 • edited Loading

llucax commented Nov 2, 2023

llucax commented Nov 6, 2023 • edited Loading

llucax commented Nov 8, 2023

llucax commented Nov 2, 2023 •

edited

Loading

llucax commented Nov 6, 2023 •

edited

Loading