Skip to content
This repository has been archived by the owner on Apr 26, 2024. It is now read-only.

Refactor database helper functions so they can be checked by mypy #14002

Open
DMRobertson opened this issue Sep 30, 2022 · 4 comments
Open

Refactor database helper functions so they can be checked by mypy #14002

DMRobertson opened this issue Sep 30, 2022 · 4 comments
Labels
A-Database DB stuff like queries, migrations, new/remove columns, indexes, unexpected entries in the db T-Task Refactoring, removal, replacement, enabling or disabling functionality, other engineering tasks. Z-Dev-Wishlist Makes developers' lives better, but doesn't have direct user impact

Comments

@DMRobertson
Copy link
Contributor

DMRobertson commented Sep 30, 2022

This issue contains notes on a type annotation problem I spent some time thinking about. Comments are welcome.

Background

In #13892 I made a mistake where I tried invoke a function f using runInteraction but gave it the arguments for another function g with a different signature. (This is the diff that corrected it.) I was surprised that mypy didn't catch it, since we've got better coverage of the source code and we make more use of mypy.

Problem

Annotating some of these DB helper functions with ParamSpec doesn't work. E.g. given

async def runWithConnection(
self,
func: Callable[..., R],
*args: Any,
db_autocommit: bool = False,
isolation_level: Optional[int] = None,
**kwargs: Any,
) -> R:

and the place where we use func

db_conn = LoggingDatabaseConnection(
conn, self.engine, "runWithConnection"
)
return func(db_conn, *args, **kwargs)

the annotations to use would be

func: Callable[Concatenate[LoggingDatabaseConnection, P], R]
args: P.args
kwargs: P.kwargs

But these don't work:

synapse/storage/database.py:887: error: Name "P.args" is not defined  [name-defined]
synapse/storage/database.py:890: error: Name "P.kwargs" is not defined  [name-defined]

Mypy's error message is crap here. I think what's going on here is buried in PEP 612:

Note that this also why we have to reject signatures of the form (*args: P.args, s: str, **kwargs: P.kwargs)

There are some examples---I find it tough to follow---but in short, I think you can only have *P.args and **P.kwargs directly adjacent in a function signature. Anything after *args, like s above, has to be a keyword-only argument. But what happens if func accepts a keyword parameter called s too? Ambiguity and pain.

For an simplified example, see here.

Proposal

If we want to have mypy check these helper functions, I think their last three arguments need to be func, *args and **kwargs; everything else needs to be a mandatory(?) position parameter before func. The downside of doing this is that every call site needs to explicitly list out a bunch of mandatory arguments which weren't mandatory before.

new_transaction is an example of such a function taking its arguments in this way:

def new_transaction(
self,
conn: LoggingDatabaseConnection,
desc: str,
after_callbacks: List[_CallbackListEntry],
async_after_callbacks: List[_AsyncCallbackListEntry],
exception_callbacks: List[_CallbackListEntry],
func: Callable[Concatenate[LoggingTransaction, P], R],
*args: P.args,
**kwargs: P.kwargs,
) -> R:

(though its call sites are quite verbose).

One way we could avoid the verbosity is to provide "simple" and "advanced" versions of a function. Take runInteraction for example.

async def runInteraction(
self,
desc: str,
func: Callable[..., R],
*args: Any,
db_autocommit: bool = False,
isolation_level: Optional[int] = None,
**kwargs: Any,
) -> R:

The majority of its call sites don't specify db_autocommit or isolation_level values. Maybe we could split the function up into

    async def run_interaction_simple(
        self,
        desc: str,
        func: Callable[..., R],
        *args: Any,
        **kwargs: Any,
    ) -> R:
        ...

    async def run_interaction_advanced(
        self,
        desc: str,
        db_autocommit: bool,
        isolation_level: Optional[int],
        func: Callable[..., R],
        *args: Any,
        **kwargs: Any,
    ) -> R:
...

Both of these are in a form that I think mypy can meaningfully process, and the former avoids us repeatedly passing in False, None at the majority of call sites.

@DMRobertson DMRobertson added T-Task Refactoring, removal, replacement, enabling or disabling functionality, other engineering tasks. Z-Dev-Wishlist Makes developers' lives better, but doesn't have direct user impact labels Sep 30, 2022
@DMRobertson
Copy link
Contributor Author

DMRobertson commented Oct 1, 2022

I have tried the above out. There were more places passing in db_autocommit=True than I realised, though these were mostly helper functions.

Running mypy on 907041c (See https://github.com/matrix-org/synapse/commits/dmr/typing/run-interaction) gives me the following output. The first one looks like a mypy bug(?). but I think the rest are all either incorrect annotations or bugs.

synapse/storage/database.py:1687: error: Argument "allow_none" to "runInteraction_advanced" of "DatabasePool" has incompatible type "bool"; expected "Literal[False]"  [arg-type]
synapse/storage/database.py:1777: error: Argument 6 to "runInteraction_advanced" of "DatabasePool" has incompatible type "Optional[Dict[str, Any]]"; expected "Dict[str, Any]"  [arg-type]
synapse/storage/database.py:1992: error: Argument 7 to "runInteraction" of "DatabasePool" has incompatible type "Iterable[Iterable[Any]]"; expected "Collection[Iterable[Any]]"  [arg-type]
synapse/storage/database.py:1992: note: "Iterable" is missing following "Collection" protocol members:
synapse/storage/database.py:1992: note:     __contains__, __len__
synapse/storage/databases/main/pusher.py:676: error: Argument 3 to "runInteraction" of "DatabasePool" has incompatible type "Sequence[int]"; expected "List[int]"  [arg-type]
synapse/storage/databases/state/store.py:554: error: Argument 4 to "runInteraction" of "DatabasePool" has incompatible type "Optional[Mapping[Tuple[str, str], str]]"; expected "Mapping[Tuple[str, str], str]"  [arg-type]
synapse/storage/databases/main/roommember.py:408: error: Argument 4 to "runInteraction" of "DatabasePool" has incompatible type "Collection[str]"; expected "List[str]"  [arg-type]
synapse/storage/databases/main/presence.py:120: error: Argument 3 to "runInteraction" of "DatabasePool" has incompatible type "Sequence[int]"; expected "List[int]"  [arg-type]
synapse/storage/databases/main/end_to_end_keys.py:240: error: Argument 3 to "runInteraction" of "DatabasePool" has incompatible type "Collection[Tuple[str, Optional[str]]]"; expected "Collection[Tuple[str, str]]"  [arg-type]
synapse/storage/databases/main/event_federation.py:1506: error: Argument 2 to "runInteraction" of "DatabasePool" has incompatible type "bool"; expected "Callable[[LoggingTransaction, VarArg(<nothing>), KwArg(<nothing>)], <nothing>]"  [arg-type]
synapse/storage/databases/main/event_federation.py:1507: error: Argument 3 to "runInteraction" of "DatabasePool" has incompatible type "None"; expected <nothing>  [arg-type]
synapse/storage/databases/main/event_federation.py:1508: error: Argument 4 to "runInteraction" of "DatabasePool" has incompatible type "Callable[[LoggingTransaction, str, str, str], None]"; expected <nothing>  [arg-type]
synapse/storage/databases/main/event_federation.py:1509: error: Argument 5 to "runInteraction" of "DatabasePool" has incompatible type "str"; expected <nothing>  [arg-type]
synapse/storage/databases/main/event_federation.py:1510: error: Argument 6 to "runInteraction" of "DatabasePool" has incompatible type "str"; expected <nothing>  [arg-type]
synapse/storage/databases/main/event_federation.py:1511: error: Argument 7 to "runInteraction" of "DatabasePool" has incompatible type "str"; expected <nothing>  [arg-type]
synapse/storage/databases/main/push_rule.py:357: error: Argument 10 to "runInteraction" of "DatabasePool" has incompatible type "Optional[str]"; expected "str"  [arg-type]
synapse/storage/databases/main/push_rule.py:358: error: Argument 11 to "runInteraction" of "DatabasePool" has incompatible type "Optional[str]"; expected "str"  [arg-type]
synapse/module_api/__init__.py:835: error: Argument 2 to "runInteraction" of "DatabasePool" has incompatible type "Callable[P, T]"; expected "Callable[[LoggingTransaction, **P], T]"  [arg-type]
Found 17 errors in 9 files (checked 734 source files)

@erikjohnston
Copy link
Member

Can we instead ban the use of *args? i.e. something like:

async def runInteraction( 
     self, 
     desc: str, 
     func: Callable[..., R], 
     *,
     db_autocommit: bool = False, 
     isolation_level: Optional[int] = None, 
     **kwargs: Any, 
 ) -> R: 
    ...

Though that method might involve so many changes that its not worth it?

@DMRobertson
Copy link
Contributor Author

Can we instead ban the use of *args? i.e. something like:

async def runInteraction( 
     self, 
     desc: str, 
     func: Callable[..., R], 
     *,
     db_autocommit: bool = False, 
     isolation_level: Optional[int] = None, 
     **kwargs: Any, 
 ) -> R: 
    ...

Though that method might involve so many changes that its not worth it?

As written, that annotation doesn't get you any extra type checking, because there's nothing linking kwargs to func.

I don't think one can have P.kwargs without a preceeding P.args (though I have no tried it). Quoting PEP 612:

Furthermore, because the default kind of parameter in Python ((x: int)) may be addressed both positionally and through its name, two valid invocations of a (*args: P.args, **kwargs: P.kwargs) function may give different partitions of the same set of parameters. Therefore, we need to make sure that these special types are only brought into the world together, and are used together, so that our usage is valid for all possible partitions.

(emphasis mine).

@erikjohnston
Copy link
Member

Ah yeah, I meant to add the P.kwargs to it.

I'd be slightly surprised if we couldn't have keyword argument only stuff, but I've not checked.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
A-Database DB stuff like queries, migrations, new/remove columns, indexes, unexpected entries in the db T-Task Refactoring, removal, replacement, enabling or disabling functionality, other engineering tasks. Z-Dev-Wishlist Makes developers' lives better, but doesn't have direct user impact
Projects
None yet
Development

No branches or pull requests

3 participants