From de7b1b3896896d25705c1dd75d1946fd8e1d5e1a Mon Sep 17 00:00:00 2001 From: msm Date: Wed, 21 Aug 2024 15:34:11 +0200 Subject: [PATCH] Push some documentation --- docs/advanced_concepts.rst | 71 ++++++++++++++++++++++++++++++++++ docs/task_headers_payloads.rst | 51 ++++++++++++++++-------- 2 files changed, 105 insertions(+), 17 deletions(-) diff --git a/docs/advanced_concepts.rst b/docs/advanced_concepts.rst index 3d75953..94c8373 100644 --- a/docs/advanced_concepts.rst +++ b/docs/advanced_concepts.rst @@ -246,3 +246,74 @@ You can enable it by setting: - :code:`KARTON_KARTON_DEBUG` environment value to "1" - :code:`debug` parameter to `1` in the :code:`[karton]` config section - :code:`--debug` command-line parameter + + +Negated filter patterns +----------------- + +There is one more pattern syntax, not documented in the :code:`Filter Patterns` section anymore. +It is possible to define a negated filter, and they are handled in a special way. For example let's consider following filters: + +.. code-block:: python + + # Special ("old style") negation + [ + {"foo": "bar", "platform": "!linux"}, + {"foo": "bar", "platform": "!windows"} + ] + +Depending on how you think this should work, this may have a surprising behavior. In particular this is **not** equivalent to: + +.. code-block:: python + + # Regular ("new style") negation (this is intentionally WRONG, see below) + [ + {"foo": "bar", "platform": {"$not": "linux"}}, + {"foo": "bar", "platform": {"$not": "windows"}}, + ] + +That's because negated filters are handled in a very special way, but :code:`$not$` is not. Let's use the following task as an example: + +.. code-block:: python + + { + "foo": "bar", + "platform": "linux" + } + +Recall that filters are checked top to bottom, and if at least one pattern matches, the task will be accepted by a consumer. +Using a regular ("new style") patterns, the matching will proceed as follows: + +- A task is checked against the first filter. Then :code:`foo` matches, but the filters explicitly rejects tasks with :code:`platform: linux"`. +- A task is checked against the second filter. Then :code:`foo` matches, and the platform - :code:`linux` - is not equal to to :code:`windows`, so the task is accepted. + +Whoops! This is probably not what the programmer intended. In comparison, "old style" filters will always reject a task if it matches at least one negated filter. +This sounds nice, but as every special case may cause unpleasant surprised. This is especially true when combining "old style" and "new style" patterns. +That's why it's currently recommended to only use "new style" filters - they do everything "old style" filters can, and much more. + +In this case, the proper way to get the desired behavior with "new-style" filters is: + +.. code-block:: python + + # Regular ("new style") negation + [ + { + "foo": "bar", + "platform": {"$not": {"$or": ["linux", "windows"]}}, + } + ] + +It's a bit more verbose, but at least it should be very clear what is happening: We want :code:`foo` equal to :code:`bar`, and :code:`platform` **not** equal to either :code:`windows` or :code:`linux`. +In this case there are no special cases, and matching checks every filter top to bottom independently, as usual. + +.. warning:: + + "Old style" negations are only supported at the top-level! Combining them with "new style" filters will not work. Exclamation mark is not considered a special character in this case. + + In fact, we're not even sure how :code:`{"$or": ["!windows", "!linux"]}` *should* behave. + +.. note:: + + Since "new style" patterns were introduced in Karton version 5.4.1, "old style" negations are not recommended and should be considered deprecated. + + Nevertheless, Karton still supports them and they will keep working indefinitely. So don't worry, there are no breaking changes here. diff --git a/docs/task_headers_payloads.rst b/docs/task_headers_payloads.rst index 38fbe86..1c601b3 100644 --- a/docs/task_headers_payloads.rst +++ b/docs/task_headers_payloads.rst @@ -88,12 +88,10 @@ Starting from 5.0.0, consumer filters support basic wildcards and exclusions. Pattern Meaning ------------------------ ------------------------------------------------------------------------------ ``{"foo": "bar"}`` matches 'bar' value of 'foo' header -``{"foo": "!bar"}`` matches any value other than 'bar' in 'foo' header ``{"foo": "ba?"}`` matches 'ba' value followed by any character ``{"foo": "ba*"}`` matches 'ba' value followed by any substring (including empty) ``{"foo": "ba[rz]"}`` matches 'ba' value followed by 'r' or 'z' character ``{"foo": "ba[!rz]"}`` matches 'ba' value followed by any character other than 'r' or 'z' -``{"foo": "!ba[!rz]"}`` matches any value of 'foo' header that doesn't match to the "bar[!rz]" pattern ======================== ============================================================================== Filter logic can be used to fulfill specific use-cases: @@ -104,27 +102,46 @@ Filter logic can be used to fulfill specific use-cases: ``[]`` matches no tasks (no headers allowed). Can be used to turn off queue and consume tasks left. ``[{}]`` matches any task (no header conditions). Can be used to intercept all tasks incoming to Karton. ``[{"foo": "bar"}, {"foo": "baz"}]`` 'foo' header is required and must have 'bar' or 'baz' value. -``[{"foo": "!*"}]`` 'foo' header must be not defined. ==================================== ============================================================================== -Excluding (negated) filters come with specific corner-cases. Regular filters require specific value to be defined in header, while -negated filters are accepting all possible values except specified in filter. +.. versionadded:: 5.4.1 -================================================================================== ============================================================================================================================================= - ``filters`` value Meaning ----------------------------------------------------------------------------------- --------------------------------------------------------------------------------------------------------------------------------------------- -``[{"type": "sample", "stage": "!*"}]`` matches only tasks that have type 'sample' but no 'stage' key -``[{"platform": "!linux"}, {"platform": "!windows"}]`` matches **all** tasks (even with no headers) but not these with platform 'linux' or 'windows' -``[{"foo": "bar", "platform": "!linux"}, {"foo": "bar", "platform": "!windows"}]`` 'foo' header is required and must have 'bar' value, but platform can't be 'linux' or 'windows' -``[{"foo": "bar", "platform": "!linux"}, {"foo": "baz", "platform": "!windows"}]`` 'foo' header is required and must have 'bar' value and no 'linux' in platform key, or foo must be 'baz', but then platform can't be 'windows' -================================================================================== ============================================================================================================================================= +Sometimes a more flexible behavior is necessary. This should be done with caution, as Karton can handle quite complex +workflows without resorting to this. The need to use complex task filtering rules may mean that one is doing something not in the "spirit" of Karton. -.. warning:: +The advanced filter syntax is based on MongoDB syntax. See `MongoDB documentation`_ +for a detailed explanation. + +In case of Karton, the following operators are allowed: + +- Comparison: :code:`$eq`, :code:`ne` :code:`$gt`, :code:`$gte`, :code:`$lt`, :code:`$lte` +- Logical: :code:`$and`, :code:`$or`, :code:`$not`, :code:`$nor` +- Array: :code:`$in`, :code:`$nin`, :code:`$all`, :code:`$elemMatch`, :code:`$size` +- Miscellaneous: :code:`$type`, :code:`$mod`, :code:`$regex`, :code:`$elemMatch`` + +For some concrete examples, consider these filters: - It's recommended to use only strings in filter and header values +.. code-block:: python + + filters = [ + { # checks if `version` header is a number greater than 3 + "type": "sample", + "version": {"$gt": 3}, + }, + { # checks if `tags` header contain both "emotet" and "dimp" + "type": "sample", + "tags": {"$all": ["emotet", "dump"]}, + }, + { # checks if `platform` header is either "win32" or "linux" + "type": "sample", + "platform": {"$in": ["win32", "linux"]}, + }, + { # checks if `respects` header contains a prime number of letters "f" + "type": "sample", + "respects": {"$not": {"$regex": r"^f?$|^(ff+?)\1+$"}} + }, + ] - Although some of non-string types are allowed, they will be converted to string for comparison - which may lead to unexpected results. Task payload ------------