Skip to content

Commit

Permalink
Push some documentation
Browse files Browse the repository at this point in the history
  • Loading branch information
msm-code committed Aug 21, 2024
1 parent 5ac7e8a commit de7b1b3
Show file tree
Hide file tree
Showing 2 changed files with 105 additions and 17 deletions.
71 changes: 71 additions & 0 deletions docs/advanced_concepts.rst
Original file line number Diff line number Diff line change
Expand Up @@ -246,3 +246,74 @@ You can enable it by setting:
- :code:`KARTON_KARTON_DEBUG` environment value to "1"
- :code:`debug` parameter to `1` in the :code:`[karton]` config section
- :code:`--debug` command-line parameter


Negated filter patterns
-----------------

Check warning on line 252 in docs/advanced_concepts.rst

View workflow job for this annotation

GitHub Actions / docs

Title underline too short.

Check warning on line 252 in docs/advanced_concepts.rst

View workflow job for this annotation

GitHub Actions / docs

Title underline too short.

There is one more pattern syntax, not documented in the :code:`Filter Patterns` section anymore.
It is possible to define a negated filter, and they are handled in a special way. For example let's consider following filters:

.. code-block:: python
# Special ("old style") negation
[
{"foo": "bar", "platform": "!linux"},
{"foo": "bar", "platform": "!windows"}
]
Depending on how you think this should work, this may have a surprising behavior. In particular this is **not** equivalent to:

.. code-block:: python
# Regular ("new style") negation (this is intentionally WRONG, see below)
[
{"foo": "bar", "platform": {"$not": "linux"}},
{"foo": "bar", "platform": {"$not": "windows"}},
]
That's because negated filters are handled in a very special way, but :code:`$not$` is not. Let's use the following task as an example:

.. code-block:: python
{
"foo": "bar",
"platform": "linux"
}
Recall that filters are checked top to bottom, and if at least one pattern matches, the task will be accepted by a consumer.
Using a regular ("new style") patterns, the matching will proceed as follows:

- A task is checked against the first filter. Then :code:`foo` matches, but the filters explicitly rejects tasks with :code:`platform: linux"`.
- A task is checked against the second filter. Then :code:`foo` matches, and the platform - :code:`linux` - is not equal to to :code:`windows`, so the task is accepted.

Whoops! This is probably not what the programmer intended. In comparison, "old style" filters will always reject a task if it matches at least one negated filter.
This sounds nice, but as every special case may cause unpleasant surprised. This is especially true when combining "old style" and "new style" patterns.
That's why it's currently recommended to only use "new style" filters - they do everything "old style" filters can, and much more.

In this case, the proper way to get the desired behavior with "new-style" filters is:

.. code-block:: python
# Regular ("new style") negation
[
{
"foo": "bar",
"platform": {"$not": {"$or": ["linux", "windows"]}},
}
]
It's a bit more verbose, but at least it should be very clear what is happening: We want :code:`foo` equal to :code:`bar`, and :code:`platform` **not** equal to either :code:`windows` or :code:`linux`.
In this case there are no special cases, and matching checks every filter top to bottom independently, as usual.

.. warning::

"Old style" negations are only supported at the top-level! Combining them with "new style" filters will not work. Exclamation mark is not considered a special character in this case.

In fact, we're not even sure how :code:`{"$or": ["!windows", "!linux"]}` *should* behave.

.. note::

Since "new style" patterns were introduced in Karton version 5.4.1, "old style" negations are not recommended and should be considered deprecated.

Nevertheless, Karton still supports them and they will keep working indefinitely. So don't worry, there are no breaking changes here.
51 changes: 34 additions & 17 deletions docs/task_headers_payloads.rst
Original file line number Diff line number Diff line change
Expand Up @@ -88,12 +88,10 @@ Starting from 5.0.0, consumer filters support basic wildcards and exclusions.
Pattern Meaning
------------------------ ------------------------------------------------------------------------------
``{"foo": "bar"}`` matches 'bar' value of 'foo' header
``{"foo": "!bar"}`` matches any value other than 'bar' in 'foo' header
``{"foo": "ba?"}`` matches 'ba' value followed by any character
``{"foo": "ba*"}`` matches 'ba' value followed by any substring (including empty)
``{"foo": "ba[rz]"}`` matches 'ba' value followed by 'r' or 'z' character
``{"foo": "ba[!rz]"}`` matches 'ba' value followed by any character other than 'r' or 'z'
``{"foo": "!ba[!rz]"}`` matches any value of 'foo' header that doesn't match to the "bar[!rz]" pattern
======================== ==============================================================================

Filter logic can be used to fulfill specific use-cases:
Expand All @@ -104,27 +102,46 @@ Filter logic can be used to fulfill specific use-cases:
``[]`` matches no tasks (no headers allowed). Can be used to turn off queue and consume tasks left.
``[{}]`` matches any task (no header conditions). Can be used to intercept all tasks incoming to Karton.
``[{"foo": "bar"}, {"foo": "baz"}]`` 'foo' header is required and must have 'bar' or 'baz' value.
``[{"foo": "!*"}]`` 'foo' header must be not defined.
==================================== ==============================================================================

Excluding (negated) filters come with specific corner-cases. Regular filters require specific value to be defined in header, while
negated filters are accepting all possible values except specified in filter.
.. versionadded:: 5.4.1

================================================================================== =============================================================================================================================================
``filters`` value Meaning
---------------------------------------------------------------------------------- ---------------------------------------------------------------------------------------------------------------------------------------------
``[{"type": "sample", "stage": "!*"}]`` matches only tasks that have type 'sample' but no 'stage' key
``[{"platform": "!linux"}, {"platform": "!windows"}]`` matches **all** tasks (even with no headers) but not these with platform 'linux' or 'windows'
``[{"foo": "bar", "platform": "!linux"}, {"foo": "bar", "platform": "!windows"}]`` 'foo' header is required and must have 'bar' value, but platform can't be 'linux' or 'windows'
``[{"foo": "bar", "platform": "!linux"}, {"foo": "baz", "platform": "!windows"}]`` 'foo' header is required and must have 'bar' value and no 'linux' in platform key, or foo must be 'baz', but then platform can't be 'windows'
================================================================================== =============================================================================================================================================
Sometimes a more flexible behavior is necessary. This should be done with caution, as Karton can handle quite complex
workflows without resorting to this. The need to use complex task filtering rules may mean that one is doing something not in the "spirit" of Karton.

.. warning::
The advanced filter syntax is based on MongoDB syntax. See `MongoDB documentation<https://www.mongodb.com/docs/manual/reference/operator/query/>`_
for a detailed explanation.

In case of Karton, the following operators are allowed:

- Comparison: :code:`$eq`, :code:`ne` :code:`$gt`, :code:`$gte`, :code:`$lt`, :code:`$lte`
- Logical: :code:`$and`, :code:`$or`, :code:`$not`, :code:`$nor`
- Array: :code:`$in`, :code:`$nin`, :code:`$all`, :code:`$elemMatch`, :code:`$size`
- Miscellaneous: :code:`$type`, :code:`$mod`, :code:`$regex`, :code:`$elemMatch``

For some concrete examples, consider these filters:

It's recommended to use only strings in filter and header values
.. code-block:: python
filters = [
{ # checks if `version` header is a number greater than 3
"type": "sample",
"version": {"$gt": 3},
},
{ # checks if `tags` header contain both "emotet" and "dimp"
"type": "sample",
"tags": {"$all": ["emotet", "dump"]},
},
{ # checks if `platform` header is either "win32" or "linux"
"type": "sample",
"platform": {"$in": ["win32", "linux"]},
},
{ # checks if `respects` header contains a prime number of letters "f"
"type": "sample",
"respects": {"$not": {"$regex": r"^f?$|^(ff+?)\1+$"}}
},
]
Although some of non-string types are allowed, they will be converted to string for comparison
which may lead to unexpected results.
Task payload
------------
Expand Down

0 comments on commit de7b1b3

Please sign in to comment.