Skip to content

Commit

Permalink
Implement mongodb query syntax for task filters (#258)
Browse files Browse the repository at this point in the history
  • Loading branch information
msm-cert authored Sep 10, 2024
1 parent c7d56b3 commit 89a885d
Show file tree
Hide file tree
Showing 8 changed files with 764 additions and 88 deletions.
73 changes: 73 additions & 0 deletions docs/advanced_concepts.rst
Original file line number Diff line number Diff line change
Expand Up @@ -246,3 +246,76 @@ You can enable it by setting:
- :code:`KARTON_KARTON_DEBUG` environment value to "1"
- :code:`debug` parameter to `1` in the :code:`[karton]` config section
- :code:`--debug` command-line parameter


Negated filter patterns
-----------------------

.. versionadded:: 5.4.1

There is one more pattern syntax, not documented in the :code:`Filter Patterns` section anymore.
It is possible to define a negated filter, and they are handled in a special way. For example let's consider following filters:

.. code-block:: python
# Special ("old style") negation
[
{"foo": "bar", "platform": "!linux"},
{"foo": "bar", "platform": "!windows"},
]
Depending on how you think this should work, this may have a surprising behavior. In particular this is **not** equivalent to:

.. code-block:: python
# Regular ("new style") negation (this is intentionally WRONG, see below)
[
{"foo": "bar", "platform": {"$not": "linux"}},
{"foo": "bar", "platform": {"$not": "windows"}},
]
That's because negated "old style" filters are handled in a very special way, but :code:`$not` is not. Let's use the following task as an example:

.. code-block:: python
{
"foo": "bar",
"platform": "linux"
}
Recall that filters are checked top to bottom, and if at least one pattern matches, the task will be accepted by a consumer.
Using regular ("new style") patterns, the matching will proceed as follows:

- Check against the first filter: :code:`foo` matches, but the filter explicitly rejects tasks with :code:`platform: linux`.
- Check against the second filter: :code:`foo` matches, and the platform - :code:`linux` - is not equal to to :code:`windows`, so the task is accepted.

Whoops! This is probably not what the programmer intended. In comparison, "old style" filters will always reject a task if it matches at least one negated filter.
This sounds nice, but as every special case may cause unpleasant surprised. This is especially true when combining "old style" and "new style" patterns.
That's why it's currently recommended to only use "new style" filters - they do everything "old style" filters can, and much more.

In this case, the proper way to get the desired behavior with "new-style" filters is:

.. code-block:: python
# Regular ("new style") negation
[
{
"foo": "bar",
"platform": {"$not": {"$or": ["linux", "windows"]}},,
}
]
It's a bit more verbose, but at least it should be very clear what is happening: We want :code:`foo` equal to :code:`bar`, and :code:`platform` **not** equal to either :code:`windows` or :code:`linux`.
In this case there are no special cases, and matching checks every filter top to bottom independently, as usual.

.. warning::

"Old style" negations are only supported at the top-level! Combining them with "new style" filters will not work. Exclamation mark is not considered a special character in this case.

In fact, we're not even sure how :code:`{"$or": ["!windows", "!linux"]}` *should* behave.

.. note::

Since "new style" patterns were introduced in Karton version 5.4.1, "old style" negations are not recommended and should be considered deprecated.

Nevertheless, Karton still supports them and they will keep working indefinitely. So don't worry, there are no breaking changes here.
81 changes: 65 additions & 16 deletions docs/task_headers_payloads.rst
Original file line number Diff line number Diff line change
Expand Up @@ -88,12 +88,10 @@ Starting from 5.0.0, consumer filters support basic wildcards and exclusions.
Pattern Meaning
------------------------ ------------------------------------------------------------------------------
``{"foo": "bar"}`` matches 'bar' value of 'foo' header
``{"foo": "!bar"}`` matches any value other than 'bar' in 'foo' header
``{"foo": "ba?"}`` matches 'ba' value followed by any character
``{"foo": "ba*"}`` matches 'ba' value followed by any substring (including empty)
``{"foo": "ba[rz]"}`` matches 'ba' value followed by 'r' or 'z' character
``{"foo": "ba[!rz]"}`` matches 'ba' value followed by any character other than 'r' or 'z'
``{"foo": "!ba[!rz]"}`` matches any value of 'foo' header that doesn't match to the "bar[!rz]" pattern
======================== ==============================================================================

Filter logic can be used to fulfill specific use-cases:
Expand All @@ -104,27 +102,78 @@ Filter logic can be used to fulfill specific use-cases:
``[]`` matches no tasks (no headers allowed). Can be used to turn off queue and consume tasks left.
``[{}]`` matches any task (no header conditions). Can be used to intercept all tasks incoming to Karton.
``[{"foo": "bar"}, {"foo": "baz"}]`` 'foo' header is required and must have 'bar' or 'baz' value.
``[{"foo": "!*"}]`` 'foo' header must be not defined.
==================================== ==============================================================================

Excluding (negated) filters come with specific corner-cases. Regular filters require specific value to be defined in header, while
negated filters are accepting all possible values except specified in filter.
.. versionadded:: 5.4.1

================================================================================== =============================================================================================================================================
``filters`` value Meaning
---------------------------------------------------------------------------------- ---------------------------------------------------------------------------------------------------------------------------------------------
``[{"type": "sample", "stage": "!*"}]`` matches only tasks that have type 'sample' but no 'stage' key
``[{"platform": "!linux"}, {"platform": "!windows"}]`` matches **all** tasks (even with no headers) but not these with platform 'linux' or 'windows'
``[{"foo": "bar", "platform": "!linux"}, {"foo": "bar", "platform": "!windows"}]`` 'foo' header is required and must have 'bar' value, but platform can't be 'linux' or 'windows'
``[{"foo": "bar", "platform": "!linux"}, {"foo": "baz", "platform": "!windows"}]`` 'foo' header is required and must have 'bar' value and no 'linux' in platform key, or foo must be 'baz', but then platform can't be 'windows'
================================================================================== =============================================================================================================================================
Sometimes a more flexible behavior is necessary. This should be done with caution, as Karton can handle quite complex
workflows without resorting to this. The need to use complex task filtering rules may mean that one is doing something not in the "spirit" of Karton.

The advanced filter syntax is based on MongoDB syntax. See `MongoDB documentation <https://www.mongodb.com/docs/manual/reference/operator/query/>`_
for a detailed explanation.

In case of Karton, the following operators are allowed:

- Comparison: :code:`$eq`, :code:`ne` :code:`$gt`, :code:`$gte`, :code:`$lt`, :code:`$lte`
- Logical: :code:`$and`, :code:`$or`, :code:`$not`, :code:`$nor`
- Array: :code:`$in`, :code:`$nin`, :code:`$all`, :code:`$elemMatch`, :code:`$size`
- Miscellaneous: :code:`$type`, :code:`$mod`, :code:`$regex`, :code:`$elemMatch`

For some concrete examples, consider these filters:

.. code-block:: python
filters = [
{ # checks if `version` header is a number greater than 3
"type": "sample",
"version": {"$gt": 3},
},
{ # checks if `tags` header contain both "emotet" and "dimp"
"type": "sample",
"tags": {"$all": ["emotet", "dump"]},
},
{ # checks if `platform` header is either "win32" or "linux"
"type": "sample",
"platform": {"$in": ["win32", "linux"]},
},
{ # checks if `respect` header contains a prime number of letters "f"
"type": "sample",
"respect": {"$not": {"$regex": r"^f?$|^(ff+?)\1+$"}}
},
]
.. warning::

It's recommended to use only strings in filter and header values
Filter styles don't mix well, and wildcard patterns only work at the top level.
For example, the following won't work as expected:

.. code-block:: python
filters = [
{ "version": {"$or": ["win*", "linux*"]} },
]
Instead you have to use regex explicitly:

.. code-block:: python
filters = [{
"version": {
"$or": [
{"$regex": "win*"},
{"$regex": "linux*"},
],
}
]
Or just:
.. code-block:: python
filters = [
{ "version": {"$regex": "win*|linux*"} },
]
Although some of non-string types are allowed, they will be converted to string for comparison
which may lead to unexpected results.
Task payload
------------
Expand Down
4 changes: 4 additions & 0 deletions karton/core/karton.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@
import traceback
from typing import Any, Callable, Dict, List, Optional, Tuple, cast

from . import query
from .__version__ import __version__
from .backend import KartonBackend, KartonBind, KartonMetrics
from .base import KartonBase, KartonServiceBase
Expand Down Expand Up @@ -122,6 +123,9 @@ def __init__(
if self.filters is None:
raise ValueError("Cannot bind consumer on Empty binds")

# Dummy conversion to make sure the filters are well-formed.
query.convert(self.filters)

self.persistent = (
self.config.getboolean("karton", "persistent", self.persistent)
and not self.debug
Expand Down
Loading

0 comments on commit 89a885d

Please sign in to comment.