Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Task tree and iterators: backend performance improvements #207

Closed
wants to merge 22 commits into from

Conversation

psrok1
Copy link
Member

@psrok1 psrok1 commented Feb 21, 2023

Initial plans:

What is done:

  • Added karton.task:<root_id>:<task_id> to speed up querying for single analysis status. This change comes with new task field called revision. It's required because there are cases where task method needs to be aware if task was found under <root_id>:<task_id> or just <task_id>:

    • Task update (KartonBackend.register_task) might be done by Karton service with different version (karton-system, karton-dashboard) which may introduce inconsistency.
    • Task deletion is a bit more complicated when identifiers are mixed.

    In the same time, this identifier wasn't stored anywhere in Task itself. I was afraid about inconsistencies due to storing fquid as a separate field so finally decided to introduce revision versioning which can be also useful for future modifications

  • Optimized task deserialization:

    • Introduced quick task deserialization (parse_resources=False) without parsing resources and with use of orjson. custom_hook used by json.loads for Resource deserialization has significant impact on performance (and is not supported by orjson)
    • During deserialization, we pass Task attribute values directly to the constructor (so they're not set twice). In addition, introduced __slots__ to speed up attribute access.

    Benchmark with 30k tasks loaded into Karton:

    # Karton 5.0.1
    In [30]: timeit.timeit(lambda: list(backend.get_all_tasks()), number=10)
    Out[30]: 5.391155833960511
    
    # Karton 5.1
    In [30]: timeit.timeit(lambda: list(backend.get_all_tasks()), number=10)
    Out[30]: 4.257205436006188
    
    # Karton 5.1 with parse_resources=False
    In [6]: timeit.timeit(lambda: backend.get_all_tasks(parse_resources=False), number=10)
    Out[6]: 3.729756842018105
    
    # Karton 5.1 with orjson + parse_resources=False
    In [7]: timeit.timeit(lambda: backend.get_all_tasks(parse_resources=False), number=10)
    Out[7]: 2.8112106900662184
  • Added iter_tasks family of methods that use iterators to make less memory footprint (inspired by SystemService and backend improvements #193). In addition, this version uses SCAN instead of KEYS to not block Redis for too long when there are lots of tasks. Possible inconsistencies due to operating on cursor instead of fixed list should not be a problem.

  • karton.core.inspect.KartonState is lazy-loading things and has additional method get_analysis if you want to load data only about specific root_uid. Main properties still load all tasks though, because we still need to deserialize everything to filter out FINISHED/CRASHED tasks.

  • Added new set to Redis called karton.assigned:<identity> that keeps references to tasks that were routed to consumer with given identity. It optimizes access to tasks processed by single consumer, but... it doesn't come with any huge benefits. It can be used to speed up access to https://karton-dashboard/queue/<identity> and fixes some GC issues (Removing hanging "started" tasks for non-persistent kartoniks when all instances are gone #10), but GC and Karton main view still need to process all the tasks in Redis so it's not a big deal.

  • Bumped Python version used by karton-system Docker image to 3.11 (see https://docs.python.org/3/whatsnew/3.11.html#optimizations. I haven't made any benchmark tho).

@psrok1 psrok1 marked this pull request as draft March 2, 2023 14:59
@psrok1 psrok1 marked this pull request as ready for review March 2, 2023 15:14
Dockerfile Outdated Show resolved Hide resolved
.github/workflows/test.yml Outdated Show resolved Hide resolved
karton/core/inspect.py Outdated Show resolved Hide resolved
karton/core/inspect.py Outdated Show resolved Hide resolved
karton/core/inspect.py Outdated Show resolved Hide resolved
karton/core/inspect.py Outdated Show resolved Hide resolved
karton/system/system.py Outdated Show resolved Hide resolved
karton/core/task.py Outdated Show resolved Hide resolved
Co-authored-by: Michał Praszmo <michalpr@cert.pl>
@nazywam
Copy link
Member

nazywam commented Jun 9, 2023

Some of the issues seen while testing this in a bunch of different setups:

  • old karton-system + new karton services - When a service spawns a new task it uses the fully qualified task id (fquid), because the old karton-system cannot handle them they are stuck in dispatched state and are not cleaned up correctly.
  • new karton-system + old karton services - The old services don't know how to properly read the fquid task from their queues which leads to karton tasks being stuck in "spawned" state.

TL;DR In order to deploy the newly-proposed fully qualified task ids, we'd have to either add some kind of heuristics to determine whether the consumer "knows" fquid or we'd have to just upgrade all services and karton-system at once (seem non-trivial)

@rakovskij-stanislav
Copy link
Contributor

For me it was kinda obvious that such major update requires both client-side and server-side update.

We can solve it by adding two features:

  • client-side check that server version is at least x.y.z, otherwise raise ServerIncompatibilityException with the suggestion to use lower version of karton on client-side
  • server-side check on new worker with force unbind the client version of karton-core of it is lower than a.b.c.

@psrok1
Copy link
Member Author

psrok1 commented Jun 11, 2023

Thanks for feedback, I'll try to separate non-breaking changes from this PR into another one and implement some safety checks here.

karton/core/task.py Outdated Show resolved Hide resolved
@psrok1
Copy link
Member Author

psrok1 commented May 13, 2024

To be continued in #255

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Get task tree status
3 participants