EQL Thread Usage #59708

imotov · 2020-07-16T15:00:53Z

I would like to discuss thread usage in EQL. The current situation is this:

The initial processing between receiving the user request and fieldCaps call is performed on the transport_worker thread.
After the information about mappings is received, we process these mappings on the management thread. This is where mapping merging and plan analysis occurs and the search request is formulated
The search request is sent to the elasticsearch, and when the response is received, we analyze and reformat the response and return it to the user on the search thread.

I think that reusing transport thread for the initial processing is justifiable since we don't do any heavy lifting there, but I am not sure about the second and the thirds steps.

The text was updated successfully, but these errors were encountered:

costin · 2020-08-20T14:46:52Z

After discussing this, we concluded that EQL (and QL in general) should be more careful around what thread it uses.
While the transport_worker thread usage is okayish, the usage of the management one is not. Also with EQL doing multiple queries for a sequence, the use of the search thread becomes problematic.

imotov · 2020-08-25T14:43:29Z

I wonder if @elastic/es-core-infra have some recommendations on how we should deal with threads in EQL.

jaymode · 2020-08-25T15:03:34Z

I think threads and which one to use is a problem we have in many different places. I did some thinking about this recently when introducing a new threadpool for system reads and a lot of times we just throw things into the generic threadpool and call it done :(. Thank you for starting this discussion and thinking about the usage of threads by EQL. In order to provide the most useful input, can the steps be explained a bit more?

For example, in terms of mapping merging - what exactly is this doing? What is involved in the plan analysis?

costin · 2020-08-25T15:41:47Z

Thanks for stepping in Jay.

In terms of thread usage, EQL currently has 3 main stages:

transport_worker

EQL request is received.
The query is parsed and the grammar checked. Before doing proper analysis, the mapping of the target index/alias is being requested.

management
2. Once the mappings get returned, they are being merged into one virtual mapping (as the target can hit multiple indices and currently we do have to do conflict resolution - this might be something that will change in the future).
Once the mapping gets created and no conflicts are discovered, the process resumes- the plan gets analyzed, optimized and planning occurs. The actual search queries are sent.

search
3. As the results are being returned, the sequences / correlation happens. This is where most of the work/execution time is spent since as the data comes back, some of it will be dropped (it doesn't form sequences) while some will. But most importantly new queries are created and the process repeated until either we're running out of data or the number of requested results has been found.

I think 1 is fine however with 2 I'm concerned the mappings themselves can be a problem. Otherwise, this process from a computation POV is quite fast (we're talking ms here). 3 is the biggest concern since EQL does trigger other queries as it navigates through the data and in case of issues, might be hard to isolate or exacerbate the problem to other areas.

jaymode · 2020-08-25T18:43:12Z

Thanks for the details Costin.

transport_worker - I agree that the work done when a request is received is fine.

management - I do not believe that the mappings should be an issue. Can you elaborate on the mappings being a problem in terms of threading? (BTW it looks like EQL is actually pulling the field capabilities and not mappings if I am looking in the right place). Also, it doesn't look like EQL is blocking or anything like that so I don't really see a reason to move to another thread.

search - I'd say this is the right choice. If you think about the nature of the search threads we run searches and aggregations on these threads. If EQL is triggering other queries asynchronously then it won't be tying up search threads indefinitely. Searches need to process data as it comes back (reduce phase from shards). There is a cost to process the data but I think that's still part of search. It is almost like EQL is search with more phases/processing especially when I think about this phrase from your response:

the process repeated until either we're running out of data or the number of requested results has been found.

That said maybe EQL could fit into its own bucket in terms of expense in comparison to other searches. With async_search more slow searches will be executed and there has been consideration that these need some sort of prioritization, see #37867 and there is a mention of EQL by the security team as well.

imotov · 2020-08-25T19:57:36Z

Regarding the management thread, we are indeed pulling field capabilities, and then we perform potentially CPU-intensive operation by analyzing pulled fields and formulating search requests and sending them back to the server. It is not blocking. (It also seems that analysis of EQL is less CPU intensive than SQL).

costin · 2020-08-26T09:16:09Z

It sounds like steps 1 and 3 are okay in terms of execution. With a future move to async search, using our own thread-pool for actual search queries might be even less of an issue.
That leaves the use of management thread which is used to return the data from field caps and then is used to assemble the actual query.
In case of large mappings this might be problematic though considering no IO is done it's still quite fast (less than 10 ms I would say) even on large mappings which put more pressure on memory instead of CPU.
However moving that to a different thread pool (generic?) seems excessive...

jaymode · 2020-08-26T14:38:09Z

However moving that to a different thread pool (generic?) seems excessive...

+1. Moving to a different thread just adds more overhead and we should strive to process this quick and free the memory up sooner rather than later if it is going to add memory pressure.

costin · 2020-08-27T10:44:18Z

Sounds to me like the current situation is acceptable hence why I'm closing this ticket.
If somehow this is not the case, @jaymode or @imotov please open it up again.

Thanks,

imotov added team-discuss :Analytics/EQL EQL querying labels Jul 16, 2020

costin removed the team-discuss label Aug 20, 2020

costin closed this as completed Aug 27, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

EQL Thread Usage #59708

EQL Thread Usage #59708

imotov commented Jul 16, 2020

costin commented Aug 20, 2020

imotov commented Aug 25, 2020

jaymode commented Aug 25, 2020

costin commented Aug 25, 2020

jaymode commented Aug 25, 2020

imotov commented Aug 25, 2020

costin commented Aug 26, 2020

jaymode commented Aug 26, 2020

costin commented Aug 27, 2020

EQL Thread Usage #59708

EQL Thread Usage #59708

Comments

imotov commented Jul 16, 2020

costin commented Aug 20, 2020

imotov commented Aug 25, 2020

jaymode commented Aug 25, 2020

costin commented Aug 25, 2020

jaymode commented Aug 25, 2020

imotov commented Aug 25, 2020

costin commented Aug 26, 2020

jaymode commented Aug 26, 2020

costin commented Aug 27, 2020