Use parallel ES indexing #185

rsevilla87 · 2020-05-26T15:55:56Z

In a simple 10 minutes test I got:

2020-05-26T15:24:29Z - INFO     - MainProcess - trigger_fio: fio has successfully finished sample 1 executing for jobname write and results are in the dir /tmp/fiod-6c4f84f7-b18d-5c45-b2f8-b251633c7612/fiojob-write-4KiB-1/1/write
2020-05-26T15:26:19Z - INFO     - MainProcess - run_snafu: Indexed results - 27466 success, 0 duplicates, 0 failures, with 0 retries.

and with parallel

2020-05-26T15:09:48Z - INFO     - MainProcess - trigger_fio: fio has successfully finished sample 1 executing for jobname write and results are in the dir /tmp/fiod-6c4f84f7-b18d-5c45-b2f8-b251633c7612/fiojob-write-4KiB-1/1/write
2020-05-26T15:10:25Z - INFO     - MainProcess - run_snafu: Indexed results - 27458 success, 0 duplicates, 0 failures, with 0 retries.

rht-perf-ci · 2020-05-26T17:49:51Z

Results for SNAFU CI Test

Test	Result	Runtime
fio_wrapper	PASS	00:11:51
fs_drift_wrapper	FAIL	00:06:10
hammerdb	PASS	00:07:50
iperf	PASS	00:03:05
pgbench_wrapper	PASS	00:04:02
smallfile_wrapper	PASS	00:05:21
sysbench	PASS	00:03:29
uperf_wrapper	PASS	00:16:10
ycsb_wrapper	PASS	00:09:43

bengland2 · 2020-05-26T20:25:36Z

@rsevilla87 good idea but does it scale? If you were doing this with 30 nodes instead of 1, is py_es_bulk able to back off if ES is overloaded? I think so (@acalhounRH what do you think?) but has anyone tried it?

aakarshg · 2020-05-27T16:36:03Z

@portante can you please review this ? I distinctly remember you finding a boatload of problems with parallel indexing and suggested to stick with serial indexing.

portante · 2020-05-27T16:46:56Z

Client side indexing is problematic to make scale. Unless you control all the clients, controlling the right level of parallelism for each client can cause an Elasticsearch instance to be swamped.

If parallel_bulk gives the exact same semantics as streaming_bulk, the code will likely work. But it is when you get into failure conditions and retries that will cause you problems.

rht-perf-ci · 2020-05-28T10:04:23Z

Results for SNAFU CI Test

Test	Result	Runtime
fio_wrapper	PASS	00:10:47
fs_drift_wrapper	FAIL	00:03:26
hammerdb	PASS	00:06:46
iperf	PASS	00:02:59
pgbench_wrapper	PASS	00:04:01
smallfile_wrapper	PASS	00:05:37
sysbench	PASS	00:02:47
uperf_wrapper	PASS	00:17:10
ycsb_wrapper	PASS	00:10:06

acalhounRH · 2020-05-28T15:14:10Z

I have already added parallel_bulk indexing with PR #173

rsevilla87 · 2020-05-28T15:33:12Z

I have already added parallel_bulk indexing with PR #173

This one enables parallelism optionally, by default false. Do you want to me wait for 173 or move forward with this one?

acalhounRH · 2020-05-28T15:57:54Z

I would prefer to wait for it #173, if you don't mind.

aakarshg · 2020-05-28T16:10:34Z

I would prefer to wait for it #173, if you don't mind.

@acalhounRH can you please update your PR in that case to support enabling parallelism optionally, but defaulting to serial given @portante 's comments above.

acalhounRH · 2020-05-28T17:53:36Z

I would prefer to wait for it #173, if you don't mind.

@acalhounRH can you please update your PR in that case to support enabling parallelism optionally, but defaulting to serial given @portante 's comments above.

This will require a change in both RIPSAW and snafu, ripsaw to set the env, and snafu to check the variable to switch between parallel or stream indexing.

rsevilla87 added the ok to test Kick off our CI framework label May 26, 2020

rsevilla87 force-pushed the parallel-bulk branch from 9e168d9 to 52732de Compare May 26, 2020 16:01

Use parallel ES indexing

87c7bbc

rsevilla87 force-pushed the parallel-bulk branch from 52732de to 87c7bbc Compare May 28, 2020 08:54

rsevilla87 closed this May 28, 2020

aakarshg mentioned this pull request May 28, 2020

Add get prometheus data to run snafu #173

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use parallel ES indexing #185

Use parallel ES indexing #185

rsevilla87 commented May 26, 2020 •

edited

Loading

rht-perf-ci commented May 26, 2020

bengland2 commented May 26, 2020

aakarshg commented May 27, 2020

portante commented May 27, 2020

rht-perf-ci commented May 28, 2020

acalhounRH commented May 28, 2020

rsevilla87 commented May 28, 2020 •

edited

Loading

acalhounRH commented May 28, 2020

aakarshg commented May 28, 2020

acalhounRH commented May 28, 2020

Use parallel ES indexing #185

Use parallel ES indexing #185

Conversation

rsevilla87 commented May 26, 2020 • edited Loading

rht-perf-ci commented May 26, 2020

bengland2 commented May 26, 2020

aakarshg commented May 27, 2020

portante commented May 27, 2020

rht-perf-ci commented May 28, 2020

acalhounRH commented May 28, 2020

rsevilla87 commented May 28, 2020 • edited Loading

acalhounRH commented May 28, 2020

aakarshg commented May 28, 2020

acalhounRH commented May 28, 2020

rsevilla87 commented May 26, 2020 •

edited

Loading

rsevilla87 commented May 28, 2020 •

edited

Loading