Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use parallel ES indexing #185

Closed
wants to merge 1 commit into from

Conversation

rsevilla87
Copy link
Member

@rsevilla87 rsevilla87 commented May 26, 2020

In a simple 10 minutes test I got:

2020-05-26T15:24:29Z - INFO     - MainProcess - trigger_fio: fio has successfully finished sample 1 executing for jobname write and results are in the dir /tmp/fiod-6c4f84f7-b18d-5c45-b2f8-b251633c7612/fiojob-write-4KiB-1/1/write
2020-05-26T15:26:19Z - INFO     - MainProcess - run_snafu: Indexed results - 27466 success, 0 duplicates, 0 failures, with 0 retries.

and with parallel

2020-05-26T15:09:48Z - INFO     - MainProcess - trigger_fio: fio has successfully finished sample 1 executing for jobname write and results are in the dir /tmp/fiod-6c4f84f7-b18d-5c45-b2f8-b251633c7612/fiojob-write-4KiB-1/1/write
2020-05-26T15:10:25Z - INFO     - MainProcess - run_snafu: Indexed results - 27458 success, 0 duplicates, 0 failures, with 0 retries.

@rsevilla87 rsevilla87 added the ok to test Kick off our CI framework label May 26, 2020
@rht-perf-ci
Copy link

Results for SNAFU CI Test

Test Result Runtime
fio_wrapper PASS 00:11:51
fs_drift_wrapper FAIL 00:06:10
hammerdb PASS 00:07:50
iperf PASS 00:03:05
pgbench_wrapper PASS 00:04:02
smallfile_wrapper PASS 00:05:21
sysbench PASS 00:03:29
uperf_wrapper PASS 00:16:10
ycsb_wrapper PASS 00:09:43

@bengland2
Copy link
Contributor

@rsevilla87 good idea but does it scale? If you were doing this with 30 nodes instead of 1, is py_es_bulk able to back off if ES is overloaded? I think so (@acalhounRH what do you think?) but has anyone tried it?

@aakarshg
Copy link
Contributor

@portante can you please review this ? I distinctly remember you finding a boatload of problems with parallel indexing and suggested to stick with serial indexing.

@portante
Copy link

Client side indexing is problematic to make scale. Unless you control all the clients, controlling the right level of parallelism for each client can cause an Elasticsearch instance to be swamped.

If parallel_bulk gives the exact same semantics as streaming_bulk, the code will likely work. But it is when you get into failure conditions and retries that will cause you problems.

@rht-perf-ci
Copy link

Results for SNAFU CI Test

Test Result Runtime
fio_wrapper PASS 00:10:47
fs_drift_wrapper FAIL 00:03:26
hammerdb PASS 00:06:46
iperf PASS 00:02:59
pgbench_wrapper PASS 00:04:01
smallfile_wrapper PASS 00:05:37
sysbench PASS 00:02:47
uperf_wrapper PASS 00:17:10
ycsb_wrapper PASS 00:10:06

@acalhounRH
Copy link
Contributor

I have already added parallel_bulk indexing with PR #173

@rsevilla87
Copy link
Member Author

rsevilla87 commented May 28, 2020

I have already added parallel_bulk indexing with PR #173

This one enables parallelism optionally, by default false. Do you want to me wait for 173 or move forward with this one?

@acalhounRH
Copy link
Contributor

I would prefer to wait for it #173, if you don't mind.

@rsevilla87 rsevilla87 closed this May 28, 2020
@aakarshg
Copy link
Contributor

I would prefer to wait for it #173, if you don't mind.

@acalhounRH can you please update your PR in that case to support enabling parallelism optionally, but defaulting to serial given @portante 's comments above.

@acalhounRH
Copy link
Contributor

I would prefer to wait for it #173, if you don't mind.

@acalhounRH can you please update your PR in that case to support enabling parallelism optionally, but defaulting to serial given @portante 's comments above.

This will require a change in both RIPSAW and snafu, ripsaw to set the env, and snafu to check the variable to switch between parallel or stream indexing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ok to test Kick off our CI framework
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants