Add get prometheus data to run snafu #173

acalhounRH · 2020-04-18T01:53:38Z

Added code to get Prometheus data and index it into Elasticsearch. Note this PR enables the functionality but does not use it. follow-on PRs will address specific tool triggers for Prometheus data collection.

Expected data flow would be that at the conclusion of a sample loop, the tool wrapper would yield a tuple (action, "get_prometheus_trigger"), this would trigger the collection of Prometheus data.

"action"(dict) would contain the following:

                action: {
                          "uuid": <uuid>
                          "user": <user>
                          "clustername": <clustername>
                          "starttime": <datetime> datetime.utcnow().strftime('%s')
                          "endtime": <datetime>
                          test_config: {...}
                        }

test_config would contain the sample specific test parameters that align to the current time window.

@jtaleric @bengland2 @dry923 @aakarshg

acalhounRH · 2020-04-20T14:42:53Z

/rerun all

aakarshg · 2020-04-20T15:15:06Z

/rerun all

rht-perf-ci · 2020-04-20T16:41:37Z

Results for SNAFU CI Test

Test	Result	Runtime
fio_wrapper	PASS	00:16:25
fs_drift_wrapper	PASS	00:07:11
hammerdb	PASS	00:07:20
iperf	PASS	00:04:49
pgbench_wrapper	PASS	00:03:58
smallfile_wrapper	PASS	00:07:17
sysbench	PASS	00:03:30
uperf_wrapper	PASS	00:18:56
ycsb_wrapper	PASS	00:11:14

rht-perf-ci · 2020-04-21T03:31:47Z

Results for SNAFU CI Test

Test	Result	Runtime
fio_wrapper	PASS	00:15:59
fs_drift_wrapper	PASS	00:06:56
hammerdb	PASS	00:07:27
iperf	PASS	00:04:36
pgbench_wrapper	PASS	00:04:02
smallfile_wrapper	PASS	00:07:12
sysbench	PASS	00:02:51
uperf_wrapper	FAIL	00:24:02
ycsb_wrapper	PASS	00:11:07

acalhounRH · 2020-04-22T13:38:05Z

/rerun all

bengland2

Overall structure of the code is good, I think. I like get_valid_es_document code reuse.

This is missing documentation - how does the benchmark developer switch over to this? How does the user make use of it? What are advantages of doing it this way for the user? Since most kubernetes users are familiar with Prometheus, I think we have to provide some justification for doing things this way. If the answer is short put it in README.md but if not, put it in a separate .md and reference it in README.md, right?

What kinds of prometheus data can you handle, and if you see data that you can't handle, do you alert the user to it and handle it without crashing?

BTW, I could be wrong in my inline comments but I think others might have similar questions, at a minimum comments or documentation should explain right?

utils/get_prometheus_data.py

rht-perf-ci · 2020-04-22T15:51:19Z

Results for SNAFU CI Test

Test	Result	Runtime
fio_wrapper	PASS	00:16:11
fs_drift_wrapper	PASS	00:06:39
hammerdb	PASS	00:07:07
iperf	PASS	00:07:40
pgbench_wrapper	PASS	00:04:14
smallfile_wrapper	PASS	00:06:50
sysbench	PASS	00:03:04
uperf_wrapper	FAIL	00:30:07
ycsb_wrapper	PASS	00:10:16

utils/get_prometheus_data.py

test-requirements.txt

utils/get_prometheus_data.py

utils/py_es_bulk.py

rht-perf-ci · 2020-05-28T18:52:12Z

Results for SNAFU CI Test

Test	Result	Runtime
fio_wrapper	FAIL	00:04:54
fs_drift_wrapper	FAIL	00:04:02
hammerdb	FAIL	00:12:54
iperf	PASS	00:02:49
pgbench_wrapper	FAIL	00:03:11
smallfile_wrapper	FAIL	00:04:14
sysbench	PASS	00:04:16
uperf_wrapper	FAIL	00:03:36
ycsb_wrapper	FAIL	00:14:22

rht-perf-ci · 2020-05-28T19:58:44Z

Results for SNAFU CI Test

Test	Result	Runtime
fio_wrapper	FAIL	00:04:23
fs_drift_wrapper	FAIL	00:04:13
hammerdb	FAIL	00:11:35
iperf	PASS	00:03:05
pgbench_wrapper	FAIL	00:03:29
smallfile_wrapper	FAIL	00:05:02
sysbench	PASS	00:02:20
uperf_wrapper	FAIL	00:03:57
ycsb_wrapper	FAIL	00:14:17

acalhounRH · 2020-05-29T02:23:26Z

/rerun all

rht-perf-ci · 2020-05-29T03:33:36Z

Results for SNAFU CI Test

Test	Result	Runtime
fio_wrapper	PASS	00:11:32
fs_drift_wrapper	PASS	00:06:33
hammerdb	PASS	00:06:32
iperf	PASS	00:03:45
pgbench_wrapper	PASS	00:03:19
smallfile_wrapper	PASS	00:06:06
sysbench	PASS	00:02:28
uperf_wrapper	PASS	00:17:00
ycsb_wrapper	PASS	00:06:25

rht-perf-ci · 2020-06-01T16:59:37Z

Results for SNAFU CI Test

Test	Result	Runtime
fio_wrapper	PASS	00:11:49
fs_drift_wrapper	PASS	00:05:32
hammerdb	PASS	00:06:04
iperf	PASS	00:02:51
pgbench_wrapper	PASS	00:03:07
smallfile_wrapper	PASS	00:06:11
sysbench	PASS	00:01:56
uperf_wrapper	FAIL	00:07:41
ycsb_wrapper	PASS	00:07:11

acalhounRH · 2020-06-11T20:30:00Z

/rerun all

acalhounRH · 2020-06-11T21:00:43Z

functionally tested PR no issues running fio with and without parallel indexing.

@aakarshg
@dry923

ready to merge, pending your review.

rht-perf-ci · 2020-06-11T21:49:17Z

Results for SNAFU CI Test

Test	Result	Runtime
fio_wrapper	PASS	00:10:57
fs_drift_wrapper	PASS	00:06:25
hammerdb	PASS	00:07:14
iperf	PASS	00:03:30
pgbench_wrapper	PASS	00:03:31
smallfile_wrapper	PASS	00:06:58
sysbench	FAIL	00:01:17
uperf_wrapper	FAIL	00:07:25
ycsb_wrapper	PASS	00:08:13

acalhounRH · 2020-06-24T17:50:16Z

I don't have write access, someone will have to resolve the conflict with run_snafu.py. @aakarshg

bengland2 · 2020-06-24T19:40:22Z

@acalhounRH I'll give a quick try

bengland2

I want to see how well Grafana dashboards are going to work with this, I talked with Alex, we need to get to the point where we are looking at real queries from some benchmark and seeing how well prom data integrates into elastic search. but we have to start doing something to kick the InfluxDB habit + copying prometheus data by hand, this seems like a good start. I want to see a better way to get a comprehensive current list of prom metrics, don't understand how to do that yet. Some documentation of how this is done would be nice.

aakarshg · 2020-06-25T22:14:53Z

I don't have write access, someone will have to resolve the conflict with run_snafu.py. @aakarshg

Hey this is a rebase issue, so you'll need to pull down the pr and the master branch and do a rebase ( fixing the merge conflicts ).

rsevilla87 · 2020-08-12T09:40:42Z

/rerun all

rsevilla87 · 2020-08-12T09:43:25Z

Hey @acalhounRH, I added some commits to this PR required to solve the dependency issues you're hitting.

comet-perf-ci · 2020-08-12T11:59:45Z

Results for SNAFU CI Test

Test	Result	Runtime
src/fio_wrapper	PASS	00:13:50
src/fs_drift_wrapper	PASS	00:09:22
src/hammerdb	PASS	00:07:40
src/iperf	PASS	00:05:55
src/pgbench_wrapper	PASS	00:05:43
src/smallfile_wrapper	PASS	00:09:49
src/sysbench	PASS	00:02:56
src/uperf_wrapper	PASS	00:27:33
src/vegeta_wrapper	PASS	00:08:34
src/ycsb_wrapper	FAIL	00:12:11

Signed-off-by: Raul Sevilla <rsevilla@redhat.com>

comet-perf-ci · 2020-08-12T13:48:29Z

Results for SNAFU CI Test

Test	Result	Runtime
src/fio_wrapper	PASS	00:13:36
src/fs_drift_wrapper	PASS	00:09:13
src/hammerdb	PASS	00:07:32
src/iperf	PASS	00:06:06
src/pgbench_wrapper	PASS	00:05:58
src/smallfile_wrapper	PASS	00:09:44
src/sysbench	PASS	00:03:03
src/uperf_wrapper	PASS	00:28:44
src/vegeta_wrapper	PASS	00:08:57
src/ycsb_wrapper	PASS	00:08:22

bengland2 · 2020-08-12T15:02:50Z

Thx Raul!

rsevilla87 · 2020-08-12T15:17:01Z

Hey @acalhounRH, in addition to linting issues there are code issues as well:

./src/utils/get_prometheus_data.py:42:89: F821 undefined name 'disable_ssl'
            self.pc = PrometheusConnect(url=self.url, headers=self.headers, disable_ssl=disable_ssl)
./src/utils/get_prometheus_data.py:104:66: F821 undefined name 'ROC'
                                    "rate_of_change_per_second": ROC,
./src/utils/py_es_bulk.py:177:13: F821 undefined name 'logger'
            logger.warn(resp)

Can you please take a look?

acalhounRH · 2020-08-12T15:36:59Z

Will start fixing those issue, thanks for the help with resolving those dependency issues.

set disable_ssl to true remove rate of change reference included initialization of logger

comet-perf-ci · 2020-08-12T18:25:02Z

Results for SNAFU CI Test

Test	Result	Runtime
src/fio_wrapper	FAIL	00:06:37
src/fs_drift_wrapper	FAIL	00:06:25
src/hammerdb	FAIL	00:13:33
src/iperf	PASS	00:06:11
src/pgbench_wrapper	PASS	00:05:43
src/smallfile_wrapper	FAIL	00:06:27
src/sysbench	PASS	00:03:06
src/uperf_wrapper	FAIL	00:08:09
src/vegeta_wrapper	FAIL	00:11:40
src/ycsb_wrapper	FAIL	00:12:57

Signed-off-by: Raul Sevilla <rsevilla@redhat.com>

comet-perf-ci · 2020-08-13T10:18:22Z

Results for SNAFU CI Test

Test	Result	Runtime
src/fio_wrapper	PASS	00:15:00
src/fs_drift_wrapper	PASS	00:09:38
src/hammerdb	PASS	00:08:30
src/iperf	PASS	00:06:26
src/pgbench_wrapper	PASS	00:06:33
src/smallfile_wrapper	PASS	00:10:04
src/sysbench	PASS	00:03:10
src/uperf_wrapper	PASS	00:29:19
src/vegeta_wrapper	PASS	00:13:54
src/ycsb_wrapper	PASS	00:10:25

Signed-off-by: Raul Sevilla <rsevilla@redhat.com>

comet-perf-ci · 2020-08-13T12:35:15Z

Results for SNAFU CI Test

Test	Result	Runtime
src/fio_wrapper	PASS	00:14:30
src/fs_drift_wrapper	PASS	00:09:23
src/hammerdb	PASS	00:07:27
src/iperf	PASS	00:07:22
src/pgbench_wrapper	PASS	00:08:20
src/smallfile_wrapper	FAIL	00:01:02
src/sysbench	PASS	00:04:31
src/uperf_wrapper	FAIL	00:01:03
src/vegeta_wrapper	FAIL	00:00:42
src/ycsb_wrapper	FAIL	00:00:33

Signed-off-by: Raul Sevilla <rsevilla@redhat.com>

comet-perf-ci · 2020-08-13T16:40:07Z

Results for SNAFU CI Test

Test	Result	Runtime
src/fio_wrapper	PASS	00:15:51
src/fs_drift_wrapper	PASS	00:12:43
src/hammerdb	PASS	00:09:48
src/iperf	PASS	00:06:05
src/pgbench_wrapper	PASS	00:08:23
src/smallfile_wrapper	FAIL	00:09:42
src/sysbench	PASS	00:04:28
src/uperf_wrapper	PASS	00:37:35
src/vegeta_wrapper	PASS	00:09:16
src/ycsb_wrapper	PASS	00:08:43

acalhounRH · 2020-08-14T03:03:22Z

Took a look at the smallfile CI test and its not obvious what exactly was the issue. Could anyone take a look? is this another dependency issue?

rsevilla87 · 2020-08-14T09:21:47Z

/rerun all

comet-perf-ci · 2020-08-14T11:09:52Z

Results for SNAFU CI Test

Test	Result	Runtime
src/fio_wrapper	PASS	00:14:22
src/fs_drift_wrapper	PASS	00:09:24
src/hammerdb	PASS	00:08:01
src/iperf	PASS	00:06:16
src/pgbench_wrapper	PASS	00:05:36
src/smallfile_wrapper	PASS	00:09:42
src/sysbench	PASS	00:03:29
src/uperf_wrapper	PASS	00:29:53
src/vegeta_wrapper	PASS	00:09:10
src/ycsb_wrapper	PASS	00:08:35

rsevilla87

LGTM!

jtaleric

LGTM

dry923

LGTM

aakarshg added the ok to test Kick off our CI framework label Apr 20, 2020

bengland2 self-assigned this Apr 22, 2020

bengland2 suggested changes Apr 22, 2020

View reviewed changes

utils/get_prometheus_data.py Outdated Show resolved Hide resolved

utils/get_prometheus_data.py Outdated Show resolved Hide resolved

utils/get_prometheus_data.py Outdated Show resolved Hide resolved

utils/get_prometheus_data.py Outdated Show resolved Hide resolved

aakarshg reviewed Apr 22, 2020

View reviewed changes

utils/get_prometheus_data.py Outdated Show resolved Hide resolved

acalhounRH mentioned this pull request May 28, 2020

Use parallel ES indexing #185

Closed