Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add wal write system call metrics observation (main) #17618

Merged
merged 1 commit into from
Mar 31, 2024

Conversation

Akiqqqqqqq
Copy link

@Akiqqqqqqq Akiqqqqqqq commented Mar 20, 2024

xref. #17615

add new metrics: wal_write_duration_seconds

wal_write_duration_seconds represent total delay of:

syscall.write(padding_info + record_data_with_padding)

  • delay of fsync is not included in wal_write_duration_seconds
  • data: the raft log to write to the WAL on disk at one single raftNode loop

@k8s-ci-robot
Copy link

Hi @Akiqqqqqqq. Thanks for your PR.

I'm waiting for a etcd-io member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@jmhbnz
Copy link
Member

jmhbnz commented Mar 20, 2024

/ok-to-test

@Akiqqqqqqq
Copy link
Author

/retest

Copy link
Member

@jmhbnz jmhbnz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM - Thanks @Akiqqqqqqq

@Akiqqqqqqq
Copy link
Author

@fuweid master, can help approve merge this PR?

@ahrtr
Copy link
Member

ahrtr commented Mar 24, 2024

Also update the metrics in writeUint64, just as what we already do for walWriteBytes?

nv, err := w.Write(buf)
walWriteBytes.Add(float64(nv))

@Akiqqqqqqq
Copy link
Author

Akiqqqqqqq commented Mar 25, 2024

Also update the metrics in writeUint64, just as what we already do for walWriteBytes?

nv, err := w.Write(buf)
walWriteBytes.Add(float64(nv))

will do

Copy link
Member

@ahrtr ahrtr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

Thanks

@ahrtr
Copy link
Member

ahrtr commented Mar 25, 2024

cc @serathius

serathius
serathius previously approved these changes Mar 25, 2024
@serathius serathius dismissed their stale review March 25, 2024 14:27

Want to ask question

nv, err := w.Write(buf)
walWriteSec.Observe(time.Since(start).Seconds())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should there be two separate calls to Observe? I think those are part of a single write. I think it would be better to have single call with time measured before writeUint64.

	start := time.Now()
	if err = writeUint64(e.bw, lenField, e.uint64buf); err != nil {
		return err
	}
	if padBytes != 0 {
		data = append(data, make([]byte, padBytes)...)
	}
	n, err = e.bw.Write(data)
	walWriteSec.Observe(time.Since(start).Seconds())

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see you followed the same flow as for walWriteBytes, however the difference is that number of calls to cumulative .Add doesn't matter while .Observe on histogram does.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Observing a write of writeUint64 as separate call would skew the histogram, as it's expected to write much smaller than size of data, then your 50%ile would be just writeUint64.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, that make sense, will fix later

Copy link
Member

@serathius serathius left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please fix measurement of write so there is only 1 call to .Observe.

Signed-off-by: Qiuyu Wu <qiuyu.wu@shopee.com>
Copy link
Member

@fuweid fuweid left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@ahrtr ahrtr merged commit cc32fa8 into etcd-io:main Mar 31, 2024
39 checks passed
@ahrtr
Copy link
Member

ahrtr commented Mar 31, 2024

Please also add a changelog for 3.6 if you don't backport the fix to 3.5; otherwise add a changelog for 3.5 after backporting.

@Akiqqqqqqq
Copy link
Author

Please also add a changelog for 3.6 if you don't backport the fix to 3.5; otherwise add a changelog for 3.5 after backporting.

will do later

@Akiqqqqqqq
Copy link
Author

Akiqqqqqqq commented Apr 7, 2024

I backported to 3.5 and raised a changelog: #17728
plz check

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

6 participants