Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WAL analysis tool improvements #9295

Closed
wenjiaswe opened this issue Feb 7, 2018 · 8 comments
Closed

WAL analysis tool improvements #9295

wenjiaswe opened this issue Feb 7, 2018 · 8 comments
Assignees

Comments

@wenjiaswe
Copy link
Contributor

wenjiaswe commented Feb 7, 2018

We want to improve WAL log analysis tool to serve as a better signal/input into workload analysis tooling.

Here is the current wal log analysis tool available for etcd: etcd-dump-logs

@jpbetz @cheftako @mml

@xiang90
Copy link
Contributor

xiang90 commented Feb 7, 2018

performance bottle neck.

The read path operations wont be recorded in WAL files. For read heavy workload, it might not help to find perf bottleneck.

Do you know what exactly you would like to improve over today's tool you listed above?

@jpbetz
Copy link
Contributor

jpbetz commented Feb 7, 2018

Yep. The WAL log only contains the linearized range reads, no watches or any of the other non-linearized reads.

We might rephrase the purpose of these improvements as- "improve WAL log analysis tool to serve as a better signal/input into workload analysis tooling". We don't have any workload analysis tooling yet, but we're looking to do some adhoc data crunching to better understand the shapes of etcd workloads for a variety of real-world kubernetes clusters, and were thinking as using the WAL log as one of our inputs.

Totally agree that we need more detail on what specifically we intend to improve.

@xiang90
Copy link
Contributor

xiang90 commented Feb 7, 2018

@jpbetz

We already moved l-read out of raft serialized execution path to allow concurrent read (also more parallelism if you have multi-core available). With that effort, l-read is no longer in raft log any more.

@jpbetz
Copy link
Contributor

jpbetz commented Feb 8, 2018

@xiang90 Thanks. I'll try to catch up on the concurrent read improvements today, I'm a bit behind.

Does/should etcd have some sort of access logging, maybe disabled by default, that could be used as a more comprehensive recording of a workload?

@xiang90
Copy link
Contributor

xiang90 commented Feb 8, 2018

@jpbetz

There is an issue for audit logging #5019. We probably can extend it to be a workload recorder.

@wenjiaswe
Copy link
Contributor Author

#9628 added an entry-type flag to the current etcd-dump-logs, so the user could set the flag and list interested entries types accordingly. I would consider it to be just a small part of the fix to this issue. It DOES not address the audit logging #5019 yet. And I will keep working on WAL log analysis as well as etcd workload analysis. Do I need to create several measurable issues instead of referring this issue for all related fixes? @jpbetz @gyuho @xiang90

@gyuho
Copy link
Contributor

gyuho commented Apr 25, 2018

Let's close this when #9628 gets merged, since it already adds filtering. And can you create new issues that you want to work on?

@wenjiaswe
Copy link
Contributor Author

@gyuho sure, here it is: #9631

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

4 participants