Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs: Improve ORAS diagnose experience #1483

Open
wants to merge 12 commits into
base: main
Choose a base branch
from

Conversation

FeynmanZhou
Copy link
Member

@FeynmanZhou FeynmanZhou commented Aug 19, 2024

What this PR does / why we need it:

This proposal document aims to:

  1. Identify the issues associated with the current implementation of the --verbose and --debug options.
  2. Clarify the concepts of verbose output and debug logs.
  3. List the guiding principles to write comprehensive, clear, and conducive verbose output and debug logs for effective diagnosis.
  4. Propose solutions to improve the diagnostic experience for ORAS CLI users and developers.
  5. Add examples for proposed changes (more examples will be appended).

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Fixes #1091

Signed-off-by: Feynman Zhou <feynmanzhou@microsoft.com>
Copy link

codecov bot commented Aug 19, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 86.05%. Comparing base (60b1369) to head (12bfc72).
Report is 18 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1483      +/-   ##
==========================================
+ Coverage   85.54%   86.05%   +0.51%     
==========================================
  Files         113      118       +5     
  Lines        3936     4224     +288     
==========================================
+ Hits         3367     3635     +268     
- Misses        342      352      +10     
- Partials      227      237      +10     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Signed-off-by: Feynman Zhou <feynmanzhou@microsoft.com>
Signed-off-by: Feynman Zhou <feynmanzhou@microsoft.com>
Copy link

@toddysm toddysm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMHO, if a user decides to get verbose output, he or she will be prepared to receive a lot of information in the output (including very technical information - for example, curl gives details about the encryption algorithm). I am still not convinced that we need to complicate the interface to suppress some of the output.

docs/proposals/diagnose-experience.md Show resolved Hide resolved
docs/proposals/diagnose-experience.md Outdated Show resolved Hide resolved
docs/proposals/diagnose-experience.md Outdated Show resolved Hide resolved
docs/proposals/diagnose-experience.md Outdated Show resolved Hide resolved
docs/proposals/diagnose-experience.md Outdated Show resolved Hide resolved
docs/proposals/diagnose-experience.md Outdated Show resolved Hide resolved
> Request headers:
"User-Agent": "oras/1.2.0+Homebrew"
"Accept": "application/vnd.docker.distribution.manifest.v2+json, application/vnd.docker.distribution.manifest.list.v2+json, application/vnd.oci.image.manifest.v1+json, application/vnd.oci.image.index.v1+json, application/vnd.oci.artifact.manifest.v1+json"

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to insert separators like -----?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would recommend looking at other tools. There are so many nicer options -
image

Copy link
Member Author

@FeynmanZhou FeynmanZhou Sep 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I propose to add an empty line as separator between each request and response for readability since this is a very common design. Do you have better ideas on improving the debug logs' readability?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe an empty line between incoming and outgoing message

docs/proposals/diagnose-experience.md Show resolved Hide resolved
docs/proposals/diagnose-experience.md Outdated Show resolved Hide resolved
- Example: `DEBUG: [2023-10-01T12:00:00Z] Starting metadata retrieval for repository oras-demo`

### 2. **Capture API-Specific Details**
- **API Requests:** Log detailed information about API requests made to the registry server, including the HTTP method, endpoint, headers, and body (excluding sensitive information).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO, we should not exclude sensitive information for debug output for two reasons:

  • it is nearly impossible to redact all sensitive information
  • it may actually be useful to see the sensitive information in the logs

As some kind of compromise maybe attempt no redaction at the highest level of debug if we were to have levels. Ansible used this policy and I think it was very useful.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will it be insecure and non-compliant to log sensitive information in ORAS debug logs?

docs/proposals/diagnose-experience.md Show resolved Hide resolved
docs/proposals/diagnose-experience.md Outdated Show resolved Hide resolved
docs/proposals/diagnose-experience.md Outdated Show resolved Hide resolved
docs/proposals/diagnose-experience.md Outdated Show resolved Hide resolved
Signed-off-by: Feynman Zhou <feynmanzhou@microsoft.com>
Signed-off-by: Feynman Zhou <feynmanzhou@microsoft.com>
Signed-off-by: Feynman Zhou <feynmanzhou@microsoft.com>
Signed-off-by: Feynman Zhou <feynmanzhou@microsoft.com>
Signed-off-by: Feynman Zhou <feynmanzhou@microsoft.com>
Signed-off-by: Feynman Zhou <feynmanzhou@microsoft.com>
docs/proposals/diagnose-experience.md Outdated Show resolved Hide resolved
docs/proposals/diagnose-experience.md Outdated Show resolved Hide resolved
- Poor readability of debug logs. No separator lines between request and response information. Users need to add separator lines manually for readability. See the relevant issue [#1382](https://github.com/oras-project/oras/issues/1382).
- Critical information is missing in debug logs. For example, the [error code](https://github.com/opencontainers/distribution-spec/blob/main/spec.md#error-codes) and metadata of the processed resource object (e.g. image manifest) are not displayed.
- The detailed operation information is missing in verbose output. For example, how many and where are objects processed. Less or no verbose output of ORAS commands in some use cases.
- Timestamp of each request and response is missing in debug logs, which is hard to trace historical operation and trace the sequence of events accurately.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It depends on the OS and environment.

You may observe the differences by running the following commands:

oras manifest fetch -d --descriptor ghcr.io/oras-project/oras:v1.2.0
oras manifest fetch -d --descriptor ghcr.io/oras-project/oras:v1.2.0 2>&1 | cat

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oras manifest fetch -d --descriptor ghcr.io/oras-project/oras:v1.2.0

gives

DEBU[0000] Request #0

while

oras manifest fetch -d --descriptor ghcr.io/oras-project/oras:v1.2.0 2>&1 | cat

reports

time=2024-09-14T15:17:52+08:00 level=debug msg=Request #0

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on this example, it looks like timestamp is not visible to users in debug logs by default. Users have to output the timestamp information with an additional command cat.

How about making timestamp information to each request and response by default?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It depends on logrus. @Wwwsylvia Could you provide options in logrus?

Copy link
Member

@Wwwsylvia Wwwsylvia Sep 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just tested a bit. We can enable TextFormatter.FullTimestamp to achieve the goal:

logger.SetFormatter(&logrus.TextFormatter{
		DisableQuote:  true,
+		FullTimestamp: true,
})

As a result, logs in TTY will show as:

DEBU[2024-09-23T18:56:49+08:00] Request #0

And logs in non-TTY will show as:

time=2024-09-23T18:59:56+08:00 level=debug msg=Request #0

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thus, it is technically feasible to make full timestamp information to each request and response by default.

docs/proposals/diagnose-experience.md Show resolved Hide resolved
Comment on lines +81 to +82
- **State Logs:** Log important state information, function names, important variable state, file name and line number before and after key operations or decisions.
- Example: `DEBUG: Current retry attempt: 1, Max retries: 3`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This does not apply to implementation inside oras-go. In other words, no details in oras.Copy() will be output. The reason is simple that logging is tightly coupled with the specific logging implementation while oras-go should be generic.

Other than logurs, there are other logging implementations like log, log/slog, zap, apex/log, etc. Unfortunately, they don't share the same interface.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that all logs will be hooks in real implementations.

docs/proposals/diagnose-experience.md Outdated Show resolved Hide resolved
- Example: `DEBUG: Current registry URL: https://myregistry.io, Timeout setting:

### 6. **Avoid Logging Sensitive Information**
- **Privacy:** Abstain from logging sensitive information such as passwords, personal data, or security tokens.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Define "sensitive information".

docs/proposals/diagnose-experience.md Outdated Show resolved Hide resolved
docs/proposals/diagnose-experience.md Outdated Show resolved Hide resolved
docs/proposals/diagnose-experience.md Outdated Show resolved Hide resolved
docs/proposals/diagnose-experience.md Show resolved Hide resolved
docs/proposals/diagnose-experience.md Outdated Show resolved Hide resolved
docs/proposals/diagnose-experience.md Outdated Show resolved Hide resolved
Signed-off-by: Feynman Zhou <feynmanzhou@microsoft.com>
Signed-off-by: Feynman Zhou <feynmanzhou@microsoft.com>
docs/proposals/diagnose-experience.md Outdated Show resolved Hide resolved
@@ -52,7 +52,7 @@ Logs focus on providing technical details for in-depth diagnosing and troublesho
## Proposals for ORAS CLI

- Deprecate the global flag `--verbose` and only remain `--debug` to avoid ambiguity. Based on the concept above, it is reasonable to continue using `--debug` to enable the output of `DEBUG` level logs in ORAS as it is in ORAS. Meanwhile, This change will make the diagnose experience much more straightforward and less breaking since only ORAS `pull/push/attach/discover` commands have verbose output.
- The existing output of `--verbose` in several ORAS commands `pull/push/attach/discover` can still be reserved. Introducing an additional flag to these commands.
- Introduce a new flag `--detail` to replace the existing global flag `--verbose` of commands like `pull`, `push`, `attach`, and `discover` for detailed output.
Copy link
Member

@Wwwsylvia Wwwsylvia Sep 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to discuss, how about --long?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think --detail is a good option, since it will cause similar confusion as --verbose. Questions like is it about detailed logs or about detailed other outputs?

One option is output the --verbose content by default, then we do not need any flags for verbose output that are not logs. This also solves the problem that some commands have --verbose flags, some don't.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will --debug be a global flag, or just available for some commands?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of --detail, I think we can have --quiet for pull, push, and attach.

discover is different since the --verbose takes effect on the metadata output instead of the status output. In that case, we can just remove --verbose and introduce new values for --format such as --format tree-full. It is also a good time to revisit the outputs of oras discover.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree with @shizhMSFT . Current verbose output in pull, push, attach is progress information but in discover it is metadata output. They are different cases.


### Output

There are three types of output in ORAS CLI:
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think log is also one type of output. An "output” of a command refers to the information that is displayed or returned after the command is executed, it can be logs, formatted data, error messages.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Logs are logs, and logs may not be displayed or returned. They may be collected in systemd, or sent to fluentd.

> Request headers:
"User-Agent": "oras/1.2.0+Homebrew"
"Accept": "application/vnd.docker.distribution.manifest.v2+json, application/vnd.docker.distribution.manifest.list.v2+json, application/vnd.oci.image.manifest.v1+json, application/vnd.oci.image.index.v1+json, application/vnd.oci.artifact.manifest.v1+json"

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe an empty line between incoming and outgoing message

@@ -52,7 +52,7 @@ Logs focus on providing technical details for in-depth diagnosing and troublesho
## Proposals for ORAS CLI

- Deprecate the global flag `--verbose` and only remain `--debug` to avoid ambiguity. Based on the concept above, it is reasonable to continue using `--debug` to enable the output of `DEBUG` level logs in ORAS as it is in ORAS. Meanwhile, This change will make the diagnose experience much more straightforward and less breaking since only ORAS `pull/push/attach/discover` commands have verbose output.
- The existing output of `--verbose` in several ORAS commands `pull/push/attach/discover` can still be reserved. Introducing an additional flag to these commands.
- Introduce a new flag `--detail` to replace the existing global flag `--verbose` of commands like `pull`, `push`, `attach`, and `discover` for detailed output.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think --detail is a good option, since it will cause similar confusion as --verbose. Questions like is it about detailed logs or about detailed other outputs?

One option is output the --verbose content by default, then we do not need any flags for verbose output that are not logs. This also solves the problem that some commands have --verbose flags, some don't.


### Guiding Principles

Here are the guiding principles to write debug logs.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest we can just select a popular logging library or tool and follow the formats that are defined by the library or tool, because I don't think it is worth spending time to develop/customize our own logging formats, unless it is easy to customize using the library or tool.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree. Currently, we use logrus, which is used by docker daemon / CLI. There are also other log implementations like zap, apex, slog, etc.

docs/proposals/diagnose-experience.md Outdated Show resolved Hide resolved
@@ -52,7 +52,7 @@ Logs focus on providing technical details for in-depth diagnosing and troublesho
## Proposals for ORAS CLI

- Deprecate the global flag `--verbose` and only remain `--debug` to avoid ambiguity. Based on the concept above, it is reasonable to continue using `--debug` to enable the output of `DEBUG` level logs in ORAS as it is in ORAS. Meanwhile, This change will make the diagnose experience much more straightforward and less breaking since only ORAS `pull/push/attach/discover` commands have verbose output.
- The existing output of `--verbose` in several ORAS commands `pull/push/attach/discover` can still be reserved. Introducing an additional flag to these commands.
- Introduce a new flag `--detail` to replace the existing global flag `--verbose` of commands like `pull`, `push`, `attach`, and `discover` for detailed output.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will --debug be a global flag, or just available for some commands?

docs/proposals/diagnose-experience.md Outdated Show resolved Hide resolved
docs/proposals/diagnose-experience.md Outdated Show resolved Hide resolved
docs/proposals/diagnose-experience.md Outdated Show resolved Hide resolved

### Output

There are three types of output in ORAS CLI:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Logs are logs, and logs may not be displayed or returned. They may be collected in systemd, or sent to fluentd.


### Logs

Logs focus on providing technical details for in-depth diagnosing and troubleshooting issues. It is intended for developers or technical users who need to understand the inner workings of the tool. Debug logs are detailed and technical, often including HTTP request and response from interactions between client and server, as well as code-specific information. In general, there are different levels of log including `DEBUG`, `INFO`, `WARNING`, `ERROR`, but only `DEBUG` level log is used in ORAS, which is controlled by the flag `--debug`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some log implementation has the TRACE level, and the FATAL level.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed the description as follows:

In general, there are different levels of logs. Logrus has been used by ORAS, which has seven logging levels: Trace, Debug, Info, Warning, Error, Fatal and Panic, but only DEBUG level log is used in ORAS, which is controlled by the flag --debug.

- Introduce a new flag `--detail` to replace the existing global flag `--verbose` of commands like `pull`, `push`, `attach`, and `discover` for detailed output.
- Add separator lines between each request and response for readability.
- Add timestamp of each request and response to the beginning of each request and response.
- Add the response body including [error code](https://github.com/opencontainers/distribution-spec/blob/main/spec.md#error-codes) and the metadata of processed OCI object (e.g. image manifest) to the debug logs
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does it mean? Is it applicable to OCI image layout?

- Add separator lines between each request and response for readability.
- Add timestamp of each request and response to the beginning of each request and response.
- Add the response body including [error code](https://github.com/opencontainers/distribution-spec/blob/main/spec.md#error-codes) and the metadata of processed OCI object (e.g. image manifest) to the debug logs
- Define guiding principles and conventions for writing clear and conductive debug logs
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the purpose / motivation of this?


### Guiding Principles

Here are the guiding principles to write debug logs.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree. Currently, we use logrus, which is used by docker daemon / CLI. There are also other log implementations like zap, apex, slog, etc.

Comment on lines +85 to +89
- **Catch and Log Exceptions:** Always catch exceptions and log them with relevant context and stack traces.
- Example: `ERROR: Exception occurred in fetchManifest: Network timeout while accessing /v2/oras-demo/manifests/latest`

- **Error Codes:** Include specific error codes to facilitate quick identification and resolution.
- Example: `ERROR: [ErrorCode: 504] Network timeout while accessing /v2/oras-demo/manifests/latest`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently, oras only use debug logs.

Comment on lines +238 to +240
ORAS Version: 1.2.0+Homebrew
Go version: go1.22.3
OS/Arch: linux/amd64
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing git information

Signed-off-by: Feynman Zhou <feynmanzhou@microsoft.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Improve verbose logging for oras commands
7 participants