Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs: Improve ORAS diagnose experience #1483

Open
wants to merge 12 commits into
base: main
Choose a base branch
from
258 changes: 258 additions & 0 deletions docs/proposals/diagnose-experience.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,258 @@
# Improve ORAS diagnose experience

ORAS currently offers two global options, `--verbose` and `--debug`, which enable users to generate verbose output and logs respectively. These features facilitate both users and developers in inspecting ORAS's performance, interactions with external services and internal systems, and in diagnosing issues by providing a clear picture of the tool’s operations.
FeynmanZhou marked this conversation as resolved.
Show resolved Hide resolved
FeynmanZhou marked this conversation as resolved.
Show resolved Hide resolved

Given the diverse roles and scenarios in which ORAS CLI is utilized, we have received feedback from users and developers to improve the diagnostic experience. Enhancing debug logs can significantly benefit ORAS users and developers by making diagnostics clearer and more unambiguous.

This proposal document aims to:

1. Identify the issues associated with the current implementation of the `--verbose` and `--debug` options.
2. Clarify the concepts of different types of output and logs for diagnostic purposes.
3. List the guiding principles to write comprehensive, clear, and conducive debug output and debug logs for effective diagnosis.
4. Propose solutions to improve the diagnostic experience for ORAS CLI users and developers.

## Problem Statement

Specifically, there are existing GitHub issues raised in the ORAS community.

- The user is confused about when to use `--verbose` and `--debug`. See the relevant issue [#1382](https://github.com/oras-project/oras/issues/1382).
- Poor readability of debug logs. No separator lines between request and response information. Users need to add separator lines manually for readability. See the relevant issue [#1382](https://github.com/oras-project/oras/issues/1382).
- Critical information is missing in debug logs. For example, the [error code](https://github.com/opencontainers/distribution-spec/blob/main/spec.md#error-codes) and metadata of the processed resource object (e.g. image manifest) are not displayed.
FeynmanZhou marked this conversation as resolved.
Show resolved Hide resolved
- The detailed operation information is missing in verbose output. For example, how many and where are objects processed. Less or no verbose output of ORAS commands in some use cases.
- Timestamp of each request and response is missing in debug logs, which is hard to trace historical operation and trace the sequence of events accurately.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It depends on the OS and environment.

You may observe the differences by running the following commands:

oras manifest fetch -d --descriptor ghcr.io/oras-project/oras:v1.2.0
oras manifest fetch -d --descriptor ghcr.io/oras-project/oras:v1.2.0 2>&1 | cat

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oras manifest fetch -d --descriptor ghcr.io/oras-project/oras:v1.2.0

gives

DEBU[0000] Request #0

while

oras manifest fetch -d --descriptor ghcr.io/oras-project/oras:v1.2.0 2>&1 | cat

reports

time=2024-09-14T15:17:52+08:00 level=debug msg=Request #0

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on this example, it looks like timestamp is not visible to users in debug logs by default. Users have to output the timestamp information with an additional command cat.

How about making timestamp information to each request and response by default?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It depends on logrus. @Wwwsylvia Could you provide options in logrus?

Copy link
Member

@Wwwsylvia Wwwsylvia Sep 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just tested a bit. We can enable TextFormatter.FullTimestamp to achieve the goal:

logger.SetFormatter(&logrus.TextFormatter{
		DisableQuote:  true,
+		FullTimestamp: true,
})

As a result, logs in TTY will show as:

DEBU[2024-09-23T18:56:49+08:00] Request #0

And logs in non-TTY will show as:

time=2024-09-23T18:59:56+08:00 level=debug msg=Request #0

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thus, it is technically feasible to make full timestamp information to each request and response by default.


## Concepts
FeynmanZhou marked this conversation as resolved.
Show resolved Hide resolved

At first, the output of ORAS flag `--verbose` and `--debug` should be clarified before restructuring them.

### Output

There are three types of output in ORAS CLI:
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think log is also one type of output. An "output” of a command refers to the information that is displayed or returned after the command is executed, it can be logs, formatted data, error messages.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Logs are logs, and logs may not be displayed or returned. They may be collected in systemd, or sent to fluentd.

FeynmanZhou marked this conversation as resolved.
Show resolved Hide resolved

- **Status output**: such as progress information, progress bar in pulling or pushing files
- **Metadata output**: showing what has been pulled (e.g. filename, digest, etc.) in specified format, such as JSON, text
- **Content output**: it is to output the raw data obtained from the remote registry server or file system, such as the pulled artifact content save as a file
FeynmanZhou marked this conversation as resolved.
Show resolved Hide resolved


The target users of these types of output are normal users. Currently, the output of ORAS `--verbose` flag belongs to status output since it contains detailed progress status. The output of `--verbose` only exists in oras `pull/push/attach` commands.
FeynmanZhou marked this conversation as resolved.
Show resolved Hide resolved
FeynmanZhou marked this conversation as resolved.
Show resolved Hide resolved

It is intended for end-users who want to observe the detailed file operation when using ORAS. It gives users a comprehensive view of what the tool is doing at every step and how long does it take when push or pull a file.

Since ORAS v1.2.0, progress bar is enabled in `pull/push/attach` by default, thus the ORAS output is already verbose in a terminal.

### Logs

Logs focus on providing technical details for in-depth diagnosing and troubleshooting issues. It is intended for developers or technical users who need to understand the inner workings of the tool. Debug logs are detailed and technical, often including HTTP request and response from interactions between client and server, as well as code-specific information. In general, there are different levels of log including `DEBUG`, `INFO`, `WARNING`, `ERROR`, but only `DEBUG` level log is used in ORAS, which is controlled by the flag `--debug`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some log implementation has the TRACE level, and the FATAL level.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed the description as follows:

In general, there are different levels of logs. Logrus has been used by ORAS, which has seven logging levels: Trace, Debug, Info, Warning, Error, Fatal and Panic, but only DEBUG level log is used in ORAS, which is controlled by the flag --debug.


- **Purpose**: Debug logs are specifically aim to facilitate ORAS developers to diagnose ORAS tool itself. They contain detailed technical information that is useful for troubleshooting problems.
- **Target users**: Primarily intended for developers or technical users who are trying to understand the inner workings of the code and identify the root cause of a possible issue with the tool itself.
- **Content**: Debug logs focus on providing context needed to troubleshoot issues, like variable values, execution paths, error stack traces, and internal states of the application.
- **Level of Detail**: Extremely detailed, providing insights into the application's internal workings and logic, often including low-level details that are essential for debugging.

## Proposals for ORAS CLI

- Deprecate the global flag `--verbose` and only remain `--debug` to avoid ambiguity. Based on the concept above, it is reasonable to continue using `--debug` to enable the output of `DEBUG` level logs in ORAS as it is in ORAS. Meanwhile, This change will make the diagnose experience much more straightforward and less breaking since only ORAS `pull/push/attach/discover` commands have verbose output.
- The existing output of `--verbose` in several ORAS commands `pull/push/attach/discover` can still be reserved. Introducing an additional flag to these commands.
FeynmanZhou marked this conversation as resolved.
Show resolved Hide resolved
- Add separator lines between each request and response for readability.
- Add timestamp of each request and response to the beginning of each request and response.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We probably just need to adjust the logrus implementation. /cc @Wwwsylvia for feasibility assessment.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes that is doable via the TextFormatter.FullTimestamp option.

- Add the response body including [error code](https://github.com/opencontainers/distribution-spec/blob/main/spec.md#error-codes) and the metadata of processed OCI object (e.g. image manifest) to the debug logs
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does it mean? Is it applicable to OCI image layout?

- Define guiding principles and conventions for writing clear and conductive debug logs
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the purpose / motivation of this?


### Guiding Principles

Here are the guiding principles to write debug logs.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest we can just select a popular logging library or tool and follow the formats that are defined by the library or tool, because I don't think it is worth spending time to develop/customize our own logging formats, unless it is easy to customize using the library or tool.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree. Currently, we use logrus, which is used by docker daemon / CLI. There are also other log implementations like zap, apex, slog, etc.


### 1. **Timestamp Each Log Entry**
- **Precise Timing:** Ensure each log entry has a precise timestamp to trace the sequence of events accurately.
- Example: `DEBUG: [2023-10-01T12:00:00Z] Starting metadata retrieval for repository oras-demo`

### 2. **Capture API-Specific Details**
- **API Requests:** Log detailed information about API requests made to the registry server, including the HTTP method, endpoint, headers, and body (excluding sensitive information).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO, we should not exclude sensitive information for debug output for two reasons:

  • it is nearly impossible to redact all sensitive information
  • it may actually be useful to see the sensitive information in the logs

As some kind of compromise maybe attempt no redaction at the highest level of debug if we were to have levels. Ansible used this policy and I think it was very useful.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will it be insecure and non-compliant to log sensitive information in ORAS debug logs?

- Example: `DEBUG: [HTTPRequest] POST /v2/oras-demo/blobs/uploads/ Headers: {Content-Length: 524288}`

- **API Responses:** Log details about the API responses received, including status codes, headers, and response body (excluding sensitive information).
- Example: `DEBUG: [HTTPResponse] Status: 201 Created, Headers: {Location: /v2/oras-demo/blobs/uploads/uuid}, Body: {}`
FeynmanZhou marked this conversation as resolved.
Show resolved Hide resolved

### 3. **Log Before and After Critical Operations**
FeynmanZhou marked this conversation as resolved.
Show resolved Hide resolved
- **Operation Logs:** Log before performing critical operations and after completing them, including success or failure status.
- Example: `DEBUG: Starting upload of layer 2 of 3 for repository oras-demo`
- Example: `DEBUG: Successfully uploaded layer 2 of 3 for repository oras-demo`
FeynmanZhou marked this conversation as resolved.
Show resolved Hide resolved

- **State Logs:** Log important state information, function names, important variable state, file name and line number before and after key operations or decisions.
- Example: `DEBUG: Current retry attempt: 1, Max retries: 3`
FeynmanZhou marked this conversation as resolved.
Show resolved Hide resolved
Comment on lines +81 to +82
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This does not apply to implementation inside oras-go. In other words, no details in oras.Copy() will be output. The reason is simple that logging is tightly coupled with the specific logging implementation while oras-go should be generic.

Other than logurs, there are other logging implementations like log, log/slog, zap, apex/log, etc. Unfortunately, they don't share the same interface.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that all logs will be hooks in real implementations.


### 4. **Error and Exception Handling**
- **Catch and Log Exceptions:** Always catch exceptions and log them with relevant context and stack traces.
- Example: `ERROR: Exception occurred in fetchManifest: Network timeout while accessing /v2/oras-demo/manifests/latest`
FeynmanZhou marked this conversation as resolved.
Show resolved Hide resolved

- **Error Codes:** Include specific error codes to facilitate quick identification and resolution.
- Example: `ERROR: [ErrorCode: 504] Network timeout while accessing /v2/oras-demo/manifests/latest`
Comment on lines +85 to +89
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it log or error message? If it is error message, I think we've already covered that.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's debug log. Should error information of ORAS CLI also be included in the debug logs?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently, oras only use debug logs.


### 5. **Include Actionable Information**
- **Guidance:** Where possible, provide suggestions within logs about potential fixes or next steps for resolving issues.
- Example: `DEBUG: [Action] Check network connectivity and retry the operation. Error occurred while accessing /v2/oras-demo/manifests/latest`

- **Diagnostic Tips:** Include information that can assist in diagnosing issues like configuration settings, environment variables, or system states.
FeynmanZhou marked this conversation as resolved.
Show resolved Hide resolved
- Example: `DEBUG: Current registry URL: https://myregistry.io, Timeout setting:
FeynmanZhou marked this conversation as resolved.
Show resolved Hide resolved
Comment on lines +91 to +96
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Debug log is meant to be analyzed. There should not be any actionable information. The only action for users is to present the debug log to our oras developers.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Including actionable information in debug logs can be beneficial if it helps diagnose issues or improve the user's understanding of the tool’s behavior.

The only action for users is to present the debug log to our oras developers.

If the user is also a developer, will this design limits the diagnose efficiency in this situation?


### 6. **Avoid Logging Sensitive Information**
- **Privacy:** Abstain from logging sensitive information such as passwords, personal data, or security tokens.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Define "sensitive information".

- Example: `DEBUG: Attempting to authenticate user [UserID: usr123]` (without password details)

- **Compliance:** Ensure that logs adhere to relevant data protection and privacy regulations.
FeynmanZhou marked this conversation as resolved.
Show resolved Hide resolved
- Example: Anonymize or obfuscate sensitive customer data in logs.

## Investigation
FeynmanZhou marked this conversation as resolved.
Show resolved Hide resolved

To make sure the ORAS diagnose functions are natural and easier to use, it worth knowing how diagnose functions work in other popular client tools.

#### Curl

Curl only has a `--verbose` option to output verbose logs. No `--debug` option.

#### Docker and Podman

Docker provides two options `--debug` and `--log-level` to control debug logs output within different log levels, such as INFO, DEBUG, WARN, etc. No `--verbose` option. Docker has its own daemon service running in local so its logs might be much more complex.

#### Helm

Helm CLI tool provides a global flag `--debug` to enable verbose output.

#### Kubectl

Kubectl has a command `kubectl logs` to show logs of objects such as Pod and container. No separate flags for verbose output and debug logs.
FeynmanZhou marked this conversation as resolved.
Show resolved Hide resolved

## Examples in ORAS

This section lists the current behaviors of ORAS debug logs, proposes the suggested changes to ORAS CLI commands. More examples will be appended below.

### oras copy

Pick the first two requests and responses as examples:

```
oras copy ghcr.io/oras-project/oras:v1.2.0 --to-oci-layout oras-dev:v1.2.0 --debug
```

**Current debug log0-=s**
FeynmanZhou marked this conversation as resolved.
Show resolved Hide resolved

```
[DEBU0000] Request #0
> Request URL: "https://ghcr.io/v2/oras-project/oras/manifests/v1.2.0"
> Request method: "GET"
> Request headers:
"User-Agent": "oras/1.2.0+Homebrew"
"Accept": "application/vnd.docker.distribution.manifest.v2+json, application/vnd.docker.distribution.manifest.list.v2+json, application/vnd.oci.image.manifest.v1+json, application/vnd.oci.image.index.v1+json, application/vnd.oci.artifact.manifest.v1+json"
DEBU[0001] Response #0
< Response Status: "401 Unauthorized"
< Response headers:
"Content-Length": "73"
"X-Github-Request-Id": "9FC6:30019C:17C06:1C462:66AD0463"
"Content-Type": "application/json"
"Www-Authenticate": "Bearer realm=\"https://ghcr.io/token\",service=\"ghcr.io\",scope=\"repository:oras-project/oras:pull\""
"Date": "Fri, 02 Aug 2024 16:08:04 GMT"
DEBU[0001] Request #1
> Request URL: "https://ghcr.io/token?scope=repository%3Aoras-project%2Foras%3Apull&service=ghcr.io"
> Request method: "GET"
> Request headers:
"User-Agent": "oras/1.2.0+Homebrew"
DEBU[0002] Response #1
< Response Status: "200 OK"
< Response headers:
"Content-Type": "application/json"
"Docker-Distribution-Api-Version": "registry/2.0"
"Date": "Fri, 02 Aug 2024 16:08:05 GMT"
"Content-Length": "69"
"X-Github-Request-Id": "9FC6:30019C:17C0D:1C46C:66AD0464"
```

**Suggested debug logs:**

```
2024-08-02 23:56:02 > msg=Request #0
FeynmanZhou marked this conversation as resolved.
Show resolved Hide resolved
> Request URL: "https://ghcr.io/v2/oras-project/oras/manifests/v1.2.0"
> Request method: "GET"
> Request headers:
"User-Agent": "oras/1.2.0+Homebrew"
"Accept": "application/vnd.docker.distribution.manifest.v2+json, application/vnd.docker.distribution.manifest.list.v2+json, application/vnd.oci.image.manifest.v1+json, application/vnd.oci.image.index.v1+json, application/vnd.oci.artifact.manifest.v1+json"

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to insert separators like -----?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would recommend looking at other tools. There are so many nicer options -
image

Copy link
Member Author

@FeynmanZhou FeynmanZhou Sep 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I propose to add an empty line as separator between each request and response for readability since this is a very common design. Do you have better ideas on improving the debug logs' readability?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe an empty line between incoming and outgoing message


2024-08-02 23:56:03 < msg=Response #0
< Response Status: "401 Unauthorized"
< Response headers:
"Content-Length": "73"
"X-Github-Request-Id": "9FC6:30019C:17C06:1C462:66AD0463"
"Content-Type": "application/json"
"Www-Authenticate": "Bearer realm=\"https://ghcr.io/token\",service=\"ghcr.io\",scope=\"repository:oras-project/oras:pull\""
"Date": "Fri, 02 Aug 2024 23:56:03 GMT"
< Response body:
{
"errors": [
{
"code": "<UNAUTHORIZED>",
"message": "<message describing condition>",
"detail": "<unstructured>"
},
...
]
}

2024-08-02 23:56:02 > msg=Request #1
> Request URL: "https://ghcr.io/token?scope=repository%3Aoras-project%2Foras%3Apull&service=ghcr.io"
> Request method: "GET"
> Request headers:
"User-Agent": "oras/1.2.0+Homebrew"



2024-08-02 23:56:03 < msg=Response #1
< Response Status: "200 OK"
< Response headers:
"Content-Type": "application/json"
"Docker-Distribution-Api-Version": "registry/2.0"
"Date": "Fri, 02 Aug 2024 16:08:05 GMT"
"Content-Length": "69"
"X-Github-Request-Id": "9FC6:30019C:17C0D:1C46C:66AD0464"
< Response body:
{
"schemaVersion": 2,
"mediaType": "application/vnd.oci.image.index.v1+json",
"manifests": [
{
"mediaType": "application/vnd.oci.image.manifest.v1+json",
"digest": "sha256:42c524c48e0672568dbd2842d3a0cb34a415347145ee9fe1c8abaf65e7455b46",
"size": 1239,
"platform": {
"architecture": "amd64",
"os": "linux"
}
},
···
}


```

### Show user's environment details

Output the user's running environment details of ORAS such as operating system and architecture information would be helpful to help the ORAS developers locate the issue and reproduce easier.

For example, the operating system and architecture are supposed to be outputted in `oras version`:

```bash
$ oras version

ORAS Version: 1.2.0+Homebrew
Go version: go1.22.3
OS/Arch: Linux AMD64
FeynmanZhou marked this conversation as resolved.
Show resolved Hide resolved
FeynmanZhou marked this conversation as resolved.
Show resolved Hide resolved
```

## Q & A

**Q1:** Is it a common practice to use an environment variable like export ORAS_DEBUG=1 as a global switch for debug logs? What are the Pros and Cons of using this design?

**A:** Per our discussion in the ORAS community meeting, ORAS maintainers agreed to not introduce an additional environment variable as a global switch to enable debug logs since --debug is intuitive enough.

**Q2**: For the diagnose flag options, why deprecate `--verbose` and remain `--debug` as it is?

**A**: The major reason is that this change avoids overloading the flag `--verbose` and reduce ambiguity in ORAS diagnose experience. Moreover, the `--debug` is consistent with other popular container client tools, such as Helm and Docker. Deprecation of `--verbose` is less breaking than changing behaviors of `--debug`.
Loading