-
Notifications
You must be signed in to change notification settings - Fork 40
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add diagnostics callback for tlscommon configs and httptransport #207
Conversation
378e0f7
to
54aeb42
Compare
Add diagnostics callbacks for TLS configuration to make it easier to include in diagnostics bundles. Diagnostics info will geneally verify that the certs or cas are able to load, and display some basic info such as cert names, expiry times, fingerprints, etc. httptransport callback will attempt an HTTP request and return information collected from httptrace about the request.
Example of the hooks functions and outputs added to a (draft) fleet-server pr: elastic/fleet-server#3587 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would like feedback, specifically if we need more information to help troubleshoot issues.
diagCertificate(logger, &c.Certificate) | ||
diagCAs(logger, c.CAs) | ||
|
||
return b.Bytes() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From the fleet-server demo, output is:
tlscommon.ServerConfig: Start diagnostics 2024-05-29 00:18:41.569890423 +0000 UTC
tlscommon.ServerConfig: verification_mode=full
tlscommon.ServerConfig: client_auth=none
tlscommon.ServerConfig: ca_sha256=[]
tlscommon.ServerConfig: CertificateSettings: checking certificate keypair
tlscommon.ServerConfig: CertificateSettings: certificate keypair OK.
tlscommon.ServerConfig: CertificateSettings: cert 0 - Subject: CN=fleet-server-dev,O=elastic-fleet
Issuer: CN=localhost,O=elastic-fleet
NotBefore: 2024-05-29 00:18:15 +0000 UTC
NotAfter: 2034-05-29 00:18:15 +0000 UTC
Fingerprint: EvAr8erNJZJ0BoXor1jnCl5X2WCU79ISMgwMITOEJBs=
SAN IP: []
SAN DNS: [fleet-server-dev]
tlscommon.ServerConfig: CertificateAuthorities: certificate_authorities provided.
tlscommon.ServerConfig: CertificateAuthorities: - cert 0 Subject: CN=localhost,O=elastic-fleet
IsCa: true
BasicConstraintsValid: true
NotBefore: 2024-05-29 00:18:15 +0000 UTC
NotAfter: 2034-05-29 00:18:15 +0000 UTC
Fingerprint: cSawoNEa2cSPV4Mwce4mRysrcBGtHBGcSeoBL+Cr48Q=
We want to make sure that certificate & ca information is provided in the diagnostics
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Slight update after I added CertDiagString
tlscommon.ServerConfig: Start diagnostics 2024-05-29 22:10:07.333735518 +0000 UTC
tlscommon.ServerConfig: verification_mode=full
tlscommon.ServerConfig: client_auth=none
tlscommon.ServerConfig: ca_sha256=[]
tlscommon.ServerConfig: CertificateSettings: checking certificate keypair
tlscommon.ServerConfig: CertificateSettings: certificate keypair OK.
tlscommon.ServerConfig: CertificateSettings: cert 0 Subject=CN=fleet-server-dev,O=elastic-fleet
Issuer=CN=localhost,O=elastic-fleet
IsCA=false
BasicConstrintsValid=false
NotBefore=2024-05-29 22:08:32 +0000 UTC
NotAfter=2034-05-29 22:08:32 +0000 UTC
Fingerprint=w04CVUAd7Tfnd/AZ/A8YB9bCUAImGeGXBAUPnhEvGgM=
SAN IP=[]
SAN DNS=[fleet-server-dev]
SAN URI=[]
tlscommon.ServerConfig: CertificateAuthorities: certificate_authorities provided.
tlscommon.ServerConfig: CertificateAuthorities: - cert 0 Subject=CN=localhost,O=elastic-fleet
Issuer=CN=localhost,O=elastic-fleet
IsCA=true
BasicConstrintsValid=true
NotBefore=2024-05-29 22:08:23 +0000 UTC
NotAfter=2034-05-29 22:08:23 +0000 UTC
Fingerprint=QHVJrVpzEHBJTTr6F9RW6TnYbj9ckPD2BQ5JphTzh7M=
SAN IP=[]
SAN DNS=[localhost]
SAN URI=[]
This may or may not be helpful, so feel free to disregard. I'd like to see some failure scenarios, paired with the new diagnostics showing how they give the info needed to correctly diagnose the problem. I'm thinking of things like:
|
Add some error parsing to HTTPSettings.DiagRequests to attempt to give a human readable statement about the cause of any errors when gathering request diagnostics.
// isGoHTTPResp detects if the response is one that a go http.Server sends if an HTTP request is made to an HTTPS server. | ||
// non Go servers may return a net.OpError instead. | ||
func isGoHTTPResp(r *http.Response) bool { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This behaviour is a part of go servers, and also occurs for our current cloud deplyments
// diagError tries to diagnose the error and return a cause/possible cause in a human readable format. | ||
// If no matching errors are found err.Error is returned. | ||
func diagError(err error) string { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@leehinman, I've tried to add this to explain errors based on the scenarios you provided (with test cases for most of them). we can add more in the future as well
Co-authored-by: Tiago Queiroz <me@tiago.life>
@lucabelluccini is there anything I should add/change to help with your troubleshooting efforts? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks for adding the test cases.
Hello, thanks for the heads up on this one and thank you for those efforts into troubleshooting TLS.
Example about (1). So we have:
We will get 3 TLS diags? |
💚 Build Succeeded
History
|
|
Lovely - Thank you @michel-laterman & team ❤️ |
Add custom hooks to use diag hooks added in elastic/elastic-agent-libs#207 to provide additional files that contain information about the TLS certs used by the server's API, TLS infomation used when connecting to elasticsearch, and a full trace to each specified elasticsearch host.
What does this PR do?
Add diagnostics callbacks for TLS configuration to make it easier to
include in diagnostics bundles. Diagnostics info will geneally verify
that the certs or cas are able to load, and display some basic info such
as cert names, expiry times, fingerprints, etc. httptransport callback
will attempt an HTTP request and return information collected from
httptrace about the request.
Why is it important?
Allows more information about TLS to be present in diagnostics to help troubleshoot any certificate issues.
Checklist