Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kestrel: Segfault when specifying default certificate on Ubuntu 20.04 #81964

Closed
1 task done
VMelnalksnis opened this issue Jan 29, 2023 · 27 comments · Fixed by #82116
Closed
1 task done

Kestrel: Segfault when specifying default certificate on Ubuntu 20.04 #81964

VMelnalksnis opened this issue Jan 29, 2023 · 27 comments · Fixed by #82116
Assignees
Milestone

Comments

@VMelnalksnis
Copy link

Is there an existing issue for this?

  • I have searched the existing issues

Describe the bug

After upgrading an ASP.NET Core project from .NET 6 to .NET 7, the application fails to start with a segmentation fault when specifying a default certificate for Kestrel.

Expected Behavior

Application works with a certificate on .NET 7 same as before on .NET 6.

Steps To Reproduce

  1. Start the application without specifying a certificate
  2. Add the following in appsettings.json:
{
  "Kestrel": {
    "Certificates": {
      "Default": {
        "Path": "/path/to/cert.p12",
        "Password": "password"
      }
    }
  }
}
  1. After the configuration has been reloaded, see Main process exited, code=killed, status=11/SEGV

Alternatively, can just specify the certificate from the start and see that the application crashes before starting.

Exceptions (if any)

I've seen two types of segfaults in the hypervisor syslog:
after updating appsettings.json

kernel: .NET ThreadPool[3421307]: segfault at 0 ip 00007f32507860be sp 00007f3226bb58c0 error 4 in libcrypto.so.1.1[7f3250674000+19b000]
kernel: Code: 00 00 4c 89 f1 4c 89 fa e8 bf 24 07 00 85 c0 0f 84 bf 00 00 00 8b 54 24 1c 49 8d 7c 24 10 4c 89 fe e8 56 2b f0 ff 85 c0 74 52 <48> 63 75 00 48 8b 7d 08 45 31 c9 49 89 d8 4c 89 f1 4c 89 fa e8 49

and starting the application with default certificates:

kernel: Gnomeshade.WebA[3397350]: segfault at 0 ip 00007f705b6200be sp 00007fff3b39f640 error 4 in libcrypto.so.1.1[7f705b50e000+19b000]
kernel: Code: 00 00 4c 89 f1 4c 89 fa e8 bf 24 07 00 85 c0 0f 84 bf 00 00 00 8b 54 24 1c 49 8d 7c 24 10 4c 89 fe e8 56 2b f0 ff 85 c0 74 52 <48> 63 75 00 48 8b 7d 08 45 31 c9 49 89 d8 4c 89 f1 4c 89 fa e8 49

.NET Version

7.0.102

Anything else?

Running on Ubuntu 20.04.5 LTS LXC container on Proxmox 7.3-4.
libbssl version:

~$ sudo apt show libssl1.1
Package: libssl1.1
Version: 1.1.1f-1ubuntu2.16
@amcasey
Copy link
Member

amcasey commented Feb 3, 2023

@VMelnalksnis Can you tell us more about your project? I followed your steps with a simple React app (dotnet new react) on Ubuntu 20.04.5 (in WLS2, since I don't have proxmox set up) and I was able to connect to the page and see my specified cert.

Have you tried your steps with a toy project (i.e. not your real one) to see whether it's somehow related to your project (vs proxmox, ubuntu, libssl, etc)?

@VMelnalksnis
Copy link
Author

@amcasey I tried to run dotnet new web with the same configuration on the same container, and it also failed with a segmentation fault:

kernel: TestApplication[1956636]: segfault at 0 ip 00007f7f211e00be sp 00007fff92d9b940 error 4 in libcrypto.so.1.1[7f7f210ce000+19b000]
kernel: Code: 00 00 4c 89 f1 4c 89 fa e8 bf 24 07 00 85 c0 0f 84 bf 00 00 00 8b 54 24 1c 49 8d 7c 24 10 4c 89 fe e8 56 2b f0 ff 85 c0 74 52 <48> 63 75 00 48 8b 7d 08 45 31 c9 49 89 d8 4c 89 f1 4c 89 fa e8 49

After changing the target framework to net6.0 it worked. I also tried to publish with --no-self-contained and install dotnet-sdk-7.0 on the container, and it also failed with a segmentation fault.

The project is this one https://github.com/VMelnalksnis/Gnomeshade/tree/master/source/Gnomeshade.WebApi. The only things changed for Kestrel were content root, web root and cipher suites policy for TLS, but I also tried without it.

@amcasey
Copy link
Member

amcasey commented Feb 6, 2023

@VMelnalksnis Thanks! It's very helpful to know that it repros in a new project. I'll play around with dotnet new web to see if I can trigger the crash.

@amcasey
Copy link
Member

amcasey commented Feb 7, 2023

@VMelnalksnis Sorry, I'm still not seeing it. Is the appsettings JSON in the description just part of your file or the entire file?

If you can repro the issue consistently, can you collect and share a dump?

@ghost
Copy link

ghost commented Feb 8, 2023

Hi @VMelnalksnis. We have added the "Needs: Author Feedback" label to this issue, which indicates that we have an open question for you before we can take further action. This issue will be closed automatically in 7 days if we do not hear back from you by then - please feel free to re-open it if you come back to this issue after that time.

@VMelnalksnis
Copy link
Author

@amcasey No, that was only the part that caused the crash. I have other sections for other libraries as well, such as ElasticApm and ConnectionStrings, and I can post those as well if they're relevant. Here's the full Kestrel section:

"Kestrel": {
    "EndpointDefaults": {
        "Protocols": "Http1AndHttp2"
    },
    "Endpoints": {
        "Https": {
            "Url": "https://some.fully.qualified.name.com:443",

            "CheckCertificateRevocation": true,
            "ClientCertificateMode": "Optional",
            "SslProtocols": ["Tls13"],
        }
    },
    "Certificates": {
        "Default": {
            "Path": "/path/to/host/certificate.p12",
            "Password": "password"
        }
   }
}

I tried the same configuration by changing each value separately as well: CheckCertificateRevocation, SslProtocols to TLS12, https port to 8443/443.

As for the dump, I'm not entirely sure where to look for it. The common answers for Ubuntu were not correct, I'm guessing because it's an LXC and/or on Proxmox; I'll keep looking.

@amcasey
Copy link
Member

amcasey commented Feb 8, 2023

@VMelnalksnis Thanks! Since you have a repro with the dotnet new web template, do you have the option of testing it out on an Ubuntu machine/VM outside proxmox?

@amcasey
Copy link
Member

amcasey commented Feb 9, 2023

We have some docs here about how to collect dotnet dumps, but this is a native crash, so I'm not sure they'll work in this case. I'll see if I can figure out how to collect a core dump on Ubuntu 20.04.

@amcasey
Copy link
Member

amcasey commented Feb 9, 2023

Well, that was harder than it needed to be...

I managed to cobble together these commands to get core files to appear in /var/crash.

ulimit -c unlimited
sudo sysctl -w kernel.core_pattern=/var/crash/core.%e.%p.%h.%t

I think you can get back to normal with

ulimit -c 0
systemctl restart apport

Mostly from here. No warranty, express or implied. 😛

@VMelnalksnis
Copy link
Author

@amcasey Thanks for the info - when running non-self-contained, setting DOTNET_DbgEnableMiniDump and ulimit -c unlimited was enough, and path was controlled by DOTNET_DbgMiniDumpName.
Here's the dump - coredump.zip

@amcasey
Copy link
Member

amcasey commented Feb 10, 2023

I'm seeing

libcoreclr!sigsegv_handler+0x260
libpthread_2_31+0x14420
libcrypto_so_1+0x18a0be
libSystem_Security_Cryptography_Native_OpenSsl!MakeCertId+0x1e
libSystem_Security_Cryptography_Native_OpenSsl!BuildOcspRequest+0x1e
libSystem_Security_Cryptography_Native_OpenSsl!CryptoNative_X509BuildOcspRequest+0x37
Interop+Crypto.<X509BuildOcspRequest>g____PInvoke|66_0(IntPtr, IntPtr)+0x4e
System_Net_Security_7f663ec70000!Interop+Crypto.X509BuildOcspRequest(IntPtr, IntPtr)+0x4f
System_Net_Security_7f663ec70000!System.Net.Security.SslStreamCertificateContext+<FetchOcspAsync>d__27.MoveNext()+0xdc
System_Private_CoreLib_7f663bad0000!System.Runtime.CompilerServices.AsyncMethodBuilderCore.Start[[Microsoft.AspNetCore.Server.Kestrel.Core.KestrelServerImpl+<StartAsync>d__30`1[[System.__Canon, System.Private.CoreLib]], Microsoft.AspNetCore.Server.Kestrel.Core]](<StartAsync>d__30`1<System.__Canon> ByRef)+0xac4d
System_Private_CoreLib_7f663bad0000!System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1[[System.__Canon, System.Private.CoreLib]].Start[[System.Net.Security.SslStreamCertificateContext+<FetchOcspAsync>d__27, System.Net.Security]](<FetchOcspAsync>d__27 ByRef)+0x20
System_Net_Security_7f663ec70000!System.Net.Security.SslStreamCertificateContext.FetchOcspAsync()+0x69
System_Net_Security_7f663ec70000!System.Net.Security.SslStreamCertificateContext.DownloadOcspAsync()+0x2ea
System_Net_Security_7f663ec70000!System.Net.Security.SslStreamCertificateContext.AddRootCertificate(System.Security.Cryptography.X509Certificates.X509Certificate2)+0x59
System_Net_Security_7f663ec70000!System.Net.Security.SslStreamCertificateContext.Create(System.Security.Cryptography.X509Certificates.X509Certificate2, System.Security.Cryptography.X509Certificates.X509Certificate2Collection, Boolean, System.Net.Security.SslCertificateTrust, Boolean)+0x438
Microsoft_AspNetCore_Server_Kestrel_Core_7f663dd50000!Microsoft.AspNetCore.Server.Kestrel.Https.Internal.HttpsConnectionMiddleware..ctor(Microsoft.AspNetCore.Connections.ConnectionDelegate, Microsoft.AspNetCore.Server.Kestrel.Https.HttpsConnectionAdapterOptions, Microsoft.Extensions.Logging.ILoggerFactory)+0x1c0

@amcasey
Copy link
Member

amcasey commented Feb 10, 2023

Based on the stack, I'm guessing dotnet/runtime will want to take a look. Probably @bartonjs?

@ghost ghost added the untriaged New issue has not been triaged by the area owner label Feb 10, 2023
@amcasey amcasey transferred this issue from dotnet/aspnetcore Feb 10, 2023
@ghost
Copy link

ghost commented Feb 10, 2023

Tagging subscribers to this area: @dotnet/area-system-security, @vcsjones
See info in area-owners.md if you want to be subscribed.

Issue Details

Is there an existing issue for this?

  • I have searched the existing issues

Describe the bug

After upgrading an ASP.NET Core project from .NET 6 to .NET 7, the application fails to start with a segmentation fault when specifying a default certificate for Kestrel.

Expected Behavior

Application works with a certificate on .NET 7 same as before on .NET 6.

Steps To Reproduce

  1. Start the application without specifying a certificate
  2. Add the following in appsettings.json:
{
  "Kestrel": {
    "Certificates": {
      "Default": {
        "Path": "/path/to/cert.p12",
        "Password": "password"
      }
    }
  }
}
  1. After the configuration has been reloaded, see Main process exited, code=killed, status=11/SEGV

Alternatively, can just specify the certificate from the start and see that the application crashes before starting.

Exceptions (if any)

I've seen two types of segfaults in the hypervisor syslog:
after updating appsettings.json

kernel: .NET ThreadPool[3421307]: segfault at 0 ip 00007f32507860be sp 00007f3226bb58c0 error 4 in libcrypto.so.1.1[7f3250674000+19b000]
kernel: Code: 00 00 4c 89 f1 4c 89 fa e8 bf 24 07 00 85 c0 0f 84 bf 00 00 00 8b 54 24 1c 49 8d 7c 24 10 4c 89 fe e8 56 2b f0 ff 85 c0 74 52 <48> 63 75 00 48 8b 7d 08 45 31 c9 49 89 d8 4c 89 f1 4c 89 fa e8 49

and starting the application with default certificates:

kernel: Gnomeshade.WebA[3397350]: segfault at 0 ip 00007f705b6200be sp 00007fff3b39f640 error 4 in libcrypto.so.1.1[7f705b50e000+19b000]
kernel: Code: 00 00 4c 89 f1 4c 89 fa e8 bf 24 07 00 85 c0 0f 84 bf 00 00 00 8b 54 24 1c 49 8d 7c 24 10 4c 89 fe e8 56 2b f0 ff 85 c0 74 52 <48> 63 75 00 48 8b 7d 08 45 31 c9 49 89 d8 4c 89 f1 4c 89 fa e8 49

.NET Version

7.0.102

Anything else?

Running on Ubuntu 20.04.5 LTS LXC container on Proxmox 7.3-4.
libbssl version:

~$ sudo apt show libssl1.1
Package: libssl1.1
Version: 1.1.1f-1ubuntu2.16
Author: VMelnalksnis
Assignees: amcasey
Labels:

area-System.Security

Milestone: -

@amcasey amcasey removed their assignment Feb 10, 2023
@vcsjones
Copy link
Member

This is new in .NET 7 because .NET 7 has server-side OCSP stapling, and this error stack indicates that its trying to fetch an OCSP response from the CA for stapling purposes.

Is it possible for you to attach your libcrypto.so.1.1 that reproduced the issue (It should probably be in /usr/lib/x86_64-linux-gnu)?

Even better would be any steps possible to reproduce this. If you can include the public certificate from the p12 file, that would also very helpful.

You can get the public certificate from the PKCS12 file with a command like:

openssl pkcs12 -nokeys -in /path/to/cert.p12

and it should output the -----BEGIN CERTIFICATE----- .... -----END CERTIFICATE-----.

@VMelnalksnis
Copy link
Author

Here's the library and the certificate: data.zip

@vcsjones
Copy link
Member

vcsjones commented Feb 12, 2023

Okay, so digging in here a little bit:

  18a0bc:	74 52                	je     18a110 <OCSP_cert_id_new@@OPENSSL_1_1_0+0x110>
  18a0be:	48 63 75 00          	movslq 0x0(%rbp),%rsi
  18a0c2:	48 8b 7d 08          	mov    0x8(%rbp),%rdi
  18a0c6:	45 31 c9             	xor    %r9d,%r9d
  18a0c9:	49 89 d8             	mov    %rbx,%r8
  18a0cc:	4c 89 f1             	mov    %r14,%rcx
  18a0cf:	4c 89 fa             	mov    %r15,%rdx
  18a0d2:	e8 49 2d fd ff       	callq  15ce20 <EVP_Digest@@OPENSSL_1_1_0>

We're basically failing (approximately) in OpenSSL here:

https://github.com/openssl/openssl/blob/36eadf1f84daa965041cce410b4ff32cbda4ef08/crypto/ocsp/ocsp_lib.c#L72-L74

I say approximately because I wasn't able to find the exact sources for your distro's OpenSSL libcrypto, but was able to follow the offsets thanks to the provided binary.

segfault at 0

So we're null dereferencing somewhere in if (!EVP_Digest(issuerKey->data, issuerKey->length, md, &i, dgst, NULL))

md, i are stack variables, so they aren't null. dgst can't be null because we know that's just EVP_sha1. But can issuerKey be null? Seems like it can happen.

It all starts here in managed code:

IntPtr subject = Certificate.Handle;
IntPtr issuer = caCert.Handle;
using (SafeOcspRequestHandle ocspRequest = Interop.Crypto.X509BuildOcspRequest(subject, issuer))

That eventually makes its way to OCSP_cert_to_id. It does some up-front work to get issuer names and serial numbers, but there is this:

ikey = X509_get0_pubkey_bitstr(issuer);
return OCSP_cert_id_new(dgst, iname, ikey, serial);

If issuer is NULL, then X509_get0_pubkey_bitstr also returns NULL, and passes that right in to OCSP_cert_id_new.

Going back to the managed side:

if (_ocspUrls is null && _ca is not null)

We check if _ca is not null... but we don't actually check to see if it is a valid handle. Perhaps _ca = new X509Certificate() or has otherwise been disposed, somehow.

From the native stack, it looks like the issuer certificate has a NULL handle, but I'm less certain how we get in to that state on the managed side. Maybe we disposed the intermediate that was chosen for _ca here:

I'm going to keep digging in to this, but thought I would post what I have so far in case someone (@bartonjs) has a psychic debugging moment.

@vcsjones
Copy link
Member

vcsjones commented Feb 12, 2023

As some supporting research, if I do something like this:

FILE *fp = fopen("/Users/vcsjones/Downloads/data/certificate.crt", "r");
assert(fp);

X509 *cert = PEM_read_X509(fp, NULL, NULL, NULL);
assert(cert);

OCSP_cert_to_id(NULL, cert, NULL); // Pass in NULL issuer

Then it fails at the exact same offset. So NULL is getting set for the X509* issuer somehow.

@vcsjones
Copy link
Member

I can repro this now.

@vcsjones
Copy link
Member

Steps to reproduce:

  1. Checkout https://github.com/vcsjones/dotnet-runtime-81964
  2. docker build
  3. docker run

@vcsjones vcsjones added the bug label Feb 13, 2023
@vcsjones
Copy link
Member

Yep. The issuer is NULL.

issuerKey=0x0000000000000000

* thread #1, name = 'app', stop reason = signal SIGSEGV: invalid address (fault address: 0x0)
  * frame #0: 0x0000ffbed5687150 libcrypto.so.1.1`OCSP_cert_id_new(dgst=0x0000ffbed579fa78, issuerName=0x0000aaaaaae2ac60, issuerKey=0x0000000000000000, serialNumber=0x0000aaaaaadfa008) at ocsp_lib.c:73:10
    frame #1: 0x0000ffbed588d100 libSystem.Security.Cryptography.Native.OpenSsl.so`CryptoNative_X509BuildOcspRequest + 80

@vcsjones vcsjones self-assigned this Feb 13, 2023
@bartonjs
Copy link
Member

I synced up with @vcsjones offline earlier, and it sounds like he has a strong grip on the problem and the solution.

@vcsjones vcsjones removed the untriaged New issue has not been triaged by the area owner label Feb 14, 2023
@vcsjones vcsjones added this to the 8.0.0 milestone Feb 14, 2023
@ghost ghost added the in-pr There is an active PR which will close this issue when it is merged label Feb 18, 2023
@vcsjones
Copy link
Member

@VMelnalksnis

I just realized I never gave a proper write-up of the issue here, only obtuse debugging notes. Apologies.

The issue you are running in to is due to a new feature in .NET 7 called OCSP stapling. Unfortunately there is an issue with the implementation when handling an X.509 chain that has exactly two certificates in it, an end-entity and a root, which your certificate chain (appears) to have.

This has been fixed for .NET 8, and we are in the process of attempting to get it serviced in to a future .NET 7 release. If or when that happens I can't say for certain, but I'll give an update here when a servicing decision has been made.

In the mean time, you have a few options.

  1. Stay on .NET 6 until a .NET version with the fix is available (either .NET 8, or a serviced version of .NET 7)

  2. Disable revocation checking for the server certificate. As far as I can tell, Kestrel's JSON configuration doesn't expose the right setting we need, however configuring kestrel in code allows us to do so:

    var builder = WebApplication.CreateBuilder(args);
    
    builder.WebHost.ConfigureKestrel(serverOptions => {
        serverOptions.ListenAnyIP(5000, listenOptions => {
            static ValueTask<SslServerAuthenticationOptions> HttpsCallback(
                SslStream stream,
                SslClientHelloInfo clientHelloInfo,
                object? state,
                CancellationToken cancellationToken)
            {
                SslServerAuthenticationOptions options = new()
                {
                    // NOTE: hardcoded password for certificate is for illustration purposes only.
                    ServerCertificate = new X509Certificate2("/app/leaf.p12", "potato"),
                    CertificateRevocationCheckMode = X509RevocationMode.NoCheck,
                };
                return new ValueTask<SslServerAuthenticationOptions>(options);
            }
            listenOptions.UseHttps(HttpsCallback, state: null!);
            listenOptions.Protocols = HttpProtocols.Http1AndHttp2AndHttp3;
        });
    });

    The important bit there is setting CertificateRevocationCheckMode to NoCheck for the ServerAuthenticationOptions. Essentially what you are doing here is disabling OCSP stapling, along with revocation checking of your server certificate.

  3. Change your CA to issue off of an intermediate X.509 certificate so there are at least three certificates in your chain.

@VMelnalksnis
Copy link
Author

Thanks for the detailed write-up and workaround. I guess I should finally fix my setup and create an offline root CA.

@Qowy
Copy link

Qowy commented Mar 7, 2023

Thank you for this workaround.
I just ran into the exact same issue on a raspberry pi with a .pfx cert without intermediate CA.

Could this maybe be documented somewhere more prominently ( if it won't be fixed in a .NET 7 release?)
I only found this issue by googleing for "OCSP_cert_id_new" after running my app through GDB.

@vcsjones
Copy link
Member

vcsjones commented Mar 8, 2023

@Qowy based on the servicing pull request, it looks like it will be merged in for the 7.0.5 servicing release.

@ghost ghost removed the in-pr There is an active PR which will close this issue when it is merged label Mar 9, 2023
@vcsjones
Copy link
Member

vcsjones commented Mar 9, 2023

The fix has been merged for the 7.0.5 release.

@vcsjones vcsjones closed this as completed Mar 9, 2023
@ghost ghost locked as resolved and limited conversation to collaborators Apr 8, 2023
@karelz
Copy link
Member

karelz commented May 27, 2023

Fixed in main (8.0) in PR #82116 and in 7.0.5 in PR #82277.

@karelz karelz modified the milestones: 8.0.0, 7.0.x May 27, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants