Skip to content
This repository has been archived by the owner on Mar 28, 2020. It is now read-only.

dep: update etcd 3.2.16, kubernetes 1.10.0, k8s codegen 1.10 #1727

Merged
merged 3 commits into from
Mar 12, 2018

Conversation

hongchaodeng
Copy link
Member

@hongchaodeng hongchaodeng commented Dec 5, 2017

No description provided.

@hongchaodeng
Copy link
Member Author

hongchaodeng commented Dec 5, 2017

@etcd-bot retest this please

Jenkins config issue.

Gopkg.lock Outdated
revision = "b709581f82a77c0ff00790d1446c05719fed714d"
version = "v1.10.9"
revision = "107df09c5f137b9dfe53b7a4c25dd4d79f81390f"
version = "v1.12.40"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

v1.12.41 is out

Gopkg.lock Outdated
revision = "fa29b1d70f0beaddd4c7021607cc3c3be8ce94b8"
packages = ["auth/authpb","clientv3","etcdserver/api/v3rpc/rpctypes","etcdserver/etcdserverpb","mvcc/mvccpb","pkg/tlsutil","pkg/transport"]
revision = "694728c496e22dfa5719c78ff23cc982e15bcb2f"
version = "v3.2.10"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

v3.2.11 is out

Gopkg.toml Outdated
@@ -16,11 +16,11 @@

[[constraint]]
name = "github.com/coreos/etcd"
version = "3.1.9"
version = "v3.2.10"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

v3.2.11

Gopkg.toml Outdated

[[constraint]]
name = "github.com/aws/aws-sdk-go"
version = "1.10.9"
version = "v1.12.40"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

v1.12.41

@hongchaodeng hongchaodeng changed the title dep: update etcd 3.2.10 and aws-sdk-go 1.12.40 dep: update etcd 3.2.12 and aws-sdk-go 1.12.54 Jan 2, 2018
@hongchaodeng hongchaodeng changed the title dep: update etcd 3.2.12 and aws-sdk-go 1.12.54 dep: update etcd 3.2.16, kubernetes 1.10.0-beta.1, aws-sdk-go 1.13.8, k8s codegen 1.10 Mar 5, 2018
@hongchaodeng
Copy link
Member Author

Two changes here:

  • update the dep: etcd 3.2 and k8s 1.10 are aligned in deps so this would be easier to bump up together
  • in k8s 1.10 the protocol of generated code has changed so I need to build another container gcr.io/coreos-k8s-scale-testing/codegen:1.10 and update it here.

@hongchaodeng
Copy link
Member Author

There is some TLS issue.
etcd server log:

WARNING: 2018/03/05 23:20:11 Failed to dial 0.0.0.0:2379: connection error: desc = "transport: authentication handshake failed: remote error: tls: bad certificate"; please retry.

Might be relevant: etcd-io/etcd#8603

@hongchaodeng
Copy link
Member Author

hongchaodeng commented Mar 6, 2018

Tried with etcd 3.3.1 with more and better error message:

2018-03-06 00:09:53.475001 I | embed: rejected connection from "127.0.0.1:44500" (error "tls: failed to verify client's certificate: x509: certificate specifies an incompatible key usage", ServerName "")
WARNING: 2018/03/06 00:09:53 Failed to dial 0.0.0.0:2379: connection error: desc = "transport: authentication handshake failed: remote error: tls: bad certificate"; please retry.
2018-03-06 00:09:53.730840 I | embed: rejected connection from "10.28.2.119:55094" (error "remote error: tls: internal error", ServerName "example-d9hvr87bh4.example.default.svc")
2018-03-06 00:10:06.742990 I | embed: rejected connection from "10.28.2.119:55120" (error "remote error: tls: internal error", ServerName "example-d9hvr87bh4.example.default.svc")

Operator failed creating etcd client:

creating etcd client failed: context deadline exceeded

@hongchaodeng hongchaodeng changed the title dep: update etcd 3.2.16, kubernetes 1.10.0-beta.1, aws-sdk-go 1.13.8, k8s codegen 1.10 dep: update etcd 3.2.16, kubernetes 1.10.0-beta.1, k8s codegen 1.10 Mar 6, 2018
@hongchaodeng
Copy link
Member Author

hongchaodeng commented Mar 6, 2018

Update:
Seems like it was some dep mistake that dep 0.4 has changed the semantics: https://golang.github.io/dep/docs/Gopkg.toml.html#version

If so we also need to update test infra to bump up to the 0.4 dep. done

@hongchaodeng
Copy link
Member Author

I have pinned down grpc version in Gopkg.toml :

[[override]]
  name = "google.golang.org/grpc"
  version = "=v1.7.5"

Otherwise it will somehow bump to "1.10.0" .

But I'm still hitting the TLS issue:

2018-03-06 05:26:12.671477 I | embed: rejected connection from "127.0.0.1:52016" (error "tls: failed to verify client's certificate: x509: certificate specifies an incompatible key usage", ServerName "")
WARNING: 2018/03/06 05:26:12 Failed to dial 0.0.0.0:2379: connection error: desc = "transport: authentication handshake failed: remote error: tls: bad certificate"; please retry.
2018-03-06 05:26:15.048265 I | embed: rejected connection from "10.28.4.252:33988" (error "remote error: tls: internal error", ServerName "example-mx55hzcd7h.example.default.svc")

@hongchaodeng
Copy link
Member Author

hongchaodeng commented Mar 6, 2018

We dig into the TLS handshake process.

Here's some background: server will request certificate from client and verify the key usage of the cert:
https://github.com/golang/go/blob/6732fcc06df713fc737cee5c5860bad87599bc6d/src/crypto/tls/handshake_server.go#L731

We found that after bumping the client version as in this PR, the cert.ExtKeyUsage becomes different.
Previously it was ClientAuth, afterwards it becomes ServerAuth. The KeyUsage to verify is ClientAuth -- that's why the Verify() failed:
https://github.com/golang/go/blob/6732fcc06df713fc737cee5c5860bad87599bc6d/src/crypto/x509/verify.go#L340

And printed out

failed to verify client's certificate: x509: certificate specifies an incompatible key usage

@hongchaodeng
Copy link
Member Author

After digging, the incompatible key usage and tls: bad certificate do not stop etcd from serving client requests.

When operator tried to create etcd client and failed, etcd logs tls: internal error:

2018-03-07 23:26:29.598194 I | embed: rejected connection from "10.28.2.219:55624" (error "remote error: tls: internal error", ServerName "example-q7ncrlm48d.example.default.svc")

@hongchaodeng
Copy link
Member Author

OK. After I directly inject certs onto etcd operator volumes and creating TLSConfig without reading secrets, it works. I guess this is the direction to go. I will debug reading secrets.

@hongchaodeng
Copy link
Member Author

I have made it work.

The issue is caused by previously we deleted the certs directory after creating TLSConfig. But it seems like 3.2.16 etcd client package needs to reload those files again. As a result after removing the code that deletes the certs dir, it works.

@hongchaodeng
Copy link
Member Author

hongchaodeng commented Mar 8, 2018

More details on the code:

TLSConfig has a lazy func GetClientCertificate():

https://github.com/coreos/etcd/blob/a537163e9e67a145035c3f6611788a6746044dfc/pkg/transport/listener.go#L180-L181

GetClientCertificate() doc:
https://github.com/golang/go/blob/c8aec4095e089ff6ac50d18e97c3f46561f14f48/src/crypto/tls/common.go#L372-L385

is called when a server requests a certificate from a client.

When it is called, it loads cert files from disk:

tlsutil.NewCert(info.CertFile, info.KeyFile, info.parseFunc)

Thus, we can't delete cert files.

@hongchaodeng hongchaodeng changed the title dep: update etcd 3.2.16, kubernetes 1.10.0-beta.1, k8s codegen 1.10 dep: update etcd 3.2.16, kubernetes 1.10.0, k8s codegen 1.10 Mar 8, 2018
@hasbro17
Copy link
Contributor

@hongchaodeng SGTM. Do we need to update the default etcd base image to 3.2.16 as well?

@hongchaodeng
Copy link
Member Author

Do we need to update the default etcd base image to 3.2.16 as well?

That would be ideal.
Let's change the client first and then the server too.

@hongchaodeng
Copy link
Member Author

@hasbro17
Can you merge this?

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants