Skip to content
This repository has been archived by the owner on Feb 5, 2020. It is now read-only.

Tectonic console broken on vSphere #3080

Closed
bodgit opened this issue Mar 7, 2018 · 10 comments
Closed

Tectonic console broken on vSphere #3080

bodgit opened this issue Mar 7, 2018 · 10 comments

Comments

@bodgit
Copy link

bodgit commented Mar 7, 2018

Is this a BUG REPORT or FEATURE REQUEST?

BUG REPORT

Versions

Tectonic version (release or commit hash):

1.8.7-tectonic.2

Terraform version (terraform version):

0.10.8

Platform (aws|azure|openstack|metal|vmware):

vmware

What happened?

The install completes cleanly (after working around #3051) however the Tectonic console is inaccessible.

What you expected to happen?

Tectonic console should be accessible

How to reproduce it (as minimally and precisely as possible)?

Follow the instructions here: https://github.com/coreos/tectonic-docs/blob/master/Documentation/install/vmware/vmware-terraform.md

PR #2911 changed the HTTPS ingress port from 443 to 32000 however the tectonic console pods still try to use 443 to access the identity service so they never come up.

I think the intention was that there should be a load balancer used somewhere that balanced port 443 as a service across port 32000 on the worker nodes and that the DNS for the ingress domain should be pointed at the load balancer however that isn't mentioned in the documentation.

Reverting #2911 made the console work again, matching the documentation.

Anything else we need to know?

My original google groups topic is here: https://groups.google.com/d/topic/coreos-user/bsmWjYqdOCs/discussion

This has all of the details of my setup.

References

@squat
Copy link
Contributor

squat commented Mar 7, 2018

@lander2k2 as I wrote in #3016, PR #2911 clearly introduced this bug. It changes the ingress controller strategy to nodePort for vmware, however there is no load balancer to PNAT the requests from 443->32000 so the console is never able to reach identity at https://:443/identity.

Should we revert the change?

@MikaSoinetsalo
Copy link

@squat - yes, I can confirm this. I'm using the lastest versions of both Tectonic (tectonic_1.8.7-tectonic.2) and CoreOS (v1632.3.0) and also use the "builtin" Terraform (v0.10.7). Platform is vSphere 6.5. I had the same issue as the others (everything else was working as expected but console and Prometheus were in CrashLoop) and the solution was to change that NodePort to HostPort on VMware platform configuration file.

@squat
Copy link
Contributor

squat commented Mar 14, 2018

@MikaNikulin thanks for the input. This bug is verified both analytically and practically. I wanted to give @lander2k2 a chance to chime in before reverting the change.

@lander2k2
Copy link
Contributor

@squat Sorry I didn't chime in earlier. If we revert this change it will break for some enterprise users of the installer.

@bodgit Your assumption that a load balancer should be used is correct. Using hostPort on worker nodes in production is not a good idea [1].

I would suggest we update the documentation rather than revert the change.

[1] https://kubernetes.io/docs/concepts/configuration/overview/#services

@lander2k2
Copy link
Contributor

Opened a PR on docs repo for this: coreos/tectonic-docs#150

@bodgit
Copy link
Author

bodgit commented Mar 15, 2018

@lander2k2 I agree with your point about production, however it becomes a lot harder to kick the tyres and test it if I also need to set up a load balancer (which doesn't currently exist for me). Having the configuration tweaks necessary to run without a load balancer mentioned in the documentation is acceptable though.

@galingit
Copy link

Hi guys, I experience similar issues like you stated above but for me the error in console container is this:

2018/03/16 14:10:55 http: Provider config sync still failing, retrying in 16s: missing required field subject_types_supported

Any clues?

@squat
Copy link
Contributor

squat commented Mar 16, 2018

@bodgit if that terraform option is acceptable given the documented ideal flow, then lets close this. I just merged @lander2k2's PR with the updated documentation.

@squat squat closed this as completed Mar 16, 2018
@squat
Copy link
Contributor

squat commented Mar 16, 2018

@galingit this means that the console thinks it is able to contact the identity server, but that the response is malformed in some way. You will need to ensure that when you curl
https:///identity/.well-known/openid-configuration you get a valid OIDC JSON config. Open up a new issue please so we can track it.

@MikaSoinetsalo
Copy link

MikaSoinetsalo commented Mar 16, 2018

@galingit Douple check your DNS settings and then from terraform.tfvars file section ->
// The domain name which resolves to Tectonic Ingress (i.e. worker node(s))
tectonic_vmware_ingress_domain = "host.domain.com"
That string above should contain as an example host.domain.com and not host-k8s.domain.com which instead should be used here -> // The domain name which resolves to controller node(s)
tectonic_vmware_controller_domain = "host-k8s.domain.com"

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants