Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

configs which contain jaeger config will hang the router when updated #331

Closed
garypen opened this issue Jan 17, 2022 · 0 comments · Fixed by #337
Closed

configs which contain jaeger config will hang the router when updated #331

garypen opened this issue Jan 17, 2022 · 0 comments · Fixed by #337
Assignees
Labels

Comments

@garypen
Copy link
Contributor

garypen commented Jan 17, 2022

Describe the bug

To Reproduce
Create a configuration for the router which contains jaeger opentelemetry. e.g.:

# Configuration of the router's HTTP server
server:
  # The socket address and port to listen on
  # Defaults to 127.0.0.1:4000
  listen: 127.0.0.1:4000

  # Cross origin request headers
  cors:
    # Set to false to disallow any origin and rely exclusively on `origins`
    # Defaults to true
    allow_any_origin: true
    # List of accepted origins
    origins:
      - https://studio.apollographql.com
    # Set to true to add the `Access-Control-Allow-Credentials` header
    allow_credentials: false
    # The headers to allow.
    # Defaults to the required request header for Apollo Studio: Content-Type
    allow_headers: [ fake, content-type, apollographql-client-name, apollographql-client-version ]
    # Allowed request methods
    # Defaults to GET, POST, OPTIONS.
    methods: [ GET, POST, OPTIONS ]
    # Which response headers should be made available to scripts running in the browser,
    # in response to a cross-origin request.
    expose_headers:

# Names and URLs of all subgraphs
# By default, this information is parsed from the supergraph schema
# Provide it here to override subgraph names or URLs
subgraphs:
  # Defines a subgraph named "accounts"
  accounts:
    # The URL of the accounts subgraph's GraphQL endpoint
    routing_url: http://localhost:4001/graphql
  # Defines a second subgraph named "products"
  products:
    routing_url: http://localhost:4003/graphql

# OpenTelemetry configuration, see next section
# opentelemetry:
opentelemetry:
  jaeger:

Save this in a yaml config file, e.g.: config_with_jaeger.yml

Now run the router as follows:

cargo run -- -c config_with_jaeger.yml -w -s ./examples/local.graphql

The important part is to enable config watching (-w) so that changes to the config file cause the router to respond. With the router running, edit the config file to remove the opentelemetry entry. Observe the server respond. Now add the opentelemetry entry back into the configuration and note that the router does not respond. At this point if you try to shut down the router, Ctrl-C, it will not respond and you can only shut down the router with kill -9.

Expected behavior
I expect to be able to change configuration details and have the router respond when it is watching configuration. I also expect to be able to cause the router to shut down with Ctrl-C at any time.

Output
Sample interaction with router (configuration edits performed in a separate terminal)

garypen@Garys-MBP router % cargo run -- -c config_with_jaeger.yml -w -s ./examples/local.graphql
    Finished dev [unoptimized + debuginfo] target(s) in 0.22s
     Running `target/debug/router -c config_with_jaeger.yml -w -s ./examples/local.graphql`
2022-01-17T09:46:58.361383Z  INFO router: Starting Apollo Router
*******************************************************************
⚠️  Experimental software, not YET recommended for production use ⚠️
*******************************************************************
2022-01-17T09:46:58.365296Z  WARN apollo_router::configuration: overriding URL from subgraph accounts at http://localhost:4001 with URL from the configuration file: http://localhost:4001/graphql
2022-01-17T09:46:58.365312Z  WARN apollo_router::configuration: overriding URL from subgraph products at http://localhost:4003 with URL from the configuration file: http://localhost:4003/graphql
2022-01-17T09:46:58.458733Z  INFO router: Listening on http://127.0.0.1:4000 🚀
2022-01-17T09:47:05.648084Z  INFO apollo_router::state_machine: Reloading configuration
2022-01-17T09:47:05.648329Z  WARN apollo_router::configuration: overriding URL from subgraph accounts at http://localhost:4001 with URL from the configuration file: http://localhost:4001/graphql
2022-01-17T09:47:05.648366Z  WARN apollo_router::configuration: overriding URL from subgraph products at http://localhost:4003 with URL from the configuration file: http://localhost:4003/graphql
2022-01-17T09:47:05.765945Z  INFO apollo_router::http_server_factory: previous server is closed
^C^C^C^C^C^Z
zsh: suspended  cargo run -- -c config_with_jaeger.yml -w -s ./examples/local.graphql
garypen@Garys-MBP router % kill -9 %1

Desktop (please complete the following information):

  • OS: Mac OS X
  • Version Darwin Garys-MBP.mhs 21.2.0 Darwin Kernel Version 21.2.0: Sun Nov 28 20:28:41 PST 2021; root:xnu-8019.61.5~1/RELEASE_ARM64_T6000 arm64

Additional context
It may be that this is an issue with OTLP telemetry configuration as well. It may be a more general problem, which is not telemetry specific - hard to say without more investigation.

@garypen garypen added the triage label Jan 17, 2022
garypen added a commit that referenced this issue Jan 17, 2022
Turns out that the code hangs in main branch and is not related to my
changes. See: #331 for more details.

I don't like these statemachine changes, but until the fix for #331 is
figured out I may as well retain them.
@garypen garypen changed the title changing config to remove and re-add jaeger opentelemetry details hangs telemetry configs which contain jaeger config will hang the router when updated Jan 18, 2022
garypen added a commit that referenced this issue Jan 18, 2022
Incorporate the fix for #331 and re-structure the code a little.

Framework for statistic reporting is now working reliably with router
configuration changes. Main outstanding work is:
 - Implement normalization
 - Implement real statistics
 - Add support for OTLP configuration
garypen added a commit that referenced this issue Jan 19, 2022
 - use spawn_blocking() in preference to spawning a thread
 - update the comment block to refer directly to issue #331
garypen added a commit that referenced this issue Jan 19, 2022
…#337)

* configs which contain jaeger config will hang the router when updated

If a configuration is updated which contains jaeger configuration, then
at some point a synchronous call to:
  opentelemetry::global::set_tracer_provider()
will arise. Even though this call is made from a synchronous function,
the root context is asynchronous and it seems to cause the runtime to
mis-behave: i.e.: hang...

If we make the call from a newly spawned thread (no performance concerns
here because re-configuration is not hot code) then the problem goes
away.

* Address review feedback

 - use spawn_blocking() in preference to spawning a thread
 - update the comment block to refer directly to issue #331

fixes: #337
@abernix abernix added bug and removed triage labels Feb 4, 2022
tinnou pushed a commit to Netflix-Skunkworks/router that referenced this issue Oct 16, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants