Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Address field in listener not working (upstream connect error or disconnect/reset before headers) #815

Closed
vijayendrabvs opened this issue Apr 22, 2017 · 19 comments
Labels
question Questions that are neither investigations, bugs, nor enhancements

Comments

@vijayendrabvs
Copy link
Contributor

Am not sure if this is related to #326 , still referencing the issue since it has the same error message, but on the face of it they seem to be different.

I'm trying to get the address field in a listener to work and I'm unable to do that. I've written a simple shell script as a test harness that will do the following - (this will work only on linux) -

  1. Spins up an envoy container named envoyct1 with a default config and installs curl and python packages in it.
  2. Uses nsenter to plumb two hardcoded IPs on eth0 of the envoy container that is spun up. These two IPs are simulations for two VIPs.
  3. Copies over a new envoy config that has two listeners configured with the two IPs, on port 80.
  4. Spins up a backend python server on port 9001 for service/1 backend.

When I docker exec into envoyct1, and fire curl <VIP1>/service/1, I expect to get a 404. But I see this error -

bash-4.3# curl 192.45.67.90/service/1
upstream connect error or disconnect/reset before headersbash-4.3#

If I spin up a python server on a different port, and curl the IP:port, it works -

bash-4.3# python -m SimpleHTTPServer 9002
Serving HTTP on 0.0.0.0 port 9002 ...
192.45.67.90 - - [22/Apr/2017 01:44:27] "GET / HTTP/1.1" 200 -

bash-4.3# curl 192.45.67.90:9002
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 3.2 Final//EN"><html>
<title>Directory listing for /</title>
<body>
<h2>Directory listing for /</h2>
<hr>
<ul>
<li><a href=".dockerenv">.dockerenv</a>
<li><a href="bin/">bin/</a>
<li><a href="dev/">dev/</a>
<li><a href="etc/">etc/</a>
<li><a href="home/">home/</a>
<li><a href="lib/">lib/</a>
<li><a href="lib64/">lib64/</a>
<li><a href="media/">media/</a>
<li><a href="mnt/">mnt/</a>
<li><a href="proc/">proc/</a>
<li><a href="root/">root/</a>
<li><a href="run/">run/</a>
<li><a href="sbin/">sbin/</a>
<li><a href="srv/">srv/</a>
<li><a href="sys/">sys/</a>
<li><a href="tmp/">tmp/</a>
<li><a href="usr/">usr/</a>
<li><a href="var/">var/</a>
</ul>
<hr>
</body>
</html>
bash-4.3#

So this doesn't look like a network configuration issue (the curl is being issued from inside the envoy container).

Is this an envoy config issue? Or some other?

When I tried to debug this using gdb and a debug envoy build, it looked like a worker thread that handles the connection request somewhere in the connection_manager_impl.cc chain sees a socket close event and so spits out this error. I'm not sure why it should see a socket close event..

Am I doing something wrong with the config? Can someone please take a look?

BTW, it doesn't matter if I have one or two listeners in my config file. It's the same result. Also, it doesn't matter whether I plumb the VIPs or not - using a simple 127.0.0.10 loopback IP yields the same result.

I'm attaching the harness as a zip file. Unzip it and simply run ./setup_ifaces.sh, and it'll spin up an envoy alpine container and do the rest of the plumbing. If you fire ./setup_ifaces.sh ubuntu, it will pull the lyft/envoy ubuntu image instead and do the same stuff there.

So basically, this happens across ubuntu/alpine, loopback/eth0. Any pointers/help would be much appreciated.

Thanks!

setup_envoy_multiple_listener.zip

@mattklein123 mattklein123 added the question Questions that are neither investigations, bugs, nor enhancements label Apr 23, 2017
@mattklein123
Copy link
Member

"upstream connect error or disconnect/reset before headers" means that Envoy cannot connect to the upstream that is being routed to. Your listener config is probably fine. I would use a combination of the /stats and /clusters admin endpoint output to debug further, and verify that you can connect to your backend services from within the Envoy container.

@vijayendrabvs
Copy link
Contributor Author

@mattklein123 Thanks for taking a look! The text below is a bit long owing to the outputs I've pasted - thanks in advance for reading through them!

When I look at the /clusters output, I see service1 and service 2 there, with a series of entries for 127.0.0.1:9001 (the python backend service), but for it, I see the cx_connect_fail stat set to 0 - if it were a connectivity issue from the envoy server, that shouldn't be 0, correct?

bash-4.3# curl 127.0.0.10:8001/clusters
service1::default_priority::max_connections::1024
service1::default_priority::max_pending_requests::1024
service1::default_priority::max_requests::1024
service1::default_priority::max_retries::3
service1::high_priority::max_connections::1024
service1::high_priority::max_pending_requests::1024
service1::high_priority::max_requests::1024
service1::high_priority::max_retries::3
service1::127.0.0.1:9001::cx_active::0
service1::127.0.0.1:9001::cx_connect_fail::0
service1::127.0.0.1:9001::cx_total::0
service1::127.0.0.1:9001::rq_active::0
service1::127.0.0.1:9001::rq_timeout::0
service1::127.0.0.1:9001::rq_total::0
service1::127.0.0.1:9001::health_flags::healthy
service1::127.0.0.1:9001::weight::1
service1::127.0.0.1:9001::zone::
service1::127.0.0.1:9001::canary::false
service1::127.0.0.1:9001::success_rate::-1
service2::default_priority::max_connections::1024
service2::default_priority::max_pending_requests::1024
service2::default_priority::max_requests::1024
service2::default_priority::max_retries::3
service2::high_priority::max_connections::1024
service2::high_priority::max_pending_requests::1024
service2::high_priority::max_requests::1024
service2::high_priority::max_retries::3
bash-4.3#

With the /stats output, I see some parameters that seem to apply here, pasting only them below (and the complete output in a separate excerpt below that) -

cluster.service1.max_host_weight: 1
cluster.service1.membership_change: 1
cluster.service1.membership_healthy: 1
cluster.service1.membership_total: 1
cluster.service1.update_attempt: 49
cluster.service1.update_failure: 0
cluster.service1.update_success: 49

The membership_healthy value shows 1 , which I infer means that envoy is able to see the backend service1 in the cluster - is that the case?

What are the update attempts referring to? They also seem to have gone through successfully 100% of the time (49 attempts).

complete output -

bash-4.3# curl 127.0.0.10:8001/stats  | grep service1
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  95cluster.service1.lb_healthy_panic: 0 0 --:--:-- --:--:-- --:--:--     0
10    0  9510    0     0  5479k      0 --:--:-- --:--:-- --cluster.service1.lb_local_cluster_not_ok: 0
:--:-- 9287k
cluster.service1.lb_recalculate_zone_structures: 0
cluster.service1.lb_zone_cluster_too_small: 0
cluster.service1.lb_zone_no_capacity_left: 0
cluster.service1.lb_zone_number_differs: 0
cluster.service1.lb_zone_routing_all_directly: 0
cluster.service1.lb_zone_routing_cross_zone: 0
cluster.service1.lb_zone_routing_sampled: 0
cluster.service1.max_host_weight: 1
cluster.service1.membership_change: 1
cluster.service1.membership_healthy: 1
cluster.service1.membership_total: 1
cluster.service1.update_attempt: 49
cluster.service1.update_failure: 0
cluster.service1.update_success: 49
cluster.service1.upstream_cx_active: 0
cluster.service1.upstream_cx_close_header: 0
cluster.service1.upstream_cx_connect_fail: 0
cluster.service1.upstream_cx_connect_timeout: 0
cluster.service1.upstream_cx_destroy: 0
cluster.service1.upstream_cx_destroy_local: 0
cluster.service1.upstream_cx_destroy_local_with_active_rq: 0
cluster.service1.upstream_cx_destroy_remote: 0
cluster.service1.upstream_cx_destroy_remote_with_active_rq: 0
cluster.service1.upstream_cx_destroy_with_active_rq: 0
cluster.service1.upstream_cx_http1_total: 0
cluster.service1.upstream_cx_http2_total: 0
cluster.service1.upstream_cx_max_requests: 0
cluster.service1.upstream_cx_none_healthy: 0
cluster.service1.upstream_cx_overflow: 0
cluster.service1.upstream_cx_protocol_error: 0
cluster.service1.upstream_cx_rx_bytes_buffered: 0
cluster.service1.upstream_cx_rx_bytes_total: 0
cluster.service1.upstream_cx_total: 0
cluster.service1.upstream_cx_tx_bytes_buffered: 0
cluster.service1.upstream_cx_tx_bytes_total: 0
cluster.service1.upstream_rq_active: 0
cluster.service1.upstream_rq_cancelled: 0
cluster.service1.upstream_rq_maintenance_mode: 0
cluster.service1.upstream_rq_pending_active: 0
cluster.service1.upstream_rq_pending_failure_eject: 0
cluster.service1.upstream_rq_pending_overflow: 0
cluster.service1.upstream_rq_pending_total: 0
cluster.service1.upstream_rq_per_try_timeout: 0
cluster.service1.upstream_rq_retry: 0
cluster.service1.upstream_rq_retry_overflow: 0
cluster.service1.upstream_rq_retry_success: 0
cluster.service1.upstream_rq_rx_reset: 0
cluster.service1.upstream_rq_timeout: 0
cluster.service1.upstream_rq_total: 0
cluster.service1.upstream_rq_tx_reset: 0
bash-4.3#

Thing is, I'm able to curl the backend service (it runs in the same envoy container) from within the envoy container on both VIPs and localhost without any issues -

Here's the netstat output to begin with -

bash-4.3# hostname
a24847dd0490
bash-4.3# netstat -apn
Active Internet connections (servers and established)
Proto Recv-Q Send-Q Local Address           Foreign Address         State       PID/Program name
tcp        0      0 0.0.0.0:8001            0.0.0.0:*               LISTEN      39/envoy
tcp        0      0 0.0.0.0:9001            0.0.0.0:*               LISTEN      47/python
tcp        0      0 192.45.67.90:80         0.0.0.0:*               LISTEN      39/envoy
tcp        0      0 192.45.67.89:80         0.0.0.0:*               LISTEN      39/envoy
tcp        3      0 127.0.0.1:10000         0.0.0.0:*               LISTEN      1/envoy
tcp       85      0 127.0.0.1:10000         127.0.0.1:43004         CLOSE_WAIT  -
tcp       88      0 127.0.0.1:10000         127.0.0.1:43006         CLOSE_WAIT  -
tcp       80      0 127.0.0.1:10000         127.0.0.1:43002         CLOSE_WAIT  -
udp        0      0 172.17.0.3:55605        10.254.58.55:53         ESTABLISHED 39/envoy
udp        0      0 172.17.0.3:59536        10.241.16.126:53        ESTABLISHED 39/envoy
udp        0      0 172.17.0.3:47713        10.254.58.54:53         ESTABLISHED 39/envoy
Active UNIX domain sockets (servers and established)
Proto RefCnt Flags       Type       State         I-Node   PID/Program name     Path
unix  2      [ ]         DGRAM                    657299   39/envoy             @envoy_domain_socket_1
unix  2      [ ]         DGRAM                    666235   1/envoy              @envoy_domain_socket_0
bash-4.3#

Here's ps -

bash-4.3# ps aux
PID   USER     TIME   COMMAND
    1 root       0:00 /usr/local/bin/envoy -c /usr/local/conf/envoy/google_com_proxy.json
   39 root       0:00 /usr/local/bin/envoy -c /usr/local/conf/envoy/envoy-multiple-listener-config.json --restart-epoch 1
   47 root       0:00 python -m SimpleHTTPServer 9001
   61 root       0:00 bash
   83 root       0:00 ps aux
bash-4.3#

Now, I'm pinging the backend python simpleHTTP server directly on its 9001 port via one of the VIPs (192.45.67.89) -

bash-4.3# curl 192.45.67.89:9001
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 3.2 Final//EN"><html>
<title>Directory listing for /</title>
<body>
<h2>Directory listing for /</h2>
<hr>
<ul>
<li><a href=".dockerenv">.dockerenv</a>
<li><a href="bin/">bin/</a>
<li><a href="dev/">dev/</a>
<li><a href="etc/">etc/</a>
<li><a href="home/">home/</a>
<li><a href="lib/">lib/</a>
<li><a href="lib64/">lib64/</a>
<li><a href="media/">media/</a>
<li><a href="mnt/">mnt/</a>
<li><a href="proc/">proc/</a>
<li><a href="root/">root/</a>
<li><a href="run/">run/</a>
<li><a href="sbin/">sbin/</a>
<li><a href="srv/">srv/</a>
<li><a href="sys/">sys/</a>
<li><a href="tmp/">tmp/</a>
<li><a href="usr/">usr/</a>
<li><a href="var/">var/</a>
</ul>
<hr>
</body>
</html>
bash-4.3#

Next, I'm pinging the backend python simpleHTTP server directly on its 9001 port via the other VIP (192.45.67.90) again from within the envoy container -

bash-4.3# curl 192.45.67.90:9001
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 3.2 Final//EN"><html>
<title>Directory listing for /</title>
<body>
<h2>Directory listing for /</h2>
<hr>
<ul>
<li><a href=".dockerenv">.dockerenv</a>
<li><a href="bin/">bin/</a>
<li><a href="dev/">dev/</a>
<li><a href="etc/">etc/</a>
<li><a href="home/">home/</a>
<li><a href="lib/">lib/</a>
<li><a href="lib64/">lib64/</a>
<li><a href="media/">media/</a>
<li><a href="mnt/">mnt/</a>
<li><a href="proc/">proc/</a>
<li><a href="root/">root/</a>
<li><a href="run/">run/</a>
<li><a href="sbin/">sbin/</a>
<li><a href="srv/">srv/</a>
<li><a href="sys/">sys/</a>
<li><a href="tmp/">tmp/</a>
<li><a href="usr/">usr/</a>
<li><a href="var/">var/</a>
</ul>
<hr>
</body>
</html>
bash-4.3#

But when I try to go via the VIP on port 80 -

bash-4.3# curl -vvv 192.45.67.90:80/service/1
*   Trying 192.45.67.90...
* TCP_NODELAY set
* Connected to 192.45.67.90 (192.45.67.90) port 80 (#0)
> GET /service/1 HTTP/1.1
> Host: 192.45.67.90
> User-Agent: curl/7.52.1
> Accept: */*
>
< HTTP/1.1 503 Service Unavailable
< content-length: 57
< content-type: text/plain
< date: Sun, 23 Apr 2017 21:34:13 GMT
< server: envoy
<
* Curl_http_done: called premature == 0
* Connection #0 to host 192.45.67.90 left intact
upstream connect error or disconnect/reset before headersbash-4.3#

Why is envoy getting a 503 as a response when it should be able to reach the backend service?

Finally, the admin access log doesn't show up any new entry when I issue a curl on the VIP/service/1 path, I'm guessing that is expected. Are there any other logs that I can enable to view envoy connection activity?

bash-4.3# curl 192.45.67.90/service/1
upstream connect error or disconnect/reset before headersbash-4.3#
bash-4.3# cat /var/log/envoy/admin_access.log
[2017-04-22T00:08:14.583Z] "GET / HTTP/1.1" 404 - 0 530 0 - "172.17.0.3" "curl/7.52.1" "-" "127.0.0.10:8001" "-"
[2017-04-23T21:29:19.803Z] "GET /clusters HTTP/1.1" 200 - 0 1195 0 - "172.17.0.3" "curl/7.52.1" "-" "127.0.0.1:8001" "-"
[2017-04-23T21:29:35.917Z] "GET /admin HTTP/1.1" 404 - 0 530 0 - "172.17.0.3" "curl/7.52.1" "-" "127.0.0.1:8001" "-"
[2017-04-23T21:29:49.917Z] "GET /server_info HTTP/1.1" 200 - 0 36 0 - "172.17.0.3" "curl/7.52.1" "-" "127.0.0.1:8001" "-"
[2017-04-23T21:44:24.729Z] "GET / HTTP/1.1" 404 - 0 530 0 - "172.17.0.3" "curl/7.52.1" "-" "127.0.0.10:8001" "-"
[2017-04-23T21:44:34.330Z] "GET /clusters HTTP/1.1" 200 - 0 1195 0 - "172.17.0.3" "curl/7.52.1" "-" "127.0.0.10:8001" "-"
[2017-04-23T21:47:00.993Z] "GET /admin HTTP/1.1" 404 - 0 530 0 - "172.17.0.3" "curl/7.52.1" "-" "127.0.0.10:8001" "-"
[2017-04-23T21:47:03.843Z] "GET /stats HTTP/1.1" 200 - 0 9511 0 - "172.17.0.3" "curl/7.52.1" "-" "127.0.0.10:8001" "-"
[2017-04-23T21:47:12.991Z] "GET /stats HTTP/1.1" 200 - 0 9510 0 - "172.17.0.3" "curl/7.52.1" "-" "127.0.0.10:8001" "-"
[2017-04-23T21:47:57.548Z] "GET /stats HTTP/1.1" 200 - 0 9510 0 - "172.17.0.3" "curl/7.52.1" "-" "127.0.0.10:8001" "-"
bash-4.3#

@mattklein123
Copy link
Member

I skimmed through this quickly and I don't see any call to service1 at all in the stats above, so they are probably going to service2. I can't tell without seeing the full config, full dump of stats, and full dump of clusters output. I won't be able to help you further in this issue. If someone else doesn't help I would try Gitter for more interactive help. This is a configuration or docker setup issue.

@vijayendrabvs
Copy link
Contributor Author

Np, thanks @mattklein123 ! I'll post this on gitter.

@vijayendrabvs
Copy link
Contributor Author

@mattklein123 This issue was being caused because I plumbed subinterfaces but didn't configure any routing on them. Going via the docker network create and connect commands resolved connectivity issues and we were able to bring up multiple listeners. Thanks for your help on this!

@aambhaik
Copy link

@vijayendrabvs I am running into the same problem. My golang service is accessible from within the service container on port 9096 but not accessible through the envoy front-proxy container, with exactly the same response as you reported.

Can you provide any details on the resolution please?

@danesavot
Copy link

danesavot commented Feb 17, 2018

I'm running into the same issue today.

I can access the service from the container using curl but not able to accesss through the envoy container via http://localhost:10000/symphony
My envoy.yaml

tatic_resources:
listeners:

  • address:
    socket_address:
    address: 0.0.0.0
    port_value: 10000
    filter_chains:
    • filters:
      • name: envoy.http_connection_manager
        config:
        codec_type: auto
        stat_prefix: ingress_http
        route_config:
        name: local_route
        virtual_hosts:
        - name: backend
        domains:
        - "*"
        routes:
        - match:
        prefix: "/symphony"
        route:
        cluster: symphony
        - match:
        prefix: "/service/2"
        route:
        cluster: service2
        http_filters:
        • name: envoy.router
          config: {}
          clusters:
  • name: symphony
    connect_timeout: 0.25s
    type: STATIC
    lb_policy: round_robin
    http2_protocol_options: {}
    hosts:
    • socket_address:
      address: 10.129.16.178
      port_value: 8080
  • name: service2
    connect_timeout: 0.25s
    type: strict_dns
    lb_policy: round_robin
    http2_protocol_options: {}
    hosts:
    • socket_address:
      address: service2
      port_value: 80
      admin:
      access_log_path: "/dev/null"
      address:
      socket_address:
      address: 0.0.0.0
      port_value: 1

@AmerbankDavd
Copy link

Same issue here, is there some way to solve this?
I can wget -qO- localhost:80/ping my service from within the container but I get the error when curling the ingress: upstream connect error or disconnect/reset before headers.

@danesavot
Copy link

I resolved my issue by removing http2_protocol_options: {}

@AmerbankDavd
Copy link

Where did you change that option?

@danesavot
Copy link

Share your Envoy config file. I will take a look.

@AmerbankDavd
Copy link

Envoy is used in my Istio container. But I don't know where to find that config file.

@johnzheng1975
Copy link
Contributor

@danesavot ,
Where can you find envoy config file in istio-container? And how to change this config file?
Do we need modify them and create istio-proxy container by ourselves?

@johnzheng1975
Copy link
Contributor

@AmerbankDavd Had you resolved this?

@johnzheng1975
Copy link
Contributor

In my istio 0.5.1, there is no http2_protocol_options: {} at all.

kubectl exec -ti istio-pilot-676d495bf8-9c2px -c istio-proxy -n istio-system -- cat /etc/istio/proxy/envoy_pilot.json
{
"listeners": [
{
"address": "tcp://0.0.0.0:15003",
"name": "tcp_0.0.0.0_15003",
"filters": [
{
"type": "read",
"name": "tcp_proxy",
"config": {
"stat_prefix": "tcp",
"route_config": {
"routes": [
{
"cluster": "in.8080"
}
]
}
}
}
],
"bind_to_port": true
}
],
"admin": {
"access_log_path": "/dev/stdout",
"address": "tcp://127.0.0.1:15000"
},
"cluster_manager": {
"clusters": [
{
"name": "in.8080",
"connect_timeout_ms": 1000,
"type": "static",
"lb_type": "round_robin",
"hosts": [
{
"url": "tcp://127.0.0.1:8080"
}
]
}
]
}
}

@oinke
Copy link

oinke commented Mar 2, 2019

I have added all the changes recommended to get the hello world example to run into this repo https://github.com/oinke/gprc-hello

The terminal still shows:
server_1 | E0302 08:41:34.225022613 7 http_server_filter.cc:271] GET request without QUERY
and when i browse localhost:8080 i can see
upstream connect error or disconnect/reset before headers. reset reason: remote reset

Running on macOS Mojave 10.14.2 with Docker version 18.09.2, build 6247962

@dio
Copy link
Member

dio commented Mar 2, 2019

@oinke seems like your issue is not related to this issue. I have posted a PR (oinke/gprc-hello#1) to your repo.

@bryanmacfarlane
Copy link

bryanmacfarlane commented Oct 20, 2020

@danesavot I also resolved by commenting out the empty http2 options. huge thanks!

# http2_protocol_options: {}

Outside of that, for everyone else, if you're running containers on the host, checkout networking: https://docs.docker.com/network/network-tutorial-standalone/

I created a custom docker bridge network, had the other containers run with --network and the jumped into the envoy container and ensured I could curl to those by name.

the empty http2 options was from the envoy tutorial

@zakhenry
Copy link

I had this error and was able to connect with ping, curl, grpcurl etc no problem. The issue turned out to be the line

    connect_timeout: 0.25s

which is present in pretty much all envoy yaml demos.

In my case I was experimenting with envoy configuration locally (in New Zealand) and connecting to a grpc service in eu-west-1. which fundamentally has a higher connection time than quarter of a second. Upping that timeout fixes the issue. Hope that helps someone else!

jpsim pushed a commit that referenced this issue Nov 28, 2022
jpsim pushed a commit that referenced this issue Nov 29, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Questions that are neither investigations, bugs, nor enhancements
Projects
None yet
Development

No branches or pull requests

10 participants