Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Ingest Manager] Enforce maximum open connections for some Fleet routes #71221

Closed
wants to merge 3 commits into from

Conversation

jfsiii
Copy link
Contributor

@jfsiii jfsiii commented Jul 9, 2020

Start with #70775 for platform lifecycle hooks and add code from #70495

Fleet-specific code is in f24ac84

Draft because it's missing some of the decrements and doesn't return to 0 connections. See comments

@jfsiii
Copy link
Contributor Author

jfsiii commented Jul 9, 2020

The incrementing and decrementing aren't balanced and the concurrentRequests count drifts upwards over time

50 connections / max 100
 hey -c 50 -z 5m -m POST -H 'kbn-xsrf: <string>' -H 'Authorization: ApiKey Y0NrTk1ITUJtOVFQOUE4ZWJvLTQ6bEQ1TllhbUJTaUdJUDVqTDk1cUdFUQ==' -H 'Content-Type: application/json' -d '{
    "type": "PERMANENT",
    "metadata": {
        "local": {
            "host": "localhost",
            "ip": "127.0.0.1",
            "system": "Darwin 18.7.0",
            "memory": 34359738368,
            "elastic": {"agent": {"version": "8.0.0"} }
        },
        "user_provided": {
            "dev_agent_version": "0.0.1",
            "region": "us-east"
        }
    }
}' 'http://localhost:5601/api/ingest_manager/fleet/agents/enroll'

Summary:
  Total:	300.0870 secs
  Slowest:	3.6310 secs
  Fastest:	0.9974 secs
  Average:	1.5436 secs
  Requests/sec:	32.3906

  Total data:	5491800 bytes
  Size/request:	565 bytes

Response time histogram:
  0.997 [1]	|
  1.261 [5203]	|■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
  1.524 [900]	|■■■■■■■
  1.788 [0]	|
  2.051 [37]	|
  2.314 [2791]	|■■■■■■■■■■■■■■■■■■■■■
  2.578 [769]	|■■■■■■
  2.841 [5]	|
  3.104 [13]	|
  3.368 [0]	|
  3.631 [1]	|


Latency distribution:
  10% in 1.0733 secs
  25% in 1.0988 secs
  50% in 1.2276 secs
  75% in 2.1402 secs
  90% in 2.2802 secs
  95% in 2.3964 secs
  99% in 2.5039 secs

Details (average, fastest, slowest):
  DNS+dialup:	0.0000 secs, 0.9974 secs, 3.6310 secs
  DNS-lookup:	0.0000 secs, 0.0000 secs, 0.0017 secs
  req write:	0.0000 secs, 0.0000 secs, 0.0003 secs
  resp wait:	1.5435 secs, 0.9974 secs, 3.6309 secs
  resp read:	0.0000 secs, 0.0000 secs, 0.0015 secs

Status code distribution:
  [200]	9720 responses

logs for above run

logs show 0 open connections after the test has stopped.

Hit /enroll with 500 connections for 5 minutes.
hey -c 500 -z 5m -m POST -H 'kbn-xsrf: <string>' -H 'Authorization: ApiKey Y0NrTk1ITUJtOVFQOUE4ZWJvLTQ6bEQ1TllhbUJTaUdJUDVqTDk1cUdFUQ==' -H 'Content-Type: application/json' -d '{
    "type": "PERMANENT",
    "metadata": {
        "local": {
            "host": "localhost",
            "ip": "127.0.0.1",
            "system": "Darwin 18.7.0",
            "memory": 34359738368,
            "elastic": {"agent": {"version": "8.0.0"} }
        },
        "user_provided": {
            "dev_agent_version": "0.0.1",
            "region": "us-east"
        }
    }
}' 'http://localhost:5601/api/ingest_manager/fleet/agents/enroll'


Summary:
  Total:	302.0166 secs
  Slowest:	19.9917 secs
  Fastest:	0.3924 secs
  Average:	4.1819 secs
  Requests/sec:	73342.2436

  Total data:	1870756 bytes
  Size/request:	119 bytes

Response time histogram:
  0.392 [1]	|
  2.352 [2697]	|■■■■■■■■■■■■■■■■
  4.312 [6759]	|■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
  6.272 [4709]	|■■■■■■■■■■■■■■■■■■■■■■■■■■■■
  8.232 [1010]	|■■■■■■
  10.192 [183]	|■
  12.152 [57]	|
  14.112 [41]	|
  16.072 [48]	|
  18.032 [73]	|
  19.992 [55]	|


Latency distribution:
  10% in 1.7412 secs
  25% in 2.9998 secs
  50% in 3.6789 secs
  75% in 5.3358 secs
  90% in 6.2026 secs
  95% in 7.3973 secs
  99% in 15.1635 secs

Details (average, fastest, slowest):
  DNS+dialup:	0.0003 secs, 0.3924 secs, 19.9917 secs
  DNS-lookup:	0.0122 secs, 0.0000 secs, 0.2994 secs
  req write:	0.0008 secs, 0.0000 secs, 0.1663 secs
  resp wait:	4.1553 secs, 0.3861 secs, 19.9667 secs
  resp read:	0.0039 secs, 0.0000 secs, 0.2027 secs

Status code distribution:
  [200]	1454 responses
  [429]	14179 responses

Error distribution:
  [466]	Post http://localhost:5601/api/ingest_manager/fleet/agents/enroll: dial tcp 127.0.0.1:5601: socket: too many open files
  [699]	Post http://localhost:5601/api/ingest_manager/fleet/agents/enroll: dial tcp [::1]:5601: connect: connection refused
  [14678]	Post http://localhost:5601/api/ingest_manager/fleet/agents/enroll: dial tcp [::1]:5601: socket: too many open files
  [22118609]	Post http://localhost:5601/api/ingest_manager/fleet/agents/enroll: dial tcp: lookup localhost: no such host
  [492]	Post http://localhost:5601/api/ingest_manager/fleet/agents/enroll: net/http: request canceled (Client.Timeout exceeded while awaiting headers)

log from above run (max connections set to 100)

logs show 100 (not 0) open connections after the test has stopped.

Hit /enroll with 500 connections for 5 minutes
hey -c 500 -z 5m -m POST -H 'kbn-xsrf: <string>' -H 'Authorization: ApiKey Y0NrTk1ITUJtOVFQOUE4ZWJvLTQ6bEQ1TllhbUJTaUdJUDVqTDk1cUdFUQ==' -H 'Content-Type: application/json' -d '{
    "type": "PERMANENT",
    "metadata": {
        "local": {
            "host": "localhost",
            "ip": "127.0.0.1",
            "system": "Darwin 18.7.0",
            "memory": 34359738368,
            "elastic": {"agent": {"version": "8.0.0"} }
        },
        "user_provided": {
            "dev_agent_version": "0.0.1",
            "region": "us-east"
        }
    }
}' 'http://localhost:5601/api/ingest_manager/fleet/agents/enroll'

Summary:
  Total:	302.0432 secs
  Slowest:	19.9219 secs
  Fastest:	1.3980 secs
  Average:	6.0786 secs
  Requests/sec:	93635.7291

  Total data:	6144940 bytes
  Size/request:	565 bytes

Response time histogram:
  1.398 [1]	|
  3.250 [146]	|■
  5.103 [2334]	|■■■■■■■■■■■■■■■
  6.955 [6113]	|■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
  8.808 [1927]	|■■■■■■■■■■■■■
  10.660 [254]	|■■
  12.512 [6]	|
  14.365 [28]	|
  16.217 [18]	|
  18.070 [16]	|
  19.922 [33]	|


Latency distribution:
  10% in 4.3191 secs
  25% in 5.2099 secs
  50% in 6.0041 secs
  75% in 6.8137 secs
  90% in 7.6883 secs
  95% in 8.2170 secs
  99% in 10.4541 secs

Details (average, fastest, slowest):
  DNS+dialup:	0.0002 secs, 1.3980 secs, 19.9219 secs
  DNS-lookup:	0.0033 secs, 0.0000 secs, 0.3403 secs
  req write:	0.0004 secs, 0.0000 secs, 0.3333 secs
  resp wait:	6.0640 secs, 1.3902 secs, 19.9068 secs
  resp read:	0.0025 secs, 0.0000 secs, 0.1354 secs

Status code distribution:
  [200]	10876 responses

Error distribution:
  [440]	Post http://localhost:5601/api/ingest_manager/fleet/agents/enroll: dial tcp 127.0.0.1:5601: socket: too many open files
  [344]	Post http://localhost:5601/api/ingest_manager/fleet/agents/enroll: dial tcp [::1]:5601: connect: connection refused
  [7725]	Post http://localhost:5601/api/ingest_manager/fleet/agents/enroll: dial tcp [::1]:5601: socket: too many open files
  [28262206]	Post http://localhost:5601/api/ingest_manager/fleet/agents/enroll: dial tcp: lookup localhost: no such host
  [445]	Post http://localhost:5601/api/ingest_manager/fleet/agents/enroll: net/http: request canceled (Client.Timeout exceeded while awaiting headers)

log from above run (max connections set to 1000)

Note the logs show 440 (not 0) open connections after the test has stopped.

In that case, it's notable that the load test program showed 440 too many open files errors

  [440]	Post http://localhost:5601/api/ingest_manager/fleet/agents/enroll: dial tcp 127.0.0.1:5601: socket: too many open files

@kobelb @restrry Any ideas why concurrent requests keeps climbing? I don't know our/hapi's lifecycle very well. Are there different/additional hooks we should use? Can we change something about the places we increment/decrement now?

@ph
Copy link
Contributor

ph commented Jul 9, 2020

lot of changes here! cc @roncohen

@jfsiii
Copy link
Contributor Author

jfsiii commented Jul 9, 2020

lot of changes here! cc @roncohen

@ph I tried to address this in the description. This branch starts from an existing PR #70775 The only changes for Fleet are in f24ac84

@ph
Copy link
Contributor

ph commented Jul 9, 2020

@kobelb If you can take a look that would be great!

@kobelb
Copy link
Contributor

kobelb commented Jul 9, 2020

Hey @ph, I took a look at what was likely causing the missing decrements this morning. The relevant conversation starts here: #70495 (comment)

@kibanamachine
Copy link
Contributor

💚 Build Succeeded

Build metrics

✅ unchanged

History

To update your PR or re-run it, just comment with:
@elasticmachine merge upstream

@jfsiii
Copy link
Contributor Author

jfsiii commented Jul 14, 2020

Close this in favor of #71552 since #70775 landed first

@jfsiii jfsiii closed this Jul 14, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Team:Fleet Team label for Observability Data Collection Fleet team
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants