-
Notifications
You must be signed in to change notification settings - Fork 68
Error while running the ALL stage on CI builds #1188
Comments
Both Python and PHP error fail with the following message, which is the very same error/stacktrace as in the all stage.
|
EDIT: This is now at elastic/kibana#107300 |
We think we have a fix for this now. We're just waiting for new images to be built and hopefully the ITs should come back. So, we're blocked until those images are built and we can test them. |
It does not look like the fix to Kibana startup time had the desired effect. The next area of exploration is going to be to try an upgrade of Docker across the workers, in case this is somehow related to a bug in Docker disrupting connectivity between services. |
Learned from reading this issue that |
Hi. Should we re-open this or start a separate issue? The IT tests for some languages (at least Node.js and Java) are still failing occasionally with:
On a slack discussion, @graphaelli noticed that some of these errors are with the
The README for opbeans-app.sh says (emphasis mine):
Give that, it doesn't strike me as totally valid to just excluding starting Kibana from this test script. Thoughts? |
@cachedout Regarding #1245 setting
However, I'm pretty ignorant on the intent of "opbeans-app.sh", so if others think excluding Kibana here is fine, then cool. Has there been any understanding of or digging into why the Kibana container is not starting -- or not starting fast enough? |
Hi @trentm There's been a pretty significant amount of debugging around the Kibana startup issue. Much of it is linked to this issue in comments above. The largest problem that we have here is that it's very hard to replicate this problem locally so it's hard to get an environment where can get the Kibana folks usable information. (I've seen it occur once locally out of about 200 tries.) To make matters worse, I can't even replicate it using the APM Integration Test suite on a manually provisioned worker. The only place we ever see this is in the CI pipeline for the APM Integration Test suite. We've done things like trying to turn up the logging, but haven't been able to capture any additional information. Kibana is tracking a couple of issues right now which seem to present themselves in similar ways. Here is the most recent and probably the most relevant: It's certainly possible that Kibana will fix this upstream and that will resolve the issue. However, I also noticed that on our end that we originally wrote test startup scripts for agents which included I will cc: @v1v here who originally wrote the README entry that you reference. I'm not sure whether he meant "the Elastic stack" (ES+KB..etc) or if by "stack" he meant the set of opbeans applications. Regardless, it's certainly fine to shelf this PR and hope that Kibana fixes this upstream but I don't think there's much more we can do from our end outside of working around it as this PR does. |
The original PR was #763 The idea of the validation was to ensure all the pieces work as expected:
The reason was to detect breaking changes earlier in case the APM agent version affected the opbeans app. Though, there are no tests that verify the Opbeans App in the APM-ITs; the APM agent team didn't see any reason to add any tests. If the latest |
Thanks for the information. As long as none of the APM agent or Opbeans App integration tests are testing central config (that would require a Kibana, or later Fleet server, I guess), then it sounds good to me to add Here is my understanding of what is/was failing. Apologies if this was already known and/or written down somewhere -- I didn't see it. Don't feel obliged the answer the questions I have here. They are only things I would investigate further if issues persist after merging #1245
Aside: Interestingly it seems that docker-compose config version 3 has dropped support for having a
|
We are now tracking what we believe to be the true cause of this issue upstream in the Kibana repo: |
It seems it's the apm-server container, which is not healthy on compose startup:
CI Job: https://apm-ci.elastic.co/blue/organizations/jenkins/apm-integration-test-downstream/detail/master/9929/pipeline
The text was updated successfully, but these errors were encountered: