Failing test: Chrome X-Pack Observability UI Functional Tests.x-pack/test/observability_functional/apps/observability/pages/overview/alert_table·ts - ObservabilityApp Observability overview Without alerts navigate and open alerts section #140507

kibanamachine · 2022-09-12T13:26:34Z

A test failed on a tracked branch

Error: retry.try timeout: TimeoutError: The element [data-test-subj="internalAlertsPageLoading"] was still present when it should have disappeared.
Wait timed out after 10165ms
    at /var/lib/buildkite-agent/builds/kb-n2-4-spot-cf11dbdfc244df0c/elastic/kibana-on-merge/kibana/node_modules/selenium-webdriver/lib/webdriver.js:906:17
    at processTicksAndRejections (node:internal/process/task_queues:96:5)
    at onFailure (test/common/services/retry/retry_for_success.ts:17:9)
    at retryForSuccess (test/common/services/retry/retry_for_success.ts:59:13)
    at RetryService.try (test/common/services/retry/retry.ts:31:12)
    at waitForAlertsTableLoadingToDisappear (x-pack/test/functional/services/observability/overview/common.ts:57:5)
    at Object.openAlertsSectionAndWaitToAppear (x-pack/test/functional/services/observability/overview/common.ts:70:5)
    at Context.<anonymous> (x-pack/test/observability_functional/apps/observability/pages/overview/alert_table.ts:35:9)
    at Object.apply (node_modules/@kbn/test/target_node/src/functional_test_runner/lib/mocha/wrap_function.js:87:16)

First failure: CI Build - main

The text was updated successfully, but these errors were encountered:

kibanamachine · 2022-09-12T14:45:31Z

New failure: CI Build - main

elasticmachine · 2022-09-12T14:55:04Z

Pinging @elastic/actionable-observability (Team: Actionable Observability)

spalger · 2022-09-12T14:55:48Z

@maryam-saeidi looks like this test is failing already and was added in #140024, going to skip, would you mind checking out the flakiness of this?

spalger · 2022-09-12T14:55:52Z

/skip

kibanamachine · 2022-09-12T14:56:23Z

Skipped

main: 30af54d

kibanamachine · 2022-09-12T15:06:34Z

New failure: CI Build - main

maryam-saeidi · 2022-09-12T15:09:25Z

@spalger Sure, I'll check it.

maryam-saeidi · 2022-09-19T12:16:28Z

Result of first round of investigation:

I’ve added a log in the else section to see what is the response in case a test fails and we end up there:

Here is the result in the log for one of the failed functional test: (after selecting log for a failed test, search for =======> else response: in the mentioned link)

{
  "id": "FmNZS3dJQ0FJUjVtVXlZTW1UVnhxU0EbVzBaR2xRcnVRTk9SQjN6VEkzSFk1Zzo3NDI2",
  "rawResponse": {
    "took": 6,
    "timed_out": false,
    "terminated_early": false,
    "num_reduce_phases": 0,
    "_shards": {
      "total": 3,
      "successful": 0,
      "skipped": 0,
      "failed": 0
    },
    "hits": {
      "total": 0,
      "max_score": null,
      "hits": []
    }
  },
  "isPartial": true,
  "isRunning": true,
  "total": 3,
  "loaded": 0,
  "isRestored": false
}

So it is not a complete response based on this condition:

isCompleteResponse: Boolean(response && response.isRunning && response.isPartial)

And it is also not a failure based on this condition:

isErrorResponse: !response || !response.rawResponse || (!response.isRunning && !!response.isPartial)

Hence ending up in else branch.

Now I am checking the meaning of isPartial and isRunning. Also, I ran the test again without disabling loader in case we are in else to see if it will try to fetch the alerts again.
Also, I want to take a look at other alerts table tests to see if they are enabled and working properly.

maryam-saeidi · 2022-09-19T12:45:27Z

When I removed disabling loading from else, the test failed less (only 1 in 10) and in the failed test see this log:

{
  "rawResponse": {
    "took": 5,
    "timed_out": false,
    "_shards": {
      "total": 3,
      "successful": 3,
      "skipped": 2,
      "failed": 0
    },
    "hits": {
      "total": 0,
      "max_score": null,
      "hits": []
    }
  },
  "isPartial": false,
  "isRunning": false,
  "total": 3,
  "loaded": 3,
  "isRestored": false
}

So it seems like a complete response, the question now is why in the screenshot, we still have a loading without alerts table:

It's good to know that the API debug log happens at [00:02:29] and the test fails at [00:03:00]:

maryam-saeidi · 2022-09-20T11:40:09Z

When I was checking other alerts table functional tests, I noticed that they are fine and this issue only happens for the overview page, which the difference is having the table in an accordion 😕

After discussing it with @XavierM, he had 2 proposals to check:

What if instead of waiting for the loading state to stop, try to check the existence of the component that we want to check later?
What if we don't need to open the accordion during the functional test?

For 1, I don't think that would be an issue, because, in the snapshot, we still see the loading state, so other parts of the table will not be visible. Also, I remember, I had an issue with the test failing so I added the step to check if loading is still there, so this step was added because of the failure.

For 2, I changed the code to always have the accordion open and see what will happen in that case, and now tests are passing 🤯 (https://buildkite.com/elastic/kibana-flaky-test-suite-runner/builds/1244)
So I assume this difference is causing an issue @XavierM @spalger any idea how to fix this flakiness?

spalger · 2022-09-21T13:40:56Z

I can't imagine option 1 helping since the test is failing after 30 seconds and then accordion is still loading. Last time we talked we assumed that there was some flaw in the UI that lead to the accordion never exiting the loading state in some cases, have you proved that isn't the case @maryam-saeidi? Maybe we can just try extending the timeout to 2 minutes? If the test still fails I really don't think it's the test and the two solutions you've proposed are implying that the problem is with the tests, not the UI.

maryam-saeidi · 2022-09-21T15:44:24Z

@spalger yes, I've added a log based on our discussion and based on the screenshot here, it seems we fetch the data correctly in the failed test but the loading is still there even after 30 secs.

The strange part is, the tests are passing when we load the table at the page load time but if we put the table in accordion and the accordion is closed at the page load time, then we have the flaky test after opening the accordion. 😖

spalger · 2022-09-21T20:02:21Z

the tests are passing when we load the table at the page load time but if we put the table in accordion and the accordion is closed at the page load time

Still sounds like a UI issue to me, one that users are likely to encounter in conditions similar to CI

maryam-saeidi · 2022-09-23T12:38:04Z

When I logged the isLoading variable, it seems the request is successful and the alert response is back but for some reason, the component sets the isLoading to true again: (here)

spalger · 2022-09-23T13:35:37Z

Nice find @maryam-saeidi! Have you inspected all the places where the isLoading state is set in the component?

maryam-saeidi · 2022-09-23T13:41:41Z

There is only one place, I've added more logs to see whether I can identify what change causes extra loading for the component.
The most confusing part is the fact that it only happens if the alert component is in an accordion and if that accordion is initially closed on the page. It does not make sense to me at all, have you heard of an issue caused by a component being hidden?! Can rendering be different when the component is hidden?
I also tried slow network locally for many times but it didn't happen there. (It happens 1 out of 10 times in functional tests)

spalger · 2022-09-23T13:46:06Z

Sounds to me like maybe a useEffect in something that's rendered inside that accordion might have a race condition. This sounds like a classic race condition without knowing more about exactly what has access to what. If you want to send me some links to components I can look around and see if I can spot anything suspicious.

maryam-saeidi · 2022-09-23T13:49:25Z

Ah, that would be great, here you can find the 2 files that I've added logs for debugging:
https://github.com/elastic/kibana/pull/140638/files

Component: x-pack/plugins/triggers_actions_ui/public/application/sections/alerts_table/alerts_table_state.tsx
Hook for fetching data: x-pack/plugins/triggers_actions_ui/public/application/sections/alerts_table/hooks/use_fetch_alerts.tsx

maryam-saeidi · 2022-09-23T14:07:33Z

It seems this difference is causing the re-rendering: (link)

@spalger if it is a race condition, shouldn't it also happen when the component is outside of the accordion?

spalger · 2022-09-23T16:26:12Z

@maryam-saeidi Ok, I think I found it: https://github.com/elastic/kibana/blob/main/x-pack/plugins/triggers_actions_ui/public/application/sections/alerts_table/hooks/use_fetch_alerts.tsx#L175-L176

If asyncSearch() or refetch.current() is called when data && data.search doesn't resolve to true then loading will be set to true even though a request is not being sent.

I think if we move the dispatch({ type: 'loading', loading: true }); line into the if() body then it should only set the loading state if it's actually going to make a request.

maryam-saeidi · 2022-09-26T09:00:50Z

@spalger Wow, interesting! Let me give it a try.

maryam-saeidi · 2022-09-27T09:55:55Z

@spalger unfortunately, it still has the same issue. (flaky test runner)

The most confusing part is the fact that the issue only happens when the alerts table is in an accordion which is closed at the start. So I am wondering what is different in the initially hidden state that causes this issue.

spalger · 2022-09-27T14:27:09Z

I am wondering what is different in the initially hidden state that causes this issue.

I'm pretty sure that when the accordion is hidden the content isn't rendered, so the primary change is that the content is being rendered some time after the page was rendered, but something it's happening a little later and sometimes it's happening a little sooner based on the speed of selenium opening the accordion.

Either way, I think focusing on how the loading state is shown, and tracking back to how that is occurring, is the most fruitful path to follow.

I realized I had stopped looking for how the loading state is represented when I saw the isLoading variable, but that's not the only think that causes the loading indicator to be rendered. This is where the loading indicator is rendered:

kibana/x-pack/plugins/triggers_actions_ui/public/application/sections/alerts_table/alerts_table_state.tsx

Lines 300 to 302 in b854cd8

    
           {(isLoading || isBrowserFieldDataLoading) && ( 
        
             <EuiProgress size="xs" color="accent" data-test-subj="internalAlertsPageLoading" /> 
        
           )}

The other variable used there, isBrowserFieldDataLoading, is coming from this hook:

kibana/x-pack/plugins/triggers_actions_ui/public/application/sections/alerts_table/hooks/use_fetch_browser_fields_capabilities.tsx

Line 27 in d5dd241

export const useFetchBrowserFieldCapabilities = ({

In that hook it looks like we have another case where an error will cause the component to always be in a "loading" state.

Do we have any insight into how that hook is working?

maryam-saeidi · 2022-09-27T14:43:23Z

isBrowserFieldDataLoading was added recently and the issue was happening even before introducing isBrowserFieldDataLoading.

spalger · 2022-09-27T15:43:16Z

Okay, but that doesn't change the fact that the component is rendering a loading component indefinitely so the answer is in the logic somewhere...

maryam-saeidi · 2022-09-28T08:34:43Z

Yes, I see your point.
I also want to check whether the request was actually made and if it was aborted.
Based on the logs, aborted was false when the issue happened.

I will check it later this week and keep you posted about the result.

maryam-saeidi · 2022-10-05T15:41:46Z

It seems this fix also solved the flaky test issue.

Flaky test runner:

I am still curious why the second request is not getting any response but I think the fix is avoiding this second request since the time hasn't been changed, so the second request should not trigger the re-fetching as it was stated here.

cc @spalger, @XavierM, @jcger

…#145828) Fixes #140507 This issue was fixed in the refactoring that I've done [here](https://github.com/elastic/kibana/pull/143840/files#diff-ab4cbe22bf9a1c9bbaf3a3cb32b5aa5b0c8b8e33c34b88ed3c99dbde6fedadb9L236). Now we only send request when a filter changes not on every render. (More info in the ticket) [Flaky test runner link](https://buildkite.com/elastic/kibana-flaky-test-suite-runner/builds?branch=refs%2Fpull%2F145828%2Fhead)

kibanamachine added the failed-test A test failure on a tracked branch, potentially flaky-test label Sep 12, 2022

botelastic bot added the needs-team Issues missing a team label label Sep 12, 2022

spalger added the Team: Actionable Observability - DEPRECATED For Observability Alerting and SLOs use "Team:obs-ux-management", for AIops "Team:obs-knowledge" label Sep 12, 2022

botelastic bot removed the needs-team Issues missing a team label label Sep 12, 2022

kibanamachine added blocker skipped-test v8.5.0 labels Sep 12, 2022

kibanamachine added a commit that referenced this issue Sep 12, 2022

skip failing test suite (#140507)

30af54d

maryam-saeidi self-assigned this Sep 12, 2022

yngrdyn pushed a commit to yngrdyn/kibana that referenced this issue Sep 13, 2022

skip failing test suite (elastic#140507)

fc8d83e

maryam-saeidi mentioned this issue Sep 13, 2022

[Actionable Observability] Fix overview functional test #140638

Closed

maryam-saeidi mentioned this issue Nov 21, 2022

[Actionable observability] Fix observability overview functional test #145828

Merged

maryam-saeidi closed this as completed in #145828 Nov 22, 2022

kibanamachine removed blocker skipped-test v8.5.0 labels Nov 22, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Failing test: Chrome X-Pack Observability UI Functional Tests.x-pack/test/observability_functional/apps/observability/pages/overview/alert_table·ts - ObservabilityApp Observability overview Without alerts navigate and open alerts section #140507

Failing test: Chrome X-Pack Observability UI Functional Tests.x-pack/test/observability_functional/apps/observability/pages/overview/alert_table·ts - ObservabilityApp Observability overview Without alerts navigate and open alerts section #140507

kibanamachine commented Sep 12, 2022 •

edited

Loading

kibanamachine commented Sep 12, 2022

elasticmachine commented Sep 12, 2022

spalger commented Sep 12, 2022 •

edited

Loading

spalger commented Sep 12, 2022

kibanamachine commented Sep 12, 2022

kibanamachine commented Sep 12, 2022

maryam-saeidi commented Sep 12, 2022

maryam-saeidi commented Sep 19, 2022 •

edited

Loading

maryam-saeidi commented Sep 19, 2022 •

edited

Loading

maryam-saeidi commented Sep 20, 2022 •

edited

Loading

spalger commented Sep 21, 2022 •

edited

Loading

maryam-saeidi commented Sep 21, 2022

spalger commented Sep 21, 2022

maryam-saeidi commented Sep 23, 2022

spalger commented Sep 23, 2022

maryam-saeidi commented Sep 23, 2022

spalger commented Sep 23, 2022

maryam-saeidi commented Sep 23, 2022 •

edited

Loading

maryam-saeidi commented Sep 23, 2022 •

edited

Loading

spalger commented Sep 23, 2022

maryam-saeidi commented Sep 26, 2022

maryam-saeidi commented Sep 27, 2022

spalger commented Sep 27, 2022

maryam-saeidi commented Sep 27, 2022

spalger commented Sep 27, 2022

maryam-saeidi commented Sep 28, 2022

maryam-saeidi commented Oct 5, 2022

Failing test: Chrome X-Pack Observability UI Functional Tests.x-pack/test/observability_functional/apps/observability/pages/overview/alert_table·ts - ObservabilityApp Observability overview Without alerts navigate and open alerts section #140507

Failing test: Chrome X-Pack Observability UI Functional Tests.x-pack/test/observability_functional/apps/observability/pages/overview/alert_table·ts - ObservabilityApp Observability overview Without alerts navigate and open alerts section #140507

Comments

kibanamachine commented Sep 12, 2022 • edited Loading

kibanamachine commented Sep 12, 2022

elasticmachine commented Sep 12, 2022

spalger commented Sep 12, 2022 • edited Loading

spalger commented Sep 12, 2022

kibanamachine commented Sep 12, 2022

kibanamachine commented Sep 12, 2022

maryam-saeidi commented Sep 12, 2022

maryam-saeidi commented Sep 19, 2022 • edited Loading

maryam-saeidi commented Sep 19, 2022 • edited Loading

maryam-saeidi commented Sep 20, 2022 • edited Loading

spalger commented Sep 21, 2022 • edited Loading

maryam-saeidi commented Sep 21, 2022

spalger commented Sep 21, 2022

maryam-saeidi commented Sep 23, 2022

spalger commented Sep 23, 2022

maryam-saeidi commented Sep 23, 2022

spalger commented Sep 23, 2022

maryam-saeidi commented Sep 23, 2022 • edited Loading

maryam-saeidi commented Sep 23, 2022 • edited Loading

spalger commented Sep 23, 2022

maryam-saeidi commented Sep 26, 2022

maryam-saeidi commented Sep 27, 2022

spalger commented Sep 27, 2022

maryam-saeidi commented Sep 27, 2022

spalger commented Sep 27, 2022

maryam-saeidi commented Sep 28, 2022

maryam-saeidi commented Oct 5, 2022

kibanamachine commented Sep 12, 2022 •

edited

Loading

spalger commented Sep 12, 2022 •

edited

Loading

maryam-saeidi commented Sep 19, 2022 •

edited

Loading

maryam-saeidi commented Sep 19, 2022 •

edited

Loading

maryam-saeidi commented Sep 20, 2022 •

edited

Loading

spalger commented Sep 21, 2022 •

edited

Loading

maryam-saeidi commented Sep 23, 2022 •

edited

Loading

maryam-saeidi commented Sep 23, 2022 •

edited

Loading