Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failing test: Chrome X-Pack UI Functional Tests.x-pack/test/functional/apps/lens/lens_reporting·ts - lens app lens reporting should not cause PDF reports to fail #59229

Closed
kibanamachine opened this issue Mar 3, 2020 · 38 comments
Assignees
Labels
failed-test A test failure on a tracked branch, potentially flaky-test Feature:Lens Team:Visualizations Visualization editors, elastic-charts and infrastructure

Comments

@kibanamachine
Copy link
Contributor

kibanamachine commented Mar 3, 2020

A test failed on a tracked branch

Error: retry.try timeout: TimeoutError: Waiting for element to be located By(css selector, [data-test-subj="downloadCompletedReportButton"])
Wait timed out after 61133ms
    at /dev/shm/workspace/kibana/node_modules/selenium-webdriver/lib/webdriver.js:841:17
    at process._tickCallback (internal/process/next_tick.js:68:7)
    at onFailure (/dev/shm/workspace/kibana/test/common/services/retry/retry_for_success.ts:28:9)
    at retryForSuccess (/dev/shm/workspace/kibana/test/common/services/retry/retry_for_success.ts:68:13)

First failure: Jenkins Build

@kibanamachine kibanamachine added the failed-test A test failure on a tracked branch, potentially flaky-test label Mar 3, 2020
@elasticmachine
Copy link
Contributor

Pinging @elastic/kibana-test-triage (failed-test)

@kibanamachine
Copy link
Contributor Author

New failure: Jenkins Build

@kibanamachine
Copy link
Contributor Author

New failure: Jenkins Build

1 similar comment
@kibanamachine
Copy link
Contributor Author

New failure: Jenkins Build

@timroes timroes added Feature:Lens Team:Visualizations Visualization editors, elastic-charts and infrastructure labels Mar 3, 2020
@elasticmachine
Copy link
Contributor

Pinging @elastic/kibana-app (Team:KibanaApp)

@LeeDr
Copy link
Contributor

LeeDr commented Mar 3, 2020

Some log details;

[00:24:02]                 │ proc [kibana]   log   [22:32:26.674] [error][execute][k7cgwe3m1a5vc2c687apc7mw][printable_pdf][reporting] Error: TypeError: Cannot read property 'unsubscribe' of undefined
[00:24:02]                 │ proc [kibana]     at SafeSubscriber._next (http://localhost:6141/bundles/kibana.bundle.js:2:2082626)
[00:24:02]                 │ proc [kibana]     at SafeSubscriber.__tryOrUnsub (http://localhost:6141/bundles/plugin/licensing/licensing.plugin.js:15:11563)
[00:24:02]                 │ proc [kibana]     at SafeSubscriber.next (http://localhost:6141/bundles/plugin/licensing/licensing.plugin.js:15:10003)
[00:24:02]                 │ proc [kibana]     at Subscriber._next (http://localhost:6141/bundles/plugin/licensing/licensing.plugin.js:15:8392)
[00:24:02]                 │ proc [kibana]     at Subscriber.next (http://localhost:6141/bundles/plugin/licensing/licensing.plugin.js:15:7989)
[00:24:02]                 │ proc [kibana]     at MapSubscriber._next (http://localhost:6141/bundles/plugin/licensing/licensing.plugin.js:15:27303)
[00:24:02]                 │ proc [kibana]     at MapSubscriber.Subscriber.next (http://localhost:6141/bundles/plugin/licensing/licensing.plugin.js:15:7989)
[00:24:02]                 │ proc [kibana]     at FilterSubscriber._next (http://localhost:6141/bundles/plugin/licensing/licensing.plugin.js:15:32187)
[00:24:02]                 │ proc [kibana]     at FilterSubscriber.Subscriber.next (http://localhost:6141/bundles/plugin/licensing/licensing.plugin.js:15:7989)
[00:24:02]                 │ proc [kibana]     at PairwiseSubscriber._next (http://localhost:6141/bundles/plugin/licensing/licensing.plugin.js:15:245063)
[00:24:02]                 │ proc [kibana]   log   [22:32:26.680] [error][error][esqueue][queue-worker][reporting] k7cg1mj51a5vc2c6873y0mnr - Failure occurred on job k7cgwe3m1a5vc2c687apc7mw: Error: TypeError: Cannot read property 'unsubscribe' of undefined
[00:24:02]                 │ proc [kibana]     at SafeSubscriber._next (http://localhost:6141/bundles/kibana.bundle.js:2:2082626)
[00:24:02]                 │ proc [kibana]     at SafeSubscriber.__tryOrUnsub (http://localhost:6141/bundles/plugin/licensing/licensing.plugin.js:15:11563)
[00:24:02]                 │ proc [kibana]     at SafeSubscriber.next (http://localhost:6141/bundles/plugin/licensing/licensing.plugin.js:15:10003)
[00:24:02]                 │ proc [kibana]     at Subscriber._next (http://localhost:6141/bundles/plugin/licensing/licensing.plugin.js:15:8392)
[00:24:02]                 │ proc [kibana]     at Subscriber.next (http://localhost:6141/bundles/plugin/licensing/licensing.plugin.js:15:7989)
[00:24:02]                 │ proc [kibana]     at MapSubscriber._next (http://localhost:6141/bundles/plugin/licensing/licensing.plugin.js:15:27303)
[00:24:02]                 │ proc [kibana]     at MapSubscriber.Subscriber.next (http://localhost:6141/bundles/plugin/licensing/licensing.plugin.js:15:7989)
[00:24:02]                 │ proc [kibana]     at FilterSubscriber._next (http://localhost:6141/bundles/plugin/licensing/licensing.plugin.js:15:32187)
[00:24:02]                 │ proc [kibana]     at FilterSubscriber.Subscriber.next (http://localhost:6141/bundles/plugin/licensing/licensing.plugin.js:15:7989)
[00:24:02]                 │ proc [kibana]     at PairwiseSubscriber._next (http://localhost:6141/bundles/plugin/licensing/licensing.plugin.js:15:245063)
[00:24:02]                 │ proc [kibana]   log   [22:32:26.681] [warning][esqueue][queue-worker][reporting] k7cg1mj51a5vc2c6873y0mnr - Failing job k7cgwe3m1a5vc2c687apc7mw
[00:24:02]                 │ proc [kibana]   log   [22:32:26.714] [error][execute][k7cgwe3m1a5vc2c687apc7mw][printable_pdf][reporting] waitForSelector .application failed on http://localhost:6141/app/kibana?_t=1583274723695#/dashboard/c27849f0-e523-11e9-9af5-2b261e1eb063?_g=(filters:!(),refreshInterval:(pause:!t,value:0),time:(from:now-15m,to:now))&_a=(description:'',filters:!(),fullScreenMode:!f,options:(hidePanelTitles:!f,useMargins:!t),panels:!((embeddableConfig:(),gridData:(h:15,i:e5966143-f050-40eb-b4e7-f94469c1426c,w:24,x:0,y:0),id:'74b0f140-e523-11e9-9af5-2b261e1eb063',panelIndex:e5966143-f050-40eb-b4e7-f94469c1426c,type:lens,version:'8.0.0-SNAPSHOT'),(embeddableConfig:(),gridData:(h:15,i:f9ff2513-2de9-4944-a8d1-a9fead4b020d,w:24,x:24,y:0),id:'84320f00-e523-11e9-9af5-2b261e1eb063',panelIndex:f9ff2513-2de9-4944-a8d1-a9fead4b020d,type:lens,version:'8.0.0-SNAPSHOT'),(embeddableConfig:(),gridData:(h:15,i:b62d7d36-83ca-415b-8971-34d891b09a1a,w:24,x:0,y:15),id:'9325b9d0-e523-11e9-9af5-2b261e1eb063',panelIndex:b62d7d36-83ca-415b-8971-34d891b09a1a,type:lens,version:'8.0.0-SNAPSHOT')),query:(language:kuery,query:''),timeRestore:!f,title:'Lens%20reportz',viewMode:view)&forceNow=2020-03-03T22:32:17.505Z
[00:24:02]                 │ proc [kibana]   log   [22:32:26.719] [error][browser-driver][execute][k7cgwe3m1a5vc2c687apc7mw][printable_pdf][reporting] error deleting user data directory at [/tmp/chromium-IdmKxT]: [Error: Cannot delete files/directories outside the current working directory. Can be overridden with the `force` option.]
[00:24:02]                 │ proc [kibana]   log   [22:32:26.720] [info][esqueue][queue-worker][reporting] k7cg1mj51a5vc2c6873y0mnr - Job marked as failed: /.reporting-2020.03.01/k7cgwe3m1a5vc2c687apc7mw
[00:24:54]                 │ debg --- retry.try error: Waiting for element to be located By(css selector, [data-test-subj="downloadCompletedReportButton"])
[00:24:54]                 │      Wait timed out after 61240ms
[00:24:54]                 │ debg TestSubjects.getAttribute(downloadCompletedReportButton, href)
[00:24:54]                 │ debg TestSubjects.find(downloadCompletedReportButton)
[00:24:54]                 │ debg Find.findByCssSelector('[data-test-subj="downloadCompletedReportButton"]') with timeout=60000
[00:25:55]                 │ debg --- retry.try error: Waiting for element to be located By(css selector, [data-test-subj="downloadCompletedReportButton"])
[00:25:55]                 │      Wait timed out after 61186ms
[00:25:56]                 │ info Taking screenshot "/dev/shm/workspace/kibana/x-pack/test/functional/screenshots/failure/lens app  lens reporting should not cause PDF reports to fail.png"
[00:25:56]                 │ info Current URL is: http://localhost:6141/app/kibana#/dashboard/c27849f0-e523-11e9-9af5-2b261e1eb063?_g=(filters:!(),refreshInterval:(pause:!t,value:0),time:(from:now-15m,to:now))&_a=(description:%27%27,filters:!(),fullScreenMode:!f,options:(hidePanelTitles:!f,useMargins:!t),panels:!((embeddableConfig:(),gridData:(h:15,i:e5966143-f050-40eb-b4e7-f94469c1426c,w:24,x:0,y:0),id:%2774b0f140-e523-11e9-9af5-2b261e1eb063%27,panelIndex:e5966143-f050-40eb-b4e7-f94469c1426c,type:lens,version:%278.0.0-SNAPSHOT%27),(embeddableConfig:(),gridData:(h:15,i:f9ff2513-2de9-4944-a8d1-a9fead4b020d,w:24,x:24,y:0),id:%2784320f00-e523-11e9-9af5-2b261e1eb063%27,panelIndex:f9ff2513-2de9-4944-a8d1-a9fead4b020d,type:lens,version:%278.0.0-SNAPSHOT%27),(embeddableConfig:(),gridData:(h:15,i:b62d7d36-83ca-415b-8971-34d891b09a1a,w:24,x:0,y:15),id:%279325b9d0-e523-11e9-9af5-2b261e1eb063%27,panelIndex:b62d7d36-83ca-415b-8971-34d891b09a1a,type:lens,version:%278.0.0-SNAPSHOT%27)),query:(language:kuery,query:%27%27),timeRestore:!f,title:%27Lens%20reportz%27,viewMode:view)
[00:25:56]                 │ info Saving page source to: /dev/shm/workspace/kibana/x-pack/test/functional/failure_debug/html/lens app  lens reporting should not cause PDF reports to fail.html
[00:25:56]                 └- ✖ fail: "lens app  lens reporting should not cause PDF reports to fail"

@LeeDr
Copy link
Contributor

LeeDr commented Mar 3, 2020

and screenshot;
image

@spalger
Copy link
Contributor

spalger commented Mar 3, 2020

This has been low-key failing for a while and is hard-core failing now

last 30 days:
image

Skipped

master: d9a05af
7.x/7.7: 02790fd

spalger added a commit that referenced this issue Mar 3, 2020
spalger added a commit that referenced this issue Mar 3, 2020
(cherry picked from commit d9a05af)
@tsullivan
Copy link
Member

Hi, the test is failing because the dashboard is rendering with a TypeError logged, and when Reporting sees that, it thinks the page has failed.

See:

const pageError$ = Rx.fromEvent<Error>(page, 'error').pipe(mergeMap(err => Rx.throwError(err)));

If I ignore the exit$ observable during the screen capturing in Reporting, the test will pass.

  1. We should soften the exit case of Reporting to capture the screen even if there are errors on the page
  2. We should find out why TypeError: Cannot read property 'unsubscribe' of undefined is happening in the Dashboard

The unsubscribe of undefined is here:

@wylieconlon
Copy link
Contributor

cc @jgowdyelastic who has worked on the ML licensing code #59275

@wylieconlon
Copy link
Contributor

It looks like this test is a real failure, not a flaky test. Master is currently broken.

@wylieconlon
Copy link
Contributor

Fixed by #59365

@kibanamachine
Copy link
Contributor Author

New failure: Jenkins Build

@kibanamachine
Copy link
Contributor Author

New failure: Jenkins Build

@kibanamachine
Copy link
Contributor Author

New failure: Jenkins Build

@spalger spalger added Team:Reporting Services and removed Feature:Lens failed-test A test failure on a tracked branch, potentially flaky-test labels Mar 24, 2020
@elasticmachine
Copy link
Contributor

Pinging @elastic/kibana-reporting-services (Team:Reporting Services)

@kibanamachine kibanamachine reopened this Apr 19, 2021
@kibanamachine
Copy link
Contributor Author

New failure: Jenkins Build

@flash1293 flash1293 added Feature:Lens Team:Visualizations Visualization editors, elastic-charts and infrastructure labels Apr 20, 2021
@flash1293 flash1293 self-assigned this Apr 20, 2021
@flash1293
Copy link
Contributor

flash1293 commented Apr 20, 2021

Failed because of "network changed" error:

[00:57:10] │ERROR browser[SEVERE] https://38b80fbd79fb4c91bae06b4642d4d093.apm.us-east-1.aws.cloud.es.io/intake/v2/rum/events - Failed to load resource: net::ERR_NETWORK_CHANGED
[00:57:10] │ERROR browser[SEVERE] http://localhost:61131/api/saved_objects/_find?fields=title&per_page=10000&type=index-pattern - Failed to load resource: net::ERR_NETWORK_CHANGED
[00:57:10] │ERROR browser[SEVERE] http://localhost:61131/internal/security/me - Failed to load resource: net::ERR_NETWORK_CHANGED

Flaky since end of March across all branches >= 7.12:
Screenshot 2021-04-20 at 09 52 07

@spalger what's usually the cause of network changed problems? Seems like something bad is happening to the Kibana under test.

@flash1293
Copy link
Contributor

flash1293 commented Apr 21, 2021

The latest failure of this is another kind of error:

06:48:20 │ proc [kibana] log [04:48:20.533] [error][browser-driver][execute-job][headless-browser-console][knqz6mbs1zcf25040c442k34][plugins][printablePdf][printable_pdf][reporting][runTask] Refused to execute inline script because it violates the following Content Security Policy directive: "script-src 'unsafe-eval' 'self'". Either the 'unsafe-inline' keyword, a hash ('sha256-P5polb1UreUSOe5V/Pv7tc+yeZuJXiOi/3fqhGsU7BE='), or a nonce ('nonce-...') is required to enable inline execution.
06:48:20 │ proc [kibana]
06:48:23 │ proc [kibana] log [04:48:22.786] [error][execute-job][knqz6mbs1zcf25040c442k34][plugins][printablePdf][printable_pdf][reporting][runTask] Reporting encountered an error on the page: Error: Error: Alert type "xpack.ml.anomaly_detection_alert" is not registered.
06:48:23 │ proc [kibana] at loadAlertType (http://localhost:6111/42300/bundles/plugin/alerting/8.0.0/alerting.plugin.js:2:61358)
06:48:23 │ proc [kibana] at async Object.registerNavigation (http://localhost:6111/42300/bundles/plugin/alerting/8.0.0/alerting.plugin.js:2:61991)
06:48:23 │ proc [kibana] log [04:48:22.788] [error][plugins][reporting][runTask] [Error: Scheduling retry. Retries remaining: 1.: Reporting encountered an error on the page: Error: Error: Alert type "xpack.ml.anomaly_detection_alert" is not registered.
06:48:23 │ proc [kibana] at loadAlertType (http://localhost:6111/42300/bundles/plugin/alerting/8.0.0/alerting.plugin.js:2:61358)
06:48:23 │ proc [kibana] at async Object.registerNavigation (http://localhost:6111/42300/bundles/plugin/alerting/8.0.0/alerting.plugin.js:2:61991)]
06:48:23 │ proc [kibana] log [04:48:22.789] [info][plugins][reporting][runTask] Rescheduling knqz6mbs1zcf25040c442k34 to retry after error.
06:48:23 │ proc [kibana] log [04:48:22.792] [error][execute-job][knqz6mbs1zcf25040c442k34][plugins][printablePdf][printable_pdf][reporting][runTask] Error: Protocol error (Runtime.callFunctionOn): Target closed.
06:48:23 │ proc [kibana] at /dev/shm/workspace/kibana-build-xpack-1/node_modules/puppeteer/lib/cjs/puppeteer/common/Connection.js:208:63
06:48:23 │ proc [kibana] at new Promise ()

@tsullivan do you have an idea what could cause this? Not sure how we can stabilize reporting in this regard.

@spalger
Copy link
Contributor

spalger commented Apr 21, 2021

net::ERR_NETWORK_CHANGED is usually caused by Docker tweaking the network interfaces available on the machine, which Chrome observes and for some reason treats as a reason to abort all in-progress network requests. It's ultimately unavoidable until we move to isolated workers in BuildKite

@tsullivan
Copy link
Member

tsullivan commented Apr 21, 2021

06:48:23 │ proc [kibana] log [04:48:22.786] [error][execute-job][knqz6mbs1zcf25040c442k34][plugins][printablePdf][printable_pdf][reporting][runTask] Reporting encountered an error on the page: Error: Error: Alert type "xpack.ml.anomaly_detection_alert" is not registered.

@tsullivan do you have an idea what could cause this? Not sure how we can stabilize reporting in this regard.

The came up from a pageerror (uncaught exception) event handler: https://github.com/elastic/kibana/blob/master/x-pack/plugins/reporting/server/browsers/chromium/driver_factory/index.ts#L219:L228

The error type is documented here: https://pptr.dev/#?product=Puppeteer&version=v5.4.1&show=api-event-pageerror

I don't think Reporting should stop handling this type of error, or change it to simply log the error instead of fail the job when the error happens. Ideally, these kind of things would have better testing and be fixed before they can be surfaced by Reporting.

@flash1293
Copy link
Contributor

flash1293 commented Apr 21, 2021

Thanks for the context @tsullivan !

@elastic/kibana-alerting-services do you know why this error is showing up in the Lens reporting tests? Error: Alert type "xpack.ml.anomaly_detection_alert" is not registered

Maybe this is an isolation problem and some of the Alerting test state is leaking into the Lens tests.

@mikecote
Copy link
Contributor

@flash1293 I'm taking a look now. I'm investigating an idea as to why this may be happening.

@mikecote
Copy link
Contributor

@flash1293 My theory didn't work out. The root place that would cause this error based on the logs you provided would be https://github.com/elastic/kibana/blob/master/x-pack/plugins/ml/public/plugin.ts#L185. But I wasn't able to make it throw the same error as shown. I tried different roles, licenses, SSL configurations, reports vs browser, etc.

Can you point me where in CI you found those logs? I couldn't find it myself.

@flash1293
Copy link
Contributor

@mikecote Sure, it happened on this PR for example: https://kibana-ci.elastic.co/job/elastic+kibana+pipeline-pull-request/121295/execution/node/711/log/

I'm going to the build stats cluster and search for "not cause PDF reports to fail" to find recent builds where this failed. It happens quite regularly at the moment

Screenshot 2021-04-22 at 14 14 54

@mikecote
Copy link
Contributor

mikecote commented Apr 22, 2021

@flash1293 thanks! I'm working on a PR as a first step that removes the throwing and replaces it with a console log instead: #98005. I still find it interesting how the alerting plugin gets into such a situation.

@mikecote
Copy link
Contributor

@flash1293 I merged a fix so it doesn't throw unhandled exceptions (#98005). This should unblock or fix the flakiness here.. It will backport to 7.14. Let me know if it needs to go into 7.13 as well 🙏

@flash1293
Copy link
Contributor

@mikecote Thanks, that's great! I checked and it only failed on 7.x and master recently - I think those are good for now. I'll go ahead and close this issue, let's see whether it stays that way. Thank you so much for looking into it quickly

@kibanamachine
Copy link
Contributor Author

New failure: Jenkins Build

@kibanamachine kibanamachine reopened this Apr 26, 2021
@flash1293
Copy link
Contributor

flash1293 commented Apr 27, 2021

@spalger The latest failure is caused by the dreaded network changed error again:

proc [kibana] log [21:32:59.939] [error][browser-driver][execute-job][headless-browser-console][knz4a0d41qaca7e49e3dtth0][plugins][printable_pdf][reporting] Failed to load resource: net::ERR_NETWORK_CHANGED

Is there any mitigation (except for migrating to buildkite)? I would love to somehow stabilize this test in the short term

@flash1293
Copy link
Contributor

Closing for now as it failed because of ERR_NETWORK_CHANGED - if it gets reopened, make sure to check the failure.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
failed-test A test failure on a tracked branch, potentially flaky-test Feature:Lens Team:Visualizations Visualization editors, elastic-charts and infrastructure
Projects
None yet
Development

No branches or pull requests

10 participants