Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Flaky test failure result in error of GetLinkable #4093

Closed
soulomoon opened this issue Feb 24, 2024 · 4 comments
Closed

Flaky test failure result in error of GetLinkable #4093

soulomoon opened this issue Feb 24, 2024 · 4 comments
Assignees
Labels
component: hls-eval-plugin flaky test type: bug Something isn't right: doesn't work as intended, documentation is missing/outdated, etc..

Comments

@soulomoon
Copy link
Collaborator

soulomoon commented Feb 24, 2024

update

=>IsEvaluating is running and already get the files for evaluations,
=>enqeueue file for evaluation
=>setSomethingModified dirty the IsEvaluating key in shakeExtre
=>IsEvaluating is finished, diry key removed from shakeExtra
=>session restarting and flushing the dirty keys of shakeExtra to hls-graph database
=>IsEvaluating is not updating since hls-graph database do not know it is dirty.

=======================
This is the issue discovered by @jhrcek
we can run the test

cabal test hls-eval-plugins-test --test-options="-p /:i behaves exactly the same as  :info/"

Today I spent some time looking into flaky test failures in CI.
Downloaded the logs from all failed testing jobs over the past 14 days.
The good news is that there are like 10 tests that keep failing relatively often, so fixing some of those could significantly reduce the overall flakiness :)

One of the most frequently failing tests is this group of tests whose failures have the same root cause.

They essentially run bunch of eval plugin's code lenses and then wait for workspace/applyEdit message from the server, which very often comes, but from time to time (can reproduce this in like once every 10 runs) it doesn't come. In such cases there is this error in server logs:

2024-02-13T12:32:05.980696Z | Error | eval: Internal Error: BadDependency "GetLinkable"

Few lines before that there's this: diagnostic message: "message": "/home/jhrcek/.cache/ghcide/test-0.1.0.0-inplace-05a647850b95fcd2b14858d164f368e84f7f8463/TI_Info.hi.core: getFileStatus: does not exist (No such file or directory)",

There's only one usage of this this GetLinkable thing in eval plugin here

I'm pretty sure the "file doesn't exist" exception is coming from this line

This merge from @wz1000 alleviate the situation #4076
but

In the end I got the (improved) error after 180 iterations of running that test. But it seems to occur less frequently now.
"message": "called GetLinkable for a file without a linkable: NormalizedFilePath "/tmp/extra-dir-10246150986263/TI_Info.hs"\nCallStack (from HasCallStack):\n error, called at src/Development/IDE/Core/Rules.hs:1120:18 in ghcide-2.6.0.0-inplace:Development.IDE.Core.Rules"

@wz1000 point out it may orgins from #3423, we trying to fix #3423 in pr #4087
but @jhrcek reported the error of GetLinkable still exist in the test.

Ok, I executed the flaky eval tests 500x from master and then 500x from Patricks PR. It doesn't seem to improve the situation significantly.

On master 2 of 500 test runs failed with GetLinkable error.
On the PR branch 6 of 500 test runs failed with that error.

image

@soulomoon soulomoon added type: bug Something isn't right: doesn't work as intended, documentation is missing/outdated, etc.. status: needs triage labels Feb 24, 2024
@soulomoon soulomoon changed the title Falkey test faileure result in error of GetLinkable Flaky test faileure result in error of GetLinkable Feb 24, 2024
@soulomoon
Copy link
Collaborator Author

soulomoon commented Feb 24, 2024

Although I have seen the error in the github action.
But I failed to reproduce the bug in windows wsl and macos m1 with my script to run the test.

echo "" > test.log
echo "" > test.err.log
for i in {1..500}; do
    /Users/ares/src/soulomoonHLS/dist-newstyle/build/aarch64-osx/ghc-9.4.8/haskell-language-server-2.6.0.0/t/hls-eval-plugin-tests/build/hls-eval-plugin-tests/hls-eval-plugin-tests -p "/i behaves exactly the same as/" >> test.log 2>> test.err.log
    echo "Iteration: $i" >> test.log
    echo "Iteration: $i" >> test.err.log
done

Is there something wrong with my runs🤔.
If not,
@jhrcek do you mind sharing your os version, I want to be able to reproduce it locally so I can add some traces to see what actually happened.

@jhrcek
Copy link
Collaborator

jhrcek commented Feb 25, 2024

Sure, I'm using Linux, Fedora 36, x86_64 arch.

@jhrcek jhrcek changed the title Flaky test faileure result in error of GetLinkable Flaky test failure result in error of GetLinkable Mar 12, 2024
@jhrcek
Copy link
Collaborator

jhrcek commented Apr 17, 2024

@wz1000 as I said it's still reproducible with master, but it requires couple dozen test iteration to trigger the failure.
I used these steps (tried both with ghc 9.4.8 and 9.8.2):

# recommended to build test binary separately and then run it in a loop (to avoid running cabal test in a loop)
cabal build hls-eval-plugin-tests
find dist-newstyle -name hls-eval-plugin-tests -type f
# ./dist-newstyle/build/x86_64-linux/ghc-9.4.8/haskell-language-server-2.7.0.0/t/hls-eval-plugin-tests/build/hls-eval-plugin-tests/hls-eval-plugin-tests
# Run tests in a loop
for i in {1..100}; do LSP_TEST_LOG_MESSAGES=1 LSP_TEST_LOG_STDERR=1 ./dist-newstyle/build/x86_64-linux/ghc-9.4.8/haskell-language-server-2.7.0.0/t/hls-eval-plugin-tests/build/hls-eval-plugin-tests/hls-eval-plugin-tests -p'/:i behaves exactly the same as :info/' || { echo "Command failed at iteration $i"; break; }; done

Eventually (for me it failed after 29/71 iterations with 9.4.8/9.8.2) it fails, and the logs contain:

ThreadId 403 | 2024-04-17T13:02:14.285792Z | Error | eval: Internal Error: BadDependency "GetLinkable"
...
 "message": "called GetLinkable for a file without a linkable: NormalizedFilePath \"/tmp/hls-test-root/extra-dir-56738026422965/TI_Info.hs\"\nCallStack (from HasCallStack):\n  error, called at src/Development/IDE/Core/Rules.hs:1068:18 in ghcide-2.7.0.0-inplace:Development.IDE.Core.Rules"

soulomoon added a commit that referenced this issue May 10, 2024
…nd rule values [flaky test #4185 #4093] (#4190)

The main problem is the out of sync state in the build system. Several states involved, the shake database running result state for key. shake extra's dirty state for key and shake extra's rule values state.
To stablize the build system. This PR force some of the updates of these state into a single STM block.

1. collect dirtykeys and ship it to session restart directly. No more invalid removal before session restart. Fixing Flaky test failure result in error of GetLinkable #4093
2. move the dirtykey removal and rules values updating to hls-graph by adding a call back fucntion to RunResult. Properly handle the dirtykeys and rule value state after session start and closely followed by another session restart Fixing ghcide-tests' addDependentFile test #4194
3. properly handle clean up by wrapping the asyncWithCleanUp to refreshDeps

---------

Co-authored-by: wz1000 <zubin.duggal@gmail.com>
Co-authored-by: Jan Hrcek <2716069+jhrcek@users.noreply.github.com>
Co-authored-by: Michael Peyton Jones <me@michaelpj.com>
@soulomoon
Copy link
Collaborator Author

fixed by #4190

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component: hls-eval-plugin flaky test type: bug Something isn't right: doesn't work as intended, documentation is missing/outdated, etc..
Projects
None yet
Development

No branches or pull requests

3 participants