Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pyright-lsp may enter a strange state reporting library not found when there are sub-project inside a folder #4994

Closed
zhou13 opened this issue Apr 23, 2023 · 23 comments
Labels
addressed in next version Issue is fixed and will appear in next published version bug Something isn't working

Comments

@zhou13
Copy link

zhou13 commented Apr 23, 2023

Describe the bug
Let's say we have a project structure like this:

$ tree
[ 288]  .
├── [   0]  example.py
├── [ 192]  projects
│   ├── [  49]  example.py
│   ├── [ 226]  requirements.txt
│   └── [ 160]  sac
│       ├── [   0]  __init__.py
│       └── [  21]  datasets.py
├── [  31]  pyproject.toml

I attached the example files in Archive.zip

With certain operation, pyright-lsp may enter a strange state reporting system library like os not found.

To Reproduce
The following sequence has 100% chance to trigger the bug on my Mac:

  1. enable pyright in nvim
  2. nvim projects/example.py example.py
  3. switch to next buffer with :bnext or :BufferPrev
  4. switch back with bprev or :BufferPrev
  5. perform a format operation using isort inside vim
  6. pyright-lsp will enter a strange state reporting os not found.

Expected behavior

pyright-lsp should not enter a strange state reporting os not found.

Screenshots or Code

render1682238804660

VS Code extension or command-line
I am using vim-lsp with pyright. I attached the lsp log for analysis, which contains all the communication in a json-like format:
lsp.log

I am not sure whether it is possible to write a reproducible script by communicating with pyright-lsp in command-line tools. I will be happy to do that if there is a pointer.

Additional context

  • If I delete projects/requirements.txt, then this bug is not reproducible.
  • step 5 is needed to trigger textDocument/didChange
@erictraut
Copy link
Collaborator

I don't know anything about vim, so I'm trying to diagnose this based on the provided logs. Based on the symptoms, I suspect that the delta changes to the document are not being properly applied to the in-memory contents, so pyright is analyzing a corrupted source file.

Here's why I think this is the case.

The starting text in the document "./projects/example.py" is:

import os
from sac.datasets import Data

os
Data

Here's the RPC message sent to the language server:

{
  jsonrpc = "2.0",
  method = "textDocument/didChange",
  params = {
    contentChanges = { {
        range = {
          ["end"] = {
            character = 4,
            line = 4
          },
          start = {
            character = 0,
            line = 1
          }
        },
        rangeLength = 38,
        text = "\nfrom sac.datasets import Data\n\nos\nData\n"
      }, {
        range = {
          ["end"] = {
            character = 0,
            line = 7
          },
          start = {
            character = 0,
            line = 6
          }
        },
        rangeLength = 1,
        text = ""
      } },
    textDocument = {
      uri = "file:///Users/yichaozhou/Sources/reproduce-bug/projects/example.py",
      version = 5
    }
  }
}

This message looks fine to me. The first change should replace the contents of line 1 through line 4 (where line numbers are zero-based) with the string "\nfrom sac.datasets import Data\n\nos\nData\n". The second change should simply strip the final LF from the file. After these deltas are applied, the resulting source file should be as follows (with the final LF stripped).

import os
from sac.datasets import Data

os
Data

From the screen shot, I can see that pyright is reporting the "os" is not defined diagnostic at a location that is one line above where the os symbol is located in the editor's window. That leads me to believe that pyright is actually analyzing a file that looks like:

from sac.datasets import Data

os
Data

Note that the first line has been stripped out when it should still be present.

If my theory is correct, the next question is why the internal state of the file contents have been corrupted when applying these deltas. When applying deltas, pyright relies on the vscode-languageserver-textdocument library, which is maintained by the VS Code team. I suspect that it's not doing the right thing in this case. This is a bit surprising given that this library is well tested and used by many language servers, but I don't have any other working theories at the moment.

Perhaps someone from the pylance team could look at this further and let me know if you have any other theories or debugging suggestions. @debonte, @rchiodo, @heejaechang

@erictraut erictraut added the needs investigation Requires additional investigation to determine course of action label Apr 23, 2023
@zhou13
Copy link
Author

zhou13 commented Apr 23, 2023

Hi Eric, Thank you for the analysis! I'd like to mention that project nesting (the existence of projects/requirements.txt and open ./example.py) is necessary to trigger this bug. Without either of them, this bug is not reproducible. Maybe that is one of the reasons this bug hasn't been reported.

@erictraut
Copy link
Collaborator

I wonder if the presence of two files with overlapping names is related? Could you try renaming the outer file something like example1.py and see if that makes the problem go away?

It looks from the screen shot that you're on a Mac? I ask because MacOS file systems are case insensitive. I'm not sure that matters here, but it's a useful clue.

@zhou13
Copy link
Author

zhou13 commented Apr 23, 2023

@erictraut No, the same filename is not related to the bug. I just renamed ./example.py to ./foobar.py and could still reproduce this bug with the procedure mentioned.

Yes, I am on a Mac.

I also tested other Python language servers: jedi and python-language-server. They are fine.

@zhou13
Copy link
Author

zhou13 commented Apr 24, 2023

I have a hypothesis that when I performed step 3-4, pyright or pylance changed the "project root" from ./projects to ./. After receiving textDocument/didChange, instead of applying it to the file in ./projects/example.py, it might apply it to a file like ./example.py due to the change of project root, which can result in this behavior I got.

@erictraut erictraut added the language server issue Related to language server features label Apr 25, 2023
@heejaechang
Copy link
Collaborator

heejaechang commented Apr 25, 2023

so, from the log, what I am seeing is that

There are 2 workspaces

 message = 'Starting service instance "/Users/yichaozhou/Sources/reproduce-bug/projects"',
message = 'Starting service instance "/Users/yichaozhou/Sources/reproduce-bug"',

one is added when server is created

workspaceFolders = { {
        name = "/Users/yichaozhou/Sources/reproduce-bug/projects",
        uri = "file:///Users/yichaozhou/Sources/reproduce-bug/projects"
      } }

the other is added when reproduce-bug/example.py is opened through didChangeWorkspaceFolders

 method = "workspace/didChangeWorkspaceFolders",
  params = {
    event = {
      added = { {
          name = "/Users/yichaozhou/Sources/reproduce-bug",
          uri = "file:///Users/yichaozhou/Sources/reproduce-bug"
        } },
      removed = {}
    }
  }

using this info, I have setup the repro env with the given zip file and ran isort from vscode market place on it. and it worked fine. and when I do, I get this change request

[Trace - 2:18:48 PM] Sending notification 'textDocument/didChange'.
Params: {
    "textDocument": {
        "uri": "file:///c%3A/pyTem/repro/Archive/projects/example.py",
        "version": 8
    },
    "contentChanges": [
        {
            "range": {
                "start": {
                    "line": 1,
                    "character": 0
                },
                "end": {
                    "line": 1,
                    "character": 0
                }
            },
            "rangeLength": 0,
            "text": "\n"
        }
    ]
}

that is different than what you have in the log.

contentChanges = { {
        range = {
          ["end"] = {
            character = 4,
            line = 4
          },
          start = {
            character = 0,
            line = 1
          }
        },
        rangeLength = 38,
        text = "\nfrom sac.datasets import Data\n\nos\nData\n"
      }, {
        range = {
          ["end"] = {
            character = 0,
            line = 7
          },
          start = {
            character = 0,
            line = 6
          }
        },
        rangeLength = 1,
        text = ""
      } },

the first weird thing is (it could be just how it log things), contentChanges used { instead of [ for array. the second one is for the same effect, rather than just asking us to insert \n, it asks us to replace from sac.datasets import Data\n\nos\nData\n with \nfrom sac.datasets import Data\n\nos\nData\n

The end impact looks like the whole content we have is replaced with \nfrom sac.datasets import Data\n\nos\nData\n rather than incrementally update the content with given range.

my gut feeling is didChange params we get is malformed. but not 100% sure.

@zhou13
Copy link
Author

zhou13 commented Apr 25, 2023

For used { instead of [ for array, I think this is due to the way lua's table works. I believe it is still a list since you can see the comma in the line }, {.

According to my hypothesis, even if everything is exactly the same, the following msg

        {
            "range": {
                "start": {
                    "line": 1,
                    "character": 0
                },
                "end": {
                    "line": 1,
                    "character": 0
                }
            },
            "rangeLength": 0,
            "text": "\n"
        }

cannot trigger the bug, since it is just adding an \n to an empty file. Could you try the following step like adding some text at the end of the file like the gif?

render1682459471729

You can see the line number of the error is wrong.

@zhou13
Copy link
Author

zhou13 commented Apr 25, 2023

Attach the new log.
lsp.log

Note that although this is long, one of response is around 5000 lines.

@heejaechang
Copy link
Collaborator

cannot trigger the bug, since it is just adding an \n to an empty file. Could you try the following step like adding some text at the end of the file like the gif?

this is not adding \n to empty file, it is adding blank line between import os and from sac.datasets import Data

@zhou13
Copy link
Author

zhou13 commented Apr 26, 2023

this is not adding \n to empty file, it is adding blank line between import os and from sac.datasets import Data

Yes, this might be the case if everything works correctly in the vscode for you. But I am trying to provide a reason to explain the bug I saw:

When I performed step 3-4, pyright or pylance changed the "project root" from ./projects to ./. After receiving textDocument/didChange, instead of applying it to the file in ./projects/example.py, it might apply it to a file like ./example.py due to the change of project root, which can result in this behavior I got.

I don't think this is a problem that neovim is generating a problematic didChange msg since if this is the case, at least not the error like line number is offset by one, I can only reproduce the problem with pyright, but not with jedi.

@heejaechang
Copy link
Collaborator

@zhou13 can you point me in the log where this happens

When I performed step 3-4, pyright or pylance changed the "project root" from ./projects to ./. After receiving textDocument/didChange, instead of applying it to the file in ./projects/example.py, it might apply it to a file like ./example.py due to the change of project root, which can result in this behavior I got.

we do not change cwd once we set it when our LSP starts for our LSP process.

@erictraut
Copy link
Collaborator

Also, once the files are open in the editor, we're working from in-memory contents. We don't go back to the file system unless/until the file has been closed in the editor. So the theory about applying the delta to the wrong file doesn't sound right to me.

@zhou13, are you able to rebuild pyright from source? If so, I could show you where you can add a bit of additional logging code that would help diagnose the problem.

@zhou13
Copy link
Author

zhou13 commented Apr 26, 2023

Yes, I can rebuild pyright from source.

@erictraut
Copy link
Collaborator

Please check out the latest sources for pyright. Then go to the source file packages/pyright-internal/src/analyzer/sourceFile.ts and search for the setClientVersion method. Within this method, look for the line that calls TextDocument.update. This is the one that applies the delta update received from the lsp client. I'd like you to add some logging around this call.

            this._console.log(`Updating file contents for ${this._filePath}`);
            this._console.log('Old file contents:');
            this._console.log(this._clientDocument.getText());
            this._clientDocument = TextDocument.update(this._clientDocument, contents, version);
            this._console.log('New file contents:');
            this._console.log(this._clientDocument.getText());

Rebuild pyright and repro the bug. Let's see what clues these logs give us.

@zhou13
Copy link
Author

zhou13 commented Apr 26, 2023

Actually, I found a way to reliably reproduce this bug without neovim by just replaying stdio. I can successfully reproduce it on both linux and mac.

reproduce-bug.zip

To do that, you first need a unix-like machine and unzip the file to /private/tmp. (That folder is on mac by default and you need to create it on Linux. not sure if it works in WSL.)

The file structure should looks like this:

$ tree /private/tmp/reproduce-bug
[ 352]  /private/tmp/reproduce-bug
├── [   0]  foobar.py
├── [ 218]  input.json
├── [   8]  log-arguments
├── [8.7K]  log-stdin
├── [5.4K]  log-stdout
├── [ 192]  projects
│   ├── [  49]  example.py
│   ├── [ 226]  requirements.txt
│   └── [ 192]  sac
│       ├── [   0]  __init__.py
│       └── [  21]  datasets.py
└── [  31]  pyproject.toml

Simply run

cd /private/tmp/reproduce-bug
awk '{print $0; system("sleep .3");}' log-stdin | pyright-langserver --stdio;

to replay the problem.

Note that sleep .3 is necessary. With things like sleep .1, the bug disappears. Larger sleep time also works.

Here is the example putout:


Content-Length: 119

{"jsonrpc":"2.0","method":"window/logMessage","params":{"type":3,"message":"Pyright language server 1.1.304 starting"}}Content-Length: 174

{"jsonrpc":"2.0","method":"window/logMessage","params":{"type":3,"message":"Server root directory: /usr/local/Cellar/pyright/1.1.304/libexec/lib/node_modules/pyright/dist/"}}Content-Length: 144

{"jsonrpc":"2.0","method":"window/logMessage","params":{"type":3,"message":"Starting service instance \"/private/tmp/reproduce-bug/projects\""}}Content-Length: 1046

{"jsonrpc":"2.0","id":1,"result":{"capabilities":{"textDocumentSync":2,"definitionProvider":{"workDoneProgress":true},"declarationProvider":{"workDoneProgress":true},"typeDefinitionProvider":{"workDoneProgress":true},"referencesProvider":{"workDoneProgress":true},"documentSymbolProvider":{"workDoneProgress":true},"workspaceSymbolProvider":{"workDoneProgress":true},"hoverProvider":{"workDoneProgress":true},"documentHighlightProvider":{"workDoneProgress":true},"renameProvider":{"prepareProvider":true,"workDoneProgress":true},"completionProvider":{"triggerCharacters":[".","[","\"","'"],"resolveProvider":true,"workDoneProgress":true,"completionItem":{"labelDetailsSupport":true}},"signatureHelpProvider":{"triggerCharacters":["(",",",")"],"workDoneProgress":true},"codeActionProvider":{"codeActionKinds":["quickfix","source.organizeImports"],"workDoneProgress":true},"executeCommandProvider":{"commands":[],"workDoneProgress":true},"callHierarchyProvider":true,"workspace":{"workspaceFolders":{"supported":true,"changeNotifications":true}}}}}Content-Length: 157

{"jsonrpc":"2.0","id":0,"method":"workspace/configuration","params":{"items":[{"scopeUri":"file:///private/tmp/reproduce-bug/projects","section":"python"}]}}Content-Length: 157

{"jsonrpc":"2.0","id":1,"method":"workspace/configuration","params":{"items":[{"scopeUri":"file:///private/tmp/reproduce-bug/projects","section":"python"}]}}Content-Length: 166

{"jsonrpc":"2.0","id":2,"method":"workspace/configuration","params":{"items":[{"scopeUri":"file:///private/tmp/reproduce-bug/projects","section":"python.analysis"}]}}Content-Length: 166

{"jsonrpc":"2.0","id":3,"method":"workspace/configuration","params":{"items":[{"scopeUri":"file:///private/tmp/reproduce-bug/projects","section":"python.analysis"}]}}Content-Length: 158

{"jsonrpc":"2.0","id":4,"method":"workspace/configuration","params":{"items":[{"scopeUri":"file:///private/tmp/reproduce-bug/projects","section":"pyright"}]}}Content-Length: 158

{"jsonrpc":"2.0","id":5,"method":"workspace/configuration","params":{"items":[{"scopeUri":"file:///private/tmp/reproduce-bug/projects","section":"pyright"}]}}Content-Length: 99

{"jsonrpc":"2.0","method":"window/logMessage","params":{"type":3,"message":"Found 3 source files"}}Content-Length: 99

{"jsonrpc":"2.0","method":"window/logMessage","params":{"type":3,"message":"Found 3 source files"}}Content-Length: 36

{"jsonrpc":"2.0","id":2,"result":[]}Content-Length: 162

{"jsonrpc":"2.0","method":"textDocument/publishDiagnostics","params":{"uri":"file:///private/tmp/reproduce-bug/projects/example.py","version":0,"diagnostics":[]}}Content-Length: 135

{"jsonrpc":"2.0","method":"window/logMessage","params":{"type":3,"message":"Starting service instance \"/private/tmp/reproduce-bug\""}}Content-Length: 148

{"jsonrpc":"2.0","id":6,"method":"workspace/configuration","params":{"items":[{"scopeUri":"file:///private/tmp/reproduce-bug","section":"python"}]}}Content-Length: 157

{"jsonrpc":"2.0","id":7,"method":"workspace/configuration","params":{"items":[{"scopeUri":"file:///private/tmp/reproduce-bug","section":"python.analysis"}]}}Content-Length: 149

{"jsonrpc":"2.0","id":8,"method":"workspace/configuration","params":{"items":[{"scopeUri":"file:///private/tmp/reproduce-bug","section":"pyright"}]}}Content-Length: 151

{"jsonrpc":"2.0","method":"window/logMessage","params":{"type":3,"message":"Loading pyproject.toml file at /private/tmp/reproduce-bug/pyproject.toml"}}Content-Length: 178

{"jsonrpc":"2.0","method":"window/logMessage","params":{"type":1,"message":"Pyproject file \"/private/tmp/reproduce-bug/pyproject.toml\" is missing \"[tool.pyright]\" section."}}Content-Length: 99

{"jsonrpc":"2.0","method":"window/logMessage","params":{"type":3,"message":"Found 4 source files"}}Content-Length: 36

{"jsonrpc":"2.0","id":3,"result":[]}Content-Length: 36

{"jsonrpc":"2.0","id":4,"result":[]}Content-Length: 152

{"jsonrpc":"2.0","method":"textDocument/publishDiagnostics","params":{"uri":"file:///private/tmp/reproduce-bug/foobar.py","version":0,"diagnostics":[]}}Content-Length: 36

{"jsonrpc":"2.0","id":5,"result":[]}Content-Length: 458

{"jsonrpc":"2.0","method":"textDocument/publishDiagnostics","params":{"uri":"file:///private/tmp/reproduce-bug/projects/example.py","version":5,"diagnostics":[{"range":{"start":{"line":3,"character":0},"end":{"line":3,"character":2}},"message":"\"os\" is not defined","severity":2,"code":"reportUndefinedVariable","source":"Pyright","codeDescription":{"href":"https://github.com/microsoft/pyright/blob/main/docs/configuration.md#reportUndefinedVariable"}}]}}Content-Length: 162

{"jsonrpc":"2.0","method":"textDocument/publishDiagnostics","params":{"uri":"file:///private/tmp/reproduce-bug/projects/example.py","version":5,"diagnostics":[]}}Content-Length: 36

@erictraut
Copy link
Collaborator

@zhou13, thanks for the repro. I figured out the underlying cause. Now we need to determine how to fix it.

The problem is that the source file is part of two different workspaces, but it is opened in the first workspace before the language server is made aware of the second workspace. We have two separate language server instances — one for each of the two workspaces, but only the first one has received the "open file" command along with the initial contents of the file. When we apply the delta change to both instances, the first one works fine, but the second one applies the delta to an empty file.

@erictraut
Copy link
Collaborator

@heejaechang, this is related to the conversation we had recently about sharing file contents across workspaces in the case where a file is associated with more than one workspace. Our discussion was focussed on memory and CPU, but it's now clear that there's a correctness issue as well.

Perhaps we should maintain a map of open files and their contents within the languageServerBase and simply send the updated contents to each of the service instances. That way, the text would be shared between them.

@heejaechang
Copy link
Collaborator

tagging @rchiodo

Perhaps we should maintain a map of open files and their contents within the languageServerBase and simply send the updated contents to each of the service instances. That way, the text would be shared between them.

I believe Rich has fixed the issue in this PR - 68a8e98#diff-8a9e7373556006db29659ed8820a21073840a944192d2b49977c032276f1f979R1283

so we should already do that. there might be a bug in getContainingWorkspacesForFile though.

@erictraut
Copy link
Collaborator

erictraut commented Apr 26, 2023

I don't think Rich's change addresses this problem. That logic is able to send "open file" and "document changed" events to all relevant workspaces, but if the workspace list changes and a new workspace is added between an "open file" and a "document changed" event, it doesn't synthesize an "open file" event for the new workspace.

@erictraut erictraut added bug Something isn't working and removed needs investigation Requires additional investigation to determine course of action labels Apr 26, 2023
@heejaechang
Copy link
Collaborator

heejaechang commented Apr 26, 2023

ah. I miss understood. yes. the file is already opened when new workspace is added.

This can't happen with vscode since vscode always reload LSP server on workspace changes. that's probably why we didn't notice this issue.

Perhaps we should maintain a map of open files and their contents within the languageServerBase and simply send the updated contents to each of the service instances. That way, the text would be shared between them.

ya, when we get change event, we probably need to see whether workspaces we got all has the file opened, otherwise, open the file first with the right content we get from one of workspace that has it opened.

erictraut pushed a commit that referenced this issue Apr 27, 2023
…responsible for applying any delta changes and updating the contents for each of the workspaces associated with that file. This allows source code contents to be shared between workspaces, which reduces memory usage. It also eliminates a correctness issue that can occur if new workspaces are added after files are open. It addresses #4994.
@erictraut
Copy link
Collaborator

I have a proposed fix ready for review in this PR.

erictraut pushed a commit that referenced this issue Apr 27, 2023
…responsible for applying any delta changes and updating the contents for each of the workspaces associated with that file. This allows source code contents to be shared between workspaces, which reduces memory usage. It also eliminates a correctness issue that can occur if new workspaces are added after files are open. It addresses #4994.
@erictraut erictraut removed the language server issue Related to language server features label Apr 28, 2023
@erictraut erictraut added the addressed in next version Issue is fixed and will appear in next published version label Apr 28, 2023
@erictraut
Copy link
Collaborator

PR is merged. This will be addressed in the next release.

@erictraut
Copy link
Collaborator

This is addressed in pyright 1.1.306, which I just published.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
addressed in next version Issue is fixed and will appear in next published version bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants