Maintain a list of tokens #133

vosen · 2015-05-14T23:09:48Z

Most of our functionality that relies on output of token list, just lexes whatever text it needs ad-hoc. It'd be much much better to maintain a list of tokens for every span. I've got some Rust lexing code lying around but it'll need some improvements and wrapping (handling newlines as separate tokens, incremental mode, some kind of cache for queries "give ma list of tokens that intersect this span", etc).

vosen · 2015-10-18T20:45:51Z

This issue turns out to be more worse than I initially thought.
Currently we are implementing syntax highlighting by implementing ITaggerProvider. We always assumed that ITagger<T> will call us line-by-line on every load and change. Which, while wasteful, is correct.
Turns out it's wrong. ITagger<T> can be called any time and with any span VS wants to. Especially with large files VS will skip tagging every line.
In practice, this breaks highlighting of multiline tokens (/**/ comments, strings) in large documents.

So, since ITagger<T> needs rework I thought of doing this properly and tackling this issue.

We can do with our current lexer, but the crux of the problem is in implementing some kind of cache incremental cache for tokens. We should store the tokens in something more sophisticated than a List<T>, I don't want to copy a list of thousands of token every time a character is enetered.
Lately I've been spelunking through decompilations of Microsoft.VisualStudio.Text.Data.dll and its various friends and i think I've got a handle on possible implementations.

GetLineNumberFromPosition(..) seems to be relatively cheap. For small documents it sums up an array of line lengths, for larger documents it does some kind of search on a binary tree index of spans. We can rely on this and split lexed document into lines, where line contains ITrackingPoint at the start of the line + List<(Span, TokenType)> for the tokens in the line.
Another solution is to simply chuck all the tokens into a binary tree, keeping tokens as (ITrackingPoint, Length, TokenType)

I've tried both solutions and the first one seem simple but is complicated in practice (mostly handling multi-line tokens). I've got invalidation implemented for both approaches, the code is at https://github.com/vosen/VSEditorLexingModel/ and I hope to wrap it up soon.

briansmith · 2015-10-19T01:53:12Z

I have noticed the problem where syntax highlighting goes wrong with multi-lined things.

The technique that I've seen used multiple times is this one:
https://github.com/smartmobili/parsing/blob/master/VisualStudioIntegration/PkgDef%20Editor/C%23/PkgDefLanguage.cs#L128

Note that I believe Visual Studio automatically does the right thing except with multi-line tokens, since it doesn't understand multi-line tokens. So, I think it is sufficient to just special-case the multi-line token case.

vosen · 2015-11-21T11:21:39Z

Closed by #197

vosen self-assigned this May 14, 2015

vosen mentioned this issue Oct 18, 2015

Add cargo integration #3

Open

5 tasks

vosen mentioned this issue Nov 3, 2015

Fix lexing #197

Merged

vosen closed this as completed Nov 21, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Maintain a list of tokens #133

Maintain a list of tokens #133

vosen commented May 14, 2015

vosen commented Oct 18, 2015

briansmith commented Oct 19, 2015

vosen commented Nov 21, 2015

Maintain a list of tokens #133

Maintain a list of tokens #133

Comments

vosen commented May 14, 2015

vosen commented Oct 18, 2015

briansmith commented Oct 19, 2015

vosen commented Nov 21, 2015