Semantic Tokens API #86415

alexdima · 2019-12-05T15:59:42Z

This issue tracks the API proposal for semantic tokens.

Tokens representation

A file can contain many tokens, perhaps even hundreds of thousands of tokens. Therefore, to improve the memory consumption around describing semantic tokens, we have decided to avoid allocating an object for each token and we represent tokens from a file as an array of integers. Furthermore, the position of each token is expressed relative to the token before it because most tokens remain stable relative to each other when edits are made in a file.

In short, each token takes 5 integers to represent, so a specific token i in the file consists of the following array indices:

at index 5*i - deltaLine: token line number, relative to the previous token
at index 5*i+1 - deltaStart: token start character, relative to the previous token (relative to 0 or the previous token's start if they are on the same line)
at index 5*i+2 - length: the length of the token. A token cannot be multiline.
at index 5*i+3 - tokenType: will be looked up in SemanticTokensLegend.tokenTypes. We currently ask that tokenType < 65536.
at index 5*i+4 - tokenModifiers: each set bit will be looked up in SemanticTokensLegend.tokenModifiers

How to encode tokens

Here is an example for encoding a file with 3 tokens in a uint32 array:

   { line: 2, startChar:  5, length: 3, tokenType: "property",  tokenModifiers: ["private", "static"] },
   { line: 2, startChar: 10, length: 4, tokenType: "type",      tokenModifiers: [] },
   { line: 5, startChar:  2, length: 7, tokenType: "class",     tokenModifiers: [] }

First of all, a legend must be devised. This legend must be provided up-front and capture all possible token types.
For this example, we will choose the following legend which must be passed in when registering the provider:

   tokenTypes: ['property', 'type', 'class'],
   tokenModifiers: ['private', 'static']

The first transformation step is to encode tokenType and tokenModifiers as integers using the legend. Token types are looked up by index, so a tokenType value of 1 means tokenTypes[1]. Multiple token modifiers can be set by using bit flags, so a tokenModifier value of 3 is first viewed as binary 0b00000011, which means [tokenModifiers[0], tokenModifiers[1]] because
bits 0 and 1 are set. Using this legend, the tokens now are:

   { line: 2, startChar:  5, length: 3, tokenType: 0, tokenModifiers: 3 },
   { line: 2, startChar: 10, length: 4, tokenType: 1, tokenModifiers: 0 },
   { line: 5, startChar:  2, length: 7, tokenType: 2, tokenModifiers: 0 }

The next step is to represent each token relative to the previous token in the file. In this case, the second token is on the same line as the first token, so the startChar of the second token is made relative to the startChar of the first token, so it will be 10 - 5. The third token is on a different line than the second token, so the startChar of the third token will not be altered:

   { deltaLine: 2, deltaStartChar: 5, length: 3, tokenType: 0, tokenModifiers: 3 },
   { deltaLine: 0, deltaStartChar: 5, length: 4, tokenType: 1, tokenModifiers: 0 },
   { deltaLine: 3, deltaStartChar: 2, length: 7, tokenType: 2, tokenModifiers: 0 }

Finally, the last step is to inline each of the 5 fields for a token in a single array, which is a memory friendly representation:

   // 1st token,  2nd token,  3rd token
   [  2,5,3,0,3,  0,5,4,1,0,  3,2,7,2,0 ]

Updating tokens

Instead of always returning all the tokens in a file, it is possible for a DocumentSemanticTokensProvider to implement this method (updateSemanticTokens) and then return incremental updates to the previously provided semantic tokens.

How tokens change when the document changes

Let's look at how tokens might change.

Continuing with the above example, suppose a new line was inserted at the top of the file.
That would make all the tokens move down by one line (notice how the line has changed for each one):

   { line: 3, startChar:  5, length: 3, tokenType: "property", tokenModifiers: ["private", "static"] },
   { line: 3, startChar: 10, length: 4, tokenType: "type",     tokenModifiers: [] },
   { line: 6, startChar:  2, length: 7, tokenType: "class",    tokenModifiers: [] }

The integer encoding of the tokens does not change substantially because of the delta-encoding of positions:

   // 1st token,  2nd token,  3rd token
   [  3,5,3,0,3,  0,5,4,1,0,  3,2,7,2,0 ]

It is possible to express these new tokens in terms of an edit applied to the previous tokens:

   [  2,5,3,0,3,  0,5,4,1,0,  3,2,7,2,0 ] // old tokens
   [  3,5,3,0,3,  0,5,4,1,0,  3,2,7,2,0 ] // new tokens

   edit: { start:  0, deleteCount: 1, data: [3] } // replace integer at offset 0 with 3

Furthermore, let's assume that a new token has appeared on line 4:

   { line: 3, startChar:  5, length: 3, tokenType: "property", tokenModifiers: ["private", "static"] },
   { line: 3, startChar: 10, length: 4, tokenType: "type",      tokenModifiers: [] },
   { line: 4, startChar:  3, length: 5, tokenType: "property", tokenModifiers: ["static"] },
   { line: 6, startChar:  2, length: 7, tokenType: "class",    tokenModifiers: [] }

The integer encoding of the tokens is:

   // 1st token,  2nd token,  3rd token,  4th token
   [  3,5,3,0,3,  0,5,4,1,0,  1,3,5,0,2,  2,2,7,2,0, ]

Again, it is possible to express these new tokens in terms of an edit applied to the previous tokens:

   [  3,5,3,0,3,  0,5,4,1,0,  3,2,7,2,0 ]               // old tokens
   [  3,5,3,0,3,  0,5,4,1,0,  1,3,5,0,2,  2,2,7,2,0, ]  // new tokens

   edit: { start: 10, deleteCount: 1, data: [1,3,5,0,2,2] } // replace integer at offset 10 with [1,3,5,0,2,2]

NOTE: When doing edits, it is possible that multiple edits occur until VS Code decides to invoke the semantic tokens provider.
NOTE: If the provider cannot compute SemanticTokensEdits, it can "give up" and return all the tokens in the document again.
NOTE: All edits in SemanticTokensEdits contain indices in the old integers array, so they all refer to the previous result state.

The text was updated successfully, but these errors were encountered:

Vigilans · 2019-12-19T05:20:56Z

Since microsoft/vscode-languageserver-node#367 proposed the protocol to be server pushing notification to the client, would it be fine for SemanticTokensProvider to provide an event to be registered, just like what TreeDataProvider do? e.g.:

treeDataProvider.onDidChangeTreeDataEvent.fire(item)

vs

semanticTokenProvider.onDidChangeSemanticTokensEvent.fire(document)

In such case, the client is capable of requesting vscode to re-render the semantic highlighting once notification from server is received.

Vigilans · 2020-01-13T12:15:16Z

Hi, I've implemented a provider prototype combining java language server and semantic tokens api in vscode-java. Just clone it and run the extension in insider version then the provider is ready for test.

Demo:

matklad · 2020-01-14T00:07:14Z

Am I correct that SemanticTokensRequestOptions.ranges allows the client to specify the "viewport" / subset of the document, for which highlights are requested?

So, for example, if I "goto definition" to the new file, the editor will first ask to only color the small visible part of the document (for faster response), and then it'll ask the second query to color the whole file (so that scrolling doesn't show boring colorless text)?

This is a super-important optimization, 👍

Also I'd like to suggest that maybe, if we have ranges, we don't need SemanticTokensEdits at all? I imagine this can work like this:

an editor always asks to highlight the visible range of the document (which is O(1) worth of data)
when a file is opened for the first time, the editor asks to highlight the whole file, in background, such that scrolling shows colored code
on the editor side we maintain this global highlighting, by adjusting ranges of the highlights. Ie, if the user types at the start of the document, we only re-ask about first screen-full of lines of the document, and just shift the ranges for the rest of the document. If the user goes to the end of the document, we immediately show them cached shifted highlighting (which might be slightly off), and also ask the server to re-highlight the end region.
if an edit happens which invalidates the whole of the cached highligting map, we re-ask the server to highlight the whole file.

I don't like the presence of SemanticTokensEdits for two reasons:

this is a statefull bit, which would be annoying to synchronize between the client and the server,
I fear that folks would focus on "incrementally highlight the whole file" use case, which I believe is fundamentally less efficient than "from-scratch highlight of the visible range". It is less efficient because, although you can send final highlighting as edits, you typically still would do O(file len) processing to compute those edits, even in many happy cases (for example, if you declare a new global variable, you need to re-highlight all function bodies). In some pathological cases, even the diff itself would be O(file len) (adding or removing a single " in a language with multilne strings). In contrast, the viewport approach is always roughly O(1) processing, because you can only fit so many letters on the screen.

rcjsuen · 2020-01-31T13:59:07Z

How do I flag escape characters and/or escaped characters? I do not see it in the current list of token types or token modifiers.

vscode/src/vs/platform/theme/common/tokenClassificationRegistry.ts

Lines 372 to 408 in be0aca7

    
           registerTokenType('comment', nls.localize('comment', "Style for comments."), [['comment']]); 
        
           registerTokenType('string', nls.localize('string', "Style for strings."), [['string']]); 
        
           registerTokenType('keyword', nls.localize('keyword', "Style for keywords."), [['keyword.control']]); 
        
           registerTokenType('number', nls.localize('number', "Style for numbers."), [['constant.numeric']]); 
        
           registerTokenType('regexp', nls.localize('regexp', "Style for expressions."), [['constant.regexp']]); 
        
           registerTokenType('operator', nls.localize('operator', "Style for operators."), [['keyword.operator']]); 
        
           registerTokenType('namespace', nls.localize('namespace', "Style for namespaces."), [['entity.name.namespace']]); 
        
           registerTokenType('type', nls.localize('type', "Style for types."), [['entity.name.type'], ['support.type'], ['support.class']]); 
        
           registerTokenType('struct', nls.localize('struct', "Style for structs."), [['storage.type.struct']], 'type'); 
        
           registerTokenType('class', nls.localize('class', "Style for classes."), [['entity.name.type.class']], 'type'); 
        
           registerTokenType('interface', nls.localize('interface', "Style for interfaces."), [['entity.name.type.interface']], 'type'); 
        
           registerTokenType('enum', nls.localize('enum', "Style for enums."), [['entity.name.type.enum']], 'type'); 
        
           registerTokenType('typeParameter', nls.localize('typeParameter', "Style for type parameters."), [['entity.name.type', 'meta.type.parameters']], 'type'); 
        
           registerTokenType('function', nls.localize('function', "Style for functions"), [['entity.name.function'], ['support.function']]); 
        
           registerTokenType('member', nls.localize('member', "Style for member"), [['entity.name.function.member'], ['support.function']]); 
        
           registerTokenType('macro', nls.localize('macro', "Style for macros."), [['entity.name.other.preprocessor.macro']], 'function'); 
        
           registerTokenType('variable', nls.localize('variable', "Style for variables."), [['variable'], ['entity.name.variable']]); 
        
           registerTokenType('constant', nls.localize('constant', "Style for constants."), [['variable.other.constant']], 'variable'); 
        
           registerTokenType('parameter', nls.localize('parameter', "Style for parameters."), [['variable.parameter']], 'variable'); 
        
           registerTokenType('property', nls.localize('property', "Style for properties."), [['variable.other.property']], 'variable'); 
        
           registerTokenType('label', nls.localize('labels', "Style for labels. "), undefined); 
        
           // default token modifiers 
        
           tokenClassificationRegistry.registerTokenModifier('declaration', nls.localize('declaration', "Style for all symbol declarations."), undefined); 
        
           tokenClassificationRegistry.registerTokenModifier('documentation', nls.localize('documentation', "Style to use for references in documentation."), undefined); 
        
           tokenClassificationRegistry.registerTokenModifier('static', nls.localize('static', "Style to use for symbols that are static."), undefined); 
        
           tokenClassificationRegistry.registerTokenModifier('abstract', nls.localize('abstract', "Style to use for symbols that are abstract."), undefined); 
        
           tokenClassificationRegistry.registerTokenModifier('deprecated', nls.localize('deprecated', "Style to use for symbols that are deprecated."), undefined); 
        
           tokenClassificationRegistry.registerTokenModifier('modification', nls.localize('modification', "Style to use for write accesses."), undefined); 
        
           tokenClassificationRegistry.registerTokenModifier('async', nls.localize('async', "Style to use for symbols that are async."), undefined); 
        
           tokenClassificationRegistry.registerTokenModifier('readonly', nls.localize('readonly', "Style to use for symbols that are readonly."), undefined);

mattacosta · 2020-02-17T23:02:48Z

Note: This is 1-2 weeks old. Some issues may no longer be relevant.

General feedback:

After trying out the little experiment above, here are some issues I ran across:

Right off the bat, vscode-languageserver wasn't exporting some types I needed from the main package. I had to import directly from vscode-languageserver/lib/sematicTokens.proposed (and vscode-languageserver/lib/progress if I had implemented that part). Not sure if that was intentional or not, but either way there's going to be a conflict between the two interfaces named SemanticTokens once the other moves out of the Proposed namespace.
The semantic-tokens-sample extension could've been more helpful. Specifically it could have:
- Mentioned that editor.semanticHighlighting.enable was disabled by default. That was a nice facepalm moment.
- Used package.json styling, which I only accidentally came across. It's only linked to as a "background information" link there. Perhaps the first post here could include a link to the wiki page as well?
The semanticTokenModifiers contribution has unexpected behavior. For a while, my server was returning tokens with several modifiers that were not defined in that contribution. Rather than ignoring the modifier and possibly reverting to the TextMate style, it used the style of the last modifier listed there. Basically, in the image above, the first three parameters were the same color as $p7 and it was annoying trying to figure out why it was being matched even though the inspector showed the correct type and modifier.
For the semanticTokenStyleDefaults contribution:
- The description for the fontStyle property should probably mention that multiple styles should be delimited by spaces.
- The description for the selector property could use clarification (what to put there, how are they "selected"). Since I was guessing at the time, I initially thought it was for TextMate selectors.

API thoughts:

I don't get why we're using [line, char, length, ...] that then conditionally switches to [unused, delta, length, ...] (at least for tokens on the same line) instead of a [delta, length, ...] representation everywhere (where the first token is relative to the start of the document). That should save memory and reduce the amount of data to transfer right?
Speaking of deltas, since they are relative to the end of the previous token, there can't be any overlapping ranges. I don't see this being an issue with everyday languages, but are we sure that it won't bite host languages that also embed other languages (especially languages that provide a foreign function interface and put said code into strings)?
"A token cannot be multiline." This presents two issues:
1. For the vast majority of languages, lines are irrelevant; that is, they do not have weird one-line-only tokenization strategies like TM grammars. It just makes sense to let each one determine the kind and length of each token rather than force every language to create workarounds.
2. Should this be a requirement however, you will have to provide a better definition of a line. Lines are ultimately a UI-level construct, so not all languages will have the same definition. For example, should the language use the Unicode definition of a line terminator (VT,FF,CR,LF,CRLF,LS,PS), or a smaller subset (CR,LF,CRLF,LS,PS), or how about an even smaller subset (LF,CRLF).

alexdima · 2020-02-18T07:54:15Z

@mattacosta Thank you for your feedback. Ping @dbaeumer for the language server and @aeschli for the token types / styling specific feedback.

Regarding the API specific feedback:

this does not switch to [unused, delta, length]. The format is: [deltaLine, deltaChar, length, tokenType, tokenModifiers]. So if a semantic token starts on row 10, character 5, and the next one starts on row 15, character 3, the second one will use [5, 3, ...]. But I agree the current format is not 100% optimal, given that 32 bits are reserved for deltaLine and another 32 bits are reserved for deltaChar and those bits will mostly have 0s in them. I considered using a variable-byte encoding strategy, kind of like UTF8 but for integers with a continuation bit, but I thought that might be too complex for the extra gains. For a file with 100.000 tokens, the cost is today 500.000 32 bit integers aka 2MB. This could be maybe reduced by an additional 40-50% using variable byte encoding of numbers, but I believe it is good enough.
the choice to have tokens not overlap is intentional. Overlapping tokens would have been a pain to work with, as binary search would have been difficult, adjusting them after an edit would have been more difficult, and in general the delta encoding strategy might not have worked too well. Today, the editor ends up storing the tokens in almost the same format they are provided (it just reduces token type + token modifiers to a single 32 bit integer which has the styling applied). The continuous non-overlapping ranges are therefore kept in a simple array and they can be adjusted relatively straight-forward. We have explored semantic tokens for HTML and at least there we didn't hit any problem with overlapping ones because HTML stops where JS or CSS begin, etc.
the multiline decision has some back-story and is vscode specific. For semantic tokens, an (offset;length) kind of representation would have been more natural. Because then (deltaOffset;length) for subsequent lines can be more easily explained as the gap between the previous token and the current one in character count. But we have decided to be consistent with the existing vscode APIs which are always using a (line;char) coordinate system. We then would have had 4 integers to represent a token position, like (startLine,startChar,endLine,endChar). We felt that this is wasteful and considered that in the vast majority of languages, semantic tokens (not syntactic tokens) are very likely to be single line. This also worked with the fact that the editor has quite a bunch of code that can deal with single-line tokens and adjust them after edits, etc. (the tokens coming in from the TM grammar). Also, it's not like multiline tokens cannot be expressed, they just cost one token per line, so a multiline token of 3 lines needs 3 tokens to be expressed. So we added the extra limitation that a semantic token can only be on a single line.
the line endings problem is a general problem for any VS Code API, i.e. for all the existing ones as well. All the APIs talk about (line;col) positions. VS Code uses \r\n, \n and \r as line endings.

Thanks again for the great feedback!

dbaeumer · 2020-02-18T15:33:30Z

@mattacosta

Regardind

Right off the bat, vscode-languageserver wasn't exporting some types I needed from the main package. I had to import directly from vscode-languageserver/lib/sematicTokens.proposed (and vscode-languageserver/lib/progress if I had implemented that part). Not sure if that was intentional or not, but either way there's going to be a conflict between the two interfaces named SemanticTokens once the other moves out of the Proposed namespace.

Which type did you miss from semantic tokens ? If it is the builder it is exposed via ProposedFeatures.SemanticTokensBuilder

I will export WorkDoneProgress so that you don't need to get it from the progress file.

rcjsuen · 2020-02-18T15:48:08Z

@dbaeumer Perhaps @mattacosta is referring to this.

I ended up doing this instead to move on.

import { SemanticTokens } from 'vscode-languageserver-protocol/lib/protocol.sematicTokens.proposed';

dbaeumer · 2020-02-18T15:50:54Z

Is Proposed.SemanticTokens not working for you? See https://github.com/microsoft/vscode-languageserver-node/blob/master/protocol/src/main.ts#L77

rcjsuen · 2020-02-18T15:56:23Z

@dbaeumer Yeah, I saw that. But no, it doesn't.

I mean, based on the syntax highlighting I'm not sure we should be doing Proposed.SemanticTokens in the first place. :P

[10:55:08 AM] File change detected. Starting incremental compilation...

src/main.ts:10:18 - error TS1005: ',' expected.

10 import { Proposed.SemanticTokens } from 'vscode-languageserver-protocol';
                    ~

[10:55:09 AM] Found 1 error. Watching for file changes.

I don't really understand TypeScript namespaces and modules well though so... 🤷‍♂

mattacosta · 2020-02-18T22:11:05Z

@dbaeumer No, not the builder. I needed the interface exposing the handler methods.

export interface SemanticTokens {
    semanticTokens: {
        on(handler: ServerRequestHandler<Proposed.SemanticTokensParams, Proposed.SemanticTokens, Proposed.SemanticTokensPartialResult, void>): void;
        onEdits(handler: ServerRequestHandler<Proposed.SemanticTokensEditsParams, Proposed.SemanticTokensEdits | Proposed.SemanticTokens, Proposed.SemanticTokensEditsPartialResult | Proposed.SemanticTokensEditsPartialResult, void>): void;
        onRange(handler: ServerRequestHandler<Proposed.SemanticTokensRangeParams, Proposed.SemanticTokens, Proposed.SemanticTokensPartialResult, void>): void;
    };
}

The Proposed.SemanticTokens type does work for me, but it resolves to:

export interface SemanticTokens {
    resultId?: string;
    data: number[];
}

The former is required because the connection only has the semanticTokens property while the type is inferred. As soon as the connection is passed elsewhere or saved in a property, that inference is lost.

let connection = createConnection(ProposedFeatures.all, ...);
connection.language.semanticTokens.on(...);  // Still inferred as `Language & SemanticTokens`

let provider = new SemanticProvider(connection);

class SemanticProvider {
  // TypeScript needs an explicit intersection type for `connection.language` either here in
  // the parameter list for the generic version or later in a type cast.
  constructor(connection: Connection) {
    // connection.language.semanticTokens.on(...);
    //                     ~~~~~~~~~~~~~~   does not exist

    (connection.language as Language & SemanticTokens).semanticTokens.on(...);
  }
}

mattacosta · 2020-02-18T22:19:10Z

@rcjsuen So close! You're supposed to just import Proposed. 😄

import { Proposed } from 'vscode-languageserver';

function handler(...): Proposed.SemanticTokens {
  // do stuff
}

dbaeumer · 2020-02-19T09:06:56Z

@mattacosta I see the problem but I would like to propose a different approach (which I use in these cases). You can always capture an inferred type in TypeScript using the typeof operator. So you can do the following

type ConnectionType = typeof connection;
class SemanticProvider {
  constructor(connection: ConnectionType) {
  }
}

or if you don't want to capture it you can inline it as well

class SemanticProvider {
  constructor(conn: typeof connection) {
  }
}

would that work for you ?

mattacosta · 2020-02-19T21:08:29Z

I don't believe so. My connection isn't accessible like that in reality. It's more like:

abstract class Server {
  protected readonly connection: typeof ???;  // no value to use here
  constructor() {
    this.connection = this.createConnection();
    // setup logging, init handlers, etc
  }
  protected abstract createConnection(): ???;
  // ...other methods that use the connection property...
}

class SemanticServer extends Server {
  constructor() {
    super();
    this.semanticTokenProvider = new SemanticTokenProvider(this.connection);
  }
}

class SyntaxServer extends Server {
  // took a page from typescript; folding, selectionrange, etc
}

simolus3 · 2020-03-07T22:21:13Z

I'm trying to integrate semantic tokens in the Dart extension. Overall it works pretty well, but one thing I noticed is that nested regions don't appear to work. In Dart, this comes up quite often with interpolation in string literals or in documentation comments. Here the entire string literal would be a string token, with other tokens appearing in between.
For now, we're splitting tokens to remove overlapping regions. But since I couldn't find anything on nested regions in the documentation, I'm wondering if this is intentional or a restriction that could be loosened.

kjeremy · 2020-03-07T22:47:41Z

@simolus3 we have the same issue in rust-analyzer here: rust-lang/rust-analyzer#3447

mattacosta · 2020-03-08T10:32:09Z

@simolus3 This was alexdima's response to my comment a few weeks ago on this:

the choice to have tokens not overlap is intentional. Overlapping tokens would have been a pain to work with, as binary search would have been difficult, adjusting them after an edit would have been more difficult, and in general the delta encoding strategy might not have worked too well. Today, the editor ends up storing the tokens in almost the same format they are provided (it just reduces token type + token modifiers to a single 32 bit integer which has the styling applied). The continuous non-overlapping ranges are therefore kept in a simple array and they can be adjusted relatively straight-forward. We have explored semantic tokens for HTML and at least there we didn't hit any problem with overlapping ones because HTML stops where JS or CSS begin, etc.

Also, for both interpolated strings and documentation comments, those can (and really should) be done on the TM grammar side of things. I see no reason to change the semantic highlighting API for them unless there's something really weird about Dart/Rust.

ryanbrandenburg · 2020-03-13T22:14:25Z

@alexdima, I see that this issue is in a "March 2020" milestone. Should we take that to mean that it's targeted at being available outside of the proposedAPI system by the end of March? We're very interested in using this feature in our Razor extension, but understand that

They are subject to change, only available in Insiders distribution and cannot be used in published extensions

JoeRobich · 2020-03-16T04:03:14Z

@alexdima, I experimented with adding semantic token support to the C# extension and am very happy with the results. One thing that I would like to bring up is that in order to get the full Visual Studio experience the Dark+ and Light+ themes need some tweaks. I have added the tweaks I've made to my PR - dotnet/vscode-csharp#3667. Would the team be open to incorporating these changes into the the released themes?

I also noticed that the keyword token is using the 'keyword.control' scope. Perhaps it would be worth having both in the default tokens since the distinction is such a big part of the VS Code aesthetic.

dbaeumer · 2020-03-16T08:28:37Z

@aeschli can you comment on #86415 (comment)

aeschli · 2020-03-16T10:57:36Z

@JoeRobich Please file a separate issue with your theme change suggestions so we can go through them together. We are very careful not to make changes to the default themes, but let's discuss this in the new issue.

JoeRobich · 2020-03-18T20:21:30Z

@aeschli I opened #92965 to track the theme change request

alexdima · 2020-03-30T15:40:11Z

A heads up if anyone was using SemanticTokensBuilder, there was a breaking change in the return type of build():

	/**
	 * A semantic tokens builder can help with creating a `SemanticTokens` instance
	 * which contains delta encoded semantic tokens.
	 */
	export class SemanticTokensBuilder {

		constructor(legend?: SemanticTokensLegend);

		/**
		 * Add another token.
		 *
		 * @param line The token start line number (absolute value).
		 * @param char The token start character (absolute value).
		 * @param length The token length in characters.
		 * @param tokenType The encoded token type.
		 * @param tokenModifiers The encoded token modifiers.
		 */
		push(line: number, char: number, length: number, tokenType: number, tokenModifiers: number): void;

		/**
		 * Add another token. Use only when providing a legend.
		 *
		 * @param range The range of the token. Must be single-line.
		 * @param tokenType The token type.
		 * @param tokenModifiers The token modifiers.
		 */
		push(range: Range, tokenType: string, tokenModifiers?: string[]): void;

		/**
		 * Finish and create a `SemanticTokens` instance.
		 */
		build(resultId?: string): SemanticTokens;
	}

ghost · 2020-04-25T09:31:01Z

https://github.com/microsoft/vscode-languageserver-node/blob/e5d7ad881b1a51b486e3f0e4aa0fbc25dad2be58/protocol/src/protocol.semanticTokens.proposed.ts#L18 < I wonder, doesn't this lack an import token type? I think most languages have a way to import a module, or to require/include a module, and I think many editors would want to be able to color these import statements differently than just any arbitrary unknown keyword. And what is a language server supposed to return for modifiers that aren't known to the client, shouldn't there be some sort of misc modifier type alike to the keyword generic token? That seems like a common scenario to run into

alexdima · 2020-04-27T20:21:11Z

@etc0de I think the best is to open a new issue

ghost · 2020-04-27T20:51:08Z

@alexdima ok I made one here: microsoft/language-server-protocol#968

vscodebot bot assigned jrieken Dec 5, 2019

alexdima assigned alexdima and unassigned jrieken Dec 5, 2019

alexdima added this to the November 2019 milestone Dec 5, 2019

alexdima added the plan-item VS Code - planned item for upcoming label Dec 5, 2019

alexdima modified the milestones: November 2019, December 2019 Dec 5, 2019

alexdima added the semantic-tokens Semantic tokens issues label Dec 9, 2019

alexdima added the api-finalization label Jan 6, 2020

kjeremy mentioned this issue Jan 13, 2020

Semantic Highlighting rust-lang/rust-analyzer#604

Closed

matklad referenced this issue in microsoft/vscode-languageserver-node Jan 27, 2020

Add proposed support for semantic tokens

4c0963d

feafarot mentioned this issue Jan 27, 2020

C# Interface Colorizing dotnet/vscode-csharp#2203

Closed

alexdima modified the milestones: January 2020, February 2020 Jan 27, 2020

Alphish mentioned this issue Feb 19, 2020

TextMate lacks means to match a pattern exactly one time microsoft/vscode-textmate#114

Open

Colengms mentioned this issue Feb 20, 2020

Switch to using VS Code Semantic Tokens API for colorization microsoft/vscode-cpptools#4992

Closed

simolus3 mentioned this issue Mar 7, 2020

[experimental] Semantic token highlighting Dart-Code/Dart-Code#2203

Closed

alexdima added feature-request Request for new features or functionality and removed plan-item VS Code - planned item for upcoming labels Mar 11, 2020

JoeRobich mentioned this issue Mar 15, 2020

Add experimental Semantic Highlighter dotnet/vscode-csharp#3667

Merged

1 task

RyanCavanaugh mentioned this issue Mar 18, 2020

Semantic colorization requested for entire (large) file instead of viewport area #92963

Closed

rcjsuen mentioned this issue Mar 25, 2020

[contrib/syntax/textmate] Dockerfile syntax highlighting for escaped quotes is not quite right moby/moby#40114

Open

alexr00 mentioned this issue Mar 26, 2020

Suggestions on Python highlighting #93382

Closed

alexdima closed this as completed in 21293c4 Mar 30, 2020

This was referenced Mar 30, 2020

Provide an API for advanced/semantic source highlighting #585

Closed

Final Semantic Tokens API check #93795

Closed

alexdima added the on-testplan label Mar 30, 2020

rcjsuen mentioned this issue Apr 2, 2020

Semantic highlighting does not appear to work (due to theming) microsoft/monaco-editor#1833

Open

rcjsuen mentioned this issue Apr 25, 2020

Support semantic highlighting microsoft/language-server-protocol#18

Closed

ghost mentioned this issue Apr 27, 2020

Semantic highlighting API draft should incorporate "importKeyword" and "modifierKeyword" token types microsoft/language-server-protocol#968

Open

s0kil mentioned this issue May 3, 2020

RFC: Semantic Highlighting mint-lang/mint-vscode#6

Open

DanielRosenwasser mentioned this issue May 8, 2020

Supporting Efficient Semantic Highlighting microsoft/TypeScript#38435

Closed

KamasamaK mentioned this issue May 12, 2020

semantic highlighting: multi-line token? microsoft/language-server-protocol#982

Closed

github-actions bot locked and limited conversation to collaborators May 14, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Semantic Tokens API #86415

Semantic Tokens API #86415

alexdima commented Dec 5, 2019 •

edited

Loading

How to encode tokens

How tokens change when the document changes

Vigilans commented Dec 19, 2019

Vigilans commented Jan 13, 2020 •

edited

Loading

matklad commented Jan 14, 2020 •

edited

Loading

rcjsuen commented Jan 31, 2020

mattacosta commented Feb 17, 2020 •

edited

Loading

alexdima commented Feb 18, 2020

dbaeumer commented Feb 18, 2020

rcjsuen commented Feb 18, 2020

dbaeumer commented Feb 18, 2020

rcjsuen commented Feb 18, 2020

mattacosta commented Feb 18, 2020 •

edited

Loading

mattacosta commented Feb 18, 2020

dbaeumer commented Feb 19, 2020 •

edited

Loading

mattacosta commented Feb 19, 2020

simolus3 commented Mar 7, 2020

kjeremy commented Mar 7, 2020

mattacosta commented Mar 8, 2020

ryanbrandenburg commented Mar 13, 2020

JoeRobich commented Mar 16, 2020 •

edited

Loading

dbaeumer commented Mar 16, 2020

aeschli commented Mar 16, 2020 •

edited

Loading

JoeRobich commented Mar 18, 2020

alexdima commented Mar 30, 2020

ghost commented Apr 25, 2020 •

edited by ghost

Loading

alexdima commented Apr 27, 2020

ghost commented Apr 27, 2020

Semantic Tokens API #86415

Semantic Tokens API #86415

Comments

alexdima commented Dec 5, 2019 • edited Loading

How to encode tokens

How tokens change when the document changes

Vigilans commented Dec 19, 2019

Vigilans commented Jan 13, 2020 • edited Loading

matklad commented Jan 14, 2020 • edited Loading

rcjsuen commented Jan 31, 2020

mattacosta commented Feb 17, 2020 • edited Loading

alexdima commented Feb 18, 2020

dbaeumer commented Feb 18, 2020

rcjsuen commented Feb 18, 2020

dbaeumer commented Feb 18, 2020

rcjsuen commented Feb 18, 2020

mattacosta commented Feb 18, 2020 • edited Loading

mattacosta commented Feb 18, 2020

dbaeumer commented Feb 19, 2020 • edited Loading

mattacosta commented Feb 19, 2020

simolus3 commented Mar 7, 2020

kjeremy commented Mar 7, 2020

mattacosta commented Mar 8, 2020

ryanbrandenburg commented Mar 13, 2020

JoeRobich commented Mar 16, 2020 • edited Loading

dbaeumer commented Mar 16, 2020

aeschli commented Mar 16, 2020 • edited Loading

JoeRobich commented Mar 18, 2020

alexdima commented Mar 30, 2020

ghost commented Apr 25, 2020 • edited by ghost Loading

alexdima commented Apr 27, 2020

ghost commented Apr 27, 2020

alexdima commented Dec 5, 2019 •

edited

Loading

Vigilans commented Jan 13, 2020 •

edited

Loading

matklad commented Jan 14, 2020 •

edited

Loading

mattacosta commented Feb 17, 2020 •

edited

Loading

mattacosta commented Feb 18, 2020 •

edited

Loading

dbaeumer commented Feb 19, 2020 •

edited

Loading

JoeRobich commented Mar 16, 2020 •

edited

Loading

aeschli commented Mar 16, 2020 •

edited

Loading

ghost commented Apr 25, 2020 •

edited by ghost

Loading