Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Semantic Tokens API #86415

Closed
alexdima opened this issue Dec 5, 2019 · 43 comments
Closed

Semantic Tokens API #86415

alexdima opened this issue Dec 5, 2019 · 43 comments
Assignees
Labels
api-finalization feature-request Request for new features or functionality on-testplan semantic-tokens Semantic tokens issues
Milestone

Comments

@alexdima
Copy link
Member

alexdima commented Dec 5, 2019

This issue tracks the API proposal for semantic tokens.


Sample: semantic-tokens-sample


API: vscode.proposed.d.ts


Tokens representation

A file can contain many tokens, perhaps even hundreds of thousands of tokens. Therefore, to improve the memory consumption around describing semantic tokens, we have decided to avoid allocating an object for each token and we represent tokens from a file as an array of integers. Furthermore, the position of each token is expressed relative to the token before it because most tokens remain stable relative to each other when edits are made in a file.


In short, each token takes 5 integers to represent, so a specific token i in the file consists of the following array indices:

  • at index 5*i - deltaLine: token line number, relative to the previous token
  • at index 5*i+1 - deltaStart: token start character, relative to the previous token (relative to 0 or the previous token's start if they are on the same line)
  • at index 5*i+2 - length: the length of the token. A token cannot be multiline.
  • at index 5*i+3 - tokenType: will be looked up in SemanticTokensLegend.tokenTypes. We currently ask that tokenType < 65536.
  • at index 5*i+4 - tokenModifiers: each set bit will be looked up in SemanticTokensLegend.tokenModifiers

How to encode tokens

Here is an example for encoding a file with 3 tokens in a uint32 array:

   { line: 2, startChar:  5, length: 3, tokenType: "property",  tokenModifiers: ["private", "static"] },
   { line: 2, startChar: 10, length: 4, tokenType: "type",      tokenModifiers: [] },
   { line: 5, startChar:  2, length: 7, tokenType: "class",     tokenModifiers: [] }
  1. First of all, a legend must be devised. This legend must be provided up-front and capture all possible token types.
    For this example, we will choose the following legend which must be passed in when registering the provider:
   tokenTypes: ['property', 'type', 'class'],
   tokenModifiers: ['private', 'static']
  1. The first transformation step is to encode tokenType and tokenModifiers as integers using the legend. Token types are looked up by index, so a tokenType value of 1 means tokenTypes[1]. Multiple token modifiers can be set by using bit flags, so a tokenModifier value of 3 is first viewed as binary 0b00000011, which means [tokenModifiers[0], tokenModifiers[1]] because
    bits 0 and 1 are set. Using this legend, the tokens now are:
   { line: 2, startChar:  5, length: 3, tokenType: 0, tokenModifiers: 3 },
   { line: 2, startChar: 10, length: 4, tokenType: 1, tokenModifiers: 0 },
   { line: 5, startChar:  2, length: 7, tokenType: 2, tokenModifiers: 0 }
  1. The next step is to represent each token relative to the previous token in the file. In this case, the second token is on the same line as the first token, so the startChar of the second token is made relative to the startChar of the first token, so it will be 10 - 5. The third token is on a different line than the second token, so the startChar of the third token will not be altered:
   { deltaLine: 2, deltaStartChar: 5, length: 3, tokenType: 0, tokenModifiers: 3 },
   { deltaLine: 0, deltaStartChar: 5, length: 4, tokenType: 1, tokenModifiers: 0 },
   { deltaLine: 3, deltaStartChar: 2, length: 7, tokenType: 2, tokenModifiers: 0 }
  1. Finally, the last step is to inline each of the 5 fields for a token in a single array, which is a memory friendly representation:
   // 1st token,  2nd token,  3rd token
   [  2,5,3,0,3,  0,5,4,1,0,  3,2,7,2,0 ]
Updating tokens

Instead of always returning all the tokens in a file, it is possible for a DocumentSemanticTokensProvider to implement this method (updateSemanticTokens) and then return incremental updates to the previously provided semantic tokens.


How tokens change when the document changes

Let's look at how tokens might change.

Continuing with the above example, suppose a new line was inserted at the top of the file.
That would make all the tokens move down by one line (notice how the line has changed for each one):

   { line: 3, startChar:  5, length: 3, tokenType: "property", tokenModifiers: ["private", "static"] },
   { line: 3, startChar: 10, length: 4, tokenType: "type",     tokenModifiers: [] },
   { line: 6, startChar:  2, length: 7, tokenType: "class",    tokenModifiers: [] }

The integer encoding of the tokens does not change substantially because of the delta-encoding of positions:

   // 1st token,  2nd token,  3rd token
   [  3,5,3,0,3,  0,5,4,1,0,  3,2,7,2,0 ]

It is possible to express these new tokens in terms of an edit applied to the previous tokens:

   [  2,5,3,0,3,  0,5,4,1,0,  3,2,7,2,0 ] // old tokens
   [  3,5,3,0,3,  0,5,4,1,0,  3,2,7,2,0 ] // new tokens

   edit: { start:  0, deleteCount: 1, data: [3] } // replace integer at offset 0 with 3

Furthermore, let's assume that a new token has appeared on line 4:

   { line: 3, startChar:  5, length: 3, tokenType: "property", tokenModifiers: ["private", "static"] },
   { line: 3, startChar: 10, length: 4, tokenType: "type",      tokenModifiers: [] },
   { line: 4, startChar:  3, length: 5, tokenType: "property", tokenModifiers: ["static"] },
   { line: 6, startChar:  2, length: 7, tokenType: "class",    tokenModifiers: [] }

The integer encoding of the tokens is:

   // 1st token,  2nd token,  3rd token,  4th token
   [  3,5,3,0,3,  0,5,4,1,0,  1,3,5,0,2,  2,2,7,2,0, ]

Again, it is possible to express these new tokens in terms of an edit applied to the previous tokens:

   [  3,5,3,0,3,  0,5,4,1,0,  3,2,7,2,0 ]               // old tokens
   [  3,5,3,0,3,  0,5,4,1,0,  1,3,5,0,2,  2,2,7,2,0, ]  // new tokens

   edit: { start: 10, deleteCount: 1, data: [1,3,5,0,2,2] } // replace integer at offset 10 with [1,3,5,0,2,2]

NOTE: When doing edits, it is possible that multiple edits occur until VS Code decides to invoke the semantic tokens provider.
NOTE: If the provider cannot compute SemanticTokensEdits, it can "give up" and return all the tokens in the document again.
NOTE: All edits in SemanticTokensEdits contain indices in the old integers array, so they all refer to the previous result state.

@alexdima alexdima assigned alexdima and unassigned jrieken Dec 5, 2019
@alexdima alexdima added this to the November 2019 milestone Dec 5, 2019
@alexdima alexdima added the plan-item VS Code - planned item for upcoming label Dec 5, 2019
@alexdima alexdima modified the milestones: November 2019, December 2019 Dec 5, 2019
@alexdima alexdima added the semantic-tokens Semantic tokens issues label Dec 9, 2019
@Vigilans
Copy link
Member

Since microsoft/vscode-languageserver-node#367 proposed the protocol to be server pushing notification to the client, would it be fine for SemanticTokensProvider to provide an event to be registered, just like what TreeDataProvider do? e.g.:

treeDataProvider.onDidChangeTreeDataEvent.fire(item)

vs

semanticTokenProvider.onDidChangeSemanticTokensEvent.fire(document)

In such case, the client is capable of requesting vscode to re-render the semantic highlighting once notification from server is received.

@Vigilans
Copy link
Member

Vigilans commented Jan 13, 2020

Hi, I've implemented a provider prototype combining java language server and semantic tokens api in vscode-java. Just clone it and run the extension in insider version then the provider is ready for test.

Demo:
deepin-screen-recorder_code - insiders_20200113192405
deepin-screen-recorder_Select area_20200114125052
deepin-screen-recorder_code - insiders_20200113192634

@matklad
Copy link

matklad commented Jan 14, 2020

Am I correct that SemanticTokensRequestOptions.ranges allows the client to specify the "viewport" / subset of the document, for which highlights are requested?

So, for example, if I "goto definition" to the new file, the editor will first ask to only color the small visible part of the document (for faster response), and then it'll ask the second query to color the whole file (so that scrolling doesn't show boring colorless text)?

This is a super-important optimization, 👍

Also I'd like to suggest that maybe, if we have ranges, we don't need SemanticTokensEdits at all? I imagine this can work like this:

  • an editor always asks to highlight the visible range of the document (which is O(1) worth of data)
  • when a file is opened for the first time, the editor asks to highlight the whole file, in background, such that scrolling shows colored code
  • on the editor side we maintain this global highlighting, by adjusting ranges of the highlights. Ie, if the user types at the start of the document, we only re-ask about first screen-full of lines of the document, and just shift the ranges for the rest of the document. If the user goes to the end of the document, we immediately show them cached shifted highlighting (which might be slightly off), and also ask the server to re-highlight the end region.
  • if an edit happens which invalidates the whole of the cached highligting map, we re-ask the server to highlight the whole file.

I don't like the presence of SemanticTokensEdits for two reasons:

  • this is a statefull bit, which would be annoying to synchronize between the client and the server,
  • I fear that folks would focus on "incrementally highlight the whole file" use case, which I believe is fundamentally less efficient than "from-scratch highlight of the visible range". It is less efficient because, although you can send final highlighting as edits, you typically still would do O(file len) processing to compute those edits, even in many happy cases (for example, if you declare a new global variable, you need to re-highlight all function bodies). In some pathological cases, even the diff itself would be O(file len) (adding or removing a single " in a language with multilne strings). In contrast, the viewport approach is always roughly O(1) processing, because you can only fit so many letters on the screen.

@rcjsuen
Copy link
Contributor

rcjsuen commented Jan 31, 2020

How do I flag escape characters and/or escaped characters? I do not see it in the current list of token types or token modifiers.

registerTokenType('comment', nls.localize('comment', "Style for comments."), [['comment']]);
registerTokenType('string', nls.localize('string', "Style for strings."), [['string']]);
registerTokenType('keyword', nls.localize('keyword', "Style for keywords."), [['keyword.control']]);
registerTokenType('number', nls.localize('number', "Style for numbers."), [['constant.numeric']]);
registerTokenType('regexp', nls.localize('regexp', "Style for expressions."), [['constant.regexp']]);
registerTokenType('operator', nls.localize('operator', "Style for operators."), [['keyword.operator']]);
registerTokenType('namespace', nls.localize('namespace', "Style for namespaces."), [['entity.name.namespace']]);
registerTokenType('type', nls.localize('type', "Style for types."), [['entity.name.type'], ['support.type'], ['support.class']]);
registerTokenType('struct', nls.localize('struct', "Style for structs."), [['storage.type.struct']], 'type');
registerTokenType('class', nls.localize('class', "Style for classes."), [['entity.name.type.class']], 'type');
registerTokenType('interface', nls.localize('interface', "Style for interfaces."), [['entity.name.type.interface']], 'type');
registerTokenType('enum', nls.localize('enum', "Style for enums."), [['entity.name.type.enum']], 'type');
registerTokenType('typeParameter', nls.localize('typeParameter', "Style for type parameters."), [['entity.name.type', 'meta.type.parameters']], 'type');
registerTokenType('function', nls.localize('function', "Style for functions"), [['entity.name.function'], ['support.function']]);
registerTokenType('member', nls.localize('member', "Style for member"), [['entity.name.function.member'], ['support.function']]);
registerTokenType('macro', nls.localize('macro', "Style for macros."), [['entity.name.other.preprocessor.macro']], 'function');
registerTokenType('variable', nls.localize('variable', "Style for variables."), [['variable'], ['entity.name.variable']]);
registerTokenType('constant', nls.localize('constant', "Style for constants."), [['variable.other.constant']], 'variable');
registerTokenType('parameter', nls.localize('parameter', "Style for parameters."), [['variable.parameter']], 'variable');
registerTokenType('property', nls.localize('property', "Style for properties."), [['variable.other.property']], 'variable');
registerTokenType('label', nls.localize('labels', "Style for labels. "), undefined);
// default token modifiers
tokenClassificationRegistry.registerTokenModifier('declaration', nls.localize('declaration', "Style for all symbol declarations."), undefined);
tokenClassificationRegistry.registerTokenModifier('documentation', nls.localize('documentation', "Style to use for references in documentation."), undefined);
tokenClassificationRegistry.registerTokenModifier('static', nls.localize('static', "Style to use for symbols that are static."), undefined);
tokenClassificationRegistry.registerTokenModifier('abstract', nls.localize('abstract', "Style to use for symbols that are abstract."), undefined);
tokenClassificationRegistry.registerTokenModifier('deprecated', nls.localize('deprecated', "Style to use for symbols that are deprecated."), undefined);
tokenClassificationRegistry.registerTokenModifier('modification', nls.localize('modification', "Style to use for write accesses."), undefined);
tokenClassificationRegistry.registerTokenModifier('async', nls.localize('async', "Style to use for symbols that are async."), undefined);
tokenClassificationRegistry.registerTokenModifier('readonly', nls.localize('readonly', "Style to use for symbols that are readonly."), undefined);

@mattacosta
Copy link
Contributor

mattacosta commented Feb 17, 2020

semantictoken

Note: This is 1-2 weeks old. Some issues may no longer be relevant.

General feedback:

After trying out the little experiment above, here are some issues I ran across:

  1. Right off the bat, vscode-languageserver wasn't exporting some types I needed from the main package. I had to import directly from vscode-languageserver/lib/sematicTokens.proposed (and vscode-languageserver/lib/progress if I had implemented that part). Not sure if that was intentional or not, but either way there's going to be a conflict between the two interfaces named SemanticTokens once the other moves out of the Proposed namespace.
  2. The semantic-tokens-sample extension could've been more helpful. Specifically it could have:
    • Mentioned that editor.semanticHighlighting.enable was disabled by default. That was a nice facepalm moment.
    • Used package.json styling, which I only accidentally came across. It's only linked to as a "background information" link there. Perhaps the first post here could include a link to the wiki page as well?
  3. The semanticTokenModifiers contribution has unexpected behavior. For a while, my server was returning tokens with several modifiers that were not defined in that contribution. Rather than ignoring the modifier and possibly reverting to the TextMate style, it used the style of the last modifier listed there. Basically, in the image above, the first three parameters were the same color as $p7 and it was annoying trying to figure out why it was being matched even though the inspector showed the correct type and modifier.
  4. For the semanticTokenStyleDefaults contribution:
    • The description for the fontStyle property should probably mention that multiple styles should be delimited by spaces.
    • The description for the selector property could use clarification (what to put there, how are they "selected"). Since I was guessing at the time, I initially thought it was for TextMate selectors.

API thoughts:

  • I don't get why we're using [line, char, length, ...] that then conditionally switches to [unused, delta, length, ...] (at least for tokens on the same line) instead of a [delta, length, ...] representation everywhere (where the first token is relative to the start of the document). That should save memory and reduce the amount of data to transfer right?
  • Speaking of deltas, since they are relative to the end of the previous token, there can't be any overlapping ranges. I don't see this being an issue with everyday languages, but are we sure that it won't bite host languages that also embed other languages (especially languages that provide a foreign function interface and put said code into strings)?
  • "A token cannot be multiline." This presents two issues:
    1. For the vast majority of languages, lines are irrelevant; that is, they do not have weird one-line-only tokenization strategies like TM grammars. It just makes sense to let each one determine the kind and length of each token rather than force every language to create workarounds.
    2. Should this be a requirement however, you will have to provide a better definition of a line. Lines are ultimately a UI-level construct, so not all languages will have the same definition. For example, should the language use the Unicode definition of a line terminator (VT,FF,CR,LF,CRLF,LS,PS), or a smaller subset (CR,LF,CRLF,LS,PS), or how about an even smaller subset (LF,CRLF).

@alexdima
Copy link
Member Author

@mattacosta Thank you for your feedback. Ping @dbaeumer for the language server and @aeschli for the token types / styling specific feedback.

Regarding the API specific feedback:

  • this does not switch to [unused, delta, length]. The format is: [deltaLine, deltaChar, length, tokenType, tokenModifiers]. So if a semantic token starts on row 10, character 5, and the next one starts on row 15, character 3, the second one will use [5, 3, ...]. But I agree the current format is not 100% optimal, given that 32 bits are reserved for deltaLine and another 32 bits are reserved for deltaChar and those bits will mostly have 0s in them. I considered using a variable-byte encoding strategy, kind of like UTF8 but for integers with a continuation bit, but I thought that might be too complex for the extra gains. For a file with 100.000 tokens, the cost is today 500.000 32 bit integers aka 2MB. This could be maybe reduced by an additional 40-50% using variable byte encoding of numbers, but I believe it is good enough.
  • the choice to have tokens not overlap is intentional. Overlapping tokens would have been a pain to work with, as binary search would have been difficult, adjusting them after an edit would have been more difficult, and in general the delta encoding strategy might not have worked too well. Today, the editor ends up storing the tokens in almost the same format they are provided (it just reduces token type + token modifiers to a single 32 bit integer which has the styling applied). The continuous non-overlapping ranges are therefore kept in a simple array and they can be adjusted relatively straight-forward. We have explored semantic tokens for HTML and at least there we didn't hit any problem with overlapping ones because HTML stops where JS or CSS begin, etc.
  • the multiline decision has some back-story and is vscode specific. For semantic tokens, an (offset;length) kind of representation would have been more natural. Because then (deltaOffset;length) for subsequent lines can be more easily explained as the gap between the previous token and the current one in character count. But we have decided to be consistent with the existing vscode APIs which are always using a (line;char) coordinate system. We then would have had 4 integers to represent a token position, like (startLine,startChar,endLine,endChar). We felt that this is wasteful and considered that in the vast majority of languages, semantic tokens (not syntactic tokens) are very likely to be single line. This also worked with the fact that the editor has quite a bunch of code that can deal with single-line tokens and adjust them after edits, etc. (the tokens coming in from the TM grammar). Also, it's not like multiline tokens cannot be expressed, they just cost one token per line, so a multiline token of 3 lines needs 3 tokens to be expressed. So we added the extra limitation that a semantic token can only be on a single line.
  • the line endings problem is a general problem for any VS Code API, i.e. for all the existing ones as well. All the APIs talk about (line;col) positions. VS Code uses \r\n, \n and \r as line endings.

Thanks again for the great feedback!

@dbaeumer
Copy link
Member

@mattacosta

Regardind

Right off the bat, vscode-languageserver wasn't exporting some types I needed from the main package. I had to import directly from vscode-languageserver/lib/sematicTokens.proposed (and vscode-languageserver/lib/progress if I had implemented that part). Not sure if that was intentional or not, but either way there's going to be a conflict between the two interfaces named SemanticTokens once the other moves out of the Proposed namespace.

Which type did you miss from semantic tokens ? If it is the builder it is exposed via ProposedFeatures.SemanticTokensBuilder

I will export WorkDoneProgress so that you don't need to get it from the progress file.

@rcjsuen
Copy link
Contributor

rcjsuen commented Feb 18, 2020

@dbaeumer Perhaps @mattacosta is referring to this.

image

I ended up doing this instead to move on.

import { SemanticTokens } from 'vscode-languageserver-protocol/lib/protocol.sematicTokens.proposed';

@dbaeumer
Copy link
Member

Is Proposed.SemanticTokens not working for you? See https://github.com/microsoft/vscode-languageserver-node/blob/master/protocol/src/main.ts#L77

@rcjsuen
Copy link
Contributor

rcjsuen commented Feb 18, 2020

@dbaeumer Yeah, I saw that. But no, it doesn't.

image

I mean, based on the syntax highlighting I'm not sure we should be doing Proposed.SemanticTokens in the first place. :P

[10:55:08 AM] File change detected. Starting incremental compilation...

src/main.ts:10:18 - error TS1005: ',' expected.

10 import { Proposed.SemanticTokens } from 'vscode-languageserver-protocol';
                    ~

[10:55:09 AM] Found 1 error. Watching for file changes.

I don't really understand TypeScript namespaces and modules well though so... 🤷‍♂

@mattacosta
Copy link
Contributor

mattacosta commented Feb 18, 2020

@dbaeumer No, not the builder. I needed the interface exposing the handler methods.

export interface SemanticTokens {
    semanticTokens: {
        on(handler: ServerRequestHandler<Proposed.SemanticTokensParams, Proposed.SemanticTokens, Proposed.SemanticTokensPartialResult, void>): void;
        onEdits(handler: ServerRequestHandler<Proposed.SemanticTokensEditsParams, Proposed.SemanticTokensEdits | Proposed.SemanticTokens, Proposed.SemanticTokensEditsPartialResult | Proposed.SemanticTokensEditsPartialResult, void>): void;
        onRange(handler: ServerRequestHandler<Proposed.SemanticTokensRangeParams, Proposed.SemanticTokens, Proposed.SemanticTokensPartialResult, void>): void;
    };
}

The Proposed.SemanticTokens type does work for me, but it resolves to:

export interface SemanticTokens {
    resultId?: string;
    data: number[];
}

The former is required because the connection only has the semanticTokens property while the type is inferred. As soon as the connection is passed elsewhere or saved in a property, that inference is lost.

let connection = createConnection(ProposedFeatures.all, ...);
connection.language.semanticTokens.on(...);  // Still inferred as `Language & SemanticTokens`

let provider = new SemanticProvider(connection);

class SemanticProvider {
  // TypeScript needs an explicit intersection type for `connection.language` either here in
  // the parameter list for the generic version or later in a type cast.
  constructor(connection: Connection) {
    // connection.language.semanticTokens.on(...);
    //                     ~~~~~~~~~~~~~~   does not exist

    (connection.language as Language & SemanticTokens).semanticTokens.on(...);
  }
}

@mattacosta
Copy link
Contributor

@rcjsuen So close! You're supposed to just import Proposed. 😄

import { Proposed } from 'vscode-languageserver';

function handler(...): Proposed.SemanticTokens {
  // do stuff
}

@dbaeumer
Copy link
Member

dbaeumer commented Feb 19, 2020

@mattacosta I see the problem but I would like to propose a different approach (which I use in these cases). You can always capture an inferred type in TypeScript using the typeof operator. So you can do the following

type ConnectionType = typeof connection;
class SemanticProvider {
  constructor(connection: ConnectionType) {
  }
}

or if you don't want to capture it you can inline it as well

class SemanticProvider {
  constructor(conn: typeof connection) {
  }
}

would that work for you ?

@mattacosta
Copy link
Contributor

I don't believe so. My connection isn't accessible like that in reality. It's more like:

abstract class Server {
  protected readonly connection: typeof ???;  // no value to use here
  constructor() {
    this.connection = this.createConnection();
    // setup logging, init handlers, etc
  }
  protected abstract createConnection(): ???;
  // ...other methods that use the connection property...
}

class SemanticServer extends Server {
  constructor() {
    super();
    this.semanticTokenProvider = new SemanticTokenProvider(this.connection);
  }
}

class SyntaxServer extends Server {
  // took a page from typescript; folding, selectionrange, etc
}

@simolus3
Copy link

simolus3 commented Mar 7, 2020

I'm trying to integrate semantic tokens in the Dart extension. Overall it works pretty well, but one thing I noticed is that nested regions don't appear to work. In Dart, this comes up quite often with interpolation in string literals or in documentation comments. Here the entire string literal would be a string token, with other tokens appearing in between.
For now, we're splitting tokens to remove overlapping regions. But since I couldn't find anything on nested regions in the documentation, I'm wondering if this is intentional or a restriction that could be loosened.

@kjeremy
Copy link

kjeremy commented Mar 7, 2020

@simolus3 we have the same issue in rust-analyzer here: rust-lang/rust-analyzer#3447

@mattacosta
Copy link
Contributor

@simolus3 This was alexdima's response to my comment a few weeks ago on this:

the choice to have tokens not overlap is intentional. Overlapping tokens would have been a pain to work with, as binary search would have been difficult, adjusting them after an edit would have been more difficult, and in general the delta encoding strategy might not have worked too well. Today, the editor ends up storing the tokens in almost the same format they are provided (it just reduces token type + token modifiers to a single 32 bit integer which has the styling applied). The continuous non-overlapping ranges are therefore kept in a simple array and they can be adjusted relatively straight-forward. We have explored semantic tokens for HTML and at least there we didn't hit any problem with overlapping ones because HTML stops where JS or CSS begin, etc.

Also, for both interpolated strings and documentation comments, those can (and really should) be done on the TM grammar side of things. I see no reason to change the semantic highlighting API for them unless there's something really weird about Dart/Rust.

@alexdima alexdima added feature-request Request for new features or functionality and removed plan-item VS Code - planned item for upcoming labels Mar 11, 2020
@ryanbrandenburg
Copy link

@alexdima, I see that this issue is in a "March 2020" milestone. Should we take that to mean that it's targeted at being available outside of the proposedAPI system by the end of March? We're very interested in using this feature in our Razor extension, but understand that

They are subject to change, only available in Insiders distribution and cannot be used in published extensions

@JoeRobich
Copy link
Member

JoeRobich commented Mar 16, 2020

@alexdima, I experimented with adding semantic token support to the C# extension and am very happy with the results. One thing that I would like to bring up is that in order to get the full Visual Studio experience the Dark+ and Light+ themes need some tweaks. I have added the tweaks I've made to my PR - dotnet/vscode-csharp#3667. Would the team be open to incorporating these changes into the the released themes?

I also noticed that the keyword token is using the 'keyword.control' scope. Perhaps it would be worth having both in the default tokens since the distinction is such a big part of the VS Code aesthetic.

@dbaeumer
Copy link
Member

@aeschli can you comment on #86415 (comment)

@aeschli
Copy link
Contributor

aeschli commented Mar 16, 2020

@JoeRobich Please file a separate issue with your theme change suggestions so we can go through them together. We are very careful not to make changes to the default themes, but let's discuss this in the new issue.

@JoeRobich
Copy link
Member

@aeschli I opened #92965 to track the theme change request

@alexdima
Copy link
Member Author

A heads up if anyone was using SemanticTokensBuilder, there was a breaking change in the return type of build():

	/**
	 * A semantic tokens builder can help with creating a `SemanticTokens` instance
	 * which contains delta encoded semantic tokens.
	 */
	export class SemanticTokensBuilder {

		constructor(legend?: SemanticTokensLegend);

		/**
		 * Add another token.
		 *
		 * @param line The token start line number (absolute value).
		 * @param char The token start character (absolute value).
		 * @param length The token length in characters.
		 * @param tokenType The encoded token type.
		 * @param tokenModifiers The encoded token modifiers.
		 */
		push(line: number, char: number, length: number, tokenType: number, tokenModifiers: number): void;

		/**
		 * Add another token. Use only when providing a legend.
		 *
		 * @param range The range of the token. Must be single-line.
		 * @param tokenType The token type.
		 * @param tokenModifiers The token modifiers.
		 */
		push(range: Range, tokenType: string, tokenModifiers?: string[]): void;

		/**
		 * Finish and create a `SemanticTokens` instance.
		 */
		build(resultId?: string): SemanticTokens;
	}

@ghost
Copy link

ghost commented Apr 25, 2020

https://github.com/microsoft/vscode-languageserver-node/blob/e5d7ad881b1a51b486e3f0e4aa0fbc25dad2be58/protocol/src/protocol.semanticTokens.proposed.ts#L18 < I wonder, doesn't this lack an import token type? I think most languages have a way to import a module, or to require/include a module, and I think many editors would want to be able to color these import statements differently than just any arbitrary unknown keyword. And what is a language server supposed to return for modifiers that aren't known to the client, shouldn't there be some sort of misc modifier type alike to the keyword generic token? That seems like a common scenario to run into

@alexdima
Copy link
Member Author

@etc0de I think the best is to open a new issue

@ghost
Copy link

ghost commented Apr 27, 2020

@alexdima ok I made one here: microsoft/language-server-protocol#968

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
api-finalization feature-request Request for new features or functionality on-testplan semantic-tokens Semantic tokens issues
Projects
None yet
Development

No branches or pull requests

13 participants