Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use split_parser branch for markdown grammar #3108

Merged

Conversation

MDeiml
Copy link
Contributor

@MDeiml MDeiml commented Jul 19, 2022

See #3072.

Only a draft because markdown_inline grammar is not injected properly, but to me it seems like this is a problem with helix and not the language config? After a few experiments I think the grammar is injected, but highlighting is still not done according to the injected language. I'm not very sure though.

Another weird behavior is with fenced code blocks. They seem to work to some degree, but if you enter something like

```rust
let foo = "bar";
```

then let is highlighted correctly while "bar" is not highlighted at all.

@MDeiml
Copy link
Contributor Author

MDeiml commented Jul 21, 2022

Changing this to ready since I found a solution to the problems in #3129.

@MDeiml MDeiml marked this pull request as ready for review July 21, 2022 06:42
@MDeiml MDeiml force-pushed the use_split_markdown_grammar branch from cb4e6db to b7ee7af Compare July 26, 2022 15:17
@the-mikedavis
Copy link
Member

I tried this branch locally but I don't see any qualitative improvements to performance (we'll need some measurements like from #940 to be definitive). Specifically I see the some slowness with incremental updates while editing (for example typing quickly in insert mode) on a reasonably large and complicated markdown document like the changelog. Is this branch of the grammar(s) more correct than master? This doesn't feel like a regression so I'm certainly not against merging it but I'm curious what the improvement is of splitting the parsers?

@MDeiml
Copy link
Contributor Author

MDeiml commented Jul 28, 2022

I'm no longer maintaining the master branch, I just kept it because some projects seem to depend on it.

It was not really possible to implement a correct enough markdown parser in a single grammar. The specs also recommend to parse markdown in 2 passes. Now after splitting it into 2 grammars they work closer to that recommendation.

They're more correct now and with some optimizations to incremental parsing I think they'll also be faster. (But that'll be topic of future PRs).

@MDeiml
Copy link
Contributor Author

MDeiml commented Jul 28, 2022

Specifically I think that tree-sitter reparses all injected ranges if they contain "conflicts" no matter if they changed or not. That means that parsing is linear with the number of injected ranges. I think we could just not reparse ranges if they did not change at all (parsing should be deterministic with regards to the input anyways).

@the-mikedavis
Copy link
Member

I'm not very familiar with the changes but I think archseer might've covered that when implementing combined injections: 6728e44...7c9ebd0

We don't use the stock tree-sitter-cli highlighting code since it isn't incremental

@MDeiml
Copy link
Contributor Author

MDeiml commented Jul 28, 2022

Just looking at the code that doesn't seem to be the case, but I should do some more investigation.

@MDeiml
Copy link
Contributor Author

MDeiml commented Jul 28, 2022

It's related to this TODO:

// TODO: we should be able to avoid editing & parsing layers with ranges earlier in the document before the edit

Just going one step further and also avoid parsing layers that are after the edits

@the-mikedavis the-mikedavis added the S-waiting-on-pr Status: This is waiting on another PR to be merged first label Jul 30, 2022
@the-mikedavis
Copy link
Member

This works well for me locally but let's await further review/merging in #3129

@the-mikedavis the-mikedavis removed the S-waiting-on-pr Status: This is waiting on another PR to be merged first label Aug 3, 2022
@archseer
Copy link
Member

archseer commented Aug 5, 2022

Can you resolve the conflict? Looks good to merge otherwise 👍🏻

languages.toml Outdated Show resolved Hide resolved
@MDeiml MDeiml force-pushed the use_split_markdown_grammar branch 2 times, most recently from 0802082 to 840e2d7 Compare August 6, 2022 14:42
@MDeiml MDeiml force-pushed the use_split_markdown_grammar branch from 840e2d7 to c750fa3 Compare August 6, 2022 15:03
Copy link
Member

@the-mikedavis the-mikedavis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Works great! Thanks for working on this plus going the extra mile to add the new predicate :)

@the-mikedavis the-mikedavis merged commit ea04220 into helix-editor:master Aug 6, 2022
@MDeiml MDeiml deleted the use_split_markdown_grammar branch August 6, 2022 16:33
thomasskk pushed a commit to thomasskk/helix that referenced this pull request Sep 9, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants