Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Please keep the generated parser files in the repository #120

Closed
Eli-Zaretskii opened this issue Mar 18, 2023 · 21 comments · Fixed by #124
Closed

Please keep the generated parser files in the repository #120

Eli-Zaretskii opened this issue Mar 18, 2023 · 21 comments · Fixed by #124

Comments

@Eli-Zaretskii
Copy link

The upcoming Emacs 29 can be optionally compiled with the tree-sitter library, and comes with a user command that assists users in building and installing the grammar libraries they need. However, that command assumes the C/C++ parser and scanner source files are available in the corresponding repositories, so as to avoid requiring Emacs users to have too many development tools installed.

The recent removal of the generated parser files from the repository will cause that Emacs command to fail; users will be unable to build and install the tree-sitter grammar for SQL using your repository, unless they install quite a few more tools and build the library manually, something that is a significant obstacle for many Emacs users.

So please add those generated files back to the repository. The Emacs user community will thank you.

Thanks in advance.

@matthias-Q
Copy link
Collaborator

We moved the generated files to another branch gh-pages can you configure specify a git commit hash (like helix) or a specific branch (like nvim-treesitter) to fetch the source artifacts?

@Eli-Zaretskii
Copy link
Author

can you configure specify a git commit hash (like helix) or a specific branch (like nvim-treesitter) to fetch the source artifacts?

The Emacs command in question is not pre-configured. It prompts the user for the URL of the repository and uses its default branch. Asking users to specify a special branch, let alone a commit SHA, would be a major inconvenience for users. I suspect most of them will not have the information available to do so.

So this is technically possible, but not really practical, from the Emacs users POV.

@matthias-Q
Copy link
Collaborator

Ok, I understand. I was checking out the emacs code and I saw the option on one subfunction to specify a revision.

The reason we moved the artifacts into another branch was because these are autogenerated files and they caused merge conflicts all the time. Hence, slowing down development.

@DerekStride whats your take on that?

@Eli-Zaretskii
Copy link
Author

The command aI was talking about is treesit-install-language-grammar, you can find it in the file treesit.el in the Emacs repository.

Thank you for reconsidering this issue.

@igniscyan
Copy link

Is this also an issue in neovim? my highlighting looks very off and was going to make an issue, unsure if I need the grammar files for this to work if I'm using TSInstall

@matthias-Q
Copy link
Collaborator

Fetching the source artifacts is not an issue in neovim.

Highlighting is done in nvim-treesitter. May I ask what your issue is @igniscyan ? I have recently commited an overhaul of the highlight queries in nvim-treesitter. So there have been some changes and I definitely looks different now.

@matthias-Q
Copy link
Collaborator

@Eli-Zaretskii this function references a treesit-language-source-alist Looks like a revision can be specified there. I was not able to find this in the repo.

@igniscyan
Copy link

igniscyan commented Mar 19, 2023

Fetching the source artifacts is not an issue in neovim.

Highlighting is done in nvim-treesitter. May I ask what your issue is @igniscyan ? I have recently commited an overhaul of the highlight queries in nvim-treesitter. So there have been some changes and I definitely looks different now.

Sure! I just switched to nvim so not 100% sure if there's something going wrong on my end, but healthcheck makes it look as though everything is fine.

image

The above is a screen of the highlighting in VSCode, which looks correct.

image

And this is the same block being highlighted with nvim-treesitter.

Edit: for context, this is just a large insert statement.

@matthias-Q
Copy link
Collaborator

@igniscyan I have recreated two lines on my machine and the MG on 400MG is parsed as a string and the percentage on '5%' as well.

Maybe you have some other plugin the overrides highlight groups or does injection of units.

Screenshot_20230319_111040

@igniscyan
Copy link

@igniscyan I have recreated two lines on my machine and the MG on 400MG is parsed as a string and the percentage on '5%' as well.

Maybe you have some other plugin the overrides highlight groups or does injection of units.

Screenshot_20230319_111040

Noticing I have a lot of errors towards the beginning of the file as well:

image

image

Any idea what's causing this/how to go about debugging this?

@matthias-Q
Copy link
Collaborator

It is getting off topic now. The issue is that we are currently not parsing the CURRENT_TIMESTAMP function. I will create an issue and see what I can do.

@Eli-Zaretskii
Copy link
Author

@Eli-Zaretskii this function references a treesit-language-source-alist Looks like a revision can be specified there.

The list in treesit-language-source-alist starts empty and is populated as the user installs the grammars he/she needs, as side effect of the installation.

And even if that list was populated OOTB, it is impractical to expect Emacs to track all the changes in the individual repositories. E.g., tomorrow you could decide that the branch which holds the generated parser files will be called differently, and the Emacs users will be non the wiser. By contrast, the default branch always exists and is always kept in good shape.

@matthias-Q
Copy link
Collaborator

This is actually exactly how neovim and helix are tracking different parsers (by commit hash). By just looking at the default branch you are also not safe with regards to breaking changes.

@Eli-Zaretskii
Copy link
Author

This is actually exactly how neovim and helix are tracking different parsers (by commit hash).

I'm not familiar with how and at what regularity is neovim released, but with Emacs release schedule of roughly 2 years between major releases and 6 months between minor bug-fix releases, it is practically impossible to keep the list up-to-date if it must track individual commits. We don't populate the list of the repositories of the grammar libraries with specific URLs for that very reason: we don't want to have to keep the list up-to-date at all times, and with our release schedule we won't be able to anyway.

By just looking at the default branch you are also not safe with regards to breaking changes.

We would ideally try to use official releases, which would be a much more dependable and stable target, but unfortunately many tree-sitter grammar libraries don't make official releases, and that includes tree-sitter-sql, AFAICT.

@matthias-Q
Copy link
Collaborator

matthias-Q commented Mar 19, 2023

The grammars are not part of neovim, just the capabilities of using tree-sitter as a parser. the grammars themself are added by a plugin called nvim-treesitter and this uses a lockfile.
For helix, the lockfile is part of the repository and the grammar version are then used while building the binary.

@Eli-Zaretskii
Copy link
Author

With Emacs, the grammars aren't part of the the package, either. Users are supposed to install the grammar libraries themselves, using the commands, tools, and directories appropriate to their platforms. The difference, AFAIU, is that the Emacs command that downloads, builds, and installs a grammar library is not a separate plugin, and is therefore released and updated together with Emacs.

@matthias-Q
Copy link
Collaborator

Yes, that is exactly how nvim-treesitter and helix are loading the grammars.

@mishazharov
Copy link

My two cents: I switched to this treesitter grammar because https://github.com/m-novikov/tree-sitter-sql removed the autogenerated parser.c from their repo. The gh-pages branch might be sufficient to prevent breakage but I haven't tested it yet

@DerekStride
Copy link
Owner

it is impractical to expect Emacs to track all the changes in the individual repositories. E.g., tomorrow you could decide that the branch which holds the generated parser files will be called differently, and the Emacs users will be non the wiser.

It prompts the user for the URL of the repository and uses its default branch. Asking users to specify a special branch, let alone a commit SHA, would be a major inconvenience for users. I suspect most of them will not have the information available to do so.

It looks like the branch / sha can be set interactively via: treesit--install-language-grammar-build-recipe. Emacs seems to have the ability to ask for the branch to use.

IMO the problem here isn't that emacs can't use a different branch it's that there's no good solution for emac users to know the installation instructions for each grammar they use and know when those change. Helix solves this by controlling the list themselves and neovim solves it via a community maintained plugin.

There could be space for an Emacs community plugin that maintains a list of parser installation instructions. This could be a predefined treesit-language-source-alist that can be downloaded. I'd be happy to help keep the sql parser entry up-to-date, @matthias-Q & I already open PRs to keep the nvim-treesitter lockfile up-to-date.

I'd also be open to including some installation instructions in the README.md or on a project wiki with notes on how to install it for emacs.

However, I don't think enforcing a project layout for tree-sitter grammars is a good solution, there are many different grammars with different maintainers and use cases. For example, this grammar for markdown MDeiml/tree-sitter-markdown defines two complementary parsers that are meant to be used together. I'd also like to note that this subdirectory pattern is also supported by treesit--install-language-grammar-build-recipe.

While I agree that it is inconvenient for parsers to diverge from the "standard" project layout, we do provide an alternative that's compatible with the existing tools provided by emacs.

@Eli-Zaretskii
Copy link
Author

IMO the problem here isn't that emacs can't use a different branch it's that there's no good solution for emac users to know the installation instructions for each grammar they use and know when those change.

Exactly. This Emacs command is supposed to help users who know little about Git repositories, let alone about the branches of each one of them.

@DerekStride
Copy link
Owner

Exactly. This Emacs command is supposed to help users who know little about Git repositories, let alone about the branches of each one of them.

It seems we agree on the problem and I've provided sufficient reasoning that the problem lies outside of the structure of this project and instead is a result of a gap in the emacs ecosystem.

I've opened a PR #124 to update the README with installation instructions on how to compile the parser, and where to find & download the parser files. I'd appreciate feedback on the PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants