Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Formatting of multi-line / end of line / block strings #10806

Open
maxRN opened this issue Aug 6, 2024 · 9 comments
Open

Formatting of multi-line / end of line / block strings #10806

maxRN opened this issue Aug 6, 2024 · 9 comments

Comments

@maxRN
Copy link
Contributor

maxRN commented Aug 6, 2024

Current Behavior

Running dune fmt turns multi-line strings into normal strings.

Example from cram tests:

In multi-line strings, newlines are escaped, but their syntax is not preserved.

  $ dune format-dune-file <<EOF
  > (echo "\> multi
  >       "\> line
  >       "\> string
  >       "\| string
  > )
  > 
  > (echo "\
  > multi
  > line
  > string
  > ")
  > EOF
  (echo "multi\nline\nstring\nstring\n")
  
  (echo "multi\nline\nstring\n")

Desired Behavior

Keep multi-line strings as is and don't turn them into normal strings when formatting.

In multi-line strings, newlines are escaped, but their syntax is not preserved.

  $ dune format-dune-file <<EOF
  > (echo "\> multi
  >       "\> line
  >       "\> string
  >       "\| string
  > )
  > 
  > (echo "\
  > multi
  > line
  > string
  > ")
  > EOF
  (echo
       "\> multi
       "\> line
       "\> string
       "\| string
  )
  
  (echo "multi\nline\nstring\n")

With the current implementation, there is not really any upside to using multi-line strings since when users call dune fmt they are immediately turned into normal strings again. This makes using them unpractical for maintaining longer text, e.g. project descriptions in dune-project files.

I think this change does not need to be versioned, because there is no downside to keeping the formatting of multi-line strings.

Side note about naming

Currently there are 3 different ways in the dune codebase that refer to multi-line strings:

  1. The docs call them "End of line" strings: https://dune.readthedocs.io/en/stable/reference/lexical-conventions.html#end-of-line-strings
  2. In the lexer they are referred to as Block_string
  3. In the cram test they are called multi-line strings

Personally I would prefer the term multi-line string, but most importantly there should be a single common name for them.

Implementation

I have started exploratory work on implementing this feature in #10780 but the above test (among others) fails and I wasn't sure what the right approach was.

@nojb
Copy link
Collaborator

nojb commented Aug 6, 2024

Hello @maxRN; thanks for trying to contribute a patch for this issue. I suggest you open a PR when you feel ready so that your patch can be reviewed by the dev team.

I wasn't sure what the right approach was.

I am not sure I understand your question; but the standard workflow is to "promote" (ie dune promote) any failing tests once the actual output matches the desired output. This will replace the output on disk which you can then commit.

@nojb
Copy link
Collaborator

nojb commented Aug 6, 2024

I suggest you open a PR when you feel ready

I had missed you had already opened the PR, sorry about that.

@maxRN
Copy link
Contributor Author

maxRN commented Aug 6, 2024

Hi! Sorry I should have been more clear. What I was trying to ask was: If I change the way dune fmt formats something, would that change need to be versioned, i.e. running dune version 3.17 on a 3.16 project formats one way and running in a project with version 3.17 formats in a different way. Like how it is also currently done for deciding whether to format dune files or not.

@nojb
Copy link
Collaborator

nojb commented Aug 6, 2024

If I change the way dune fmt formats something, would that change need to be versioned

I think so; people do not generally like having diffs appear after simply upgrading Dune. If memory serves, dune format-dune-file already supports versioning (via the --dune-version argument).

@mbarbin
Copy link
Contributor

mbarbin commented Sep 10, 2024

Hi, I wanted to add a +1 to this issue. For context, I just tried auto-formatting pp's dune-project with dune, but the format change to the description field felt like too much of a regression, so I haven't pursued.

@mbarbin
Copy link
Contributor

mbarbin commented Sep 10, 2024

I wonder if the "\n\" trick could be considered here: if you render newline by the sequence "\n\ + actual new line" I think you end up with something that doesn't look too bad in the source, and is stable across auto-formatting. As in:

"This is a lenghty\n\
description that spans\n\
multiple lines."

This may not look as nice as what's done in the linked PR, but perhaps this is simpler. Just a thought!

@nojb
Copy link
Collaborator

nojb commented Sep 10, 2024

This may not look as nice as what's done in the linked PR, but perhaps this is simpler. Just a thought!

Personally, I think this is a good idea; am not sure using \ to break long string literals is supported in Dune today (but could be added if it is not).

@maxRN
Copy link
Contributor Author

maxRN commented Sep 10, 2024

I think we should rather "fix" how the existing way of writing multi line strings work instead of introducing new syntax for having line breaks. I also wouldn't like to have to remember to type \n\ + actual newline if I could instead use the block string syntax and just type "enter" and have it work as expected.
Implementation wise it might not even be simpler, because now instead of a string like \> Some text we now have a string like Some text\n\ and in both cases we would need to clean the string before it can be used.

Currently the biggest blocker are block strings that are also templates: Internally they are represented as a list of strings or parts and the strings of the list are accessed directly so they need to be cleaned before use. I would like to introduce a new type for them and disallow the direct access to the strings, but that requires a bit of work.

@nojb
Copy link
Collaborator

nojb commented Sep 10, 2024

Implementation wise it might not even be simpler, because now instead of a string like \> Some text we now have a string like Some text\n\ and in both cases we would need to clean the string before it can be used.

I'm not sure I follow "would need to clean the string before it can be used": the conventional meaning of a backslash followed by a newline (as used in C, OCaml, and several other languages) is to remove the backslash and everything that follows until the next non-whitespace character from the input stream. So upon seeing

"foo\n\
bar"

the parser would return "foo\nbar"; no other "cleaning" is necessary. The pretty printer would then do the inverse transformation: whenever it encounter a newline, it could emit \n\ followed by a newline (unlike just \n like today).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants