Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wrong output for ksc-json-output in case of relative paths under Windows. #507

Closed
ams-tschoening opened this issue Apr 3, 2017 · 11 comments

Comments

@ams-tschoening
Copy link

I upgraded ksv to get support for opaque types and with it came a changed interaction with ksc, the new ksc-json-output is used now. This leads to compile time errors in ksv for me, because your generated JSON doesn't seem to handle paths and their different representation in Java under different OS correctly.

C:/[...]/visualizer_main.rb:68:in block (2 levels) in compile_formats': undefined method []' for nil:NilClass (NoMethodError)
from C:/Program Files (x86)/kaitai_struct/visualizer/lib/kaitai/struct/visualizer/visualizer_main.rb:64:in each' from C:/Program Files (x86)/kaitai_struct/visualizer/lib/kaitai/struct/visualizer/visualizer_main.rb:64:in each_with_index'
from C:/Program Files (x86)/kaitai_struct/visualizer/lib/kaitai/struct/visualizer/visualizer_main.rb:64:in `block in compile_formats'

      fns.each_with_index { |fn, idx|
        puts "... processing #{fn} #{idx}"

        log_fn = log[fn]
        if log_fn['errors']
{"record\fmt_clt_recs.ksy": {"firstSpecName": "fmt_clt_recs","output": {"ruby": {"fmt_clt_recs": {"topLevelName": "FmtCltRecs","files": [{"fileName": "fmt_clt_recs.rb"}]},"fmt_oms_rec": {"topLevelName": "FmtOmsRec","files": [{"fileName": "fmt_oms_rec.rb"}]}}}},"record\fmt_oms_rec.ksy": {"firstSpecName": "fmt_oms_rec","output": {"ruby": {"fmt_oms_rec": {"topLevelName": "FmtOmsRec","files": [{"fileName": "fmt_oms_rec.rb"}]}}}}}

There are two problems here:

  1. / to \

I use a command line similar to the following in Eclipse to start ksv:

[...]ksv "${workspace_loc}/[...]/data.bin" "record/fmt_clt_recs.ksy" "record/fmt_oms_rec.ksy"

With the current ksc this leads to \ instead of / in the generated JSON object, because Java internally uses a platform dependent path separator most of the times. The problem is that ksv sees / itself and can't find its own paths in the JSON anymore. I used / in favour of \ simply because I had trouble using \ in the past and / seems to be more common in Ruby, this would work on non-Windows etc.

  1. Improper escaping of \

\ needs to be escaped in JSON, that's why changing / to \ in my invocation above doesn't fix the problem: JSON contains \f in the end instead of \\f which is what Ruby internally would have as the String key.

In my opinion, the best thing to do is simply to not generate \ in JSON at all, so that only one normalized path separator is used across platforms and languages:

  // FIXME: do proper string handling
  def stringToJson(str: String): String =
    "\"%s\"".format(str.replaceAll("\\\\", "/"))

This is somewhat hacky as well of course, but fixes the escape problem and makes interacting with ksv easier.

@GreyCat
Copy link
Member

GreyCat commented Apr 3, 2017

Thanks for reporting this! I've applied slightly different fix that reuses string output from BaseTranslator, so it should escape quotes, backslashes, etc, properly.

Probably it should be enough for Windows as well. As far as I understand, Ruby on Windows would do fine with backslashes as well. Can't check it, though, my Windows VM are not available at the moment. Could you check if this fix would work for you?

@ams-tschoening
Copy link
Author

If I use \ in my relative paths, your change works, with / it doesn't. The problem is that the mentioned invocation with / works on all platforms and is part of some versioned Eclipse launch configuration. \ will break that configuration for my colleagues using Linux.

Don't you think normalizing paths would be worth another hack? JSON itself is OS-independent, so providing such paths as well seems the right thing to do. We do that in some of our software as well for different runtimes and OS like Perl, which automatically convert / to \ sometimes.

@GreyCat
Copy link
Member

GreyCat commented Apr 4, 2017

Ok, understood, I'll add normalization to / in file names then.

@GreyCat
Copy link
Member

GreyCat commented Apr 7, 2017

Should be fixed now: kaitai-io/kaitai_struct_compiler@2aef0de

@ams-tschoening
Copy link
Author

ams-tschoening commented Apr 10, 2017

Sorry, but I don't think so. Just tested with a current master and the error using / in my relative paths is still the same. Looking at your commit you seemed to have changes some error messages only? What I was talking about was the following line instead.

def stringToJson(str: String): String =
  translator.doStringLiteral(str)

kaitai-io/kaitai_struct_compiler@c94b650#diff-cd926bd86535fd1e7062bc1372c1ec7bR34

That line is providing either / or \ to ksv depending on the current platform.

@GreyCat
Copy link
Member

GreyCat commented Apr 10, 2017

Could you provide an (preferably minimal) example of what is misbehaving? I'll try to find a Windows box to test shortly.

Current stringToJson method seems to do what it is supposed to do: it provides a JSON representation of an arbitrary string. There is absolutely no reason to do any string mangling in such generic method. If we need to replace backslashes in file paths somewhere, then we need to track that particular location and do replacement there.

@ams-tschoening
Copy link
Author

ams-tschoening commented Apr 10, 2017

Could you provide an (preferably minimal) example of what is misbehaving?

It's not necessarily a misbehaviour, but a design decision. Assume the tests-repo as current working dir, which of the following both invocations should work on Windows and why?

ruby "-I[...]/kaitai_struct/runtime/ruby/lib" "[...]/kaitai_struct/visualizer/bin/ksv" "src/bcd_user_type_be.bin" "formats/bcd_user_type_be.ksy"

ruby "-I[...]/kaitai_struct/runtime/ruby/lib" "[...]/kaitai_struct/visualizer/bin/ksv" "src/bcd_user_type_be.bin" "formats\bcd_user_type_be.ksy"

Currently, the first invocation doesn't work because of formats/, only the second one works. But that second invocation doesn't work on non-Windows.

So the question is, is it worth to add some workaround to JSON output to make the first invocation work on Windows as well? That way the first invocation could be shared amongst platforms easily. E.g. because that invocation is part of some Eclipse launch configuration or some script or whatever.

The reason why the first invocation doesn't work is that ksc provides JSON with \ on Windows always, because Java stringifies paths that way internally, while ksv simply uses what is given on the shell as text. So on Windows one needs to provide \ on the shell, because that is what ends up in JSON, which doesn't work on non-Windows anymore, where Java internally stringifies to /.

If you disagree or are in doubt, just leave it as it is. It's not a huge thing and the most important fix was that of the formal JSON syntax.

I have no other easier examples, my dir layout and such is completely different from yours. But things break down to the both provided invocations, assuming that ksc is properly available in the environment and such.

@GreyCat GreyCat transferred this issue from kaitai-io/kaitai_struct_compiler Jan 13, 2019
@generalmimon
Copy link
Member

generalmimon commented Aug 15, 2023

Although the main problem of this issue (invalid string literals like "path\to\format.ksy" in the JSON output) has been resolved in KSC 0.8, there is still the compiler's behavior of converting input paths with / to \ in the --ksc-json-output report on Windows to discuss. As of KSC 0.10, the compiler still converts the path to backslashes even if the input .ksy path uses forward slashes (notice the second test, where the compiler used ".\\hello_world.ksy" as JSON key even though ./hello_world.ksy was given as the input path):

C:\temp\ksc-json-output-tests>kaitai-struct-compiler --ksc-json-output -t ruby -d test-a1 hello_world.ksy
{"hello_world.ksy": {"firstSpecName": "hello_world","output": {"ruby": {"hello_world": {"topLevelName": "HelloWorld","files": [{"fileName": "hello_world.rb"}]}}}}}

C:\temp\ksc-json-output-tests>kaitai-struct-compiler --ksc-json-output -t ruby -d test-a2 ./hello_world.ksy
{".\\hello_world.ksy": {"firstSpecName": "hello_world","output": {"ruby": {"hello_world": {"topLevelName": "HelloWorld","files": [{"fileName": "hello_world.rb"}]}}}}}

C:\temp\ksc-json-output-tests>kaitai-struct-compiler --ksc-json-output -t ruby -d test-a3 .\hello_world.ksy
{".\\hello_world.ksy": {"firstSpecName": "hello_world","output": {"ruby": {"hello_world": {"topLevelName": "HelloWorld","files": [{"fileName": "hello_world.rb"}]}}}}}

Personally, I see this as somewhat unfortunate behavior and don't see any benefit in it. I think reflecting the original .ksy path exactly as passed on the command line to kaitai-struct-compiler (i.e. without any slash conversions or anything else) in the JSON output is the least surprising behavior that would make KSC very easy to use: the naive way of using the input .ksy path directly as a key to the JSON object simply always works regardless of what slashes were used, what platform we're on, etc.

@ams-tschoening:

I use a command line similar to the following in Eclipse to start ksv:

[...]ksv "${workspace_loc}/[...]/data.bin" "record/fmt_clt_recs.ksy" "record/fmt_oms_rec.ksy"

With the current ksc this leads to \ instead of / in the generated JSON object, because Java internally uses a platform dependent path separator most of the times. The problem is that ksv sees / itself and can't find its own paths in the JSON anymore. I used / in favour of \ simply because I had trouble using \ in the past and / seems to be more common in Ruby, this would work on non-Windows etc.

I reported this problem (without knowing about this issue) in kaitai-io/kaitai_struct_visualizer#52 and worked around it in ksv in kaitai-io/kaitai_struct_visualizer@d9a3dbd. Retrying the path with all forward slashes / replaced to backslashes \ seemed to work in simple cases I happened to test, but admittedly, it was a hack without properly investigating the cause of the problem. The simplest proper fix I see would be to just change the compiler to use the input path strings exactly as JSON keys, so that ksv (and possibly other third-party applications making use of --ksc-json-output, if they exist) can reliably just look them up in the JSON object 100% of the time.

@generalmimon
Copy link
Member

generalmimon commented Aug 15, 2023

Apparently, the reason why the compiler converts the input paths is because it initially reads the given command-line .ksy path arguments to java.io.File objects (see JavaMain.scala:38) and later uses their toString() method to convert the File objects back to path strings (JavaMain.scala:279). However, toString() uses the getPath() method:

public String toString()

Returns the pathname string of this abstract pathname. This is just the string returned by the getPath() method.

and getPath() is specified to use the system-dependent separator character:

public String getPath()

Converts this abstract pathname into a pathname string. The resulting string uses the default name-separator character to separate the names in the name sequence.

... which is specified as follows (separatorChar):

public static final char separatorChar

The system-dependent default name-separator character. This field is initialized to contain the first character of the value of the system property file.separator. On UNIX systems the value of this field is '/'; on Microsoft Windows systems it is '\\'.

Also, according to my tests, it collapses multiple consecutive slashes into one (this is in WSL 2, happens on Windows as well):

$ kaitai-struct-compiler -- --ksc-json-output -t ruby -d test-wsl .///hello_world.ksy
{"./hello_world.ksy": {"firstSpecName": "hello_world","output": {"ruby": {"hello_world": {"topLevelName": "HelloWorld","files": [{"fileName": "hello_world.rb"}]}}}}}

Which further proves that merely retrying the path with / replaced by \ as the JSON key that I added in kaitai-io/kaitai_struct_visualizer@d9a3dbd was a band-aid solution, not a proper fix.

@generalmimon
Copy link
Member

generalmimon commented Aug 17, 2023

@ams-tschoening:

Don't you think normalizing paths would be worth another hack? JSON itself is OS-independent, so providing such paths as well seems the right thing to do. We do that in some of our software as well for different runtimes and OS like Perl, which automatically convert / to \ sometimes.

@GreyCat:

Ok, understood, I'll add normalization to / in file names then.

Should be fixed now: kaitai-io/kaitai_struct_compiler@2aef0de

I don't think this "normalization" was the best decision, either. I don't see any compelling reason why it should be done. Also, I'm not sure what it means to provide OS-independent paths - in general, paths are very much OS-dependent. You can achieve some degree of compatibility of relative paths, but not much more.

And the piece of code responsible for this "normalization" is very naive (ProblemCoords.scala:36-38):

object ProblemCoords {
  def formatFileName(file: Option[String]): String = file match {
    case Some(x) => x.replace('\\', '/')

Unix paths use only forward slashes / to separate names, so nothing stops you from creating a file with name hello\world.ksy (not to be confused with a file world.ksy in subdirectory hello - that would be hello/world.ksy). However, this slash normalization code butchers it so that it's indistinguishable from a world.ksy file in a hello subdir:

$ ls -la
total 0
drwxrwxrwx 1 pp pp 4096 Aug 17 11:11  .
drwxrwxrwx 1 pp pp 4096 Aug 17 11:11  ..
-rwxrwxrwx 1 pp pp   68 Aug 17 11:11 'hello\world.ksy'

$ kaitai-struct-compiler -- --ksc-json-output -t ruby -d outdir ./hello\\world.ksy
{"./hello\\world.ksy": {"errors": [{"file": "./hello/world.ksy","path": ["meta","endian"],"message": "unable to parse endianness: `le`, `be` or calculated endianness map is expected"}]}}
                                                    ^

$ kaitai-struct-compiler -- -t ruby -d outdir ./hello\\world.ksy
./hello/world.ksy: /meta/endian:
       ^
	error: unable to parse endianness: `le`, `be` or calculated endianness map is expected

We could spend time tweaking it so it doesn't cause any problems, but as I said, I consider the slash normalization unnecessary. I think it's best when the compiler treats the paths from the user as opaque strings and merely passes them to the underlying platform and all console/JSON outputs unchanged. If the accepted string matches some supported path format on the particular platform, great; if the path is invalid, the platform will report an error, which the compiler just passes along. In fact, there's no need for the compiler to worry about which path formats are supported on which platform.

generalmimon added a commit to kaitai-io/kaitai_struct_compiler that referenced this issue Aug 18, 2023
See kaitai-io/kaitai_struct#507

Before this commit, the input .ksy paths were normalized as soon as they
were parsed from command line arguments, because they were read to
`java.io.File` objects.
[`java.io.File`](https://docs.oracle.com/en/java/javase/17/docs/api/java.base/java/io/File.html)
objects are subjected to path normalization as soon as they're created,
for example on Windows the forward slashes `/` are converted to
backslashes `\`, a series of consecutive slashes are collapsed into one
on both Unix and Windows systems etc. (See
kaitai-io/kaitai_struct#507 (comment)
for more details.) Consequently, the .ksy file path for which the
compile results are reported may not have matched the original input
path. This was particularly noticeable on Windows, where you could only
use backslashes (not forward slashes) in the .ksy path passed to `ksv`
due to this.

As I explain in
kaitai-io/kaitai_struct#507 (comment),
I believe that treating input paths as opaque strings makes the compiler
easier to operate. Now you can pass a bunch of .ksy paths to the
compiler and expect to find each path string used exactly as a JSON key
in `--ksc-json-output`, without any effects of path normalization
mentioned earlier.
generalmimon added a commit to kaitai-io/kaitai_struct_compiler that referenced this issue Aug 18, 2023
As demonstrated in
kaitai-io/kaitai_struct#507 (comment),
this could in fact change the path incorrectly: on a Unix system, a file
name can contain `\` as a special character, for example
`hello\world.ksy`, but the old code would still "normalize" a Unix path
like `./hello\world.ksy` to `./hello/world.ksy`, which is wrong.

But I don't see why forced path normalization should be done on Windows
either, except for purity reasons (which are unimportant if you ask me).
I don't mind any mix of `\` and `/` slashes in the path coming out on
Windows (which can indeed happen from now on, especially for paths to
imported specs), as long as the resulting path points to the correct
file (which it kind of has to, because it is the same path that the
compiler used to access the file).

This commit reverts 2aef0de.
generalmimon added a commit to kaitai-io/kaitai_struct_visualizer that referenced this issue Aug 23, 2023
As explained in
kaitai-io/kaitai_struct#507 (comment),
retrying the path with all forward slashes `/` replaced with backslashes
`\` was a band-aid fix, not a proper solution. It's better to have the
compiler preserve input .ksy paths exactly in JSON keys so that
applications like `ksv` don't have to do any trial-and-error or complex
path resolution. The compiler has already been changed to do that in
kaitai-io/kaitai_struct_compiler@4c5096b,
so this `ksv` hack can be reverted.

This (partially) reverts commit d9a3dbd.
generalmimon added a commit to kaitai-io/kaitai_struct_compiler that referenced this issue Mar 10, 2024
In 2b84e0a, I reverted a previous
change that was replacing all backslashes `\` to forward slashes `/` in
file paths used in error messages (meaning that backslashes are kept
since then if present in input paths). However, until now I didn't
notice that this broke `ErrorMessagesSpec` tests on Windows. This is
because compile error messages actually include the full path to the
`.ksy` spec, for example:

```
../tests/formats_err/attr_bad_size.ksy: /seq/0/size:
	error: invalid type: expected integer, got CalcStrType
```

However, in `formats_err` specs, only the base name is specified (see
https://github.com/kaitai-io/kaitai_struct_tests/blob/73486eef/formats_err/attr_bad_size.ksy
below):

```
# attr_bad_size.ksy: /seq/0/size:
#       error: invalid type: expected integer, got CalcStrType
#
meta:
  id: attr_bad_size
seq: # ...
```

The `../tests/formats_err/` prefix in the actual message is supposed to
be removed by `.replace(FORMATS_ERR_DIR + "/", "")` in
`ErrorMessagesSpec` (line 53). Note that the string to be removed always
uses forward slashes `/`.

The problem is that on Windows, `f.toString` (which returns a file path
with the backslash `\` as name separator - this is explained at
kaitai-io/kaitai_struct#507 (comment))
is passed to the compiler as the .ksy spec to process. Since the path
normalization (i.e. `\` -> `/` conversion) was removed, error messages
will start like `..\tests\formats_err\attr_bad_size.ksy: ...`. This
means the `.replace(FORMATS_ERR_DIR + "/", "")` call won't do anything,
because it looks for ``../tests/formats_err/`, which is not there.

The fix is to pass forward slash `/` paths to the compiler even on
Windows.
@generalmimon
Copy link
Member

I believe the proposed solution of "treating paths as opaque strings" is now fully implemented in KSC by commits kaitai-io/kaitai_struct_compiler@7071a9d...2b84e0a.

The band-aid commit in ksv trying to add support for forward slash paths on Windows (kaitai-io/kaitai_struct_visualizer@d9a3dbd) was reverted, since it's not needed anymore - KSC now preserves the original path "spelling" in JSON keys of --ksc-json-output, so ksv will get a 100% match simply by indexing the JSON object with the same path string passed that it passed to KSC (regardless of whether the user typed ./hello_world.ksy, .\hello_world.ksy or .///hello_world.ksy).

This issue is therefore resolved.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants