-
-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add changes required by Nimble lock file support #12104
Conversation
Seems to do way too much, since these hashes are ignored, why not ignore them in the parsing step and leave the rest of the code as it is? |
What are you intending to use these hashes for? Why do they need to be appended to the package directories? |
@Araq The hashes are used by Nimble itself. If you look at the old Nimble and compiler code before my changes, you will find two nearly identical pieces of code: I just merged them. Nimble already heavily relies on imports of the compiler code. The implementation also supports the old format of the package directories for backward compatibility of the compiler with older Nimble versions. |
@dom96 If you look at the specification of the feature checksum is needed for each package to determine that it is the exactly same version as the required in the lock file in order to achieve reproducible builds. Even for tagged versions Git tags are not enough because they are not immutable. I decided to store the checksum in the package directory, because this way the required by the lock file package can be found very fast without entering in every directory to search for it. Also this way you can have two or more different branch versions
and different projects to depend to different library versions from the same branch. To store the checksum in the directory name is common practice in other package managers. See Nix for example. |
My recommendation to @bobeff was to reduce the reliance on the He has taken the approach here, because the suggested alternative has proven to require more significant changes, but I still think it's worth pursuing. If the question is "Why do we need hashes in the first place?", the answer is that the recent stories of package hijacking in other eco-systems now make this additional security measure a requirement. We must ensure that a dependency in the lockfile can't be silently replaced by a malicious version. The layout of the Nimble cache is just an optimisation that makes it easier to resolve your dependencies more quickly in everyday Nimble operations. |
As a counter-point, I should mention that there are few additional concerns that suggest that the compiler should be more aware of the Lockfiles.
When you have the following diamond dependecies:
The package D must be resolved to a different path when imported from C or from B. A similar concern arises when using different packages sharing the same name. It's possible to solve this problem through adding a slightly more advanced alternative to the |
You'll hit some command line size limits there. Windows command lines seem limited to 2047 characters: https://support.microsoft.com/en-us/help/830473/command-prompt-cmd-exe-command-line-string-limitation |
@stefantalpalaru, that's right. This is usually handled by adding support for command files. In Nim, you can argue that a temp |
If hashes are used in the path, only the first 8 characters should be used from the hash:
|
generally it seems wrong that the compiler knows this much about the package manager - the paths/extended-module-names could easily be passed in (allowing other package mgmt strategies etc) nimsuggest and friends could easily be aware of nimble or lock files instead of burdening the compiler with this knowledge. otherwise, what's the point of splitting package manager and compiler? this split is already a bit half-baked in nim, with the nim compiler doing all kinds of things that are typically left to outside tools (rst template processing to multiple formats, ctags etc etc!!??) a better and more well-defined interface between compiler and package manager would make the compiler much easier to understand - fewer "magic" paths that it looks in, so when something goes wrong, one has a fair chance of understanding what it was. |
Or compiler can ask package manager about modules somehow, e.g. by specific call with arguments. |
asking creates the same kind of two-way information flow between package manager and compiler which represents a similar kind of complexity - each project must be aware of the other which is what one would want to avoid |
|
yes, unfortunately - the compiler code is quite a lot harder to read because of this, with the extra
|
asking creates the same kind of two-way information flow between package
manager and compiler which represents a similar kind of complexity - each
project must be aware of the other which is what one would want to avoid
Not if you have a protocol that other package managers can use
|
indeed, but this is orthogonal - if the |
Well guess what, we figured it out too, see nim-lang/nimble#654 |
The newer Nimble doesn't and nimscriptsupport.nim is all dead code afaik. |
If the Nim compiler has to be touched, I'd rather see #654 implemented as part of this work. Meanwhile, Nimble should continue to be backwards compatible thru this change. As of today, you can run the latest Nimble even with 0.19.6. Perhaps lock file support won't work with an older compiler but existing functionality shouldn't break. |
Whoa, this discussion has gone a bit off topic. Please consider moving such discussions to separate threads/issues. You will likely find that many of these have already been discussed in Nimble's issue tracker (I saw a lot of things mentioned that are already implemented and/or planned). As far as my original question. I understand why hashes are a necessity for lock files support, what I am wondering is why you are tacking on another hash to the pkg file paths when these paths can already include a Git/Hg commit hash: |
Btw just to clear up some misconceptions: Nimble already passes paths explicitly to Nim when calling |
One possible optimization is that we can reuse the top level tree hash of a Git commit and to implement exactly the same algorithm which Git uses, when another download method is being used. Git and Mercurial use different methods for calculating their tree hashes. I don't implement it in this way because:
I still can implement this optimization as a final step before the pull request, if there is a strong interest in it. |
@Araq I also think that this is out of the scope of the lock files feature. After all despite that the code of the procedure has become somehow more clumsy, I didn't add anything new as a dependency between Nim and Nimble. I propose for now to make the directory names to contain only the first few characters of the hash as it was suggested above, if this is the final decision? If so, the question is how many are appropriate:
|
How does this affect us in a practical sense? Can you elaborate on why this is a problem?
Even in this case, you can easily reuse this mechanism, no? You just calculate a hash for your tarballed package and reuse the same |
@dom96, I've had several discussions with @bobeff regarding the hashing algorithms being used. An important property that we need is that the hash should not depend on the download method being used. It should depend only on the contents of the package. In other words, the same package obtained through different methods should resolve to the same hash. It's possible to use a git tree hash to identify the contents (this should not be confused with a git commit hash), but this would require us to implement a fully compliant tree hashing algorithms that works exactly as git (to be used when you download a tarball). In return, we'll be able to skip some of the hashing when working with locally developed git repos. What I recommended to @bobeff in the end is to use a scheme similar to multihash, so that we'll be able to add extra hashing schemes such as git tree hashes or mercurial manifest hashes in the future, but for now we'll launch with a single hashing algorithms defined by Nimble itself. |
Okay, but I don't understand why this is necessary, can you explain? |
I would say that having this property has a long tail of various benefits, but here is one particular way to see why it's necessary:
|
@Araq You are right. I forgot that I even removed this file in nim-lang/nimble@134b799 :D |
Okay, I'm inclined to accept this, however I want to see the lock file implementation in Nimble first in order to understand precisely how these hashes are going to be used (this PR shouldn't block you). It might very well turn out that we can do this differently and no changes in the compiler will be necessary, let's postpone this PR until lock files in Nimble are ready. |
d9a6246
to
1757aad
Compare
6623149
to
9a05edb
Compare
Could you merge this to enable CI for the Nimble lock file support branch? |
@bobeff you can change the Github Actions file to point at this branch until we can merge this. |
9a05edb
to
7902a1f
Compare
7902a1f
to
89907c2
Compare
89907c2
to
d5e4998
Compare
I'm not sure why the CI is failing. It seems to be not related to the pull request. |
There still seem to be several unaddressed comments here: ie locking the algorithms that the package manager must use to introduce compile paths, the use of sha1 vs a multihash scheme that would allow, among other things, to use git hashes directly in the future etc. |
ie what's version selection doing the compiler? why does the compiler have to know about package versioning schemes at all? this is clearly in the domain of a package manager |
Correct but this is kept for backwards compatibility. At least that's my understanding. |
c57bd0f
to
7f92950
Compare
Implemented support for Nimble local cache with package directories with a checksum of the package at the end of their names. Now the compiler supports package paths in the form: * /path_to_nimble_cache_dir/pkgs/package_name-1.2.3- FEBADEAEA2345E777F0F6F8433F7F0A52EDD5D1B * /path_to_nimble_cache_dir/pkgs/package_name-#head- 042D4BE2B90ED0672E717D71850ABDB0A2D19CD2 * /path_to_nimble_cache_dir/pkgs/package_name-#branch-name- DBC1F902CB79946E990E38AF51F0BAD36ACFABD9 Related to nim-lang/nimble#127
7f92950
to
1da1285
Compare
Implemented support for Nimble local cache with package directories with a checksum of the package at the end of their names. Now the compiler supports package paths in the form: * /path_to_nimble_cache_dir/pkgs/package_name-1.2.3- FEBADEAEA2345E777F0F6F8433F7F0A52EDD5D1B * /path_to_nimble_cache_dir/pkgs/package_name-#head- 042D4BE2B90ED0672E717D71850ABDB0A2D19CD2 * /path_to_nimble_cache_dir/pkgs/package_name-#branch-name- DBC1F902CB79946E990E38AF51F0BAD36ACFABD9 Related to nim-lang/nimble#127
Implemented support for Nimble local cache with package directories a checksum of the package at the end of their names. Now the compiler supports package paths in the form:
/path_to_nimble_cache_dir/pkgs/package_name-1.2.3-FEBADEAEA2345E777F0F6F8433F7F0A52EDD5D1B
/path_to_nimble_cache_dir/pkgs/package_name-#head-042D4BE2B90ED0672E717D71850ABDB0A2D19CD2
/path_to_nimble_cache_dir/pkgs/package_name-#branch-name-DBC1F902CB79946E990E38AF51F0BAD36ACFABD9
Related to nim-lang/nimble#127