Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider moving sway-lib-std and sway-lib-core back into the sway repo to make synchronising under CI easier #830

Closed
mitchmindtree opened this issue Feb 23, 2022 · 7 comments · Fixed by #1052
Labels
ci lib: std Standard library

Comments

@mitchmindtree
Copy link
Contributor

Currently, we have a cyclic dependency between the sway repo and the sway-lib-core/sway-lib-std repos. std and core are used throughout the examples and tests under sway, and the two libraries implicitly depend on sway via the sway language and forc.

We should consider the possibility of moving the std and core libs back into the sway repo to make keeping everything in sync under CI easier and to avoid the need to open two PRs each time we make a change to either the libs or sway.

forc dependencies in git repo sub-directories

Seeing as Sway doesn't yet have an implicitly included prelude through which we can provide core and std, we'll need to solve how to refer to the sub-directory of a repo in forc's git depedencies to make this possible. This will be necessary to allow downstream projects to depend on std and core via git by pointing at the sway repo.

Cargo generally seems to "solve" this via their [workspace] feature, as crates within subdirectories are almost always a part of some top-level workspace. That said, workspaces are a much broader feature which aims to solve a lot of other problems too, so I'm not sure we should rely on waiting to land a similar feature in forc just to solve this.

An alternative might be to add something like a root attribute to forc's git dependency type, where root is a relative path from the repo to the subdirectory containing the manifest of the dependency. We could potentially re-use the existing path field for this rather than adding a root field, as long as we clarify the behaviour that when both path and git are specified, path is relative to the root of the specified git repo rather than the root of the current project.

In the sway repo examples and tests we could solve this by specifying the core and std dependencies via path, however this wouldn't solve anything for downstream projects.

@mitchmindtree mitchmindtree added ci lib: std Standard library labels Feb 23, 2022
mitchmindtree added a commit that referenced this issue Feb 23, 2022
This is in preparation for addressing #799 and makes landing #825 a
little easier for the reasons described in #829.

We might wish to consider keeping these dependencies pointed at the
master branch, however this would be made easier by #830.
@otrho
Copy link
Contributor

otrho commented Feb 23, 2022

Couldn't it be resolved with pinning? The libs don't necessarily require the latest Sway and could stick with 'stable' (though while it's still early days this isn't as true, but it should settle down).

Keeping the repositories separate and referenced simply via a git repo URL in the TOML is nicer organisationally, as is having separate repo meta, like issues and PRs.

@mitchmindtree
Copy link
Contributor Author

Couldn't it be resolved with pinning?

Kind of, I think this is currently the idea behind pinning the std and core deps to a particular version in the E2E tests.

However, this only solves depending on sway-lib-std and sway-lib-core inside of sway, but doesn't solve the issue where updating sway or forc may break the ability to compile the master branch of sway-lib-core or sway-lib-std (e.g. due to a language change, or forc manifest change).

It also allows for the E2E test core and std deps to fall out of date as is currently the case, but perhaps that doesn't matter so much if those tests are specifically for sway the language, and not the libraries. Ideally it would be nice if we could use them to check for breakage in both the language and the core/std libs.

I think from the perspective of a user, I would assume that I should be able to download the current master of both sway, sway-lib-std and sway-lib-core and know that all three will be compatible, however this is difficult to guarantee while each are split across 3 repositories. If all are synchronised under one repo, I think it's much easier to offer that guarantee.

Keeping the repositories separate and referenced simply via a git repo URL in the TOML is nicer organisationally, as is having separate repo meta, like issues and PRs.

I think we can get the issue and PR organisational benefits via labels, but otherwise I'm not so sure I see much benefit to keeping core and std separate. It does make specifying them via git dependencies nice and easy at the moment, and this is partly what I was aiming to address in the second part of the OP, but I think this will only be an issue until we have a prelude anyway.

For the record, rust-lang keep their core and std libs under the main repo too, but they've been shipping them alongside the compiler for a long time now, and we're not doing that just yet.

mitchmindtree added a commit that referenced this issue Feb 23, 2022
This is in preparation for addressing #799 and makes landing #825 a
little easier for the reasons described in #829.

We might wish to consider keeping these dependencies pointed at the
master branch, however this would be made easier by #830.
mitchmindtree added a commit that referenced this issue Feb 23, 2022
Addresses #829.

Also allows for putting off #831 while we discuss #830.
@nfurfaro
Copy link
Contributor

It also allows for the E2E test core and std deps to fall out of date as is currently the case, but perhaps that doesn't matter so much if those tests are specifically for sway the language, and not the libraries.

Indeed, now that we have a test harness in lib-std, all new tests added for the std lib will be added in that repo.
Additionally, existing std-lib tests in the sway repo could be migrated to the new harness for consistency.

@sezna
Copy link
Contributor

sezna commented Feb 23, 2022

I think perhaps moving them back is a good idea, as long as we are still able to reference them directly from a Forc.toml or introduce a prelude, as is discussed above. It seems like those are the only two options. I'm thinking the former is less work, as a prelude has the whole chicken-and-egg problem of compiling the standard library in order to compile the compiler... etc.

Cargo's [workspace] feature does somewhat rely on having a working package manager, as consumers of crates that exist in a workspace refer to the crate by its name on crates.io.

My 🪙 🪙 : I think the easiest way forward right now is to allow both subdirectories in git and local path dependencies which would take precedence over the git dependency when it is available. This solves all of the pain points I believe?

pain points that would be solved:

  1. Painful iteration across multiple repos
  2. Lack of enforced compatibility among the latest compiler, core, and std
  3. lack of ability to specify a dependency that is not the top-level project in a github repository
  4. unified testing

P.S. re: repo metadata. If we do this, I think the only way to solve the repo-meta issue Toby described is to use labels more.

@Voxelot
Copy link
Member

Voxelot commented Feb 24, 2022

The risk of overriding the path behavior when git is enabled is that local dev workflows could be broken if you have multiple forc projects in the same repo. For example, if I make a change in crate A and consume that in crate B, I'd have to push crate A to a branch before seeing those changes locally in crate B.

Although, it seems like they would use the same path in either case whether a git repo is involved or not?

mitchmindtree added a commit that referenced this issue Feb 24, 2022
Addresses #829.

Also allows for putting off #831 while we discuss #830.
@emilyaherbert
Copy link
Contributor

I think from the perspective of a user, I would assume that I should be able to download the current master of both sway, sway-lib-std and sway-lib-core and know that all three will be compatible, however this is difficult to guarantee while each are split across 3 repositories. If all are synchronised under one repo, I think it's much easier to offer that guarantee.

This hits the nail on the head of what I think our focus should be.

I think perhaps moving them back is a good idea, as long as we are still able to reference them directly from a Forc.toml or introduce a prelude, as is discussed above. It seems like those are the only two options. I'm thinking the former is less work, as a prelude has the whole chicken-and-egg problem of compiling the standard library in order to compile the compiler... etc.

Sure, but I do think that we should plan to support prelude in the future.

My 🪙 🪙 : I think the easiest way forward right now is to allow both subdirectories in git and local path dependencies which would take precedence over the git dependency when it is available.

To clarify, are you saying that the user could have core and std in their [dependencies] both pointing to GitHub, and forc still would still elect to use either the local version (or the git subdir) version instead? I think that this is fine for now while waiting for a solution like prelude. However, what I want to avoid is a situation in which the user thinks that they are indeed using the Github versions (in the event that we just added a new feature or something that hasn't been pushed to crates.io) and the compiler is failing and producing error messages that aren't relevant. I think if forc does decide to prefer a different version, it should tell the user that it has found a local version (or a git subdir) and is compiling with that version, in the compiler messages.

@sezna
Copy link
Contributor

sezna commented Feb 24, 2022

what I want to avoid is a situation in which the user thinks that they are indeed using the Github versions (in the event that we just added a new feature or something that hasn't been pushed to crates.io)

I agree -- that's not ideal. What Cargo does right now is: if there is a local path dependency and a git/crates.io dependency listed, it uses the local one if it exists. If not, use the remote one. That's definitely implicit behavior that could lead to what you're describing. I like your idea of adding the message, something unobtrusive like (using local path) or something.

Sure, but I do think that we should plan to support prelude in the future.

I agree, we should have a prelude eventually. After we are done iterating and building out all of this v1 stuff, that will be essential to the developer experience.

mitchmindtree added a commit that referenced this issue Feb 28, 2022
Addresses #829.

Also allows for putting off #831 while we discuss #830.
mitchmindtree added a commit that referenced this issue Feb 28, 2022
Addresses #829.

Also allows for putting off #831 while we discuss #830.
mitchmindtree added a commit that referenced this issue Feb 28, 2022
Addresses #829.

Also allows for putting off #831 while we discuss #830.
mitchmindtree added a commit that referenced this issue Mar 2, 2022
Addresses #829.

Also allows for putting off #831 while we discuss #830.
mitchmindtree added a commit that referenced this issue Mar 2, 2022
Addresses #829.

Also allows for putting off #831 while we discuss #830.
mitchmindtree added a commit that referenced this issue Mar 3, 2022
Addresses #829.

Also allows for putting off #831 while we discuss #830.
mitchmindtree added a commit that referenced this issue Mar 4, 2022
…rt. Add `Forc.lock` for more deterministic builds. (#825)

* Construct `pkg_graph` at beginning of `forc build`

This is a start at separating the construction of the dependency graph
and fetching of dependency source from the compilation process. This is
done under a new `fetch_pkgs` function where, given the manifest of the
package we're compiling, it produces a graph where each node represents
a package at a known, pinned version/commit along with a `Path` to a
local copy of their source (under `.forc/git/checkouts` for git deps).

The `pkg_graph` also contains our root-level package, allowing us to
treat all packages in the compilation process as equal, avoiding special
case code for the top-level program.

The compilation order of packages is determined by performing a toposort
of the `pkg_graph`. By compiling packages in this order, each node is
always guaranteed to have access to the compiled artifacts of each of
its dependencies.

This PR also introduces the `git2` crate for fetching and working with
git dependencies. This should allow us to remove the current
github-specific dependency handling in favour of more general git
support.

---

I think the next step in this PR will be replacing the existing
`dependency_graph` mutable hash map reference with an immutable
reference to the newly constructed `pkg_graph`. It seems this may also
involve populating the `Namespace` prior to the rest of compilation too?
I'm still familiarising myself with `sway-core`, so will need to dive a
bit deeper to be sure.

* Replace old dependency compilation with new `pkg_graph` approach

This removes the old dependency-specific compilation logic in favour of
flattening the graph into a list of compilation steps and using the same
`compile_pkg` function to compile all packages, including at the
top-level package.

* Move `pkg`-specific code into a new `pkg` submodule

* Address cargo fmt and clippy nits

* Remove implied pkg from idents in pkg submodule

* Add support for specifying git dependency via tag

* Fix dependency path handling to be relative to parent

* Remove github-specific logic in favour of new git pkg handling

See #829.

This also temporarily disables `forc update` and the related
`forc_dep_check` module pending more general `git` support and
committing updates to a `Forc.lock` file.

* Improve support for determinism and cachability with `Forc.lock`

This adds a `Forc.lock` file that for the most part mirrors the
`Cargo.lock` file.

If a `Forc.lock` doesn't exist, a `Lock` structure is created from the
package graph and serialised to TOML before being written to the
`Forc.lock` file path.

If a `Forc.lock` *does* exist, we construct the build graph from the
`Lock` structure, re-using the previously fetched source for the pinned
dependencies.

TODO:

- Make `forc update` update the lock file.
- Add support for updating individual packages.

* Move `BuildPlan` under `pkg` module in anticipation for `forc update`

* Update `forc update` for lock. Print removed and added deps.

A new `lock::Diff` type is added to collect added/removed packages.

* Address doc comment nits

* Validate `Lock` in accordance with `Manifest`

This ensures that if any `Forc.toml` dependencies are added, removed or
modified, the change is detected during `forc build` and the `Forc.lock`
file is updated accordingly.

* Improve formatting of `Forc.lock` related stdout

* Update sway-core petgraph version so that it matches forc

* Update test and examples to pin via git tag rather than version field

Addresses #829.

Also allows for putting off #831 while we discuss #830.

* Update some E2E tests due to unpinned `core` dep in `std`

None of the `std` releases so far pin `core` to a version. As a result,
it pulls the `core` master branch by default, despite the top-level
project also dependending on `core` at a specific version. Most E2E
tests seemt to work regardless, but this subset requires updating.
Following the lock file work landing, we should update versioned `std`
releases to refer to versioned `core` via git tag.

* Only fetch pkgs that are missing, rather than whole pkg graph

Previously if any of the pkgs that exist in `Forc.lock` were not already
available in `~/.forc/git/checkouts`, we would update the entire lock
file and re-fetch all dependencies. Now, if the local copy of a pkg is
missing, we just fetch that individual pkg and continue.

This commit also updates the `fetch_pkgs` code to re-use pre-fetched git
commits. Previously, all packages would be re-fetched when the
`Forc.lock` file was updated. Now, we only fetch if the path for that
commit doesn't already exist.

* Update forc git dep from 0.13 to 0.14

* Fix forc Cargo.toml dependency order

* Update Cargo.lock for the addition of git2, petgraph

* Add Forc.lock files for sway examples

* Ensure `Namespace` only contains a package's dependencies

Previously one namespace was used to collect *all* items. This meant
that when passed to the package that was being compiled, the namespace
contained not only the names for its dependencies, but sometimes other
unrelated names that happened to be earlier in the compilation order.

This commit fixes this issue, creating a fresh namespace purely for each
package in order to collect only its dependencies. This fixes some
shadowing errors that were appearing in the E2E tests.

* Pin tests by adding their `Forc.lock` files for reproducibility

As a follow-up, we should update all of the tests to depend on `master`
and then update the lock files. This will save us from having to change
versions tags to pin all the time.

Note: You can clear all test lock files with

```
rm ./test/src/e2e_vm_tests/test_programs/*/Forc.lock
```

* Temporarily warn about `version` field in dependencies

We no longer use the GitHub REST API and in turn no longer map the
`version` field to GitHub "release"s. Instead, we now support git more
generally. Warn about appearances of the `version` field in dependencies
and suggest using `branch` or `tag` as an alternative.

* Address nits uncaught in anyhow PR review

* Fix formatting following rebase onto forc anyhow PR
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ci lib: std Standard library
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

6 participants