Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Motivation docs for graph #5741

Merged
merged 7 commits into from
Oct 7, 2020

Conversation

rainersigwald
Copy link
Member

Several folks have asked for better documentation about what static graph is and why we think it's a good thing. That's . . . a very reasonable request!

cc @marcpopMSFT @KathleenDollard

@rainersigwald rainersigwald added Documentation Issues about docs, including errors and areas we should extend (this repo and learn.microsoft.com) Area: Static Graph Issues with -graph, -isolate, and the related APIs. labels Sep 17, 2020
Copy link
Member

@jaredpar jaredpar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really happy to see this doc getting updated! Left some comments on a few parts.


The second build will:

1. Build the library project, skipping all well-authored targets.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Think it would help to have a definition of "well-authored targets" in the doc. Not sure how generally well understood that concept is and it's pretty critical to this feature.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one is critical to old features too--I just mean "defines inputs/outputs". Fixing.


Microsoft has an internal build system, [CloudBuild](https://www.microsoft.com/research/publication/cloudbuild-microsofts-distributed-and-caching-build-service/), that supports this and has proven that it is effective, but is heuristic-based and requires maintenance.

MSBuild static graph features make it easier to implement a system like CloudBuild by building required operations into MSBuild itself.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure what "required operations" means here.

Static graph functionality can be used in three ways:

- On the command line with `-graph` (and equivalent API).
- This gets the scheduling improvements for well-specified projects, but allows underspecified projects to complete without error.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will the -graph command error if it cannot produce an acyclic graph?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Uh, technically yes?

$ dotnet msbuild -graph
Microsoft (R) Build Engine version 16.8.0-preview-20451-02+51a1071f8 for .NET
Copyright (C) Microsoft Corporation. All rights reserved.

Stack overflow.

😰 #3757

- On the command line with `-graph` (and equivalent API).
- This gets the scheduling improvements for well-specified projects, but allows underspecified projects to complete without error.
- On the command line with `-graph -isolate` (and equivalent API).
- This gets the scheduling improvements and also enforces that the graph is correct and complete. In this mode, MSBuild will produce an error if there is an `MSBuild` task invocation that was not known to the graph ahead of time.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

May want to explain a bit what "graph is correct" means.

documentation/specs/static-graph.md Show resolved Hide resolved
documentation/specs/static-graph.md Outdated Show resolved Hide resolved
documentation/specs/static-graph.md Outdated Show resolved Hide resolved

This part of the error is the problem here:

> the reference was called with a target which is not specified in the ProjectReferenceTargets item in project "S:\Referencing\Referencing.csproj"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Think you need to elaborate a bit more on why this is an error in the isolated mode. Particularly because I'm honestly unsure of what the exact problem is here 😄

Copy link
Contributor

@cdmihai cdmihai Sep 18, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TLDR: /isolate ensures the graph build does at least the work that vanilla msbuild does. But it's pretty complicated to use and probably only suitable for sdk writers.

  • it brutally checks the correctness of "what targets should a graph node be called with?". In top-down just-in-time vanilla msbuild a project gets built at the time of discovery. It gets called with whatever arbitrary target values end up in <MSBuild Targets=$(some_targets). During a graph build the runtime graph gets statically inferred and built bottom up, but without doing symbolic execution on the xml we have no idea what targets to call a node with. We solved this by adding a way to statically declare what targets a project calls on its references via the ProjectReferenceTargets protocol, and then we run a data flow over the graph to propagate them. But how do we know the declarations are correct? At minimum the graph build should build at least what vanilla msbuild builds. So one can use /isolate to ensure that the called targets are a superset of the targets run by vanilla msbuild. This is most useful for SDK writers, to ensure that their p2p target calling patterns are inferred by static graph. Not so much useful for end users unless they customize their build.
  • correctness wise, even if /isolate is violated, the build would most likely still end up correct. If A calls B with Foo, and Foo is undeclared, then Foo won't get run when the graph build builds B, but will get run regardless when the graph build A, and then A calls into B with Foo. However, if you have a distributed / cached build, it would be nice to capture and cache Foo's potentially expensive CPU work and IO side effects as part of B's scheduled run and cache entry, not as part of A's run. The higher order build engine might even complain if Foo touches the outputs of B, or the outputs of B's references. The fact that Foo's side effects get captured in A's cache entry rather than B's cache entry may or may not be a problem, depending on what one uses the caches for. Just binplacing cached results is fine, doing analyses to infer file dataflow might not be fine.
  • MSBuild needs a way to build a project "in isolation", without following references. This is useful when another tool is orchestrating the build. Both VS and Quickbuild build their own msbuild graph and then tell msbuild to build each project in isolation, without following references. To do this both of them rely on the sdks respecting the BuildProjectReferences=false convention. /isolate could be used to check that BuildProjectReferences is actually respected.
  • it enables a potentially interesting way of doing a graph build. If we can correctly infer all the targets a project gets called with, we could visit each node exactly once and call it with all the inferred targets. Then we could serialize those target results and feed them to all the referencing projects. This would enable each project only doing its own work, without reaching out and doing any work in other projects.

Copy link
Member Author

@rainersigwald rainersigwald left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @jaredpar!


The second build will:

1. Build the library project, skipping all well-authored targets.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one is critical to old features too--I just mean "defines inputs/outputs". Fixing.

Static graph functionality can be used in three ways:

- On the command line with `-graph` (and equivalent API).
- This gets the scheduling improvements for well-specified projects, but allows underspecified projects to complete without error.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Uh, technically yes?

$ dotnet msbuild -graph
Microsoft (R) Build Engine version 16.8.0-preview-20451-02+51a1071f8 for .NET
Copyright (C) Microsoft Corporation. All rights reserved.

Stack overflow.

😰 #3757


Microsoft has an internal build system, [CloudBuild](https://www.microsoft.com/research/publication/cloudbuild-microsofts-distributed-and-caching-build-service/), that supports this and has proven that it is effective, but is heuristic-based and requires maintenance.

MSBuild static graph features make it easier to implement a system like CloudBuild by building operations like graph construction and output caching into MSBuild itself.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Has anyone outside of CloudBuild done this and would we have guidance for someone who tried to build one?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on issues, I know a couple of folks have played with it but I don't know if they got anywhere. We don't really have guidance--if there's external interest I think we could provide some.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably not worth it as it'd likely only be the rarest of experts who try this or 1ES.


# Static Graph
## What is static graph for?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any visual recreation of the graph available today (like a dgml diagram or something)? Might be an interesting future project as we invest more here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but I believe only internally. @cdmihai added code that can spit out a DOT file--see for example

var dot = projectGraph.ToDot();

Which produces

graph dot

As you can see, this quickly gets complex!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting, we'd probably need to invest on making that more navigable and readable if we were to expose that. Not sure if there are changes customers can do based on the graph that are worth doing.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, if this becomes the default case I think we'll definitely need to figure out a nice visualization for it. We could do that today, and folks have done so in the past (https://www.visualstudiogeeks.com/blog/msbuild/visual%20studio/msbuild-dependency-visualizer for example), but it hasn't seemed super important.

Copy link
Contributor

@cdmihai cdmihai Sep 22, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The binlog viewer also prints out the project dependency graph :D. But it gets pretty hairy beyond 5-10 projects. However, it's super useful to understand the p2p protocol :)

image

src/MSBuild/XMake.cs Outdated Show resolved Hide resolved
@rainersigwald rainersigwald merged commit a710679 into dotnet:master Oct 7, 2020
@rainersigwald rainersigwald deleted the graph-high-level branch October 7, 2020 14:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area: Static Graph Issues with -graph, -isolate, and the related APIs. Documentation Issues about docs, including errors and areas we should extend (this repo and learn.microsoft.com)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants