-
Notifications
You must be signed in to change notification settings - Fork 47
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Consider adding codeflow.state to capture initial execution state for things like static variables #168
Comments
@lgolding, thinking this over, this data is very specific for sure to dynamic analysis tools and not static. do you agree? |
@michaelcfanning I agree that it is dynamic-specific, and for that reason I would agree to relabel this from At that point, it's moot, but -- I'm not sure that this is "state", if by that, you mean the property
... but it seems like a bit of an abuse. If we're not "really" supporting this, IMO the proper place is the result's property bag -- and then the tool can make use of hierarchical property bag property names:
|
Disagree. Our position is that if we have properties that reasonably can accommodate output from a dynamic analysis tool, then we don't need to require separation for some reason. What I'm arguing for here is that the forms data, web request details, etc. are state that are associated with a result code flow. This is conceptually no different than a static analysis tool that populates state with some assumptions that are driving the analysis. What I have in mind in general is populating an unparsed proper, e.g., 'User Agent'., the value of which is the relevant unparsed string. But producers are free to populate as they see fit. |
Then I'm not sure what you're suggesting. You clearly disagree with using |
How do users specify the 'inaugural' state then? This is a useful question to raise. Consider a code flow that consists of three distinct threads. How should a preliminary value of a shared static variable be specified? Perhaps we should add a state property somewhere else. Probably at the root of the code flow. For anything thread specific, this can be specified at the first thread flow location? |
I agree with you that the root of the code flow is the right place for this information. I still don't agree that the thing you're describing (forms data and HTTP headers) is "state," because it never changes. If we're going to have to add a new property anyway, I'd prefer not to give it that name. It's more like "parameters". But I'm not going to lie down in front of the "state" train. Another option is to use "properties":
|
huh? state that does not change is state. an analysis that starts with an assumption around a variable that is never modified during analysis is a very common scenario. |
Ok, I yield. We add |
:) As part of this discussion, I believe we've uncovered a useful gap in the format from the perspective of a static analysis tool. It is not clear where a code flow would specify initial state for static variables, other data in a multi-thread scenario. So my suggestion is to add codeFlow.state and restrict spec language to the static analysis scenario. It's an interesting idea to consider some non-normative language around how to utilize SARIF for dynamic analysis scenarios where there's sufficient overlap (and to describe perhaps where that gaps are too extreme). But this seems like a low priority compared to other open items. |
Having thought more about this, I suggest |
But even so "state" is problematic for the "headers/forms data" scenario. Here's why: As the user navigates a code flow, the debugger-like "watch window" is going to display the state belonging to the current location. For example, suppose you are on location 1, whose
The watch window displays
Now you navigate to location 2, whose
What should the viewer display? Should it display
... assuming that
... assuming that I claim that it should just display What does that imply for an
To me it implies that That's why I think that persistent information like this, which makes sense throughout the code flow, (and which you might want to view throughout your navigation through the code flow), needs a separate slot in the format. "properties" works for that. |
properties does not work for this, because properties could contain some arbitrary data that's not intended to be viewable and does not allow for general viewer consumption. do me a favor. stop talking about the headers/forms example. you'll note I retitled this issue. let's revisit your example: how do we project that a viewer can most usefully report out on things like static variables that may not be associated with the current selected code location. This is a good question to answer. The answer is not to use the properties bag. Any thoughts? |
"Static" variables can change from location to location, just as local variables can:
The producer should add them to the But if you mean "constant" variables, rather than "static" ones -- there I think is the answer. We could define a new property |
I am focused on this specific question: how does a viewer differentiate between local vs. static variables. In general, when debugging, all viewable static variables do no appear in watch windows. This data can be quite copious. Users can explicitly add a static to a watch window, but this is an explicit question. Now consider your step-through scenario. Does we expect all state to be viewable in a common window? Is it useful to distinguish between variables of varying scope? etc. I believe we need to do some thinking here. You have raised a new concern around constants. Maybe that's valuable, I'm not sure. Some debuggers also have a place for 'automatics', a place to show implied variables that may not be explicit in source code (such as an exception instance that pops up out of nowhere during execution). I appreciate all the good thinking/speculation here. We would do be well-advised to collect some traces from various tools to drive progress. Probably should create a bucket issue around all the new topics that have turned up as this conversation has turned into a more systematic review of this area. |
One way that a SARIF viewer differs from a debugger is that the SARIF producer gets to decide which variables the user will find interesting at each location in the code flow. So it might be less important to segregate variables by type than it is in a debugger. For example, the producer could say:
|
This is a good point, you are noting that one position we could have is that a step-through of a code flow is an extremely directed experience, in that the SARIF data is explicitly designed to show only the relevant information. That's a good value. On the other hand, this implies a certain rigidity, we've limited the ability of viewers to provide alternate/competing views. I actually thought that our spec said that the state shouldn't recapitulate data except for what has changed. This is an optimization that prevents the need to keep repeating state values for data that hasn't changed. Does our spec actually say that? If not, we need yet another issue on code flows. Which continues to be my most urgent concern. With all due respect to our highly intelligent discussion here, I won't really feel like we're making progress until we're driving SARIF v2 examples through multiple tools through multiple viewers. Have you and Chris taken a look at SDV yet in our current implementation of various things? |
The spec does not say that With regard to SDV: The That means that the SDV-to-SARIF converter is already up to date with those changes, and the tests demonstrate that the converter produces valid SARIF in the new format. What more SDV-related validation did you have in mind? |
I would like someone to please assess all existing SARIF v2 samples, whether produced directly or by conversion, in whatever SARIF v2 code flow consuming mechanisms exist today. These include a web control, a VS Code extension and a VS add-in, minimally. I am not attempting to provide a complete test methodology for evaluating SDV. I am hoping that you, or Chris, or another resource, can take this on. Send me mail offline if this isn't clear and we need to work against a more clearly articulated plan. This discussion is getting off topic. |
E-BALLOT PROPOSAL:
|
E-BALLOT PROPOSALEnable the representation of "immutable state": state that remains constant throughout the traversal of a graph or a code flow. This is useful for capturing, for example, HTTP request headers or forms data. Also enable the representation of "initial state": mutable state captured before the start of a code flow or graph traversal. The SCHEMA CHANGES
|
This could be a separate ticket, but it is a change to the description for the state value, so it relates to this ticket. For many of the tools that I have seen that include flows or traversals, when explaining a flow or graph traversal state, they specify the value of an expression as a literal value (expressible in this current design), or as a constraint on the value (currently not not expressible). Such as
This state constraint information that many tools report, could incorporate this without changing the schema by describing that a magic string value such as "{expr}" is used to represent a constraint expression. If "{expr}" exists in the value then it is a constraint (the value may substitute the actual expression for {expr}. If "{expr}" does not exist in the value then this is an equality constraint equivalent to "{expr} == VALUE" where VALUE is the value of the JSON property. The above constraints would then become a state object of
For viewer just displaying the values above would probably be deciphered with the correct meaning by most developers, and advanced viewers could do more to indicate a general constraint instead of just equality. |
This draft contains all the changes through "e-ballot #2," which opened on Friday March 15 and will close on Friday March 22. It contains changes for ballot issues #168, #291, #309, #320, #321, #326, #335, and #341, as well as for previously approved issue #340. It does _not_ contain changes for any issues from "e-ballot #3," which will open on Friday March 22 and close on Friday March 29.
Approved in e-ballot #2. |
@kupsch I will open an issue for your "arbitrary expression" suggestion, and I'll take care of it. Since all the state-ish properties are just string-to-string dictionaries, there's no downside to this, it's non-breaking, it's just words in the spec, and I have the time. @michaelcfanning FYI |
No description provided.
The text was updated successfully, but these errors were encountered: