Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Go: Fix bad join order using late inlined predicate #17437

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

owen-mc
Copy link
Contributor

@owen-mc owen-mc commented Sep 11, 2024

Currently it's evaluating DataFlow::booleanProperty(outcome).checkOn(ret, p.asBoolean(), g) first, which is massive, then joining ret = getUniqueOutputNode(fd, outp) , which makes it a bit smaller, then guardChecks(g, arg.asExpr(), outcome), which makes it empty. So basically, we are looking for code that looks like returns someFunction(arg) and only then checking if someFunction is a function that we care about. I think it would probably fix the problem to force it to evaluate guardChecks(g, arg.asExpr(), outcome) first, because it will generally be small. I tried putting either only_bind pragma on either of the places that outcome was used, but it made no difference. I then tried pulling out the two lines that aren't guardChecks(g, arg.asExpr(), outcome) and putting them in an inline late predicate. I wasn't sure what to use as the bindingset so I used all the variables. That fixed the join problem.

Before:

Evaluated recursive predicate DataFlowUtil::BarrierGuard<TaintTrackingUtil::listOfConstantsComparisonSanitizerGuard>::guardingFunction/5#f413785b@124c4xve in 1ms on iteration 1 (delta size: 0).
Evaluated relational algebra for predicate DataFlowUtil::BarrierGuard<TaintTrackingUtil::listOfConstantsComparisonSanitizerGuard>::guardingFunction/5#f413785b@124c4xve on iteration 1 running pipeline base with tuple counts:
        175394   ~1%    {4} r1 = JOIN `_Properties::Property.checkOn/3#dispred#43ad13f2_num#Properties::IsBoolean#d0c3641f#shared` WITH num#Properties::IsBoolean#d0c3641f ON FIRST 1 OUTPUT Lhs.2, Lhs.1, Lhs.3, Rhs.1
          2385   ~0%    {6}    | JOIN WITH `DataFlowUtil::getUniqueOutputNode/2#05007054_201#join_rhs` ON FIRST 1 OUTPUT Rhs.2, Rhs.1, Lhs.0, Lhs.1, Lhs.2, Lhs.3
          2385   ~0%    {5}    | JOIN WITH `FunctionInputsAndOutputs::FunctionOutput.getEntryNode/1#dispred#e470844c` ON FIRST 3 OUTPUT Lhs.1, Lhs.3, Lhs.4, Lhs.5, Lhs.0
          2385   ~5%    {6}    | JOIN WITH `Scopes::DeclaredFunction.getFuncDecl/0#dispred#657c96f0_10#join_rhs` ON FIRST 1 OUTPUT Lhs.2, Lhs.1, Lhs.3, Lhs.0, Lhs.4, Rhs.1
             0   ~0%    {6}    | JOIN WITH `TaintTrackingUtil::listOfConstantsComparisonSanitizerGuard/3#e445654a_021#join_rhs` ON FIRST 2 OUTPUT Rhs.2, Lhs.0, Lhs.2, Lhs.3, Lhs.4, Lhs.5
                    
             0   ~0%    {6} r2 = JOIN r1 WITH DataFlowUtil::ExprNode#9f54f090_20#join_rhs ON FIRST 1 OUTPUT Lhs.3, Rhs.1, Lhs.1, Lhs.2, Lhs.4, Lhs.5
                    
             0   ~0%    {6} r3 = JOIN r1 WITH DataFlowUtil::ExprNode#9f54f090_20#join_rhs ON FIRST 1 OUTPUT Rhs.1, Lhs.1, Lhs.2, Lhs.3, Lhs.4, Lhs.5
             0   ~0%    {6}    | JOIN WITH `#DataFlowUtil::localFlowStep/2#edbc0f9aPlus#swapped` ON FIRST 1 OUTPUT Lhs.3, Rhs.1, Lhs.1, Lhs.2, Lhs.4, Lhs.5
                    
             0   ~0%    {6} r4 = r2 UNION r3
             0   ~0%    {5}    | JOIN WITH `FunctionInputsAndOutputs::FunctionInput.getExitNode/1#dispred#f3d2cd98_120#join_rhs` ON FIRST 2 OUTPUT Lhs.2, Lhs.5, Rhs.2, Lhs.4, Lhs.3
                        return r4

After:

Evaluated recursive predicate DataFlowUtil::BarrierGuard<TaintTrackingUtil::listOfConstantsComparisonSanitizerGuard>::guardingFunction/5#f413785b@41fc6xau in 0ms on iteration 1 (delta size: 0).
Evaluated relational algebra for predicate DataFlowUtil::BarrierGuard<TaintTrackingUtil::listOfConstantsComparisonSanitizerGuard>::guardingFunction/5#f413785b@41fc6xau on iteration 1 running pipeline base with tuple counts:
          60  ~0%    {6} r1 = JOIN `_#DataFlowUtil::localFlowStep/2#edbc0f9aPlus_DataFlowUtil::ExprNode#9f54f090_FunctionInputsAndOutput__#shared` WITH `TaintTrackingUtil::listOfConstantsComparisonSanitizerGuard/3#e445654a_102#join_rhs` ON FIRST 1 OUTPUT Lhs.3, Lhs.1, Lhs.2, Lhs.4, Rhs.1, Rhs.2
        1132  ~4%    {8}    | JOIN WITH `FunctionInputsAndOutputs::FunctionOutput.getEntryNode/1#dispred#e470844c_102#join_rhs` ON FIRST 1 OUTPUT Lhs.0, Rhs.1, Rhs.2, Lhs.1, Lhs.2, Lhs.3, Lhs.4, Lhs.5
           0  ~0%    {7}    | JOIN WITH `DataFlowUtil::getUniqueOutputNode/2#05007054` ON FIRST 3 OUTPUT Lhs.3, Lhs.4, Lhs.5, Lhs.6, Lhs.7, Lhs.1, Lhs.2
           0  ~0%    {8}    | JOIN WITH num#Properties::IsBoolean#d0c3641f_10#join_rhs ON FIRST 1 OUTPUT Lhs.6, Rhs.1, Lhs.3, Lhs.0, Lhs.1, Lhs.2, Lhs.4, Lhs.5
           0  ~0%    {7}    | JOIN WITH `Properties::Property.checkOn/3#dispred#43ad13f2_1230#join_rhs` ON FIRST 3 OUTPUT Lhs.6, Rhs.3, Lhs.3, Lhs.4, Lhs.5, Lhs.2, Lhs.7
           0  ~0%    {5}    | JOIN WITH num#Properties::IsBoolean#d0c3641f ON FIRST 2 OUTPUT Lhs.5, Lhs.3, Lhs.4, Lhs.6, Lhs.2
                     return r1

@owen-mc owen-mc added the no-change-note-required This PR does not need a change note label Sep 11, 2024
@owen-mc owen-mc requested a review from a team as a code owner September 11, 2024 09:22
@github-actions github-actions bot added the Go label Sep 11, 2024
Copy link
Contributor

@smowton smowton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Per https://codeql.github.com/docs/ql-language-reference/annotations/#pragma-inline-late, the bindingset covering all parameters means that all parameters should be bound (constrained by evaluating something else) before the helper is scheduled. Therefore it will usually go last (though if there was another predicate referring to all vars, say filter(g, outp, p, fd, ret, outcome), it would be ok for the filter and the helper to go in either order).

If it would be a better idea for the helper to go before guardChecks, then it should be given a looser binding set -- so specifically excluding their common variables g and outcome, so that the join orderer could go for "I shall constrain g and outcome by running guardingFunctionHelper, and only then run guardChecks", which would presently be an illegal order. Of course it would still have the choice of the reverse order, since guardChecks has no bindingset constraint, so that might be a hint that a different optimisation strategy, such as giving guardChecks itself some binding-order hints.

// (and we have (this has outcome ==> arg is checked))
// but p.checkOn(ret, outcome, this) gives us (ret has outcome ==> p holds of this),
// so we need to swap outcome and (specifically boolean) p:
DataFlow::booleanProperty(pragma[only_bind_into](outcome)).checkOn(ret, p.asBoolean(), g)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the binding order hint required here now?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doh

We want this predicate to be ordered after
`guardChecks(g, arg.asExpr(), outcome)`, so that the relatively small
number of possible `g` restricts the number of tuples from getting too
large.
@owen-mc
Copy link
Contributor Author

owen-mc commented Sep 11, 2024

I've addressed the review comments. The join order for the standard case is now as below. Because the structure of guardingFunctionguardingFunction is A or B or C, where B is not recursive but A and C are, B is only evaluated once (in the base case) and never in the standard pipeline. So I don't think that join ordering pragmas for a predicate used only in B will affect the join order of the standard pipeline of guardingFunction.

Evaluated recursive predicate DataFlowUtil::BarrierGuard<TaintTrackingUtil::listOfConstantsComparisonSanitizerGuard>::guardingFunction/5#f413785b@6841axn9 in 0ms on iteration 2 (delta size: 2).
Evaluated relational algebra for predicate DataFlowUtil::BarrierGuard<TaintTrackingUtil::listOfConstantsComparisonSanitizerGuard>::guardingFunction/5#f413785b@6841axn9 on iteration 2 running pipeline standard with tuple counts:
          0   ~0%    {5} r1 = SCAN `DataFlowUtil::BarrierGuard<TaintTrackingUtil::listOfConstantsComparisonSanitizerGuard>::guardingFunction/5#f413785b#prev_delta` OUTPUT In.2, In.0, In.1, In.3, In.4
          0   ~0%    {9}    | JOIN WITH `_DataFlowUtil::getUniqueOutputNode/2#05007054_FunctionInputsAndOutputs::FunctionInput.getNode/1#disp__#join_rhs` ON FIRST 1 OUTPUT Lhs.2, Rhs.5, Rhs.1, Rhs.2, Rhs.3, Rhs.4, Lhs.1, Lhs.3, Lhs.4
                 
          0   ~0%    {8} r2 = JOIN r1 WITH `Scopes::Function.getACall/0#ab99e5da` ON FIRST 2 OUTPUT Lhs.7, Lhs.1, Lhs.5, Lhs.2, Lhs.3, Lhs.4, Lhs.6, Lhs.8
          0   ~0%    {5}    | JOIN WITH `FunctionInputsAndOutputs::FunctionOutput.getExitNode/1#dispred#4c5ce77c` ON FIRST 3 OUTPUT Lhs.6, Lhs.3, Lhs.4, Lhs.5, Lhs.7
                 
         35   ~0%    {3} r3 = SCAN `DataFlowUtil::BarrierGuard<TaintTrackingUtil::listOfConstantsComparisonSanitizerGuard>::guards/3#226617c0#prev_delta` OUTPUT In.2, In.0, In.1
        256   ~3%    {7}    | JOIN WITH `_FunctionInputsAndOutputs::FunctionOutput.getEntryNode/1#dispred#e470844c_102#join_rhs__#DataFlowUti__#join_rhs` ON FIRST 1 OUTPUT Lhs.2, Rhs.1, Rhs.2, Rhs.3, Rhs.4, Rhs.5, Lhs.1
        470   ~0%    {7}    | JOIN WITH `ControlFlowGraph::ControlFlow::ConditionGuardNode.dominates/1#115c031e` ON FIRST 1 OUTPUT Lhs.5, Rhs.1, Lhs.1, Lhs.2, Lhs.3, Lhs.4, Lhs.6
         27   ~0%    {6}    | JOIN WITH `DataFlowUtil::Node.getBasicBlock/0#dispred#5a814384` ON FIRST 2 OUTPUT Lhs.3, Lhs.5, Lhs.0, Lhs.2, Lhs.4, Lhs.6
                 
          2   ~0%    {5} r4 = JOIN r3 WITH `DataFlowUtil::onlyPossibleReturnOfNonNil/3#dbc218f1` ON FIRST 3 OUTPUT _, Lhs.3, Lhs.4, Lhs.1, Lhs.5
          2   ~0%    {5}    | REWRITE WITH Out.0 := false
          2   ~0%    {5}    | JOIN WITH num#Properties::IsNil#e058e637 ON FIRST 1 OUTPUT Lhs.4, Lhs.1, Lhs.2, Lhs.3, Rhs.1
                 
          0   ~0%    {5} r5 = JOIN r3 WITH `DataFlowUtil::onlyPossibleReturnOfNil/3#6fe7eb3d` ON FIRST 3 OUTPUT _, Lhs.3, Lhs.4, Lhs.1, Lhs.5
          0   ~0%    {5}    | REWRITE WITH Out.0 := true
          0   ~0%    {5}    | JOIN WITH num#Properties::IsNil#e058e637 ON FIRST 1 OUTPUT Lhs.4, Lhs.1, Lhs.2, Lhs.3, Rhs.1
                 
          0   ~0%    {5} r6 = JOIN r3 WITH `DataFlowUtil::onlyPossibleReturnOfBool/4#f6e4bd83` ON FIRST 3 OUTPUT Rhs.3, Lhs.3, Lhs.4, Lhs.1, Lhs.5
          0   ~0%    {5}    | JOIN WITH num#Properties::IsBoolean#d0c3641f ON FIRST 1 OUTPUT Lhs.4, Lhs.1, Lhs.2, Lhs.3, Rhs.1
                 
          0   ~0%    {8} r7 = JOIN r1 WITH `Scopes::Function.getACall/0#ab99e5da` ON FIRST 2 OUTPUT Lhs.8, Lhs.5, Lhs.2, Lhs.3, Lhs.4, Lhs.1, Lhs.6, Lhs.7
          0   ~0%    {8}    | JOIN WITH `Properties::Property.checkOn/3#dispred#43ad13f2` ON FIRST 2 OUTPUT Lhs.7, Lhs.5, Rhs.3, Lhs.2, Lhs.3, Lhs.4, Lhs.6, Rhs.2
          0   ~0%    {5}    | JOIN WITH `FunctionInputsAndOutputs::FunctionOutput.getExitNode/1#dispred#4c5ce77c` ON FIRST 3 OUTPUT Lhs.7, Lhs.3, Lhs.4, Lhs.5, Lhs.6
          0   ~0%    {5}    | JOIN WITH num#Properties::IsBoolean#d0c3641f ON FIRST 1 OUTPUT Lhs.4, Lhs.1, Lhs.2, Lhs.3, Rhs.1
                 
          2   ~0%    {5} r8 = r2 UNION r4 UNION r5 UNION r6 UNION r7
          2   ~0%    {5}    | AND NOT `DataFlowUtil::BarrierGuard<TaintTrackingUtil::listOfConstantsComparisonSanitizerGuard>::guardingFunction/5#f413785b#prev`(FIRST 5)
                     return r8

Copy link
Contributor

@smowton smowton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM if DCA shows positive results.

@owen-mc
Copy link
Contributor Author

owen-mc commented Sep 11, 2024

DCA shows nothing at all. Is that acceptible? Or should it be removing the bad join order seen in the nightly DCA run?

@smowton
Copy link
Contributor

smowton commented Sep 12, 2024

How did you notice it in the first place? I recall you said it was flagged in DCA? Can you tell if that signal has gone away?

@owen-mc
Copy link
Contributor Author

owen-mc commented Sep 17, 2024

The DCA results didn't show the bad join order before and no bad join order after, as I'd hoped. Maybe I'm just struggling to run DCA in exactly the right way. I will try to pick this up again at some point.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Go no-change-note-required This PR does not need a change note
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants