Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add detail about customization, tree sharing, and decision point scope #242

Merged
merged 7 commits into from
Jun 27, 2023
61 changes: 59 additions & 2 deletions doc/md_src_files/060_decision-trees.md
Original file line number Diff line number Diff line change
Expand Up @@ -124,7 +124,15 @@ In our case, we are not attempting to fit a tree to data.
Rather, we are interested in producing usable trees that minimize extraneous effort.
To that end, we briefly examine the qualities for which decision tree measurement is suitable.

Decision tree construction methods must address four significant concerns: feature selection, feature type, overfitting, and parsimony.
### Decision Tree Construction Concerns

Decision tree construction methods must address four significant concerns:
- feature selection
- feature type
- overfitting
- parsimony

#### Feature selection

Feature selection is perhaps the most important consideration for SSVC, because it directly affects the information gathering requirements placed on the analyst attempting to use the tree.
Each decision point in SSVC is a feature.
Expand All @@ -136,13 +144,18 @@ If nothing else, this means analysts are spending time gathering evidence to mak
The added details also make it harder for the decision process to accurately manage the risks in question.
This difficulty arises because more variance and complexity there is in the decision increases the possibility of errors in the decision process itself.

#### Feature types

Regarding feature types, all of the features included in SSVC version 2 can be considered ordinal data.
That is, while they can be ordered (e.g., for Exploitation, active is greater than poc is greater than none), they can not be compared via subtraction or division (active - poc = nonsense).
The use of ordinal features is a key assumption behind our use of the parsimony analysis that follows.

#### Overfitting

When decision trees are used in a machine learning context, overfitting increases tree complexity by incorporating the noise in the training data set into the decision points in a tree.
In our case, our “data” is just the set of outcomes as decided by humans, so overfitting is less of a concern, assuming the feature selection has been done with care.

#### Parsimony
Parsimony is, in essence, Occam's Razor applied to tree selection. Given the choice between two trees that have identical outputs, one should choose the tree with fewer decisions.
One way to evaluate the parsimony of a tree is by applying the concept of feature importance to ensure that each feature is contributing adequately to the result.
While there are a few ways to compute feature importance, the one we found most useful is permutation importance.
Expand Down Expand Up @@ -174,9 +187,53 @@ Thus, 60 unique combinations of decision values is the point at which a decision
SSVC trees should be identifiable by name and version. A tree name is simply a short descriptive label for the tree derived from the stakeholder and/or function the tree is intended for. Tree versions are expected to share the major and minor version numbers with the SSVC version in which their decision points are defined. Revisions should increment the patch number. For example: “Applier Tree v1.1.0” would be the identity of the version of the Applier Tree as published in version 1.1 of SSVC.
“Coordinator Publish Tree v2.0.3” would be the identity of a future revision of the Coordinator Publish Tree as described in this document. The terms “major”, “minor”, and “patch” with respect to version numbering are intended to be consistent with [Semantic Versioning 2.0.0](https://semver.org/spec/v2.0.0.html).

### Sharing Trees With Others

Communities of shared interest may desire to share information about decision points or even create custom trees to share within their community.
Examples include:
- an Information Sharing and Analysis Organization (ISAO) within a critical infrastructure sector might want to define a custom decision point relevant to their constituents' regulatory compliance.
- a corporate Computer Security Incident Response Team (CSIRT) might choose to adjust decision priorities for an existing tree for use by its subsidiaries.
- a government department might define a separate tree using existing decision points to address a particular governance process within their constituent agencies.
- a regional coordinator might want to produce decision point information as a product of its threat analysis work and provide this information to its constituency in an advisory.

In these and other scenarios, there are two scopes to consider:
1. Decision Point Scope
2. Decision Tree Scope

#### Decision Point Scope

Each decision point defined in this document has a characteristic scope, either *global* or *local*.

- *Globally scoped decision points* describe the state of the world outside the decision maker's environment.
They form the background context in which the stakeholder is making prioritization decisions.
Nearly all stakeholders should agree on the assignment of specific values to these decision points.
- *Locally scoped decision points* are expected to be contextual to some decision makers.
Mission Impact is one such example.
Information about a locally scoped decision point can still be inherited by others using the same tree.
For example in the corporate CSIRT scenario above, the System Exposure value might be consistent across all subsidiaries for a centrally managed service.

ahouseholder marked this conversation as resolved.
Show resolved Hide resolved
#### Decision Tree Scope

Two kinds of modifications are possible at the decision tree level.

- A *Risk Appetite Shift* retains the structure of an existing tree and all its decision points, and simply adjusts the decision outputs according to the stakeholder's risk appetite.
For example, an organization with sufficient resources to efficiently deploy fixes might choose to defer fewer cases than the default tree would recommend.
- *Tree Customization* can be done in one of three ways:
1. incorporating an already-defined decision point into an existing tree that does not already contain it.
2. defining a new decision point and adding it to an existing tree.
Note that adding or removing an option from an existing decision point should be treated as creating a new decision point.
The new decision point should be given a distinct name as well.
3. defining a new tree entirely from existing or new decision points

Because tree customization changes the tree structure and implies the addition or removal of leaf nodes, it will be necessary for the organization to review the decision outputs in light of its risk appetite as well.

Risk-shifted or customized trees can be shared among a community of interest, of course.
Further customization within each stakeholder remains an option as well, although there is likely a diminishing return on more than a few layers of customization for the same basic decision.
Of course, SSVC users might choose to construct other trees to inform other decisions.

## Guidance for Evidence Gathering

To answer each of these decision points, a supplier or deployer should, as much as possible, have a repeatable evidence collection and evaluation process. However, we are proposing decisions for humans to make, so evidence collection and evaluation is not totally automatable. That caveat notwithstanding, some automation is possible.
To answer each of these decision points, a stakeholder should, as much as possible, have a repeatable evidence collection and evaluation process. However, we are proposing decisions for humans to make, so evidence collection and evaluation is not totally automatable. That caveat notwithstanding, some automation is possible.

For example, whether exploitation modules are available in ExploitDB, Metasploit, or other sources is straightforward.
We hypothesize that searching Github and Pastebin for exploit code can be captured in a script.
Expand Down