Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add EnableLogging and LogLabel supports for Node NetworkPolicy #6626

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

hongliangl
Copy link
Contributor

@hongliangl hongliangl commented Aug 23, 2024

For #6525

This commit introduces limited support for traffic logging in Node
NetworkPolicy. The limitations are:

  • Traffic logs are written only to the system log (not managed by Antrea).
    Users can filter logs using syslog filters.
  • The LogLabel for Node NetworkPolicy is restricted to a maximum of
    12 characters.

Node NetworkPolicy's data path is implemented via iptables. An iptables
"non-terminating target" LOG is added before the final matching rule to
log packets to the system kernel log. The logs provide packet match details,
such as:

Sep  2 10:31:07 k8s-node-control-plane kernel: [6657320.789675] Antrea:I:Allow:allow-http:IN=ens224 OUT= MAC=00:50:56:a7:fb:18:00:50:56:a7:23:47:08:00 SRC=10.10.0.10 DST=192.168.240.200 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=52813 DF PROTO=TCP SPT=57658 DPT=80 WINDOW=64240 RES=0x00 SYN URGP=0
Sep  2 10:31:11 k8s-node-control-plane kernel: [6657324.899219] Antrea:I:Drop:default-drop:IN=ens224 OUT= MAC=00:50:56:a7:fb:18:00:50:56:a7:23:47:08:00 SRC=192.168.240.201 DST=192.168.240.200 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=27486 DF PROTO=TCP SPT=33152 DPT=80 WINDOW=64240 RES=0x00 SYN URGP=0

The log prefix (e.g., Antrea:I:Allow:allow-http:) is up to 29 characters long
and includes a user-provided log label (up to 12 characters). The log prefix format:

|---1--| |2| |---3--| |----------4--------|
|Antrea|:|I|:|Reject|:|user-provided label|:|
|6     |1|1|1|4-6   |1|1-12               |1|
  • Part 1: Fixed, "Antrea"
  • Part 2: Direction, "I" (In) or "O" (Out)
  • Part 3: Action, "Allow", "Drop", or "Reject"
  • Part 4: User-provided log label, up to 12 characters

@hongliangl hongliangl added the action/release-note Indicates a PR that should be included in release notes. label Aug 23, 2024
@hongliangl hongliangl added this to the Antrea v2.2 release milestone Aug 23, 2024
@hongliangl hongliangl force-pushed the 20240822-nnp-enable-logging branch 2 times, most recently from 82bc18c to 1589c2a Compare September 2, 2024 11:23
@hongliangl hongliangl marked this pull request as ready for review September 2, 2024 11:23
@hongliangl hongliangl force-pushed the 20240822-nnp-enable-logging branch 2 times, most recently from a829204 to 43efbe4 Compare September 3, 2024 03:02
pkg/controller/networkpolicy/validate.go Outdated Show resolved Hide resolved
docs/antrea-node-network-policy.md Outdated Show resolved Hide resolved
docs/antrea-node-network-policy.md Outdated Show resolved Hide resolved
docs/antrea-node-network-policy.md Outdated Show resolved Hide resolved
docs/antrea-node-network-policy.md Outdated Show resolved Hide resolved
docs/antrea-node-network-policy.md Outdated Show resolved Hide resolved
// used as iptables log prefix. According to https://ipset.netfilter.org/iptables-extensions.man.html, the prefix is
// up to 29 letters long.
if len(logLabel) > 28 {
klog.InfoS("The log label exceeds 28 characters and will be truncated", "logLabel", logLabel)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we log this in the validation webhook (instead or in addition)? cc @tnqn for his opinion

Copy link
Contributor Author

@hongliangl hongliangl Sep 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fine with validating with a webhook.

One more thing, we need to add some basic information about the rule to the logLabel which will be used as iptables log prefix. The worst case is like the following:


| Antrea |:|Out|:|Reject|:| [user-provided label] |:|
| 6      |1|3  |1|6     |1| n                     |1|

As a result, the user-provided log label (n) is up to 10 characters long. Will that be sufficient?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that having a fixed-limit, regardless of the rule direction or the rule action, may be the right approach.
Maybe we could replace "Out" with "O" and "In" with "I". I would give us 2 extra characters (making the limit 12 for the user-provided label), without really compromising readability too much. I was thinking of doing the same for the action, but I think it would hurt readability too much?

12 characters is not much, but we are heavily constrained by the implementation.
I was initially thinking of failing in the validation webhook if the limit is exceeded, but I now feel like it is better to just log a message warning the user. I'd log it once in the webhook (controller) and once in the agent.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd log it once in the webhook (controller) and once in the agent.

Do you mean that something should be shown when a Node NetworkPolicy with a log label over 12 characters?

I made some changes to the validator. If I understood correctly, what I expected is that: when I use kubectl to apply a Node NetworkPolicy with a log label over 12 characters, a warning message will be shown in the terminal. It seems that there is something wrong, after applying such Node NetworkPolicy, nothing is shown in the terminal.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you will need to use the Warnings field of the AdmissionResponse: https://pkg.go.dev/k8s.io/api/admission/v1#AdmissionResponse

I am not sure we have done it in the past. Is this what you tried?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! That's what I need!

// used as iptables log prefix. According to https://ipset.netfilter.org/iptables-extensions.man.html, the prefix is
// up to 29 letters long.
if len(logLabel) > 28 {
klog.InfoS("The log label exceeds 28 characters and will be truncated", "logLabel", logLabel)
Copy link
Contributor

@luolanzone luolanzone Sep 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This log is a little misleading, probably better to include the logLabel with prefix and the original one in Node NP.

Suggested change
klog.InfoS("The log label exceeds 28 characters and will be truncated", "logLabel", logLabel)
klog.InfoS("The log label exceeds 28 characters and will be truncated", "logLabel", logLabel, "rule.LogLabel", rule.LogLabel)

I am wondering will this kind of truncation has any side-affect? this label seems likely to be duplicated in different Node NPs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this label seems likely to be duplicated in different Node NPs.

Do you mean that different Node NPs that have that same logLabels?

@hongliangl hongliangl force-pushed the 20240822-nnp-enable-logging branch 3 times, most recently from 3dfb4ee to 5df6efe Compare September 10, 2024 03:32
@hongliangl hongliangl force-pushed the 20240822-nnp-enable-logging branch 3 times, most recently from 503637b to fa40171 Compare September 10, 2024 07:34
Copy link
Contributor

@antoninbas antoninbas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good on my side, only minor comments

docs/antrea-node-network-policy.md Outdated Show resolved Hide resolved
Comment on lines 89 to 90
EnableLogging: true,
LogLabel: "ingress1",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just want to make sure that we still have a policy rule for which EnableLogging is false, for the sake of testing?

Copy link
Contributor Author

@hongliangl hongliangl Sep 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You just reminded me. I left one policy with EnableLogging set to false and another policy with EnableLogging set to true but with LogLabel left empty, for the sake of testing.

pkg/controller/networkpolicy/validate.go Outdated Show resolved Hide resolved
@hongliangl hongliangl force-pushed the 20240822-nnp-enable-logging branch 2 times, most recently from 446c1da to 4fef712 Compare September 11, 2024 06:26
antoninbas
antoninbas previously approved these changes Sep 11, 2024
Copy link
Contributor

@antoninbas antoninbas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good on my side
@luolanzone do you want to do another review?

@luolanzone
Copy link
Contributor

Looks good on my side @luolanzone do you want to do another review?

Probably no time this week, I will check with hongliang offline for the question I posted. @antoninbas you can go ahead. Thanks.

@antoninbas
Copy link
Contributor

/test-all

@antoninbas
Copy link
Contributor

@hongliangl I just realized, shouldn't we have an e2e test for this? Without it we only have unit tests and no guarantee that it actually works

@hongliangl
Copy link
Contributor Author

@hongliangl I just realized, shouldn't we have an e2e test for this? Without it we only have unit tests and no guarantee that it actually works

Make sense, I'll add.

Comment on lines 916 to 932
// Before generating some traffic, there should be no corresponding log on hostNetwork Pod x/a.
stdout, _, err := data.RunCommandFromPod(getNS("x"), podXaName, "c80", []string{"dmesg"})
require.NoError(t, err)
require.NotContains(t, stdout, expectedLogPrefix)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this would fail if I wanted to run the test locally multiple times on the same Kind cluster (e.g., because I am troubleshooting something related to Node NPs).

It should be possible to run the test multiple times on the same cluster. One acceptable approach could be to have a random label, and include a comment in the code explaining why the label is random. There may be better solutions though.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have added a random label.

Besides, this test cannot be run on github Kind cluster. In a Kind cluster, each Kubernetes Node runs as a Docker container. The iptables LOG target doesn't generate kernel logs inside containers unless /proc/sys/net/netfilter/nf_log_all_netns is set to 1. However, this setting cannot be modified in GitHub Actions.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's fine to skip that specific test on Kind

@hongliangl hongliangl force-pushed the 20240822-nnp-enable-logging branch 9 times, most recently from ed07d42 to 8eebffc Compare September 25, 2024 02:10
@hongliangl
Copy link
Contributor Author

/test-all-features-conformance

Comment on lines +899 to +904
// In a Kind cluster, each Kubernetes Node runs as a Docker container. The iptables LOG target doesn't generate
// kernel logs inside containers unless /proc/sys/net/netfilter/nf_log_all_netns is set to 1. However, this setting
// cannot be modified in GitHub Actions.
skipIfProviderIs(t, "kind", "The iptables LOG action doesn't generate log in container")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you think we should add a note about this in the documentation, in case someone is trying to use the feature on a Kind cluster and cannot figure out why it's not working?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make sense. We should mention it in the documentation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added.

test/e2e/nodenetworkpolicy_test.go Outdated Show resolved Hide resolved
test/e2e/nodenetworkpolicy_test.go Outdated Show resolved Hide resolved
Comment on lines 938 to 939
require.Contains(t, stdout, expectedLogPrefix)
require.NotContains(t, stdout, toBeTruncated)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: assert is more appropriate than require here IMO

IMO we should do something better here. We should locate the actual log entry (maybe with a regex) and validate independent fields.

BTW, does the log always show up immediately or should we use Poll to avoid flakes?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have updated this part:

  • Add a regex to match source/destination IP, random log label
  • Use assert.EventuallyWithT to avoid flakes.


// Generate some traffic that will be dropped by acnp-drop-x-to-y-egress.
_, err = k8sUtils.Probe(getNS("x"), "a", getNS("y"), "a", p80, ProtocolTCP, nil, nil)
require.NoError(t, err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure when to use failOnError or require.NoError directly?

Copy link
Contributor Author

@hongliangl hongliangl Sep 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure either. I saw that failOnError will delete the test namespace. In this test, I don't think it is necessary to delete the test namespace when getting an error. Maybe require.NoError is sufficient.

@hongliangl hongliangl force-pushed the 20240822-nnp-enable-logging branch 3 times, most recently from dbe52ad to c2b86b9 Compare September 26, 2024 11:57
This commit introduces limited support for traffic logging in Node
NetworkPolicy. The limitations are:

- Traffic logs are written only to the system log (not managed by Antrea).
  Users can filter logs using syslog filters.
- The `LogLabel` for Node NetworkPolicy is restricted to a maximum of
  12 characters.

Node NetworkPolicy's data path is implemented via iptables. An iptables
"non-terminating target" `LOG` is added before the final matching rule to
log packets to the system kernel log. The logs provide packet match details,
such as:

```
Sep  2 10:31:07 k8s-node-control-plane kernel: [6657320.789675] Antrea:I:Allow:allow-http:IN=ens224 OUT= MAC=00:50:56:a7:fb:18:00:50:56:a7:23:47:08:00 SRC=10.10.0.10 DST=192.168.240.200 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=52813 DF PROTO=TCP SPT=57658 DPT=80 WINDOW=64240 RES=0x00 SYN URGP=0
Sep  2 10:31:11 k8s-node-control-plane kernel: [6657324.899219] Antrea:I:Drop:default-drop:IN=ens224 OUT= MAC=00:50:56:a7:fb:18:00:50:56:a7:23:47:08:00 SRC=192.168.240.201 DST=192.168.240.200 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=27486 DF PROTO=TCP SPT=33152 DPT=80 WINDOW=64240 RES=0x00 SYN URGP=0
```

The log prefix (e.g., `Antrea:I:Allow:allow-http:`) is up to 29 characters long
and includes a user-provided log label (up to 12 characters). The log prefix format:

```
|---1--| |2| |---3--| |----------4--------|
|Antrea|:|I|:|Reject|:|user-provided label|:|
|6     |1|1|1|4-6   |1|1-12               |1|
```

- Part 1: Fixed, "Antrea"
- Part 2: Direction, "I" (In) or "O" (Out)
- Part 3: Action, "Allow", "Drop", or "Reject"
- Part 4: User-provided log label, up to 12 characters

Signed-off-by: Hongliang Liu <hongliang.liu@broadcom.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
action/release-note Indicates a PR that should be included in release notes.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants