Skip to content

Commit

Permalink
Merge branch 'backport/NET-1594/trivially-giving-clam' of ssh://githu…
Browse files Browse the repository at this point in the history
…b.com/hashicorp/consul into backport/NET-1594/trivially-giving-clam
  • Loading branch information
absolutelightning committed Sep 4, 2023
2 parents 36213ab + 7a2746d commit 304cb82
Show file tree
Hide file tree
Showing 10 changed files with 60 additions and 14 deletions.
3 changes: 3 additions & 0 deletions .changelog/18636.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
```release-note:bug
connect: Fix issue where Envoy endpoints would not populate correctly after a snapshot restore.
```
8 changes: 8 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,9 @@
## 1.16.1 (August 8, 2023)

KNOWN ISSUES:

* connect: Consul versions 1.16.0 and 1.16.1 may have issues when a snapshot restore is performed and the servers are hosting xDS streams. When this bug triggers, it will cause Envoy to incorrectly populate upstream endpoints. This bug only impacts agent-less service mesh and should be fixed in Consul 1.16.2 by [GH-18636](https://github.com/hashicorp/consul/pull/18636).

SECURITY:

* Update `golang.org/x/net` to v0.13.0 to address [CVE-2023-3978](https://nvd.nist.gov/vuln/detail/CVE-2023-3978). [[GH-18358](https://github.com/hashicorp/consul/issues/18358)]
Expand Down Expand Up @@ -59,6 +63,10 @@ https://github.com/rboyer/safeio/pull/3 [[GH-18302](https://github.com/hashicorp

## 1.16.0 (June 26, 2023)

KNOWN ISSUES:

* connect: Consul versions 1.16.0 and 1.16.1 may have issues when a snapshot restore is performed and the servers are hosting xDS streams. When this bug triggers, it will cause Envoy to incorrectly populate upstream endpoints. This bug only impacts agent-less service mesh and should be fixed in Consul 1.16.2 by [GH-18636](https://github.com/hashicorp/consul/pull/18636).

BREAKING CHANGES:

* api: The `/v1/health/connect/` and `/v1/health/ingress/` endpoints now immediately return 403 "Permission Denied" errors whenever a token with insufficient `service:read` permissions is provided. Prior to this change, the endpoints returned a success code with an empty result list when a token with insufficient permissions was provided. [[GH-17424](https://github.com/hashicorp/consul/issues/17424)]
Expand Down
2 changes: 1 addition & 1 deletion agent/agent.go
Original file line number Diff line number Diff line change
Expand Up @@ -4568,7 +4568,7 @@ func (a *Agent) proxyDataSources() proxycfg.DataSources {
// interact with ACLs and the streaming backend. See comments in `proxycfgglue.ServerHealthBlocking`
// for more details.
// sources.Health = proxycfgglue.ServerHealth(deps, proxycfgglue.ClientHealth(a.rpcClientHealth))
sources.Health = proxycfgglue.ServerHealthBlocking(deps, proxycfgglue.ClientHealth(a.rpcClientHealth), server.FSM().State())
sources.Health = proxycfgglue.ServerHealthBlocking(deps, proxycfgglue.ClientHealth(a.rpcClientHealth))
sources.HTTPChecks = proxycfgglue.ServerHTTPChecks(deps, a.config.NodeName, proxycfgglue.CacheHTTPChecks(a.cache), a.State)
sources.Intentions = proxycfgglue.ServerIntentions(deps)
sources.IntentionUpstreams = proxycfgglue.ServerIntentionUpstreams(deps)
Expand Down
3 changes: 3 additions & 0 deletions agent/proxycfg-glue/glue.go
Original file line number Diff line number Diff line change
Expand Up @@ -52,6 +52,9 @@ type Store interface {
PeeringTrustBundleList(ws memdb.WatchSet, entMeta acl.EnterpriseMeta) (uint64, []*pbpeering.PeeringTrustBundle, error)
TrustBundleListByService(ws memdb.WatchSet, service, dc string, entMeta acl.EnterpriseMeta) (uint64, []*pbpeering.PeeringTrustBundle, error)
VirtualIPsForAllImportedServices(ws memdb.WatchSet, entMeta acl.EnterpriseMeta) (uint64, []state.ServiceVirtualIP, error)
CheckConnectServiceNodes(ws memdb.WatchSet, serviceName string, entMeta *acl.EnterpriseMeta, peerName string) (uint64, structs.CheckServiceNodes, error)
CheckIngressServiceNodes(ws memdb.WatchSet, serviceName string, entMeta *acl.EnterpriseMeta) (uint64, structs.CheckServiceNodes, error)
CheckServiceNodes(ws memdb.WatchSet, serviceName string, entMeta *acl.EnterpriseMeta, peerName string) (uint64, structs.CheckServiceNodes, error)
}

// CacheCARoots satisfies the proxycfg.CARoots interface by sourcing data from
Expand Down
29 changes: 20 additions & 9 deletions agent/proxycfg-glue/health_blocking.go
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,6 @@ import (
"github.com/hashicorp/go-memdb"

"github.com/hashicorp/consul/acl"
"github.com/hashicorp/consul/agent/consul/state"
"github.com/hashicorp/consul/agent/consul/watch"
"github.com/hashicorp/consul/agent/proxycfg"
"github.com/hashicorp/consul/agent/structs"
Expand All @@ -39,14 +38,13 @@ import (
// because proxycfg has a `req.Source.Node != ""` which prevents the `streamingEnabled` check from passing.
// This means that while agents should technically have this same issue, they don't experience it with mesh health
// watches.
func ServerHealthBlocking(deps ServerDataSourceDeps, remoteSource proxycfg.Health, state *state.Store) *serverHealthBlocking {
return &serverHealthBlocking{deps, remoteSource, state, 5 * time.Minute}
func ServerHealthBlocking(deps ServerDataSourceDeps, remoteSource proxycfg.Health) *serverHealthBlocking {
return &serverHealthBlocking{deps, remoteSource, 5 * time.Minute}
}

type serverHealthBlocking struct {
deps ServerDataSourceDeps
remoteSource proxycfg.Health
state *state.Store
watchTimeout time.Duration
}

Expand All @@ -66,7 +64,7 @@ func (h *serverHealthBlocking) Notify(ctx context.Context, args *structs.Service
}

// Determine the function we'll call
var f func(memdb.WatchSet, *state.Store, *structs.ServiceSpecificRequest) (uint64, structs.CheckServiceNodes, error)
var f func(memdb.WatchSet, Store, *structs.ServiceSpecificRequest) (uint64, structs.CheckServiceNodes, error)
switch {
case args.Connect:
f = serviceNodesConnect
Expand Down Expand Up @@ -108,14 +106,20 @@ func (h *serverHealthBlocking) Notify(ctx context.Context, args *structs.Service
// their data, rather than holding onto the last-known list of healthy nodes indefinitely.
if hadResults {
hadResults = false
h.deps.Logger.Debug("serverHealthBlocking emitting zero check-service-nodes due to insufficient ACL privileges",
"serviceName", structs.NewServiceName(args.ServiceName, &args.EnterpriseMeta),
"correlationID", correlationID,
"connect", args.Connect,
"ingress", args.Ingress,
)
return 0, &structs.IndexedCheckServiceNodes{}, watch.ErrorACLResetData
}
return 0, nil, acl.ErrPermissionDenied
}
}

var thisReply structs.IndexedCheckServiceNodes
thisReply.Index, thisReply.Nodes, err = f(ws, h.state, args)
thisReply.Index, thisReply.Nodes, err = f(ws, store, args)
if err != nil {
return 0, nil, err
}
Expand All @@ -134,6 +138,13 @@ func (h *serverHealthBlocking) Notify(ctx context.Context, args *structs.Service
}

hadResults = true
h.deps.Logger.Trace("serverHealthBlocking emitting check-service-nodes",
"serviceName", structs.NewServiceName(args.ServiceName, &args.EnterpriseMeta),
"correlationID", correlationID,
"connect", args.Connect,
"ingress", args.Ingress,
"nodes", len(thisReply.Nodes),
)
return thisReply.Index, &thisReply, nil
},
dispatchBlockingQueryUpdate[*structs.IndexedCheckServiceNodes](ch),
Expand All @@ -151,14 +162,14 @@ func (h *serverHealthBlocking) filterACL(authz *acl.AuthorizerContext, token str
return nil
}

func serviceNodesConnect(ws memdb.WatchSet, s *state.Store, args *structs.ServiceSpecificRequest) (uint64, structs.CheckServiceNodes, error) {
func serviceNodesConnect(ws memdb.WatchSet, s Store, args *structs.ServiceSpecificRequest) (uint64, structs.CheckServiceNodes, error) {
return s.CheckConnectServiceNodes(ws, args.ServiceName, &args.EnterpriseMeta, args.PeerName)
}

func serviceNodesIngress(ws memdb.WatchSet, s *state.Store, args *structs.ServiceSpecificRequest) (uint64, structs.CheckServiceNodes, error) {
func serviceNodesIngress(ws memdb.WatchSet, s Store, args *structs.ServiceSpecificRequest) (uint64, structs.CheckServiceNodes, error) {
return s.CheckIngressServiceNodes(ws, args.ServiceName, &args.EnterpriseMeta)
}

func serviceNodesDefault(ws memdb.WatchSet, s *state.Store, args *structs.ServiceSpecificRequest) (uint64, structs.CheckServiceNodes, error) {
func serviceNodesDefault(ws memdb.WatchSet, s Store, args *structs.ServiceSpecificRequest) (uint64, structs.CheckServiceNodes, error) {
return s.CheckServiceNodes(ws, args.ServiceName, &args.EnterpriseMeta, args.PeerName)
}
5 changes: 2 additions & 3 deletions agent/proxycfg-glue/health_blocking_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -27,8 +27,7 @@ func TestServerHealthBlocking(t *testing.T) {
remoteSource := newMockHealth(t)
remoteSource.On("Notify", ctx, req, correlationID, ch).Return(result)

store := state.NewStateStore(nil)
dataSource := ServerHealthBlocking(ServerDataSourceDeps{Datacenter: "dc1"}, remoteSource, store)
dataSource := ServerHealthBlocking(ServerDataSourceDeps{Datacenter: "dc1"}, remoteSource)
err := dataSource.Notify(ctx, req, correlationID, ch)
require.Equal(t, result, err)
})
Expand All @@ -49,7 +48,7 @@ func TestServerHealthBlocking(t *testing.T) {
Datacenter: datacenter,
ACLResolver: aclResolver,
Logger: testutil.Logger(t),
}, nil, store)
}, nil)
dataSource.watchTimeout = 1 * time.Second

// Watch for all events
Expand Down
4 changes: 4 additions & 0 deletions agent/proxycfg/upstreams.go
Original file line number Diff line number Diff line change
Expand Up @@ -136,6 +136,10 @@ func (s *handlerUpstreams) handleUpdateUpstreams(ctx context.Context, u UpdateEv

uid := UpstreamIDFromString(uidString)

s.logger.Debug("upstream-target watch fired",
"correlationID", correlationID,
"nodes", len(resp.Nodes),
)
if _, ok := upstreamsSnapshot.WatchedUpstreamEndpoints[uid]; !ok {
upstreamsSnapshot.WatchedUpstreamEndpoints[uid] = make(map[string]structs.CheckServiceNodes)
}
Expand Down
4 changes: 3 additions & 1 deletion agent/xds/endpoints.go
Original file line number Diff line number Diff line change
Expand Up @@ -723,14 +723,15 @@ func (s *ResourceGenerator) endpointsFromDiscoveryChain(
}
switch len(groupedTarget.Targets) {
case 0:
s.Logger.Trace("skipping endpoint generation for zero-length target group", "cluster", clusterName)
continue
case 1:
// We expect one target so this passes through to continue setting the load assignment up.
default:
return nil, fmt.Errorf("cannot have more than one target")
}
ti := groupedTarget.Targets[0]
s.Logger.Debug("generating endpoints for", "cluster", clusterName, "targetID", ti.TargetID)
s.Logger.Trace("generating endpoints for", "cluster", clusterName, "targetID", ti.TargetID, "gatewayKey", gatewayKey)
targetUID := proxycfg.NewUpstreamIDFromTargetID(ti.TargetID)
if targetUID.Peer != "" {
loadAssignment, err := s.makeUpstreamLoadAssignmentForPeerService(cfgSnap, clusterName, targetUID, mgwMode)
Expand All @@ -752,6 +753,7 @@ func (s *ResourceGenerator) endpointsFromDiscoveryChain(
forMeshGateway,
)
if !valid {
s.Logger.Trace("skipping endpoint generation for invalid target group", "cluster", clusterName)
continue // skip the cluster if we're still populating the snapshot
}

Expand Down
10 changes: 10 additions & 0 deletions website/content/docs/release-notes/consul/v1_16_x.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -64,6 +64,16 @@ We are pleased to announce the following Consul updates.

For more detailed information, please refer to the [upgrade details page](/consul/docs/upgrading/upgrade-specific) and the changelogs.

## Known Issues

The following issues are known to exist in the v1.16.x releases:

- v1.16.0 - v1.16.1 may have issues when a snapshot restore is performed
and the servers are hosting xDS streams. When this bug triggers, it
will cause Envoy to incorrectly populate upstream endpoints. It is
currently not recommended for service mesh users running agent-less
workloads to upgrade Consul to these versions.

## Changelogs

The changelogs for this major release version and any maintenance versions are listed below.
Expand Down
6 changes: 6 additions & 0 deletions website/content/docs/upgrading/upgrade-specific.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,12 @@ upgrade flow.

## Consul 1.16.x

#### Known issues

Service mesh in Consul versions 1.16.0 and 1.16.1 may have issues when a snapshot restore is performed and the servers are hosting xDS streams.
When this bug triggers, it will cause Envoy to incorrectly populate upstream endpoints. Due to this issue, it is currently not recommended for
service mesh users running agent-less workloads to upgrade Consul to these versions.

#### API health endpoints return different status code

Consul versions 1.16.0+ now return an error 403 "Permission denied" status
Expand Down

0 comments on commit 304cb82

Please sign in to comment.