Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ROC API stopped working after node restart #1601

Open
pushpraj527 opened this issue Nov 11, 2022 · 7 comments
Open

ROC API stopped working after node restart #1601

pushpraj527 opened this issue Nov 11, 2022 · 7 comments

Comments

@pushpraj527
Copy link

We have deployed Aether on the GKE cluster. We have also enabled the persistence volume for onos-consensus-store
We observed that after the GKE Cluster restart (after the updates).

ROC API Seems not able to add/update the resources.
E.g. Add/Delete subscriber API returns success but, it is not doing what it is supposed to do.
It looks like an onos/atomix store issue.

We are using the following versions for the components:
Atomix-Controller:
ChartVersion: 0.6.9
AppVersion: v0.6.2
Atomix-Raft-Storage:
ChartVersion: 0.1.25
AppVersion: v0.9.19
ONOS-Operator:
ChartVersion: 0.5.1
AppVersion: v0.5.0
ONOS-Config:
ChartVersion: 1.6.11
AppVersion: v0.10.27
ONOS-topo:
ChartVersion: 1.2.2
AppVersion: v0.9.3
ONOS-Cli:
ChartVersion: 1.2.7
AppVersion: v0.9.10

Logs:
-> One transaction is stuck reporting the following error:

        2022-11-09T12:27:52.616Z    ERROR    controller/configuration    configuration/controller.go:219    Failed sending SetRequest prefix:{target:"connectivity-service-v2"...
(EXTRA *errors.TypedError=error in creating config struct from IETF JSON data: field name SimCard value sim-312440000001123 
(string ptr) schema path /device/enterprises/enterprise/site/device/sim-card has leafref path ../../sim-card/sim-id not equal to any target nodes)
[github.com/onosproject/onos-config/pkg/controller/configuration.(*Reconciler).reconcileConfiguration](http://github.com/onosproject/onos-config/pkg/controller/configuration.%28*Reconciler%29.reconcileConfiguration)
    /build/pkg/controller/configuration/controller.go:219
[github.com/onosproject/onos-config/pkg/controller/configuration.(*Reconciler).Reconcile](http://github.com/onosproject/onos-config/pkg/controller/configuration.%28*Reconciler%29.Reconcile)
    /build/pkg/controller/configuration/controller.go:76
[github.com/onosproject/onos-lib-go/pkg/controller.(*Controller).reconcileRequest](http://github.com/onosproject/onos-lib-go/pkg/controller.%28*Controller%29.reconcileRequest)
    /build/vendor/[github.com/onosproject/onos-lib-go/pkg/controller/controller.go:282](http://github.com/onosproject/onos-lib-go/pkg/controller/controller.go:282)
[github.com/onosproject/onos-lib-go/pkg/controller.(*Controller).reconcileRequests](http://github.com/onosproject/onos-lib-go/pkg/controller.%28*Controller%29.reconcileRequests)
    /build/vendor/[github.com/onosproject/onos-lib-go/pkg/controller/controller.go:275](http://github.com/onosproject/onos-lib-go/pkg/controller/controller.go:275)

It’s still reattempting it. We tried to get the particular SIM Object for the specified id but, it’s structure seems fine. not sure if it Persistent volume corruption issue. But, it has to stop somewhere. it still continuously reattempts the same operation.
This is preventing further new transactions.

onos-consensus-store config override:

global:
  storage:
    controller: "atomix-controller.kube-system:5679"
  store:
    consensus:
      enabled: true
      clusters: 1
      replicas: 1
      partitions: 1
      persistence:
        storageClass: "silver"
        storageSize: 5Gi

onos-config override:

onos-config:
  logging:
    loggers:
      root:
        level: debug
  store:
    consensus:
      enabled: false
@SeanCondon
Copy link
Contributor

hi @pushpraj527 - the error you're seeing error in creating config struct from IETF JSON data I would think is coming up from the sdcore-adapter. If it's got to here - onos-config must have validated it. Has something changed in the adapter?

Also what version of the aether-roc-umbrella chart are you using? and what values are you overriding?

@pushpraj527
Copy link
Author

pushpraj527 commented Nov 14, 2022

Hi @SeanCondon
We are using aether-roc-umbrella chart version: 2.0.47.
Yes, there are some modeling changes in the sd-core adapter. The system was in a working state. I am not sure if the exception we are seeing is related or not But, I do not understand why the API stopped working. Post API should save the data to onos-consensus-store. Is it something related to persistent volume?
roc-values-yaml.txt

@adibrastegarnia
Copy link
Contributor

@pushpraj527
The error you are getting is from this line https://github.com/onosproject/onos-config/blob/master/pkg/controller/configuration/controller.go#L219 which is the Set request for the whole config tree (all of the path values that have been Set so far). As Sean mentioned the error is coming from the target. The current configuration should be there in the onos-config store already and this part of the code gets triggered when the target is not persisting configuration and requires the whole current config to be applied again. If you want to check your current config tree, use onos-cli and run onos config get configurations -v command.

@pushpraj527
Copy link
Author

@adibrastegarnia
For the given command getting the below error:

onos-cli-5d448ff6c4-fjfwh:~$ onos config get configurations --auth-header "<auth token>" -v
Unable to read configuration: rpc error: code = Internal desc = grpc: failed to unmarshal the received message proto: wrong wireType = 2 for field Termrpc error: code = Internal desc = grpc: failed to unmarshal the received message proto: wrong wireType = 2 for field Term

onos-cli-5d448ff6c4-fjfwh:~$ 

image

@adibrastegarnia
Copy link
Contributor

@pushpraj527
I would like to ask you also list transactions for me. onos config get transactions -v. And also try getting configuration without passing that --auth-header to see if you see the same error.

@pushpraj527
Copy link
Author

Getting configuration without --auth-header
image

Getting onos-config transaction with -v
image

@SeanCondon
Copy link
Contributor

We found that the message

Unable to read configuration: rpc error: code = Internal desc = grpc: failed to unmarshal the received message proto: wrong wireType = 2 for field Termrpc error: code = Internal desc = grpc: failed to unmarshal the received message proto: wrong wireType = 2 for field Term

was happening in the onos-cli because of an incompatible version v0.9.10 - changing to version v0.9.11 removes this problem. This was on aether-roc-umbrella 2.0.3

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

3 participants