Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to install v18.04 on OpenShift 3.9 #120

Closed
kaparora opened this issue Apr 25, 2018 · 17 comments
Closed

Unable to install v18.04 on OpenShift 3.9 #120

kaparora opened this issue Apr 25, 2018 · 17 comments

Comments

@kaparora
Copy link

kaparora commented Apr 25, 2018

we followed the installations steps described at https://netapp-trident.readthedocs.io/en/stable-v18.04/kubernetes/deploying.html#download-extract-the-installer

we starting the installation on the master.

[root@se1-ocpma-e100 trident-installer]# cat setup/backend.json 
{ 
    "version": 1, 
    "storageDriverName": "ontap-nas", 
    "managementLIF": "4.168.16.25", 
    "username": "aaa", 
    "password": "xxx", 
    "defaults": { 
      "spaceReserve": "none", 
      "exportPolicy": "openshift" 
    } 
} 


[root@se1-ocpma-e100 trident-installer]# ./tridentctl install -n netapp2 -d 
DEBU Initialized logging.                          logLevel=debug 

DEBU Initialized Kubernetes CLI client.            cli=oc flavor=openshift namespace=netapp2 version=1.9. 
DEBU Validated Trident installation environment.   installationNamespace=netapp2 kubernetesVersion=1.9.1+ 
DEBU Parsed requested volume size.                 quantity=2Gi 
DEBU Namespace exists.                             namespace=netapp2 
DEBU PVC does not exist.                           pvc=trident 
DEBU PV does not exist.                            pv=trident 
INFO Starting storage driver.                      backend=/root/trident-installer/setup/backend.json 
DEBU config: {"defaults":{"exportPolicy":"openshift","spaceReserve":"none"},"managementLIF":"4.168.16.25" 
DEBU Storage prefix is absent, will use default prefix. 
DEBU Parsed commonConfig: {Version:1 StorageDriverName:ontap-nas BackendName: Debug:false DebugTraceFlags 
DEBU Initializing storage driver.                  driver=ontap-nas 
DEBU Addresses found from ManagementLIF lookup.    addresses="[4.168.16.25]" hostname=4.168.16.25 
DEBU Using derived SVM.                            SVM=se1-svm-s01 
DEBU ONTAP API version.                            Ontapi=1.110 
WARN Could not determine controller serial numbers. API status: failed, Reason: Unable to find API: syste 
DEBU Configuration defaults                        Encryption=false ExportPolicy=openshift FileSystemTypene SplitOnClone=false StoragePrefix=trident_ UnixPermissions=---rwxrwxrwx 
DEBU Data LIFs                                     dataLIFs="[4.168.16.25]" 
DEBU Found NAS LIFs.                               dataLIFs="[4.168.16.25]" 
DEBU Configured EMS heartbeat.                     intervalHours=24 
DEBU Read storage pools assigned to SVM.           pools="[sdeb_nas_t001_data01 sdeb_nas_t002_data01]" sv 
DEBU Read aggregate attributes.                    aggregate=sdeb_nas_t001_data01 mediaType=hdd 
DEBU Read aggregate attributes.                    aggregate=sdeb_nas_t002_data01 mediaType=hdd 
DEBU Storage driver initialized.                   driver=ontap-nas 
INFO Storage driver loaded.                        driver=ontap-nas 
INFO Starting Trident installation.                namespace=netapp2 
DEBU Deleted Kubernetes object by YAML. 
DEBU Deleted cluster role binding. 
DEBU Deleted Kubernetes object by YAML. 
DEBU Deleted cluster role. 
DEBU Deleted Kubernetes object by YAML. 
DEBU Deleted service account. 
DEBU Removed Trident user from security context constraint. 
DEBU Created Kubernetes object by YAML. 
INFO Created service account. 
DEBU Created Kubernetes object by YAML. 
INFO Created cluster role. 
DEBU Created Kubernetes object by YAML. 
INFO Created cluster role binding. 
INFO Added Trident user to security context constraint. 
DEBU Created Kubernetes object by YAML. 
INFO Created PVC. 
DEBU Attempting volume create.                     size=2147483648 storagePool=sdeb_nas_t001_data01 volCo 
DEBU Created Kubernetes object by YAML. 
INFO Created PV.                                   pv=trident 
INFO Waiting for PVC to be bound.                  pvc=trident 
DEBU PVC not yet bound, waiting.                   increment=619.512855ms pvc=trident 
DEBU PVC not yet bound, waiting.                   increment=676.793322ms pvc=trident 
DEBU PVC not yet bound, waiting.                   increment=1.225961586s pvc=trident 
DEBU Logged EMS message.                           driver=ontap-nas 
DEBU PVC not yet bound, waiting.                   increment=1.328790335s pvc=trident 
DEBU Created Kubernetes object by YAML. 
INFO Created Trident deployment. 
INFO Waiting for Trident pod to start. 
DEBU Trident pod not yet running, waiting.         increment=619.624506ms 
DEBU Trident pod not yet running, waiting.         increment=870.617544ms 
DEBU Trident pod not yet running, waiting.         increment=844.84827ms 
INFO Trident pod started.                          namespace=netapp2 pod=trident-cdd5fc7b4-ls8h4 
INFO Waiting for Trident REST interface. 
DEBU Invoking tunneled command: oc exec trident-cdd5fc7b4-ls8h4 -n netapp2 -c trident-main -- tridentctl 
DEBU REST interface not yet up, waiting.           increment=360.640418ms 
DEBU Invoking tunneled command: oc exec trident-cdd5fc7b4-ls8h4 -n netapp2 -c trident-main -- tridentctl 
DEBU REST interface not yet up, waiting.           increment=877.614503ms 
DEBU Invoking tunneled command: oc exec trident-cdd5fc7b4-ls8h4 -n netapp2 -c trident-main -- tridentctl 
DEBU REST interface not yet up, waiting.           increment=1.520820412s 
DEBU Invoking tunneled command: oc exec trident-cdd5fc7b4-ls8h4 -n netapp2 -c trident-main -- tridentctl 
DEBU REST interface not yet up, waiting.           increment=1.834092202s 
DEBU Invoking tunneled command: oc exec trident-cdd5fc7b4-ls8h4 -n netapp2 -c trident-main -- tridentctl 
DEBU REST interface not yet up, waiting.           increment=3.152914941s 
DEBU Invoking tunneled command: oc exec trident-cdd5fc7b4-ls8h4 -n netapp2 -c trident-main -- tridentctl 
DEBU REST interface not yet up, waiting.           increment=3.145476382s 
DEBU Invoking tunneled command: oc exec trident-cdd5fc7b4-ls8h4 -n netapp2 -c trident-main -- tridentctl 
DEBU REST interface not yet up, waiting.           increment=6.207780768s 
DEBU Invoking tunneled command: oc exec trident-cdd5fc7b4-ls8h4 -n netapp2 -c trident-main -- tridentctl 
DEBU REST interface not yet up, waiting.           increment=5.170037335s 
DEBU Invoking tunneled command: oc exec trident-cdd5fc7b4-ls8h4 -n netapp2 -c trident-main -- tridentctl 
DEBU REST interface not yet up, waiting.           increment=18.007844228s 
DEBU Invoking tunneled command: oc exec trident-cdd5fc7b4-ls8h4 -n netapp2 -c trident-main -- tridentctl 
DEBU REST interface not yet up, waiting.           increment=16.276606311s 
DEBU Invoking tunneled command: oc exec trident-cdd5fc7b4-ls8h4 -n netapp2 -c trident-main -- tridentctl 
DEBU REST interface not yet up, waiting.           increment=34.967432358s 
DEBU Invoking tunneled command: oc exec trident-cdd5fc7b4-ls8h4 -n netapp2 -c trident-main -- tridentctl 
DEBU REST interface not yet up, waiting.           increment=42.703850717s 
DEBU Invoking tunneled command: oc exec trident-cdd5fc7b4-ls8h4 -n netapp2 -c trident-main -- tridentctl 
ERRO Trident REST interface was not available after 120.00 seconds. 
WARN An error occurred during installation, cleaning up. 
DEBU Deleted Kubernetes object by YAML. 
INFO Deleted cluster role binding. 
DEBU Deleted Kubernetes object by YAML. 
INFO Deleted cluster role. 
DEBU Deleted Kubernetes object by YAML. 
INFO Deleted service account. 
INFO Removed Trident user from security context constraint. 
DEBU Deleted Kubernetes object by name.            pvc=trident 
INFO Deleted PVC.                                  pvc=trident 
DEBU Deleted Kubernetes object by name.            pv=trident 
INFO Deleted PV.                                   pv=trident 
FATA Install failed; exit status 1; Error: could not get version. 500 Internal Server Error 
command terminated with exit code 1; use 'tridentctl logs' to learn more 
[root@se1-ocpma-e100 trident-installer]# 

while the the container is running we got the following output inside the container

[root@se1-ocpma-e100 ~]# oc rsh  trident-cdd5fc7b4-ls8h4 
Defaulting container name to trident-main. 
Use 'oc describe pod/trident-cdd5fc7b4-ls8h4 -n netapp2' to see all of the containers in this pod. 
/ # tridentctl -s 127.0.0.1:8000 version -o json 
Error: could not get version. 500 Internal Server Error 
/ # 

the events regarding the namespace are

[root@se1-ocpma-e100 ~]# oc get ev 
LAST SEEN   FIRST SEEN   COUNT     NAME                                       KIND                    SUB                              MESSAGE 
1m          1m           1         trident-cdd5fc7b4-ls8h4.15284c6cb57af3b1   Pod                        scheduler                     Successfully assigned trident-cdd5fc7b4-ls8h4 to se1-ocpco-e142.sys.schwarz 
1m          1m           1         trident-cdd5fc7b4-ls8h4.15284c6cc51e36b5   Pod                         se1-ocpco-e142.sys.schwarz   MountVolume.SetUp succeeded for volume "trident-token-zx6zx" 
1m          1m           1         trident-cdd5fc7b4-ls8h4.15284c6cc6422647   Pod                         se1-ocpco-e142.sys.schwarz   MountVolume.SetUp succeeded for volume "trident" 
1m          1m           1         trident-cdd5fc7b4-ls8h4.15284c6d154ff9ec   Pod                     spe se1-ocpco-e142.sys.schwarz   Container image "netapp/trident:18.04.0" already present on machine 
1m          1m           1         trident-cdd5fc7b4-ls8h4.15284c6d17e80af1   Pod                     spe se1-ocpco-e142.sys.schwarz   Created container 
1m          1m           1         trident-cdd5fc7b4-ls8h4.15284c6d1dfb02a7   Pod                     spe se1-ocpco-e142.sys.schwarz   Started container 
1m          1m           1         trident-cdd5fc7b4-ls8h4.15284c6d1e1580f3   Pod                     spe se1-ocpco-e142.sys.schwarz   Container image "quay.io/coreos/etcd:v3.1.5" already present on machine 
1m          1m           1         trident-cdd5fc7b4-ls8h4.15284c6d21c24ec8   Pod                     spe se1-ocpco-e142.sys.schwarz   Created container 
1m          1m           1         trident-cdd5fc7b4-ls8h4.15284c6d27f171b5   Pod                     spe se1-ocpco-e142.sys.schwarz   Started container 
1m          1m           1         trident-cdd5fc7b4.15284c6cb4e7e7c1         ReplicaSet                 et-controller                 Created pod: trident-cdd5fc7b4-ls8h4 
1m          1m           1         trident.15284c6a2984d474                   PersistentVolumeClaim      ntvolume-controller           no persistent volumes available for this claim and no storage class is set 
1m          1m           1         trident.15284c6cb37d0c52                   Deployment                 nt-controller                 Scaled up replica set trident-cdd5fc7b4 to 1 
@kaparora
Copy link
Author

trident-logs-all.log
Attached trident logs

@jkonline
Copy link

having the same issue was this ever resolved

@kaparora
Copy link
Author

kaparora commented May 18, 2018

Today we got Trident running with iSCSI (onatp-san) driver.
Everything works fine from installation to provisioning to mounting and consuming storage.

We added NFS as a backend to trident and used it for a mysql deployment.
MySql doesn’t work either like ETCD with NFS backend.
Here are the logs:

=> sourcing 20-validate-variables.sh ... 
=> sourcing 25-validate-replication-variables.sh ... 
=> sourcing 30-base-config.sh ... 
---> 08:41:11     Processing basic MySQL configuration files ... 
=> sourcing 60-replication-config.sh ... 
=> sourcing 70-s2i-config.sh ... 
---> 08:41:11     Processing additional arbitrary  MySQL configuration provided by s2i ... 
=> sourcing 40-paas.cnf ... 
=> sourcing 50-my-tuning.cnf ... 
---> 08:41:11     Initializing database ... 
---> 08:41:11     Running mysqld --initialize-insecure ... 
2018-05-18T08:41:11.628403Z 0 [Warning] TIMESTAMP with implicit DEFAULT value is deprecated. Please use --explicit_defaults_for_timestamp server option (see documentation for more details). 
2018-05-18T08:41:11.629989Z 0 [Warning] Duplicate ignore-db-dir directory name 'lost+found' found in the config file(s). Ignoring the duplicate. 
2018-05-18T08:41:11.630674Z 0 [ERROR] --initialize specified but the data directory has files in it. Aborting. 
2018-05-18T08:41:11.630700Z 0 [ERROR] Aborting 

NFS provisioning and mounting is fine.

I tried to mount nfs volume on a host(worker node) and write to it and it works.

This may have something to do with OpenShift user permissions inside the pod. I have no clue.
Any inputs are appreciated.

@innergy
Copy link
Contributor

innergy commented May 18, 2018

We're definitely not seeing this in general. Our CI tests with this combination, the same versions. In cases like these there is usually a configuration issue either on the host or on the storage backend that's getting in the way. Troubleshooting this over GitHub would likely require a great deal of back and forth, therefore my suggestion would be to open up a case so that we can work through it live.

@kaparora
Copy link
Author

Thanks @innergy! a Support case is already open.

@jkonline
Copy link

jkonline commented May 19, 2018

@kapilarora
How did you resolve this error on the initial install:

DEBU Invoking tunneled command: oc exec trident-cdd5fc7b4-ls8h4 -n netapp2 -c trident-main --tridentctl
DEBU REST interface not yet up, waiting.

@jacobjohnanda
Copy link

Having an issue with Trident 18.04 Install with Openshift 3.7 as well.

@acsulli
Copy link

acsulli commented May 30, 2018

@kapilarora,

Any chance you can check the latency between your OpenShift nodes and the data LIF(s)? Just encountered a situation where extreme latency (> 200ms) was causing etcd to (apparently) falsely believe there were locks. Changing to a storage device which is dramatically closer fixed things.

I have no idea at what point the latency might become an issue for etcd, but it would be worth knowing if this could be an issue for you. All of the CI testing is with systems which are a couple ms apart at most, so it's not something we've encountered before.

Andrew

@kaparora
Copy link
Author

kaparora commented Jun 6, 2018

Trident is able to server both backedns NFS and iSCSI
PostgreSQL runs fine with NFS
we are having issues still with mysql.
This is a configuration issue but I dont think we can solve it at trident level.
Hence I am closing this issue for now.
I am also not able to recreate it in my lab.
The customer support case has also been closed.

@kaparora kaparora closed this as completed Jun 6, 2018
@kaparora
Copy link
Author

Today after some troubleshooting we figured that by default openshift template has mountPath /var/lib/mysql/data
We changed it to /val/lib/mysql after looking at this issue : docker-library/mysql#69

And, mysql is now running in the OpenShift cluster with ONTAP NFS

@japplewhite
Copy link

I'm seeing this too with OpenShift 3.9 Origin and iSCSI with a ONTAP simulator that was working on earlier deployments

@japplewhite
Copy link

@kapilarora I have an env to reproduce it

@rushins
Copy link

rushins commented Jul 25, 2018

i hit the same error today with openshift 3.9 using NFS ontap-cdot 9.1 release.
FATA Install failed; PVC trident was not bound after 120000000000 seconds

any idea

@japplewhite
Copy link

@rushins I had success with the newest Trident beta release on Origin 3.9. I was using iscsi though so YMMV. What I have found is it works best on the first go. If you have an existing install that failed you must clean up on the fas by deleting the volume and lun (for iscsi) before proceeding to try again.

@acsulli
Copy link

acsulli commented Jul 25, 2018

@rushins, @japplewhite Make sure that something else doesn't have a pending PVC when creating Trident. In the original 18.04 there was a bug which resulted in a missing piece of metadata preventing the trident PV from being bound by another PVC. This is/was particularly an issue with OpenShift Enterprise, which deploys Ansible Service Broker (a.k.a. ASB) by default.

If, after starting the Trident install, you do a oc get pvc --all-namespaces and you see a PVC which is bound to the trident PV, that is a good indicator.

This was fixed in 18.07 beta 1.

Andrew

@rushins
Copy link

rushins commented Jul 27, 2018

thanks Andhrew. Yes you are right 18.04 seems have bug with PVC bound . I have followed your solution to use 18.07 beta 1 and it worked without any major issue in openshift container platform as a storage class and i was able to create PV and bound it to PVC ?

thanks.

@rushins
Copy link

rushins commented Jul 27, 2018

Hi john,
i tried with ISCSI and it didn't work as i found its bug as stated by Andrew on build 18.04 ? so 18.07 beta 1 worked for all NAS and SAN traffic ( NFS, ISCSI) .

Anyways, thanks for your suggestion.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants