Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Empty thin pool LV seems to be taken into account for capacity monitoring #210

Open
javiku opened this issue Oct 25, 2022 · 11 comments
Open
Labels
question Further information is requested
Milestone

Comments

@javiku
Copy link

javiku commented Oct 25, 2022

What steps did you take and what happened:

  • I have a 10GB PV and an initially empty VG (no LVs)
  • I configure a StorageClass to use Thin Provisioning
  • I start one Pod with two volumes of 8GB each
  • a Thin Pool LV is created (size 8GB) and two LVs inside it for each Pod volume
$ sudo lvs
  LV                                       VG                Attr       LSize Pool                                    Origin Data%  Meta%  Move Log Cpy%Sync Convert
  vg-workersworkers_thinpool               vg-workersworkers twi-aotz-- 8,00g                                                3,86   12,70
  pvc-45c18a20-e055-4264-afa5-f128816ea309 vg-workersworkers Vwi-aotz-- 8,00g vg-workersworkers_thinpool        1,97
  pvc-712624e8-c1c2-4026-9773-172f296359a9 vg-workersworkers Vwi-aotz-- 8,00g vg-workersworkers_thinpool        1,89
  • after the Pod is deleted, the last two LVs are removed, but the Thin Pool LV remains, which look ok.
$ sudo lvs
  LV                         VG                Attr       LSize Pool Origin Data%  Meta%  Move Log Cpy%Sync Convert
  vg-workersworkers_thinpool vg-workersworkers twi-aotz-- 8,00g             0,00   10,79
  • when I try to start another Pod with the same requirements, Kubernetes complains that there is not enough free space in the node.
  status:
  conditions:
  - lastProbeTime: null
    lastTransitionTime: "2022-10-25T17:30:26Z"
    message: '0/2 nodes are available: 1 node(s) did not have enough free storage, 1 node(s) had taint {node-role.kubernetes.io/master: }, that the pod didn''t tolerate.'
    reason: Unschedulable
    status: "False"
    type: PodScheduled
  phase: Pending
  qosClass: Burstable
  • Indeed, the Node storage capacity is reported as only ~2GB free. It looks like the Thin Pool LV is being taken into account as used space, which I believe is wrong.
$ kubectl -n kube-system logs openebs-lvm-controller-0

[...]
I1025 16:59:50.516742       1 grpc.go:81] GRPC response: {"available_capacity":2126512128}
[...]


$ kubectl -n openebs get lvmnodes worker-1 -oyaml

apiVersion: local.openebs.io/v1alpha1
kind: LVMNode
metadata:
  creationTimestamp: "2022-10-24T14:13:26Z"
  generation: 12
  name: worker-1
  namespace: openebs
  ownerReferences:
  - apiVersion: v1
    controller: true
    kind: Node
    name: worker-1
    uid: 7e8c75dc-5491-4884-a709-c693e7835490
  resourceVersion: "390350"
  uid: 4f545eaa-b2c4-4a82-bba9-244acc37cba3
volumeGroups:
- allocationPolicy: 0
  free: 2028Mi
  lvCount: 1
  maxLv: 0
  maxPv: 0
  metadataCount: 1
  metadataFree: 507Ki
  metadataSize: 1020Ki
  metadataUsedCount: 1
  missingPvCount: 0
  name: lvm-volumegroup-kumori-workers
  permissions: 0
  pvCount: 1
  size: 10236Mi
  snapCount: 0
  uuid: 6gd0JS-RoSk-TYN2-YptT-kCmo-22pg-RRGIlR


$ kubectl -n kube-system get csistoragecapacities  csisc-nbcxt -oyaml

apiVersion: storage.k8s.io/v1beta1
kind: CSIStorageCapacity
metadata:
  creationTimestamp: "2022-10-24T14:13:44Z"
  generateName: csisc-
  labels:
    csi.storage.k8s.io/drivername: local.csi.openebs.io
    csi.storage.k8s.io/managed-by: external-provisioner
  name: csisc-nbcxt
  namespace: kube-system
  ownerReferences:
  - apiVersion: apps/v1
    controller: true
    kind: StatefulSet
    name: openebs-lvm-controller
    uid: 43a9720c-801f-46b7-b6ae-4f819e5a18a2
  resourceVersion: "390055"
  uid: 40236777-92e9-4460-a735-b205ade6ffe9
capacity: 2028Mi
nodeTopology:
  matchLabels:
    kubernetes.io/hostname: worker-1
    openebs.io/nodename: worker-1
storageClassName: openebs-locallvm


What did you expect to happen:
I expected that the node is reported as having 10GB of free space, since no "real" volumes exist, only the Thin Pool LV.
Otherwise, I can't deploy the same Pod again, event if the disk space is free.

Environment:

  • LVM Driver version: tested with 0.8 and 1.0.0
  • Kubernetes version: 1.21.10
  • Kubernetes installer & version: kubeadm
  • Cloud provider or hardware configuration: baremetal
  • OS: Ubuntu 20.04.3
@graphenn
Copy link

graphenn commented Nov 2, 2023

Still same, if left capacity not enough, the k8s will stuck for not enough space, can't allocate the thin volume.

@dsharma-dc
Copy link
Contributor

I tried to reproduce this but unable to. Below are the details

The VG is this:

VG   #PV #LV #SN Attr   VSize    VFree  
dsvg   1   3   0 wz--n- 1020.00m 512.00m

Created two PVCs binding to thin LVs on a thin pool. thinpool is 512MiB.

NAME             STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS    AGE
csi-lvmpv-ds-1   Bound    pvc-deba93f4-51bd-484b-84e5-188ba295ca8b   500Mi      RWO            openebs-lvmpv   4s
csi-lvmpv-ds-2   Bound    pvc-9aca14e3-8a91-48f6-9709-ca5a54fb6b93   500Mi      RWO            openebs-lvmpv   4s

The thin LVs got created.

LV                                       VG   Attr       LSize   Pool          Origin Data%  Meta%  Move Log Cpy%Sync Convert
dsvg_thinpool                            dsvg twi-aotz-- 500.00m                      16.23  11.62                           
pvc-9aca14e3-8a91-48f6-9709-ca5a54fb6b93 dsvg Vwi-a-tz-- 500.00m dsvg_thinpool        8.11                                   
pvc-deba93f4-51bd-484b-84e5-188ba295ca8b dsvg Vwi-a-tz-- 500.00m dsvg_thinpool        8.11          

Now deleted the claims. So thin LVs are deleted.

persistentvolumeclaim "csi-lvmpv-ds-1" deleted
persistentvolumeclaim "csi-lvmpv-ds-2" deleted

LVs got deleted

LV                    VG   Attr       LSize   Pool Origin Data%  Meta%  Move Log Cpy%Sync Convert Devices               
dsvg_thinpool         dsvg twi-aotz-- 500.00m             0.00   10.84                            dsvg_thinpool_tdata(0)
[dsvg_thinpool_tdata] dsvg Twi-ao---- 500.00m                                                     /dev/loop0(1)         
[dsvg_thinpool_tmeta] dsvg ewi-ao----   4.00m                                                     /dev/loop0(126)       
[lvol0_pmspare]       dsvg ewi-------   4.00m                                                     /dev/loop0(0)  

Created a new claim using same PVC yaml. The thin LV is created successfully.

LV                                       VG   Attr       LSize   Pool          Origin Data%  Meta%  Move Log Cpy%Sync Convert Devices               
dsvg_thinpool                            dsvg twi-aotz-- 500.00m                      0.00   10.94                            dsvg_thinpool_tdata(0)
[dsvg_thinpool_tdata]                    dsvg Twi-ao---- 500.00m                                                              /dev/loop0(1)         
[dsvg_thinpool_tmeta]                    dsvg ewi-ao----   4.00m                                                              /dev/loop0(126)       
[lvol0_pmspare]                          dsvg ewi-------   4.00m                                                              /dev/loop0(0)         
pvc-0eb249c8-256a-4958-b479-9502c9b15755 dsvg Vwi-a-tz-- 500.00m dsvg_thinpool        0.00                                       

@dsharma-dc
Copy link
Contributor

Please update if this is still an issue, and if there are any other details that can help reproduce this locally.

@dsharma-dc dsharma-dc added the question Further information is requested label May 23, 2024
@graphenn
Copy link

For example, 20GB VG, already have a thinpool, 15GB, and 5GB free.
If claim a new 8GB lv from thinpool, the k8s will stuck for not enough space.

@dsharma-dc
Copy link
Contributor

For example, 20GB VG, already have a thinpool, 15GB, and 5GB free. If claim a new 8GB lv from thinpool, the k8s will stuck for not enough space.

I don't think that's a problem here. Please refer below

  1. I have 1GiB VG
  2. Created a 500MiB thinpool on it
  3. Created a thin lv pvc of 400MiB and attached to a pod fio1. Filled it about 80% with data.
  4. Created another thin lv pvc of 400MiB and attached to another pod fio2. Works ok.
  5. Now if I try to create a thick 600MiB pvc, that'll fail which is expected because the vg doesn't have that much space.
  VG   #PV #LV #SN Attr   VSize    VFree  
  dsvg   1   1   0 wz--n- 1020.00m 512.00m


  LV            VG   Attr       LSize   Pool Origin Data%  Meta%  Move Log Cpy%Sync Convert
  dsvg_thinpool dsvg twi-aotz-- 500.00m             0.00   10.84     


  LV                                       VG   Attr       LSize   Pool          Origin Data%  Meta%  Move Log Cpy%Sync Convert
  dsvg_thinpool                            dsvg twi-aotz-- 500.00m                      70.95  14.06                           
  pvc-80055086-0521-4a18-b64d-3c237dc9d65f dsvg Vwi-aotz-- 400.00m dsvg_thinpool        80.11                                  
  pvc-ba7e6ce5-d932-49af-a579-0bf0fd512f3d dsvg Vwi-aotz-- 400.00m dsvg_thinpool        8.58           

Now try creating a thick lvm LV. Expected to fail because vg is 1GiB - out of which 500MiB is taken by thinpool and remaining is 500MiB only.

  Normal   Provisioning          10s (x6 over 38s)  local.csi.openebs.io_openebs-lvm-localpv-controller-7df8f57fb7-6vgvq_4e35f19e-3a9c-486c-8f60-66907ab5faab  External provisioner is provisioning volume for claim "default/csi-lvmpv-ds-3"
  Warning  ProvisioningFailed    7s (x6 over 36s)   local.csi.openebs.io_openebs-lvm-localpv-controller-7df8f57fb7-6vgvq_4e35f19e-3a9c-486c-8f60-66907ab5faab  failed to provision volume with StorageClass "openebs-lvmpv-thick": rpc error: code = ResourceExhausted desc = no vg available to serve volume request having regex="^dsvg$" & capacity="629145600"

@graphenn
Copy link

  1. I have a 20GB volume group (VG).
  2. Create two or more 8GB thin LV PVCs.
  3. After several days, the thinpool's LV size increases to more than 12GB, for example, 13GB. Now, the VG has 7GB free disk space, but the thinpool has only allocated 40% of its data, indicating it actually has enough free space.
  4. Now, I want to create a new 8GB thin LV PVC, but Kubernetes will be stuck due to insufficient space. After purchasing a new disk and adding it to the VG, increasing the free disk space to more than 8GB, I can then create and attach the new thin LV PVC to the existing thinpool.
  5. The newly purchased disk space is essentially wasted in this scenario, as it is not being utilized. The original thinpool's space is more than sufficient for my needs. Ideally, the monitoring should focus on the remaining space within the thinpool itself, and it should automatically expand when necessary, rather than preemptively checking if the remaining space exceeds the upper limit of the thin pvc to be allocated.

@dsharma-dc
Copy link
Contributor

VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert
dsvg_thinpool dsvg twi-aotz-- 500.00m 0.00 10.84

  1. I have a 20GB volume group (VG).
  2. Create two or more 8GB thin LV PVCs.
  3. After several days, the thinpool's LV size increases to more than 12GB, for example, 13GB. Now, the VG has 7GB free disk space, but the thinpool has only allocated 40% of its data, indicating it actually has enough free space.
  4. Now, I want to create a new 8GB thin LV PVC, but Kubernetes will be stuck due to insufficient space. After purchasing a new disk and adding it to the VG, increasing the free disk space to more than 8GB, I can then create and attach the new thin LV PVC to the existing thinpool.
  5. The newly purchased disk space is essentially wasted in this scenario, as it is not being utilized. The original thinpool's space is more than sufficient for my needs. Ideally, the monitoring should focus on the remaining space within the thinpool itself, and it should automatically expand when necessary, rather than preemptively checking if the remaining space exceeds the upper limit of the thin pvc to be allocated.

@graphenn Thank you. Two questions:

  1. Are you creating thinpool manually on host or letting the provisioner create it? If creating manually, what's the size thinpool is created with?
  2. Are you doing thinpool expansion? I ask this because I'm not sure how can thin LVs collectively be 13GB allocated on an 8GB thinpool

It'd be helpful if you can share the real outputs that show this behaviour.

@graphenn
Copy link

  1. Create automatically
  2. In fact, I create two or three 8GB lvs on the same thinpool at the beginning. My real scenario use a auto thinpool expansion as suggestion.

@dsharma-dc
Copy link
Contributor

dsharma-dc commented May 28, 2024

It'll depend upon the settings thin_pool_autoextend_threshold and thin_pool_autoextend_percent. With auto thinpool expansion config values thin_pool_autoextend_threshold=50 and thin_pool_autoextend_percent=20 , I can successfully create two 500MiB thin LVs, write 400MiB data, thinpool expands till ~850MiB, and still can provision one more 800MiB thin LV.

However, I'll see if there is some delay or race in the plugin getting free space details with a lag somewhere.

@dsharma-dc
Copy link
Contributor

Summarising the issue reported:

  1. The thinpool even if empty, is considered as reserved space and if a thick volume needs more space than VG capacity minus thinpool size then it'll remain pending. This is expected as LVM behaviour.
  2. The auto deletion of thinpool that was auto-created by LVM CSI provisioner, when there is no LV present on it is being considered as enhancement against a different issue.

@avishnu
Copy link
Member

avishnu commented Sep 17, 2024

Thanks for the issue and the findings posted above. Further analysis reveals that:

  • In case of "WaitForFirstConsumer", the K8s scheduler does the pod placement based on node capacity metrics, in which case the "free" capacity is considered and this doesn't consider thin vs thick usage. So, in the current design/architecture, it's not feasible unless we create a custom scheduler, to override this behavior.
  • In case of "Immediate" Binding, the LVM CSI Controller makes the decision of identifying the appropriate node candidate based on the available capacity stats. Potentially, it should be possible to address the issue in this case.

Pending further investigation in v4.3

@avishnu avishnu added this to the v4.3 milestone Sep 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

4 participants