Skip to content

Commit

Permalink
Merge pull request #5601 from qiuming-best/release-1.10
Browse files Browse the repository at this point in the history
[release-1.10] Add Velero File System Backup Performance Guide
  • Loading branch information
blackpiglet authored Nov 17, 2022
2 parents 5dc606b + 0f55a4d commit 2d71b7c
Show file tree
Hide file tree
Showing 4 changed files with 330 additions and 4 deletions.
4 changes: 2 additions & 2 deletions site/content/docs/main/file-system-backup.md
Original file line number Diff line number Diff line change
Expand Up @@ -347,9 +347,9 @@ to be defined by its pod.
- Even though the backup data could be incrementally preserved, for a single file data, FSB leverages on deduplication
to find the difference to be saved. This means that large files (such as ones storing a database) will take a long time
to scan for data deduplication, even if the actual difference is small.
- You may need to [customize the resource limits](/docs/main/customize-installation/#customize-resource-requests-and-limits)
- You may need to [customize the resource limits](customize-installation/#customize-resource-requests-and-limits)
to make sure backups complete successfully for massive small files or large backup size cases, for more details refer to
[Velero File System Backup Performance Guide](https://empty-to-be-created).
[Velero File System Backup Performance Guide](/docs/main/performance-guidance).
- Velero's File System Backup reads/writes data from volumes by accessing the node's filesystem, on which the pod is running.
For this reason, FSB can only backup volumes that are mounted by a pod and not directly from the PVC. For orphan PVC/PV pairs
(without running pods), some Velero users overcame this limitation running a staging pod (i.e. a busybox or alpine container
Expand Down
163 changes: 163 additions & 0 deletions site/content/docs/main/performance-guidance.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,163 @@
---
title: "Velero File System Backup Performance Guide"
layout: docs
---

When using Velero to do file system backup & restore, Restic uploader or Kopia uploader are both supported now. But the resources used and time consumption are a big difference between them.

We've done series rounds of tests against Restic uploader and Kopia uploader through Velero, which may give you some guidance. But the test results will vary from different infrastructures, and our tests are limited and couldn't cover a variety of data scenarios, **the test results and analysis are for reference only**.

## Infrastructure

Minio is used as Velero backend storage, Network File System (NFS) is used to create the persistent volumes (PVs) and Persistent Volume Claims (PVC) based on the storage. The minio and NFS server are deployed independently in different virtual machines (VM), which with 300 MB/s write throughput and 175 MB/s read throughput representatively.

The details of environmental information as below:

```
### KUBERNETES VERSION
root@velero-host-01:~# kubectl version
Client Version: version.Info{Major:"1", Minor:"22", GitVersion:"v1.22.4"
Server Version: version.Info{Major:"1", Minor:"21", GitVersion:"v1.21.14"
### DOCKER VERSION
root@velero-host-01:~# docker version
Client:
Version: 20.10.12
API version: 1.41
Server:
Engine:
Version: 20.10.12
API version: 1.41 (minimum version 1.12)
Go version: go1.16.2
containerd:
Version: 1.5.9-0ubuntu1~20.04.4
runc:
Version: 1.1.0-0ubuntu1~20.04.1
docker-init:
Version: 0.19.0
### NODES
root@velero-host-01:~# kubectl get nodes |wc -l
6 // one master with 6 work nodes
### DISK INFO
root@velero-host-01:~# smartctl -a /dev/sda
smartctl 7.1 2019-12-30 r5022 [x86_64-linux-5.4.0-126-generic] (local build)
Copyright (C) 2002-19, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF INFORMATION SECTION ===
Vendor: VMware
Product: Virtual disk
Revision: 1.0
Logical block size: 512 bytes
Rotation Rate: Solid State Device
Device type: disk
### MEMORY INFO
root@velero-host-01:~# free -h
total used free shared buff/cache available
Mem: 3.8Gi 328Mi 3.1Gi 1.0Mi 469Mi 3.3Gi
Swap: 0B 0B 0B
### CPU INFO
root@velero-host-01:~# cat /proc/cpuinfo | grep name | cut -f2 -d: | uniq -c
4 Intel(R) Xeon(R) Gold 6230R CPU @ 2.10GHz
### SYSTEM INFO
root@velero-host-01:~# cat /proc/version
root@velero-host-01:~# cat /proc/version
Linux version 5.4.0-126-generic (build@lcy02-amd64-072) (gcc version 9.4.0 (Ubuntu 9.4.0-1ubuntu1~20.04.1)) #142-Ubuntu SMP Fri Aug 26 12:12:57 UTC 2022
### VELERO VERSION
root@velero-host-01:~# velero version
Client:
Version: main ###v1.10 pre-release version
Git commit: 9b22ca6100646523876b18a491d881561b4dbcf3-dirty
Server:
Version: main ###v1.10 pre-release version
```

## Test

Below we've done 6 groups of tests, for each single group of test, we used limited resources (1 core CPU 2 GB memory or 4 cores CPU 4 GB memory) to do Velero file system backup under Restic path and Kopia path, and then compare the results.

Recorded the metrics of time consumption, maximum CPU usage, maximum memory usage, and minio strorage usage for node-agent daemonset, and the metrics of Velero deployment are not included since the differences are not obvious by whether using Restic uploader or Kopia uploader.

Compression is either disabled or not unavailable for both uploader.

### Case 1: 4194304(4M) files, 2396745(2M) directories, 0B per file total 0B content
#### result:
|Uploader| Resources|Times |Max CPU|Max Memory|Repo Usage|
|--------|----------|:----:|------:|:--------:|:--------:|
| Kopia | 1c2g |24m54s| 65% |1530 MB |80 MB |
| Restic | 1c2g |52m31s| 55% |1708 MB |3.3 GB |
| Kopia | 4c4g |24m52s| 63% |2216 MB |80 MB |
| Restic | 4c4g |52m28s| 54% |2329 MB |3.3 GB |
#### conclusion:
- The memory usage is larger than Velero's default memory limit (1GB) for both Kopia and Restic under massive empty files.
- For both using Kopia uploader and Restic uploader, there is no significant time reduction by increasing resources from 1c2g to 4c4g.
- Restic uploader is one more time slower than Kopia uploader under the same specification resources.
- Restic has an **irrational** repository size (3.3GB)

### Case 2: Using the same size (100B) of file and default Velero's resource configuration, the testing quantity of files from 20 thousand to 2 million, these groups of cases mainly test the behavior with the increasing quantity of files.

### Case 2.1: 235298(23K) files, 137257 (10k)directories, 100B per file total 22.440MB content
#### result:
| Uploader | Resources|Times |Max CPU|Max Memory|Repo Usage|
|-------|----------|:----:|------:|:--------:|:--------:|
| Kopia | 1c1g |2m34s | 70% |692 MB |108 MB |
| Restic| 1c1g |3m9s | 54% |714 MB |275 MB |

### Case 2.2 470596(40k) files, 137257 (10k)directories, 100B per file total 44.880MB content
#### result:
| Uploader | Resources|Times |Max CPU|Max Memory|Repo Usage|
|-------|----------|:----:|------:|:--------:|:--------:|
| Kopia | 1c1g |3m45s | 68% |831 MB |108 MB |
| Restic| 1c1g |4m53s | 57% |788 MB |275 MB |

### Case 2.3 705894(70k) files, 137257(10k) directories, 100B per file total 67.319MB content
#### result:
|Uploader| Resources|Times |Max CPU|Max Memory|Repo Usage|
|--------|----------|:----:|------:|:--------:|:--------:|
| Kopia | 1c1g |5m06s | 71% |861 MB |108 MB |
| Restic | 1c1g |6m23s | 56% |810 MB |275 MB |

### Case 2.4 2097152(2M) files, 2396745(2M) directories, 100B per file total 200.000MB content
#### result:
|Uploader| Resources|Times |Max CPU|Max Memory|Repo Usage|
|--------|----------|:----:|------:|:--------:|:--------:|
| Kopia | 1c1g |OOM | 74% |N/A |N/A |
| Restic | 1c1g |41m47s| 52% |904 MB |3.2 GB |
#### conclusion:
- With the increasing number of files, there is no memory abnormal surge, the memory usage for both Kopia uploader and Restic uploader is linear increasing, until exceeds 1GB memory usage in Case 2.4 Kopia uploader OOM happened.
- Kopia uploader gets increasingly faster along with the increasing number of files.
- Restic uploader repository size is still much larger than Kopia uploader repository.

### Case 3: 10625(10k) files, 781 directories, 1.000MB per file total 10.376GB content
#### result:
|Uploader| Resources|Times |Max CPU|Max Memory|Repo Usage|
|--------|----------|:----:|------:|:--------:|:--------:|
| Kopia | 1c2g |1m37s | 75% |251 MB |10 GB |
| Restic | 1c2g |5m25s | 100% |153 MB |10 GB |
| Kopia | 4c4g |1m35s | 75% |248 MB |10 GB |
| Restic | 4c4g |3m17s | 171% |126 MB |10 GB |
#### conclusion:
- This case involves a relatively large backup size, there is no significant time reduction by increasing resources from 1c2g to 4c4g for Kopia uploader, but for Restic upoader when increasing CPU from 1 core to 4, backup time-consuming was shortened by one-third, which means in this scenario should allocate more CPU resources for Restic uploader.
- For the large backup size case, Restic uploader's repository size comes to normal

### Case 4: 900 files, 1 directory, 1.000GB per file total 900.000GB content
#### result:
|Uploader| Resources|Times |Max CPU|Max Memory|Repo Usage|
|--------|----------|:-----:|------:|:--------:|:--------:|
| Kopia | 1c2g |2h30m | 100% |714 MB |900 GB |
| Restic | 1c2g |Timeout| 100% |416 MB |N/A |
| Kopia | 4c4g |1h42m | 138% |786 MB |900 GB |
| Restic | 4c4g |2h15m | 351% |606 MB |900 GB |
#### conclusion:
- When the target backup data is relatively large, Restic uploader starts to Timeout under 1c2g. So it's better to allocate more memory for Restic uploader when backup large sizes of data.
- For backup large amounts of data, Kopia uploader is both less time-consuming and less resource usage.

## Summary
- With the same specification resources, Kopia uploader is less time-consuming when backup.
- Performance would be better if choosing Kopia uploader for the scenario in backup large mounts of data or massive small files.
- It's better to set one reasonable resource configuration instead of the default depending on your scenario. For default resource configuration, it's easy to be timeout with Restic uploader in backup large amounts of data, and it's easy to be OOM for both Kopia uploader and Restic uploader in backup of massive small files.
4 changes: 2 additions & 2 deletions site/content/docs/v1.10.0-rc.1/file-system-backup.md
Original file line number Diff line number Diff line change
Expand Up @@ -347,9 +347,9 @@ to be defined by its pod.
- Even though the backup data could be incrementally preserved, for a single file data, FSB leverages on deduplication
to find the difference to be saved. This means that large files (such as ones storing a database) will take a long time
to scan for data deduplication, even if the actual difference is small.
- You may need to [customize the resource limits](/docs/main/customize-installation/#customize-resource-requests-and-limits)
- You may need to [customize the resource limits](customize-installation/#customize-resource-requests-and-limits)
to make sure backups complete successfully for massive small files or large backup size cases, for more details refer to
[Velero File System Backup Performance Guide](https://empty-to-be-created).
[Velero File System Backup Performance Guide](/docs/v1.10/performance-guidance).
- Velero's File System Backup reads/writes data from volumes by accessing the node's filesystem, on which the pod is running.
For this reason, FSB can only backup volumes that are mounted by a pod and not directly from the PVC. For orphan PVC/PV pairs
(without running pods), some Velero users overcame this limitation running a staging pod (i.e. a busybox or alpine container
Expand Down
Loading

0 comments on commit 2d71b7c

Please sign in to comment.