Skip to content

Commit

Permalink
support scale tikv/tiflash simultaneously
Browse files Browse the repository at this point in the history
Signed-off-by: Zheming Li <nkdudu@126.com>
  • Loading branch information
lizhemingi committed Aug 10, 2022
1 parent 00002ed commit a8d771c
Show file tree
Hide file tree
Showing 25 changed files with 10,629 additions and 5,189 deletions.
70 changes: 70 additions & 0 deletions docs/api-references/docs.md
Original file line number Diff line number Diff line change
Expand Up @@ -13199,6 +13199,48 @@ bool
</tr>
</tbody>
</table>
<h3 id="scalepolicy">ScalePolicy</h3>
<p>
(<em>Appears on:</em>
<a href="#tiflashspec">TiFlashSpec</a>,
<a href="#tikvspec">TiKVSpec</a>)
</p>
<p>
</p>
<table>
<thead>
<tr>
<th>Field</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td>
<code>scaleInParallelism</code></br>
<em>
int32
</em>
</td>
<td>
<em>(Optional)</em>
<p>ScaleInParallelism configures max scale in replicas for TiKV stores.</p>
</td>
</tr>
<tr>
<td>
<code>scaleOutParallelism</code></br>
<em>
int32
</em>
</td>
<td>
<em>(Optional)</em>
<p>ScaleOutParallelism configures max scale out replicas for TiKV stores.</p>
</td>
</tr>
</tbody>
</table>
<h3 id="secretorconfigmap">SecretOrConfigMap</h3>
<p>
(<em>Appears on:</em>
Expand Down Expand Up @@ -16459,6 +16501,20 @@ Failover
<p>Failover is the configurations of failover</p>
</td>
</tr>
<tr>
<td>
<code>scalePolicy</code></br>
<em>
<a href="#scalepolicy">
ScalePolicy
</a>
</em>
</td>
<td>
<em>(Optional)</em>
<p>ScalePolicy is the scale configuration for TiFlash</p>
</td>
</tr>
</tbody>
</table>
<h3 id="tikvbackupconfig">TiKVBackupConfig</h3>
Expand Down Expand Up @@ -20534,6 +20590,20 @@ bool
If you set it to <code>true</code> for an existing cluster, the TiKV cluster will be rolling updated.</p>
</td>
</tr>
<tr>
<td>
<code>scalePolicy</code></br>
<em>
<a href="#scalepolicy">
ScalePolicy
</a>
</em>
</td>
<td>
<em>(Optional)</em>
<p>ScalePolicy is the scale configuration for TiKV</p>
</td>
</tr>
</tbody>
</table>
<h3 id="tikvstatus">TiKVStatus</h3>
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,66 @@
# Scale in/out multiple TiKV/TiFlash instances simultaneously

## Summary

This document presents a design to scale in/out multiple TiKV/TiFlash instances simultaneously in one sync loop.

## Motivation

Multiple TiKV/TiFlash instances can be scaled simultaneously to speed up the schedule.

### Goals

* Support scale multiple TiKV/TiFlash instances in one sync loop.

### Non-Goals

* Scale PD or other components simultaneously.

## Proposal

### Scale in

* Add `scalePolicy` in `spec.tikv` and `spec.tiflash` to define scale related arguments.
* Add `scaleInParallelism` in `scalePolicy` to specify the max instances can be scaled-in in one sync loop.
* Default value of `scaleInParallelism` would be 1 when it's not set, for backward compatibility.
* Extend `scaleOne` in `scaler.go` to `scaleMulti`, function signature would be like:
```go
// scaleMulti calculates desired replicas and delete slots from actual/desired
// StatefulSets by allowing multiple pods to be deleted or created
// it returns following values:
// - scaling:
// - 0: no scaling required
// - 1: scaling out
// - -1: scaling in
// - ordinals: pod ordinals to create or delete
// - replicas/deleteSlots: desired replicas and deleteSlots by allowing no more than maxCount pods to be deleted or created
func scaleMulti(actual *apps.StatefulSet, desired *apps.StatefulSet, maxCount int) (scaling int, ordinals []int32, replicas int32, deleteSlots sets.Int32)
`````````
* Call `scaleMulti` to get ordinals to be scaled-in, recorded as A.
* Call PD API to get store info, which will be used during this sync loop.
* For all ordinals to be scaled-in in this loop:
* Check if the number of stores with `up` state (exclude already deleted store in this round) is more than `Replication.MaxReplicas` in PD config.
* Call PD API to delete store until its state changes to `offline`.
* When store becomes tombstone, add defer deleting annotation to the PVCs of the corresponding pod to be deleted.
* If the current store is `tombstone` and the defer deleting annotation has been added to the PVCs, mark the corresponding ordinal as finished, otherwise ongoing.
* Call `setReplicasAndDeleteSlotsByFinished` to delete pod:
* Since native StatefulSet will always scale in Pod with the largest order so we should assure the ordinals from largest to smallest __strictly__ are finished.
* Count the __continuous__ finished ordinal beginning from largest in A, recorded as c, then the final replicas will be `replicas - c`.

### Scale out

* Add `scaleOutParallelism` in `scalePolicy` to specify the max instances can be scaled-in in one sync loop.
* Default value of `scaleOutParallelism` would be 1 when it's not set, for backward compatibility.
* Call `scaleMulti` to get ordinals to be scaled-out, recorded as A.
* For all ordinals to be scaled-out in this loop:
* Call `deleteDeferDeletingPVC` to clean all PVCs with the defer deleting annotation, mark the corresponding original as finished if succeed, otherwise ongoing.
* Call `setReplicasAndDeleteSlotsByFinished` to create pod:
* Count the __continuous__ finished original beginning from smallest order in A, recorded as c, then the final replicas will be `replicas + c`.

### Test Plan

* [Test Plan](https://docs.google.com/document/d/1XgreMvP6Sx7KrwMwVJn4ZldYWhs5s6oXj4Bcn9FvajI/edit).

## Drawbacks

* Scale in/out multiple instances simultaneously will cause more region replicas to be rescheduled at the same time.
Loading

0 comments on commit a8d771c

Please sign in to comment.