Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs: Add proposal for instance scoped variables #30558

Merged
merged 19 commits into from
Jan 19, 2022
Merged
Changes from 4 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
187 changes: 187 additions & 0 deletions docs/design/2021-12-08-instance-scope.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,187 @@
# TiDB Design Documents

- Author(s): [morgo](http://github.com/morgo)
- Discussion PR: https://github.com/pingcap/tidb/pull/30558
- Tracking Issue: https://github.com/pingcap/tidb/issues/30366

## Table of Contents

* [Introduction](#introduction)
* [Motivation or Background](#motivation-or-background)
* [Detailed Design](#detailed-design)
* [Test Design](#test-design)
* [Impacts & Risks](#impacts--risks)
* [Investigation & Alternatives](#investigation--alternatives)
* [Unresolved Questions](#unresolved-questions)

## Introduction

Currently, TiDB has two primary methods of configuration:

- The configuration file (in `.toml` format) ([Docs](https://docs.pingcap.com/tidb/stable/tidb-configuration-file))
morgo marked this conversation as resolved.
Show resolved Hide resolved
- Using MySQL compatible system variables (`SET GLOBAL sysvar=x`, `SET SESSION sysvar=x`) ([Docs](https://docs.pingcap.com/tidb/stable/system-variables))

Both the **semantics** and **naming conventions** differ between these two methods:
Copy link
Member

@breezewish breezewish Dec 10, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you supply a real user scenario that would require this non-persisted instance scoped variables? From my perspective, setting a variable in instance level but not persisting it will lead to confusion in many cases. For example, if I really would like to enable a feature for only one instance, I would be very surprised that the change is reverted at some point, cause TiDB is designed to be stateless and frequent restart is allowed to happen.

More over, considering that in many cases user is using TiDB behind a LB, and there may be even no way to access individual instances (for example, in TiDB Cloud), would it be better to allow remotely setting the instance variables?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is almost more a question of what should be a candidate for instance scope. Eason's opinion is that we should always default to global scope for management simplicity, and actually few things should be instance. I think CRDB gets this right in that the node specific options almost always refer to local resources (max memory, tmpdir path, log file path, port, socket). We probably get a few things wrong currently, but are mostly on the right path.

If we look from these examples: you might need to shuffle around the tmpdir on a particular TiDB server instance to keep it up and running (and might not want to restart it). So you would have to log into the TiDB instance to make this change, but it would not affect users running through a LB.

As to the semantic of "reverted", if we were to persist the change, it is not really any less confusing because there is still a precedence problem of config file vs. saved in TiDB (i.e. "I set this option in the config file but it's not working!"). The semantic is also how MySQL works (although there is a newer feature called SET PERSIST), TiDB is special in that it persists all global config.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have added a section on "Documentation Changes", PTAL. I think this is easier to explain

Copy link
Member

@breezewish breezewish Dec 11, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the supply! Does this mean that, we will have a clear rule that for these settings, an INSTANCE variable can be used to temporarily updating the config (and taking effect immediately), while user should use config file for persistently updating the config (but taking effect after a reboot)?

I do like CRDB's config mechanism a lot. There aren't a lot of ways like TiDB to configure. They only have two, which is very easy for user to learn. User will not be confused about persistence or things like that:

a. CLI Args - Instance level, restart to take effect, persisted
b. Variable - Cluster level, take effect immediately, persisted

Notice that under CRDB's model it is not possible to adjust some instance level things and take effect without a restart.

Copy link
Contributor Author

@morgo morgo Dec 15, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Notice that under CRDB's model it is not possible to adjust some instance level things and take effect without a restart.

Yes exactly. The other limitation is explaining why 2 nodes are behaving differently: in MySQL SHOW GLOBAL VARIABLES captures all configuration, but since memory in CRDB is a node-level option, it wouldn't be captured, but it is in this proposal since everything has a sysvar mapping (step 2).

Edit: even if TiDB only has 2 sources currently, the split between them is not intuitive to users. We will benefit from a source of truth.

Does this mean that, we will have a clear rule that for these settings, an INSTANCE variable can be used to temporarily updating the config (and taking effect immediately), while user should use config file for persistently updating the config (but taking effect after a reboot)?

Yes, I propose in documentation we describe the semantic as "Persists to Cluster: Bool" (instead of INSTANCE vs. GLOBAL). Persists to Cluster: No is identical to how MySQL global variables, so it is easy to explain. We do not need to use the term Instance scope vs. global scope for users - since they are both set with SET GLOBAL.


1. Configuration file settings only apply to a single TiDB server instance. Making changes to settings managed by the configuration file requires restarting TiDB server.
2. System variables _natively_ manage settings that have _session_ (connection) or _global_ (cluster-wide) scope. However, some _session_ scoped variables are used to allow instance configuration. This is not a feature that is natively supported, and the usage is often confusing.
3. The configuration file uses a heirachy of sections such as `[performance]` or `[experimental]`.
4. System variables use are flat naming convention (but often use a prefix of `tidb_feature_XXX`). Thus mapping between the two is not straight forward.

This proposal introduces _native_ support for `INSTANCE` scoped variables, such that system variables offer the superset of functionality of configuration files. It does this using a much simplified implementation from earlier proposals: an individual system variable **must not** permit both `GLOBAL` and `INSTANCE` scope. Implementing with this restriction is key to this proposal; since it avoids a confusing set of precedence rules which are hard to understand. For alternative proposals see "investigation and alternatives".

## Motivation or Background

The motivation for this proposal is ease of use and maintainability:

1. It's not clear if each setting should be a configuration file setting or a system variable (we usually chose system variable, but there is no clear rule). After this proposal is implemented, we can make every setting available as a system variable (even if read-only).
2. We are being asked to make additional configuration file settings dynamically configuration (via a system variable). When we do this, the naming is inconsistent. After this proposal is implemented, we can allow the `.toml` file to support an `[instance]` section where the flat named for system variables can be used. This makes it identical to how system variables are configured in MySQL.
3. The current system is hard for users. It's hard to explain the two systems, and hard to explain the differences between them. Often "INSTANCE" scoped system variables are incorrectly documented as `SESSION`, since it's not a native behavior. Explaining how `SESSION` scope is hijacked is difficult for MySQL users to understand, because usually changes in a different session should not affect your session.

## Detailed Design

### Stage 1: Initial Implementation

From a user-oriented point of view, setting an instance scoped sysvar is the same as setting a `GLOBAL` variable:

```sql
SET GLOBAL max_connections=1234;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will this confuse the customer that SET GLOBAL is used by instance and global sysvars? How could they know if a var is instance or global level?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, you will need to read the docs to see if a variable is INSTANCE or GLOBAL scoped. This confusion is existing with the session scope, but actually worse since it's not handled by the framework the docs for many INSTANCE variables say they are SESSION scoped (we will finally be able to auto generate docs and say something is "INSTANCE" scoped versus a manual workaround: https://github.com/pingcap/docs/blob/8600643f2fcd20d70d8ad8f637009995948c48c6/scripts/generate-system-variables.go#L142-L152 ).

The semantics for INSTANCE are also much closer to global than session, with only two exceptions:

  • Changes don't propagate to other tidb servers.
  • Changes don't persist.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have added a section on "Documentation Changes", PTAL

```

Because `INSTANCE` and `GLOBAL` are mutually exclusive, the only semantic difference is that changes to `INSTANCE` scoped variables are not persisted and are not propagated to other TiDB servers.

From a sysvar framework perspective the changes required are quite minimal. A new scope is added, and validating if setting permits `GLOBAL` scope also permits `INSTANCE` scope:

```
+++ b/sessionctx/variable/sysvar.go
@@ -56,6 +56,8 @@ const (
ScopeGlobal ScopeFlag = 1 << 0
// ScopeSession means the system variable can only be changed in current session.
ScopeSession ScopeFlag = 1 << 1
+ // ScopeInstance means it is similar to global but doesn't propagate to other TiDB servers.
+ ScopeInstance ScopeFlag = 1 << 2

// TypeStr is the default
TypeStr TypeFlag = 0
@@ -257,6 +259,11 @@ func (sv *SysVar) HasGlobalScope() bool {
return sv.Scope&ScopeGlobal != 0
}

+// HasInstanceScope returns true if the scope for the sysVar includes global or instance
+func (sv *SysVar) HasInstanceScope() bool {
+ return sv.Scope&ScopeInstance != 0
+}
+
// Validate checks if system variable satisfies specific restriction.
func (sv *SysVar) Validate(vars *SessionVars, value string, scope ScopeFlag) (string, error) {
// Check that the scope is correct first.
@@ -313,7 +320,7 @@ func (sv *SysVar) validateScope(scope ScopeFlag) error {
if sv.ReadOnly || sv.Scope == ScopeNone {
return ErrIncorrectScope.FastGenByArgs(sv.Name, "read only")
}
- if scope == ScopeGlobal && !sv.HasGlobalScope() {
+ if scope == ScopeGlobal && !(sv.HasGlobalScope() || sv.HasInstanceScope()) {
```

The session package handles loading/saving GLOBAL variable values. It will need minor changes to make Instance Scope a noop operation (since persistence is not supported, and in the initial implementation a `GetGlobal()` function will be used to retrieve the instance value:

```
+++ b/session/session.go
@@ -1103,6 +1103,10 @@ func (s *session) GetGlobalSysVar(name string) (string, error) {
return "", variable.ErrUnknownSystemVar.GenWithStackByArgs(name)
}

+ if sv.HasInstanceScope() { // has INSTANCE scope only, not pure global
+ return "", errors.New("variable has only instance scope and no GetGlobal func. Not sure how to handle yet.")
+ }
+
sysVar, err := domain.GetDomain(s).GetGlobalVar(name)
if err != nil {
// The sysvar exists, but there is no cache entry yet.
@@ -1121,6 +1125,7 @@ func (s *session) GetGlobalSysVar(name string) (string, error) {
}

// SetGlobalSysVar implements GlobalVarAccessor.SetGlobalSysVar interface.
+// it is used for setting instance scope as well.
func (s *session) SetGlobalSysVar(name, value string) (err error) {
sv := variable.GetSysVar(name)
if sv == nil {
@@ -1132,6 +1137,9 @@ func (s *session) SetGlobalSysVar(name, value string) (err error) {
if err = sv.SetGlobalFromHook(s.sessionVars, value, false); err != nil {
return err
}
+ if sv.HasInstanceScope() { // skip for INSTANCE scope
+ return nil
+ }
if sv.GlobalConfigName != "" {
domain.GetDomain(s).NotifyGlobalConfigChange(sv.GlobalConfigName, variable.OnOffToTrueFalse(value))
}
@@ -1148,6 +1156,9 @@ func (s *session) SetGlobalSysVarOnly(name, value string) (err error) {
if err = sv.SetGlobalFromHook(s.sessionVars, value, true); err != nil {
return err
}
+ if !sv.HasInstanceScope() { // skip for INSTANCE scope
+ return nil
+ }
return s.replaceGlobalVariablesTableValue(context.TODO(), sv.Name, value)
}
```

A "non-native" `INSTANCE` variable can be changed to a native one as follows:

```
+ {Scope: ScopeInstance, Name: TiDBGeneralLog, Value: BoolToOnOff(DefTiDBGeneralLog), Type: TypeBool, skipInit: true, SetGlobal: func(s *SessionVars, val string) error {
ProcessGeneralLog.Store(TiDBOptOn(val))
return nil
- }, GetSession: func(s *SessionVars) (string, error) {
+ }, GetGlobal: func(s *SessionVars) (string, error) {
return BoolToOnOff(ProcessGeneralLog.Load()), nil
}},
```

This introduces a compatibility issue, since users who have previously configured instance scope with `SET [SESSION] tidb_general_log = x` will receive an error because the scope is wrong. This can be fixed by adding a legacy mode to the sysvar framework which allows `INSTANCE` scoped variables to be set with either `SET GLOBAL` or `SET SESSION`:

```sql
SET GLOBAL tidb_enable_legacy_instance_scope = 1;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking together with instance semantic: https://github.com/pingcap/tidb/pull/30558/files#diff-af084f8d970428eecaaeb42d9b43dcd9e0155602726dec822bda47e0071a0896R51

I guess when tidb_enable_legacy_instance_scope is enabled, the user may face instance-level var with both 'SESSION' and 'GLOBAL' keywords:

SET SESSION tidb_general_log = x

SET GLOBAL max_connections = x

Both of the two SETs are actually working in 'instance' scope. Am I right? if so I think this can be confusing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's right. We should add a warning for the SET SESSION version. I agree it is confusing, which is why I suggest we consider later changing the default.

Unfortunately it is required to keep compatibility. But it is important to transition INSTANCE to using the GLOBAL keyword, because SESSION semantics are different.

```

We can default to `TRUE` for now, but should revisit this in the future.

### Stage 2: Mapping All Configuration File Settings to System Variables
morgo marked this conversation as resolved.
Show resolved Hide resolved

The second stage is to map all configuration file settings to system variable names. In MySQL any configuration file setting is also available as a system variable (even if it is read only), which makes it possible to get the configuration of the cluster with `SHOW GLOBAL VARIABLES`. Once this step is complete, it should also be possible to offer an `[instance]` section of the configuration file which accepts the system variable names (and finally consistent naming between the two systems). We can then modify the example configuration files to be based on the flat `[instance]` configuration:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this mean that if I change the value of an system variable, the value in the configuration file will be changed as well ? or those var from config file is readonly?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The default will be that we map variables as read-only, so we don't need to think about the specific cases (i.e. you can't easily change the socket or port). This helps us achieve completeness first, and then we can then evaluate which ones can be made dynamic.

When you modify an instance scoped system variable, it modifies in memory only (similar to SET GLOBAL in MySQL). This does not propose modifying or rewriting the config file (that would make it lose formatting, and it gets complex quickly).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added this clarification to the proposal, so I'm going to resolve this conversation.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Why do we offer system variables in the configuration file? To allow users to set the initial values through configuration files? Or to be compatible with MySQL?
  • Initial values of some system variables are already configured in the configuration file, so they need to be deduplicated from showing global variables or the configuration file.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we offer system variables in the configuration file? To allow users to set the initial values through configuration files? Or to be compatible with MySQL?

The goal is to (1) have system variables show the superset of all configuration and (2) have unified names between configuration. These do technically improve MySQL compatibility, but I would say these are also good practices.

In the case of instance variables, they do not persist on restart so a configuration file is required to handle this case. In cockroachdb they don't have this though: node (their term for instance) is only supported as startup parameters, everything else is a cluster wide variable. But I think this is also an incompatible change.. by making every config setting available through systerm variables and settable in the config file using the system variable name (under [instance] heading) gives us a better transition state to unification.

Initial values of some system variables are already configured in the configuration file, so they need to be deduplicated from showing global variables or the configuration file.

Because instance values do not persist, this behavior is straight forward. The configuration file values will be taken, and if not specified, compiled server defaults. This is also technically a MySQL compatible way for GLOBAL variables.


```
# tidb.toml
[instance]
max_connections = 1234
tidb_general_log = "/path/to/file"
morgo marked this conversation as resolved.
Show resolved Hide resolved
```

### Stage 3: Refactoring

The use of a `GetGlobal()` and `SetGlobal()` func for each instance scoped system variable is not ideal. It is possible to refactor the system variable framework so that instance scope is stored in a map, and the values are updated automatically by `SET GLOBAL` on an instance scoped variable. On startup, as the configuration file is parsed it will update the values in the map. This seems like a better approach than the current use of Setters/Getters, and because there is a prescribed way of doing it we can correctly handle the data races that are common with our current incorrect usage of calling `config.GetGlobalConfig()`.

Thus, the source of truth for instance scoped variables moves from the `config` package to another part of the server (likely `domain`). See [issue #30366](https://github.com/pingcap/tidb/issues/30366).

## Impacts & Risks

Biggest risk is that we can not agree on a reduced scope of implementation and the project extends to cover increased scope. Configuration management is a classic example of a [bikeshed problem](https://en.wikipedia.org/wiki/Law_of_triviality), and we will likely need to make some tradeoffs to get anywhere.

Many of the alternatives are difficult to implement because they break compatibility (we need to live with `GLOBAL` means cluster `GLOBAL`, and there is no `SET CLUSTER`) or they break rolling upgrade scenarios because we currently have no specific rules of the minimum version of TiDB which can be upgraded from.

## Investigation & Alternatives

- There is no equiverlant functionality to reference in MySQL since it does not have the native concept of a cluster.
- An [earlier proposal](https://docs.google.com/document/d/1RuajYAFVsjJCwCBpIYF--9jtgxDZZcz3cbP-iVuLxJI/edit) for `INSTANCE` scope, with rules for precedence (same author)
- CockroachDB has cluster level settings and node-level settings. Most settings are cluster level, and [node-level](https://www.cockroachlabs.com/docs/v21.2/cockroach-start) needs to be parsed as arguments when starting the server.

## Unresolved Questions

There are **very few** known scenarios where a system variable is *currently both* instance and global scoped, but these individually would need to be handled:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still don't get it: why do we replace global with instance instead of replacing session with instance? To users, they are both hard to understand and you need to explain it in the user docs anyway.
Are there any cases where a system variable is both session and instance scoped? I don't think so.

Copy link
Contributor Author

@morgo morgo Dec 10, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's correct, there are no cases where a variable is both session and instance.

Session by semantics is supposed to be thread-local-storage. When the initial connection is created to MySQL it will copy all the values of global variables that also have session scope, and any further changes to global will not affect those session changes. It should only be changes that you make in your connection that change the value.

What I am calling INSTANCE is actually the identical semantics to MySQL GLOBAL. They are much closer in behavior: the only difference is you will need to check the docs to see if it is an INSTANCE-GLOBAL or a GLOBAL-GLOBAL if you want to know if your change (1) propagates to the cluster and (2) persists.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is listed as motivation #3 in "Motivation and Background".

Copy link
Contributor

@djshow832 djshow832 Dec 10, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know the semantics between session and instance is different, but the semantics between instance and global is also different. From users' view:

  • If you set a global system variable and then reboot TiDB, the setting should still take effect, but it actually does not.
  • If you set a global system variable and then connect to TiDB from another session, the setting should take effect, but it may not.

So I don't think instance is closer to session.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you set a global system variable and then connect to TiDB from another session, the setting should take effect, but it may not.

This is not the case since ~TiDB 5.2. On a single-instance basis if you set a GLOBAL system variable and then connect from another session it will take effect unless the scope is SESSION + GLOBAL, and the other session pre-originated the setting. On a multi-instance basis there is an etcd notification sent immediately, but there is a race where you could beat it.

So I don't think instance is closer to session.

I was arguing it is closer to MySQL GLOBAL, or rather it is identical. TiDB GLOBAL does not exist in MySQL.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have added a section on "Documentation Changes", PTAL


- `tidb_store_limit`: Suggestion is to convert to global only ([issue #30515](https://github.com/pingcap/tidb/issues/30515))
- `tidb_stmt_summary_XXX`: Suggestion is to convert to global only.
- `tidb_enable_stmt_summary`: Convert to instance scope only.
morgo marked this conversation as resolved.
Show resolved Hide resolved

Thus, turning on statement summary is a per-server decision, but the configuration for statement summary is a per-cluster decision.