Skip to content

Commit

Permalink
[#2742] YSQL: Enable src/test/isolation Pg tests using java framework…
Browse files Browse the repository at this point in the history
… and fix rare SKIP LOCKED bugs

Summary:
Part 1
-------
Add java class org.yb.pgsql.TestPgIsolationRegress to support running tests in
src/test/isolation (similar to src/test/regress).

Enable tests skip-locked and skip-locked-2. Without a fix that is explained in
Part 2 below, the skip-locked test would fail rarely due a bug (also explained
in Part 2).

Two extra tests are added -

(1) yb-modification-followed-by-lock to test that a modification doesn't wait on
a lock that was held by a recently committed transaction. This test would have
failed rarely without the fix in part 2 due to the same bug that affects the
skip-locked test.

(2) yb-skip-locked-after-update to assert that a statement with SKIP LOCKED
actually skips locking a tuple that was concurrently modified by another
transaction. The test asserts the skip locked behaviour for three different
states of the concurrent transaction that modified the tuple: pending,
committed but not applied, committed and applied.

Right now, the assertions fail in the latter 2 cases where the transaction has
committed but not applied or has committed and applied. Part 3 explains the
reason for the bug along with the fix.

Part 2
-------

(1) In skip-locked test, the step s2b rarely results in a serialization error
in permutation [s1a s1b s2a s1c s2b s2c]. This happens if the intents of s1
haven't yet moved to regular db before conflict resolution of s2b starts. The
following steps in conflict resolution of s2b lead to a serialization error in
this scenario -
  (a) s2b sees a conflicting row lock intent in intents db that corresponds to
      transaction in s1 (in ReadIntentConflicts())
  (b) s2b finds s1's transaction to be committed and hence aborts itself (as
      part of CheckLocalCommits())

In step b), it is incorrect to abort if the conflicting intents of committed
transaction 1 were only corresponding to explicit lock and not any modification
to the regular db (which is true for statements s1a and s1b). Instead, s2b
should ignore such intents of transaction 1 since they are only explicit row
lock intents and locks are released post commit. The fix adds logic to ignore
such conflicting intents of committed transactions which correspond to only
explicit row lock intents (i.e., those which can only block but not change any
data). This is achieved by usage of a new field "all_lock_only_conflicts" in
conflict_resolution.cc

(2) In cases other than (1) i.e., if the transaction in s1 has already been
applied before conflict resolution for s2b is done, there is no serialization
error. This is because no intents are moved to regular db as part of the "apply"
step of s1 (which in turn is because the intents are lock-only intents and not
a modification). And so, s2b doesn't see any conflicts in both intents db and
regular db.

This fix of ignoring explicit lock only intents of committed transactions solves
the rare issue mentioned in skip-locked test above which occurs if steps
described in (1) happen. Moreover, the fix also ensure that there is no
serialization error in the newly added test yb-modification-followed-by-lock (
the error would have occurred in this test as well without the fix).

Part 3
------
As mentioned earlier, SKIP LOCKED functionality fails in
yb-skip-locked-after-update if the concurrently modified tuple is part of a
transaction that has committed but not applied or has committed and applied. The
reason for this is:

(1) "committed and applied" case: currently, an explicit locking rpc with SKIP
LOCKED will abort if it finds a modified row in regular db with commit time
after read time. We should instead skip locking this row exactly as done in
CheckPriorityInternal() in D13352 i.e., by throwing
TransactionErrorCode::kSkipLocking if the policy is WAIT_SKIP. We check for
conflicts with regular db in StrongConflictChecker::Check() and the fix is to
add the extra handling here.

(2) "committed but not applied" case: even if we don't find conflicting
modifications in regular db that would lead to aborting or skipping the tuple
(i.e., case (1)), we might still find committed conflicting modifications in
intents db. This might happen if a transaction that modified the tuple has
committed but hasn't been applied yet (i.e., intents haven't moved to regular db
yet). Such conflicts are handled in CheckConflictWithCommitted(). Right now we
just abort here or ignore the conflict if all_lock_only_conflicts=true after fix
in Part 1. An extra check similar to (1) is needed here.

These two bugs arise because we missed to throw the
TransactionErrorCode::kSkipLocking error if the policy is WAIT_SKIP at the two
places mentioned above apart from CheckPriorityInternal() in D13352 (the diff
that added SKIP LOCKED functionality).

Note that case (2) is very hard to simulate in test with small transactions that
don't spend much time in moving intents to regular db in the apply stage. To
ensure we test the case reliably, testIsolationRegressWithDelayedTxnApply() has
been added to TestPgIsolationRegress. It sets
TEST_inject_sleep_before_applying_intents_ms to ensure a transaction sleeps in
the apply stage so that case (2) is hit with certainty.

Test Plan: ./yb_build.sh --java-test org.yb.pgsql.TestPgIsolationRegress#testIsolationRegress

Reviewers: mihnea, sergei, jason, dmitry

Reviewed By: dmitry

Subscribers: alex, rsami, jason, neil, yql

Differential Revision: https://phabricator.dev.yugabyte.com/D14362
  • Loading branch information
pkj415 committed Feb 5, 2022
1 parent 323f2ca commit 85531fb
Show file tree
Hide file tree
Showing 18 changed files with 916 additions and 114 deletions.
33 changes: 9 additions & 24 deletions java/yb-pgsql/src/test/java/org/yb/pgsql/BasePgSQLTest.java
Original file line number Diff line number Diff line change
Expand Up @@ -109,11 +109,11 @@ public class BasePgSQLTest extends BaseMiniClusterTest {

protected static boolean pgInitialized = false;

public void runPgRegressTest(File inputDir, String schedule, long maxRuntimeMillis)
throws Exception {
public void runPgRegressTest(
File inputDir, String schedule, long maxRuntimeMillis, File executable) throws Exception {
final int tserverIndex = 0;
PgRegressRunner pgRegress = new PgRegressRunner(inputDir, schedule, maxRuntimeMillis);
ProcessBuilder procBuilder = new PgRegressBuilder()
ProcessBuilder procBuilder = new PgRegressBuilder(executable)
.setDirs(inputDir, pgRegress.outputDir())
.setSchedule(schedule)
.setHost(getPgHost(tserverIndex))
Expand All @@ -127,36 +127,21 @@ public void runPgRegressTest(File inputDir, String schedule, long maxRuntimeMill
}

public void runPgRegressTest(File inputDir, String schedule) throws Exception {
runPgRegressTest(inputDir, schedule, 0 /* maxRuntimeMillis */);
runPgRegressTest(
inputDir, schedule, 0 /* maxRuntimeMillis */,
PgRegressBuilder.PG_REGRESS_EXECUTABLE);
}

public void runPgRegressTest(String schedule, long maxRuntimeMillis) throws Exception {
File inputDir = PgRegressBuilder.getPgRegressDir();
runPgRegressTest(inputDir, schedule, maxRuntimeMillis);
runPgRegressTest(
PgRegressBuilder.PG_REGRESS_DIR /* inputDir */, schedule, maxRuntimeMillis,
PgRegressBuilder.PG_REGRESS_EXECUTABLE);
}

public void runPgRegressTest(String schedule) throws Exception {
runPgRegressTest(schedule, 0 /* maxRuntimeMillis */);
}

private static int getRetryableRpcSingleCallTimeoutMs() {
if (TestUtils.isReleaseBuild()) {
return 10000;
} else if (TestUtils.IS_LINUX) {
if (BuildTypeUtil.isASAN()) {
return 20000;
} else if (BuildTypeUtil.isTSAN()) {
return 45000;
} else {
// Linux debug builds.
return 15000;
}
} else {
// We get a lot of timeouts in macOS debug builds.
return 45000;
}
}

public static void perfAssertLessThan(double time1, double time2) {
if (TestUtils.isReleaseBuild()) {
assertLessThan(time1, time2);
Expand Down
117 changes: 67 additions & 50 deletions java/yb-pgsql/src/test/java/org/yb/pgsql/PgRegressBuilder.java
Original file line number Diff line number Diff line change
Expand Up @@ -25,23 +25,29 @@
import java.io.PrintWriter;
import java.util.ArrayList;
import java.util.Arrays;
import java.util.Collection;
import java.util.List;
import java.util.Map;
import java.util.TreeMap;

/**
* Build a ProcessBuilder for pg_regress. Also, set up the output directory.
*/
public class PgRegressBuilder {
protected static final Logger LOG = LoggerFactory.getLogger(PgRegressBuilder.class);

protected static File pgBinDir = new File(TestUtils.getBuildRootDir(), "postgres/bin");
protected static File pgRegressDir = new File(TestUtils.getBuildRootDir(),
"postgres_build/src/test/regress");
protected static File pgRegressExecutable = new File(pgRegressDir, "pg_regress");
protected static final File PG_REGRESS_DIR = new File(TestUtils.getBuildRootDir(),
"postgres_build/src/test/regress");
public static final File PG_REGRESS_EXECUTABLE = new File(PG_REGRESS_DIR, "pg_regress");

protected static final File PG_ISOLATION_REGRESS_DIR =
new File(TestUtils.getBuildRootDir(), "postgres_build/src/test/isolation");
public static final File PG_ISOLATION_REGRESS_EXECUTABLE = new File(PG_ISOLATION_REGRESS_DIR,
"pg_isolation_regress");

protected static final File pgBinDir = new File(TestUtils.getBuildRootDir(), "postgres/bin");
protected File executable;
protected List<String> args;
protected Map<String, String> extraEnvVars;
protected Map<String, String> extraEnvVars = new TreeMap<>();

protected File inputDir;
protected File outputDir;
Expand All @@ -50,16 +56,20 @@ public static File getPgBinDir() {
return pgBinDir;
}

public static File getPgRegressDir() {
return pgRegressDir;
private boolean isRegressExecutable() {
return executable.compareTo(PG_REGRESS_EXECUTABLE) == 0;
}

protected PgRegressBuilder() {
protected PgRegressBuilder(File executable) {
this.executable = executable;
this.args = new ArrayList<>(Arrays.asList(
pgRegressExecutable.toString(),
executable.toString(),
"--bindir=" + pgBinDir,
"--dlpath=" + pgRegressDir,
"--use-existing"));

if (isRegressExecutable()) {
args.add("--dlpath=" + PG_REGRESS_DIR);
}
}

public PgRegressBuilder setDirs(File inputDir, File outputDir) throws RuntimeException {
Expand All @@ -71,32 +81,34 @@ public PgRegressBuilder setDirs(File inputDir, File outputDir) throws RuntimeExc
throw new RuntimeException("Failed to create directory " + outputDir);
}

// Copy files needed by pg_regress. "input" and "ouput" don't need to be copied since they can
// be read from inputDir. Their purpose is to generate files into "expected" and "sql" in
// outputDir (implying "expected" and "sql" should be copied). "data" doesn't need to be copied
// since it's only read from (by convention), which can be done from inputDir.
try {
for (String name : new String[]{"expected", "sql"}) {
FileUtils.copyDirectory(new File(inputDir, name), new File(outputDir, name));
if (isRegressExecutable()) {
// Copy files needed by pg_regress. "input" and "ouput" don't need to be copied since they
// can be read from inputDir. Their purpose is to generate files into "expected" and "sql" in
// outputDir (implying "expected" and "sql" should be copied). "data" doesn't need to be
// copied since it's only read from (by convention), which can be done from inputDir.
try {
for (String name : new String[]{"expected", "sql"}) {
FileUtils.copyDirectory(new File(inputDir, name), new File(outputDir, name));
}
} catch (IOException ex) {
LOG.error("Failed to copy a directory from " + inputDir + " to " + outputDir);
throw new RuntimeException(ex);
}
} catch (IOException ex) {
LOG.error("Failed to copy a directory from " + inputDir + " to " + outputDir);
throw new RuntimeException(ex);
}

// TODO(dmitry): Workaround for #1721, remove after fix.
try {
for (File f : (new File(outputDir, "sql")).listFiles()) {
if (!f.setWritable(true)) {
throw new IOException("Couldn't set write permissions for " + f.getAbsolutePath());
}
try (FileWriter fr = new FileWriter(f, true)) {
fr.write("\n-- YB_DATA_END\nROLLBACK;DISCARD TEMP;");
// TODO(dmitry): Workaround for #1721, remove after fix.
try {
for (File f : (new File(outputDir, "sql")).listFiles()) {
if (!f.setWritable(true)) {
throw new IOException("Couldn't set write permissions for " + f.getAbsolutePath());
}
try (FileWriter fr = new FileWriter(f, true)) {
fr.write("\n-- YB_DATA_END\nROLLBACK;DISCARD TEMP;");
}
}
} catch (IOException ex) {
LOG.error("Failed to write YB_DATA_END footer to sql file");
throw new RuntimeException(ex);
}
} catch (IOException ex) {
LOG.error("Failed to write YB_DATA_END footer to sql file");
throw new RuntimeException(ex);
}

args.add("--inputdir=" + inputDir);
Expand All @@ -113,27 +125,32 @@ public PgRegressBuilder setSchedule(String schedule) {
}

File scheduleInputFile = new File(inputDir, schedule);
File scheduleOutputFile = new File(outputDir, schedule);

// Copy the schedule file, replacing some lines based on the operating system.
try (BufferedReader scheduleReader = new BufferedReader(new FileReader(scheduleInputFile));
PrintWriter scheduleWriter = new PrintWriter(new FileWriter(scheduleOutputFile))) {
String line;
while ((line = scheduleReader.readLine()) != null) {
line = line.trim();
if (line.equals("test: yb_pg_inet") && !TestUtils.IS_LINUX) {
// We only support IPv6-specific tests in yb_pg_inet.sql on Linux, not on macOS.
line = "test: yb_pg_inet_ipv4only";

if (isRegressExecutable()) {
File scheduleOutputFile = new File(outputDir, schedule);

// Copy the schedule file, replacing some lines based on the operating system.
try (BufferedReader scheduleReader = new BufferedReader(new FileReader(scheduleInputFile));
PrintWriter scheduleWriter = new PrintWriter(new FileWriter(scheduleOutputFile))) {
String line;
while ((line = scheduleReader.readLine()) != null) {
line = line.trim();
if (line.equals("test: yb_pg_inet") && !TestUtils.IS_LINUX) {
// We only support IPv6-specific tests in yb_pg_inet.sql on Linux, not on macOS.
line = "test: yb_pg_inet_ipv4only";
}
LOG.info("Schedule output line: " + line);
scheduleWriter.println(line);
}
LOG.info("Schedule output line: " + line);
scheduleWriter.println(line);
} catch (IOException ex) {
LOG.error("Failed to write schedule to " + outputDir);
throw new RuntimeException(ex);
}
} catch (IOException ex) {
LOG.error("Failed to write schedule to " + outputDir);
throw new RuntimeException(ex);
args.add("--schedule=" + new File(outputDir, schedule));
} else {
args.add("--schedule=" + scheduleInputFile);
}

args.add("--schedule=" + new File(outputDir, schedule));
return this;
}

Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,50 @@
// Copyright (c) YugaByte, Inc.
//
// Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except
// in compliance with the License. You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software distributed under the License
// is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express
// or implied. See the License for the specific language governing permissions and limitations
// under the License.
//

package org.yb.pgsql;

import java.util.Collections;

import org.junit.Test;
import org.junit.runner.RunWith;
import org.yb.util.YBTestRunnerNonTsanOnly;

@RunWith(value=YBTestRunnerNonTsanOnly.class)
public class TestPgIsolationRegress extends BasePgSQLTest {

private void runIsolationRegressTest() throws Exception {
runPgRegressTest(
PgRegressBuilder.PG_ISOLATION_REGRESS_DIR /* inputDir */, "yb_pg_isolation_schedule",
0 /* maxRuntimeMillis */, PgRegressBuilder.PG_ISOLATION_REGRESS_EXECUTABLE);
}

@Test
public void isolationRegress() throws Exception {
runIsolationRegressTest();
}

@Test
public void withDelayedTxnApply() throws Exception {
// The reason for running all tests in the schedule again with
// TEST_inject_sleep_before_applying_intents_ms is the following: our tests usually have very
// small transactions such that the step of applying intents to regular db after commit is
// instantaneous enough that the code path which checks and resolves conflicts with committed
// transactions (that are not yet applied), is not exercised. To ensure we exercise this rare
// code path of performing conflict resolution with committed but not yet applied transactions,
// we inject a sleep before applying intents.
restartClusterWithFlags(Collections.emptyMap(),
Collections.singletonMap("TEST_inject_sleep_before_applying_intents_ms",
"100"));
runIsolationRegressTest();
}
}
4 changes: 2 additions & 2 deletions java/yb-pgsql/src/test/java/org/yb/pgsql/TestYsqlDump.java
Original file line number Diff line number Diff line change
Expand Up @@ -114,7 +114,7 @@ public void testPgDumpVerifyOutput() throws Exception {
restartCluster(); // create a new cluster for this test
markClusterNeedsRecreation(); // create a new cluster for the next test

File pgRegressDir = PgRegressBuilder.getPgRegressDir();
File pgRegressDir = PgRegressBuilder.PG_REGRESS_DIR;

final int tserverIndex = 0;
File pgBinDir = PgRegressBuilder.getPgBinDir();
Expand Down Expand Up @@ -173,7 +173,7 @@ void testPgDumpHelper(final String binaryName,
final String expectedFileRelativePath,
final boolean includeYbMetadata) throws Exception {
// Location of Postgres regression tests
File pgRegressDir = PgRegressBuilder.getPgRegressDir();
File pgRegressDir = PgRegressBuilder.PG_REGRESS_DIR;

// Create the data
try (BufferedReader inputIn = createFileReader(new File(pgRegressDir,
Expand Down
1 change: 1 addition & 0 deletions python/yb/yb_dist_tests.py
Original file line number Diff line number Diff line change
Expand Up @@ -321,6 +321,7 @@ def set_global_conf_from_dict(global_conf_dict: Dict[str, str]) -> GlobalTestCon
'thirdparty_url.txt',
'postgres_build/contrib',
'postgres_build/src/test/regress',
'postgres_build/src/test/isolation',
]

ARCHIVED_PATHS_IN_SRC_DIR = [
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
Parsed test spec with 2 sessions

starting permutation: s1a s2a s1b s1c s2b s2c
step s1a: SELECT * FROM queue; -- this is just to ensure we have picked the read point
id data status

1 foo NEW
2 bar NEW
step s2a: SELECT * FROM queue; -- this is just to ensure we have picked the read point
id data status

1 foo NEW
2 bar NEW
step s1b: SELECT * FROM queue ORDER BY id FOR UPDATE LIMIT 1;
id data status

1 foo NEW
step s1c: COMMIT;
step s2b: UPDATE queue set status='OLD' WHERE id=1;
step s2c: COMMIT;
Original file line number Diff line number Diff line change
@@ -0,0 +1,62 @@
Parsed test spec with 2 sessions

starting permutation: s1a s2a s1b s2b s1c s2c
step s1a: SELECT * FROM queue; -- this is just to ensure we have picked the read point
id data status

1 foo NEW
2 bar NEW
step s2a: SELECT * FROM queue; -- this is just to ensure we have picked the read point
id data status

1 foo NEW
2 bar NEW
step s1b: UPDATE queue set status='OLD' WHERE id=1;
step s2b: SELECT * FROM queue ORDER BY id FOR UPDATE SKIP LOCKED LIMIT 1;
id data status

2 bar NEW
step s1c: COMMIT;
step s2c: COMMIT;

starting permutation: s1a s2a s1b s1c s2b s2c
step s1a: SELECT * FROM queue; -- this is just to ensure we have picked the read point
id data status

1 foo NEW
2 bar NEW
step s2a: SELECT * FROM queue; -- this is just to ensure we have picked the read point
id data status

1 foo NEW
2 bar NEW
step s1b: UPDATE queue set status='OLD' WHERE id=1;
step s1c: COMMIT;
step s2b: SELECT * FROM queue ORDER BY id FOR UPDATE SKIP LOCKED LIMIT 1;
id data status

2 bar NEW
step s2c: COMMIT;

starting permutation: s1a s2a s1b s1c s2_sleep s2b s2c
step s1a: SELECT * FROM queue; -- this is just to ensure we have picked the read point
id data status

1 foo NEW
2 bar NEW
step s2a: SELECT * FROM queue; -- this is just to ensure we have picked the read point
id data status

1 foo NEW
2 bar NEW
step s1b: UPDATE queue set status='OLD' WHERE id=1;
step s1c: COMMIT;
step s2_sleep: SELECT pg_sleep(5);
pg_sleep


step s2b: SELECT * FROM queue ORDER BY id FOR UPDATE SKIP LOCKED LIMIT 1;
id data status

2 bar NEW
step s2c: COMMIT;
Loading

0 comments on commit 85531fb

Please sign in to comment.