Skip to content

Commit

Permalink
[#1127] YSQL: Use attribute collation for SPLIT AT clause
Browse files Browse the repository at this point in the history
Summary:
This bug was found during collation test plan review. Consider this example

```
   CREATE TABLE t(id TEXT COLLATE "en_US.utf8",
                  PRIMARY KEY(id ASC)) SPLIT AT VALUES (('abc'));
   INSERT INTO t VALUES ('ab'), ('abcd');
```

Table `t` has two tablets with 'abc' as the split point. One would expect 'ab' in one tablet and
'abcd' in another tablet.  However, currently we use `value->constcollid` when processing the split
point 'abc' which has database default collation. At present YSQL restricts database can only have
"C" collation.  Therefore 'abc' will not be collation-encoded. On the other hand, when 'ab' and
'abcd' are inserted into table `t`, they will be encoded using column collation "en_US.utf8". As a
result both collation encoded values are compared larger than non-collation-encoded 'abc' and
therefore are inserted into the same tablet.

I made a change to use the column collation `attr->attcollation` when processing the split point
'abc'. In this way when comparing 'ab' or 'abcd' against 'abc', all of them will be collation-encded
with "en_US.utf8". We assume collation-encoding will likely to retain the similar key distribution
as the original text values.

New test added.

Test Plan:
./yb_build.sh release --cxx-test pgwrapper_pg_libpq-test --gtest_filter PgLibPqTest.CollationRangePresplit
./yb_build.sh release --java-test 'org.yb.pgsql.TestPgRegressTypesString'
./yb_build.sh release --java-test 'org.yb.pgsql.TestPgRegressExtension'
./yb_build.sh release --java-test 'org.yb.pgsql.TestPgRegressPartitions'
./yb_build.sh release --java-test 'org.yb.pgsql.TestPgRegressDml'
./yb_build.sh release --java-test 'org.yb.pgsql.TestPgRegressPlpgsql'
./yb_build.sh release --java-test 'org.yb.pgsql.TestPgRegressFeature'
./yb_build.sh release --java-test 'org.yb.pgsql.TestYsqlUpgrade'

Reviewers: mihnea, dmitry

Reviewed By: dmitry

Subscribers: yql

Differential Revision: https://phabricator.dev.yugabyte.com/D13933
  • Loading branch information
myang2021 committed Nov 16, 2021
1 parent acb2b95 commit 6ae2f23
Show file tree
Hide file tree
Showing 2 changed files with 39 additions and 1 deletion.
9 changes: 8 additions & 1 deletion src/postgres/src/backend/commands/ybccmds.c
Original file line number Diff line number Diff line change
Expand Up @@ -323,7 +323,14 @@ YBTransformPartitionSplitPoints(YBCPgStatement yb_stmt,
{
/* Given value is not null. Convert it to YugaByte format. */
Const *value = castNode(Const, datums[idx]->value);
exprs[idx] = YBCNewConstant(yb_stmt, value->consttype, value->constcollid,
/*
* Use attr->attcollation because the split value will be compared against
* collation-encoded strings that are encoded using the column collation.
* We assume collation-encoding will likely to retain the similar key
* distribution as the original text values.
*/
Form_pg_attribute attr = attrs[idx];
exprs[idx] = YBCNewConstant(yb_stmt, value->consttype, attr->attcollation,
value->constvalue, false /* is_null */);
break;
}
Expand Down
31 changes: 31 additions & 0 deletions src/yb/yql/pgwrapper/pg_libpq-test.cc
Original file line number Diff line number Diff line change
Expand Up @@ -2318,5 +2318,36 @@ TEST_F(PgLibPqTest, YB_DISABLE_TEST_IN_TSAN(PagingReadRestart)) {
ASSERT_FALSE(runner.HasError());
}

TEST_F(PgLibPqTest, YB_DISABLE_TEST_IN_TSAN(CollationRangePresplit)) {
const string kDatabaseName ="yugabyte";
auto client = ASSERT_RESULT(cluster_->CreateClient());

auto conn = ASSERT_RESULT(ConnectToDB(kDatabaseName));
ASSERT_OK(conn.Execute("CREATE TABLE collrange(a text COLLATE \"en-US-x-icu\", "
"PRIMARY KEY(a ASC)) SPLIT AT VALUES (('100'), ('200'))"));

google::protobuf::RepeatedPtrField<master::TabletLocationsPB> tablets;
auto table_id = ASSERT_RESULT(GetTableIdByTableName(client.get(), kDatabaseName, "collrange"));

// Validate that number of tablets created is 3.
ASSERT_OK(client->GetTabletsFromTableId(table_id, 0, &tablets));
ASSERT_EQ(tablets.size(), 3);
// Partition key length of plain encoded '100' or '200'.
const size_t partition_key_length = 7;
// When a text value is collation encoded, we need at least 3 extra bytes.
const size_t min_collation_extra_bytes = 3;
for (const auto& tablet : tablets) {
ASSERT_TRUE(tablet.has_partition());
auto partition_start = tablet.partition().partition_key_start();
auto partition_end = tablet.partition().partition_key_end();
LOG(INFO) << "partition_start: " << b2a_hex(partition_start)
<< ", partition_end: " << b2a_hex(partition_end);
ASSERT_TRUE(partition_start.empty() ||
partition_start.size() >= partition_key_length + min_collation_extra_bytes);
ASSERT_TRUE(partition_end.empty() ||
partition_end.size() >= partition_key_length + min_collation_extra_bytes);
}
}

} // namespace pgwrapper
} // namespace yb

0 comments on commit 6ae2f23

Please sign in to comment.