Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

executor: add batch copy to inner join, left and right outer join. #7493

Merged
merged 23 commits into from
Sep 5, 2018

Conversation

crazycs520
Copy link
Contributor

@crazycs520 crazycs520 commented Aug 27, 2018

What problem does this PR solve?

Use vectorize filter and batch copy.

here is benchmark of batch copy VS copy one by one:

ᐅ go test -bench="BenchmarkChunk.*Row"  -run=xx -benchtime="3s" -count=3
goos: linux
goarch: amd64
pkg: github.com/pingcap/tidb/util/chunk
BenchmarkCopySelectedJoinRows     100000             24923 ns/op               0 B/op          0 allocs/op
BenchmarkCopySelectedJoinRows     100000             24639 ns/op               0 B/op          0 allocs/op
BenchmarkAppendSelectedRow         50000             57097 ns/op               0 B/op          0 allocs/op
BenchmarkAppendSelectedRow         50000             57146 ns/op               0 B/op          0 allocs/op
#1
sql1: select count(*) from tid1 inner join tid2 where tid1.id!=tid2.id;
sql2: select count(*) from tid1 inner join tid2 where tid1.id>tid2.id;

create table t1 (id int);
create table t2 (id int);
# t1 data as below:
+-------+
| id    |
+-------+
| 0     |
| 1     |
| 2     |
| ...   |
| ...   |
| ...   |
| 10000 |
+-------+

# t2 data as below:
+-------+
| id    |
+-------+
| 10000 |
| 10001 |
| 10002 |
| ...   |
| ...   |
| ...   |
| 20000 |
+-------+

#2
sql3: select count(*) from t1,t2 where t1.id != t2.id and t1.name != t2.name and t1.addr != t2.addr and t1.course != t2.course;
sql4: select count(*) from t1,t2 where t1.id != t2.id and t1.name != t2.name and t1.addr != t2.addr and t1.course != t2.course and t1.addr > t2.course;

create table t1 (id int, name varchar(30),addr varchar(30), course varchar(30))
create table t2 (id int, name varchar(30),addr varchar(30), course varchar(30))

# both t1, t2 data as below:
+------+----------------+-------------------+------------------+
| id   | name           | addr              | course           |
+------+----------------+-------------------+------------------+
| 0    | name_abcd_0    | address_abcd_0    | course_abcd_0    |
| 1    | name_abcd_1    | address_abcd_1    | course_abcd_1    |
| 2    | name_abcd_2    | address_abcd_2    | course_abcd_2    |
| 3    | name_abcd_3    | address_abcd_3    | course_abcd_3    |
...
10000 rows

//on my Mac, only TiDB , chunk_size=1024

Sql1 Sql2 Sql3 Sql4
master 5.4 3.6 13.6 11.2
master + batch copy 4.5 3.6 12.1 11
Shadow 6 3.2 14.1 11
Shadow + batch 4.6 3.7 12.1 11

//on server, TiDB + PD + TiKV , chunk_size=1024

Sql1 Sql2 Sql3 Sql4
master 7.5 6.1 18.5 15.6
master + batch copy 6 4.7 15.6 14
Shadow 11 5.8 22.7 11.5
Shadow + batch 6.2 6.78 14 16

@crazycs520
Copy link
Contributor Author

/run-all-tests

@crazycs520 crazycs520 added sig/execution SIG execution type/enhancement The issue or PR belongs to an enhancement. labels Aug 27, 2018
@crazycs520
Copy link
Contributor Author

/run-all-tests

@crazycs520
Copy link
Contributor Author

/run-common-test

@crazycs520
Copy link
Contributor Author

/run-common-test

@crazycs520
Copy link
Contributor Author

/run-all-tests

@crazycs520
Copy link
Contributor Author

@zz-jason @XuHuaiyu PTAL

@@ -158,19 +158,12 @@ func (j *baseJoiner) makeShallowJoinRow(isRightJoin bool, inner, outer chunk.Row
j.shallowRow.ShallowCopyPartialRow(inner.Len(), outer)
}

func (j *baseJoiner) filter(input, output *chunk.Chunk) (matched bool, err error) {
func (j *baseJoiner) filter(input, output *chunk.Chunk) (err error) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we make j.selected as a return value for this function? The function caller can directly use the returned []bool, no need to dig into this function to know that the result is stored in j.selected.

@@ -232,6 +232,77 @@ func (c *Chunk) AppendPartialRow(colIdx int, row Row) {
}
}

// BatchCopyJoinRowToChunk uses for join to batch copy inner rows and outer row to chunk.
func (c *Chunk) BatchCopyJoinRowToChunk(isRight bool, chkForJoin *Chunk, outer Row, selected []bool) bool {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this function be simplified to

func CopySelectedRows(src *Chunk, selected []bool, dst *Chunk)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we cann't ...
There is a special optimizer for copy outer row.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, chkForJoin shares the same schema with c

@crazycs520 crazycs520 force-pushed the only-batch-copy branch 3 times, most recently from 029afe3 to 0cfa549 Compare September 3, 2018 09:22
}

// appendInnerRows appends different inner rows to the chunk.
func appendInnerRows(innerColOffset, outerColOffset int, src *Chunk, selected []bool, dst *Chunk) int {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how about:

s/appendInnerRows/copySelectedInnerRows/
s/columns/srcCols/
s/rowCol/srcCol/
s/chkCol/dstCol/

return false
}

selectedRowNum := appendInnerRows(innerColOffset, outerColOffset, src, selected, dst)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how about s/selectedRowNum/numSelected/?

chkCol.appendNullBitmap(!rowCol.isNull(i))
chkCol.length++

if rowCol.isFixed() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's better to move this branch predicate outside the for loop.

start, end := srcCol.offsets[row.idx], srcCol.offsets[row.idx+numRows]
dstCol.data = append(dstCol.data, srcCol.data[start:end]...)
offsets := dstCol.offsets
l := srcCol.offsets[row.idx+1] - srcCol.offsets[row.idx]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how about `s/l/elemLen/?

return dst.columns[innerColOffset].length - oldLen
}

// copyOuterRows appends same outer row to the chunk with `numRows` times.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how about:

// copyOuterRows copies the continuous 'numRows' outer rows in the source Chunk
// to the destination Chunk.

return numSelected > 0
}

// copySelectedInnerRows appends different inner rows to the chunk.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how about:

// copySelectedInnerRows copies the selected inner rows from the source Chunk
// to the destination Chunk.


package chunk

// CopySelectedJoinRows uses for join to batch copy inner rows and outer row to chunk.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how about:

// CopySelectedJoinRows copies the selected joined rows from the source Chunk
// to the destination Chunk.
//
// NOTE: All the outer rows in the source Chunk should be the same.

As the file and function name already have the join key word, I think it's not necessary reiterate that this function is only used for the join operator.

return srcChk, dstChk, selected
}

func TestBatchCopyJoinRowToChunk(t *testing.T) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/TestBatchCopyJoinRowToChunk/TestCopySelectedJoinRows/

}
}

func BenchmarkChunkBatchCopyJoinRow(b *testing.B) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the benchmark name needs to be updated.

}
}

func BenchmarkChunkAppendRow(b *testing.B) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how about BenchmarkAppendSelectedRow?

@@ -85,6 +85,34 @@ func (c *column) appendNullBitmap(on bool) {
}
}

func (c *column) appendMultiSameNullBitmap(on bool, num int) {
l := ((c.length + num - 1) >> 3) - len(c.nullBitmap)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it should be:

l := ((c.length + num + 7) >> 3) - len(c.nullBitmap)

how about:

s/l/numNewBytes/

func (c *column) appendMultiSameNullBitmap(on bool, num int) {
l := ((c.length + num - 1) >> 3) - len(c.nullBitmap)
for i := 0; i <= l; i++ {
c.nullBitmap = append(c.nullBitmap, 0)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it's much clear and easier to understand if we change the copy strategy to:

  1. set all the higher x bits of c.nullBitmap[len(c.nullBitmap)-1] to 0 or 1 according to the value of on.
  2. memset the new bytes to 0xFF or 0x00 according to the value of on.

}
}
} else {
c.nullCount += num
Copy link
Member

@zz-jason zz-jason Sep 3, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we also need to set all the existing bits to zero even if on is set to false, because this Chunk maybe is truncated, and the null bitmap is not reset in that scenario.

@zz-jason
Copy link
Member

zz-jason commented Sep 4, 2018

@XuHuaiyu PTAL

@zz-jason zz-jason added the status/LGT1 Indicates that a PR has LGTM 1. label Sep 4, 2018
@zz-jason zz-jason added this to the 2.1 milestone Sep 4, 2018
}
matched = true
output.AppendRow(input.GetRow(i))
// batch copy selected row to output chunk
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Batch copies selected rows to output chunk.
  2. Add a . at the end of this comment.


// copySelectedInnerRows copies the selected inner rows from the source Chunk
// to the destination Chunk.
func copySelectedInnerRows(innerColOffset, outerColOffset int, src *Chunk, selected []bool, dst *Chunk) int {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add a comment for the return value.

start, end := srcCol.offsets[row.idx], srcCol.offsets[row.idx+numRows]
dstCol.data = append(dstCol.data, srcCol.data[start:end]...)
offsets := dstCol.offsets
elemLen := srcCol.offsets[row.idx+1] - srcCol.offsets[row.idx]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will it be possible that row.Idx+1 out of range?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the normal case, it will not out of range.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will it be possible that src.NumRows equals 1, if so, it seems this will be out of range? @crazycs520

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if src.NumRows equals 1, the lengh of offsets should be 2.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

@@ -85,6 +85,29 @@ func (c *column) appendNullBitmap(on bool) {
}
}

func (c *column) appendMultiSameNullBitmap(on bool, num int) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/ on/ notNull ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see appendNullBitmap also use on, so, both use on or notNull ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer to both use on.

@@ -85,6 +85,29 @@ func (c *column) appendNullBitmap(on bool) {
}
}

func (c *column) appendMultiSameNullBitmap(on bool, num int) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add a comment for this function and its parameters.

for i := 0; i < numNewBytes; i++ {
c.nullBitmap = append(c.nullBitmap, b)
}
if on {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if !on{
  c.nulCount += num
  return
}
xxxx

This can help avoiding the else.

@crazycs520 crazycs520 force-pushed the only-batch-copy branch 2 times, most recently from eea2182 to 301a0db Compare September 4, 2018 08:43
@crazycs520
Copy link
Contributor Author

/run-all-tests

pos := uint(c.length) & 7
c.nullBitmap[idx] |= byte(1 << pos)
} else {
c.nullCount++
}
}

func (c *column) appendMultiSameNullBitmap(on bool, num int) {
// appendMultiSameNullBitmap appends multiple same bit value to `nullBitMap`.
// notNull mean not not null.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/ mean/ means
remove redundant not

func (c *column) appendMultiSameNullBitmap(on bool, num int) {
// appendMultiSameNullBitmap appends multiple same bit value to `nullBitMap`.
// notNull mean not not null.
// num mean appends `num` bit value to `nullBitMap`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

num means the number of bits that should be appended.

}
// 1. Set all the higher 8-'numOldBits' bits in the last old byte to 1.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Set all the remained bits in the last slot of old c.numBitMap to 1.
  2. Set all the redundant bits in the last slot of new c.numBitMap to 0.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wait, I ask @CaitinChen

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@crazycs520 I prefer "the remaining bits"

Copy link
Member

@zz-jason zz-jason left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@crazycs520
Copy link
Contributor Author

/run-all-tests

@crazycs520
Copy link
Contributor Author

@XuHuaiyu PTAL

Copy link
Contributor

@XuHuaiyu XuHuaiyu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@crazycs520 crazycs520 merged commit 92e6a5a into pingcap:master Sep 5, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
sig/execution SIG execution status/LGT1 Indicates that a PR has LGTM 1. type/enhancement The issue or PR belongs to an enhancement.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants