Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

executor: add batch copy to inner join, left and right outer join. #7493

Merged
merged 23 commits into from
Sep 5, 2018
Merged
Show file tree
Hide file tree
Changes from 11 commits
Commits
Show all changes
23 commits
Select commit Hold shift + click to select a range
0cc4ab9
batch copy init
crazycs520 Aug 25, 2018
2592034
add comment
crazycs520 Aug 28, 2018
265dd58
add batch copy test and benchmark
crazycs520 Aug 28, 2018
39cde9c
refine code
crazycs520 Aug 28, 2018
7b19c1c
fix bugs
crazycs520 Aug 28, 2018
e312fa4
Merge branch 'master' of https://github.com/pingcap/tidb into only-ba…
crazycs520 Aug 28, 2018
4022038
Merge branch 'master' of https://github.com/pingcap/tidb into only-ba…
crazycs520 Aug 29, 2018
9b6ce0f
Merge branch 'master' of https://github.com/pingcap/tidb into only-ba…
crazycs520 Sep 2, 2018
315c1fa
refactor code and comment
crazycs520 Sep 3, 2018
37902a2
refine code
crazycs520 Sep 3, 2018
c4556b4
address comment
crazycs520 Sep 3, 2018
cefae06
refactor code and comment
crazycs520 Sep 3, 2018
9fe55b3
refactor code and comment
crazycs520 Sep 4, 2018
91366cd
address comment
crazycs520 Sep 4, 2018
d3f8e85
address comment
crazycs520 Sep 4, 2018
026be7f
Merge branch 'master' of https://github.com/pingcap/tidb into only-ba…
crazycs520 Sep 4, 2018
50a698d
address comment
crazycs520 Sep 4, 2018
28018f9
address comment
crazycs520 Sep 4, 2018
2d04676
Merge branch 'master' into only-batch-copy
zz-jason Sep 4, 2018
e47fb7c
Merge branch 'master' into only-batch-copy
zz-jason Sep 5, 2018
3208738
refine comments
crazycs520 Sep 5, 2018
cd6eda1
Merge branch 'master' of https://github.com/pingcap/tidb into only-ba…
crazycs520 Sep 5, 2018
cedcd0e
Merge branch 'only-batch-copy' of https://github.com/crazycs520/tidb …
crazycs520 Sep 5, 2018
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
38 changes: 23 additions & 15 deletions executor/joiner.go
Original file line number Diff line number Diff line change
Expand Up @@ -158,19 +158,18 @@ func (j *baseJoiner) makeShallowJoinRow(isRightJoin bool, inner, outer chunk.Row
j.shallowRow.ShallowCopyPartialRow(inner.Len(), outer)
}

func (j *baseJoiner) filter(input, output *chunk.Chunk) (matched bool, err error) {
func (j *baseJoiner) filter(input, output *chunk.Chunk, outerColsLen int) (bool, error) {
var err error
j.selected, err = expression.VectorizedFilter(j.ctx, j.conditions, chunk.NewIterator4Chunk(input), j.selected)
if err != nil {
return false, errors.Trace(err)
}
for i := 0; i < len(j.selected); i++ {
if !j.selected[i] {
continue
}
matched = true
output.AppendRow(input.GetRow(i))
// batch copy selected row to output chunk
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Batch copies selected rows to output chunk.
  2. Add a . at the end of this comment.

innerColOffset, outerColOffset := 0, input.NumCols()-outerColsLen
if !j.outerIsRight {
innerColOffset, outerColOffset = outerColsLen, 0
}
return matched, nil
return chunk.CopySelectedJoinRows(input, innerColOffset, outerColOffset, j.selected, output), nil
}

type semiJoiner struct {
Expand Down Expand Up @@ -350,8 +349,11 @@ func (j *leftOuterJoiner) tryToMatch(outer chunk.Row, inners chunk.Iterator, chk
}

// reach here, chkForJoin is j.chk
matched, err := j.filter(chkForJoin, chk)
return matched, errors.Trace(err)
matched, err := j.filter(chkForJoin, chk, outer.Len())
if err != nil {
return false, errors.Trace(err)
}
return matched, nil
}

func (j *leftOuterJoiner) onMissMatch(outer chunk.Row, chk *chunk.Chunk) {
Expand Down Expand Up @@ -384,9 +386,11 @@ func (j *rightOuterJoiner) tryToMatch(outer chunk.Row, inners chunk.Iterator, ch
return true, nil
}

// reach here, chkForJoin is j.chk
matched, err := j.filter(chkForJoin, chk)
return matched, errors.Trace(err)
matched, err := j.filter(chkForJoin, chk, outer.Len())
if err != nil {
return false, errors.Trace(err)
}
return matched, nil
}

func (j *rightOuterJoiner) onMissMatch(outer chunk.Row, chk *chunk.Chunk) {
Expand Down Expand Up @@ -421,8 +425,12 @@ func (j *innerJoiner) tryToMatch(outer chunk.Row, inners chunk.Iterator, chk *ch
}

// reach here, chkForJoin is j.chk
matched, err := j.filter(chkForJoin, chk)
return matched, errors.Trace(err)
matched, err := j.filter(chkForJoin, chk, outer.Len())
if err != nil {
return false, errors.Trace(err)
}
return matched, nil

}

func (j *innerJoiner) onMissMatch(outer chunk.Row, chk *chunk.Chunk) {
Expand Down
99 changes: 99 additions & 0 deletions util/chunk/chunk_util.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,99 @@
// Copyright 2018 PingCAP, Inc.
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// See the License for the specific language governing permissions and
// limitations under the License.

package chunk

// CopySelectedJoinRows uses for join to batch copy inner rows and outer row to chunk.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how about:

// CopySelectedJoinRows copies the selected joined rows from the source Chunk
// to the destination Chunk.
//
// NOTE: All the outer rows in the source Chunk should be the same.

As the file and function name already have the join key word, I think it's not necessary reiterate that this function is only used for the join operator.

// This function optimize for join. To be exact, `copyOuterRows` optimizes copy outer row to `dst` chunk.
// Because the outer row in join is always same. so we can use batch copy for outer row data.
func CopySelectedJoinRows(src *Chunk, innerColOffset, outerColOffset int, selected []bool, dst *Chunk) bool {
if src.NumRows() == 0 {
return false
}

numSelected := copySelectedInnerRows(innerColOffset, outerColOffset, src, selected, dst)
copyOuterRows(innerColOffset, outerColOffset, src, numSelected, dst)
dst.numVirtualRows += numSelected
return numSelected > 0
}

// copySelectedInnerRows appends different inner rows to the chunk.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how about:

// copySelectedInnerRows copies the selected inner rows from the source Chunk
// to the destination Chunk.

func copySelectedInnerRows(innerColOffset, outerColOffset int, src *Chunk, selected []bool, dst *Chunk) int {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add a comment for the return value.

oldLen := dst.columns[innerColOffset].length
var srcCols []*column
if innerColOffset == 0 {
srcCols = src.columns[:outerColOffset]
} else {
srcCols = src.columns[innerColOffset:]
}
for j, srcCol := range srcCols {
dstCol := dst.columns[innerColOffset+j]
if srcCol.isFixed() {
for i := 0; i < len(selected); i++ {
if !selected[i] {
continue
}
dstCol.appendNullBitmap(!srcCol.isNull(i))
dstCol.length++

elemLen := len(srcCol.elemBuf)
offset := i * elemLen
dstCol.data = append(dstCol.data, srcCol.data[offset:offset+elemLen]...)
}
} else {
for i := 0; i < len(selected); i++ {
if !selected[i] {
continue
}
dstCol.appendNullBitmap(!srcCol.isNull(i))
dstCol.length++

start, end := srcCol.offsets[i], srcCol.offsets[i+1]
dstCol.data = append(dstCol.data, srcCol.data[start:end]...)
dstCol.offsets = append(dstCol.offsets, int32(len(dstCol.data)))
}
}
}
return dst.columns[innerColOffset].length - oldLen
}

// copyOuterRows appends same outer row to the chunk with `numRows` times.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how about:

// copyOuterRows copies the continuous 'numRows' outer rows in the source Chunk
// to the destination Chunk.

func copyOuterRows(innerColOffset, outerColOffset int, src *Chunk, numRows int, dst *Chunk) {
row := src.GetRow(0)
var srcCols []*column
if innerColOffset == 0 {
srcCols = src.columns[outerColOffset:]
} else {
srcCols = src.columns[:innerColOffset]
}
for i, srcCol := range srcCols {
dstCol := dst.columns[outerColOffset+i]
dstCol.appendMultiSameNullBitmap(!srcCol.isNull(row.idx), numRows)
dstCol.length += numRows
if srcCol.isFixed() {
elemLen := len(srcCol.elemBuf)
start := row.idx * elemLen
end := start + numRows*elemLen
dstCol.data = append(dstCol.data, srcCol.data[start:end]...)
} else {
start, end := srcCol.offsets[row.idx], srcCol.offsets[row.idx+numRows]
dstCol.data = append(dstCol.data, srcCol.data[start:end]...)
offsets := dstCol.offsets
l := srcCol.offsets[row.idx+1] - srcCol.offsets[row.idx]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how about `s/l/elemLen/?

for j := 0; j < numRows; j++ {
offsets = append(offsets, int32(offsets[len(offsets)-1]+l))
}
dstCol.offsets = offsets
}
}
}
83 changes: 83 additions & 0 deletions util/chunk/chunk_util_test.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,83 @@
// Copyright 2018 PingCAP, Inc.
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// See the License for the specific language governing permissions and
// limitations under the License.

package chunk

import (
"reflect"
"testing"

"github.com/pingcap/tidb/types"
)

func getChk() (*Chunk, *Chunk, []bool) {
numRows := 1024
srcChk := newChunkWithInitCap(numRows, 0, 0, 8, 8, 16, 0)
selected := make([]bool, numRows)
var row Row
for j := 0; j < numRows; j++ {
if j%7 == 0 {
row = MutRowFromValues("abc", "abcdefg", nil, 123, types.ZeroDatetime, "abcdefg").ToRow()
} else {
row = MutRowFromValues("abc", "abcdefg", j, 123, types.ZeroDatetime, "abcdefg").ToRow()
selected[j] = true
}
srcChk.AppendPartialRow(0, row)
}
dstChk := newChunkWithInitCap(numRows, 0, 0, 8, 8, 16, 0)
return srcChk, dstChk, selected
}

func TestBatchCopyJoinRowToChunk(t *testing.T) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/TestBatchCopyJoinRowToChunk/TestCopySelectedJoinRows/

srcChk, dstChk, selected := getChk()
numRows := srcChk.NumRows()
for i := 0; i < numRows; i++ {
if !selected[i] {
continue
}
dstChk.AppendRow(srcChk.GetRow(i))
}
// batch copy
dstChk2 := newChunkWithInitCap(numRows, 0, 0, 8, 8, 16, 0)
CopySelectedJoinRows(srcChk, 0, 3, selected, dstChk2)

if !reflect.DeepEqual(dstChk, dstChk2) {
t.Fatal()
}
}

func BenchmarkChunkBatchCopyJoinRow(b *testing.B) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the benchmark name needs to be updated.

b.ReportAllocs()
srcChk, dstChk, selected := getChk()
b.ResetTimer()
for i := 0; i < b.N; i++ {
dstChk.Reset()
CopySelectedJoinRows(srcChk, 0, 3, selected, dstChk)
}
}

func BenchmarkChunkAppendRow(b *testing.B) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how about BenchmarkAppendSelectedRow?

b.ReportAllocs()
srcChk, dstChk, selected := getChk()
numRows := srcChk.NumRows()
b.ResetTimer()
for i := 0; i < b.N; i++ {
dstChk.Reset()
for j := 0; j < numRows; j++ {
if !selected[j] {
continue
}
dstChk.AppendRow(srcChk.GetRow(j))
}
}
}
28 changes: 28 additions & 0 deletions util/chunk/column.go
Original file line number Diff line number Diff line change
Expand Up @@ -85,6 +85,34 @@ func (c *column) appendNullBitmap(on bool) {
}
}

func (c *column) appendMultiSameNullBitmap(on bool, num int) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/ on/ notNull ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see appendNullBitmap also use on, so, both use on or notNull ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer to both use on.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add a comment for this function and its parameters.

l := ((c.length + num - 1) >> 3) - len(c.nullBitmap)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it should be:

l := ((c.length + num + 7) >> 3) - len(c.nullBitmap)

how about:

s/l/numNewBytes/

for i := 0; i <= l; i++ {
c.nullBitmap = append(c.nullBitmap, 0)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it's much clear and easier to understand if we change the copy strategy to:

  1. set all the higher x bits of c.nullBitmap[len(c.nullBitmap)-1] to 0 or 1 according to the value of on.
  2. memset the new bytes to 0xFF or 0x00 according to the value of on.

}
if on {
idx := c.length >> 3
pos := uint(c.length) & 7
for num > 0 {
if pos == 0 && num > 8 {
c.nullBitmap[idx] = 0xff
idx++
num = num - 8
} else {
c.nullBitmap[idx] |= byte(1 << pos)
pos++
num--
if pos == 8 {
pos = 0
idx++
}
}
}
} else {
c.nullCount += num
Copy link
Member

@zz-jason zz-jason Sep 3, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we also need to set all the existing bits to zero even if on is set to false, because this Chunk maybe is truncated, and the null bitmap is not reset in that scenario.

}
}

func (c *column) appendNull() {
c.appendNullBitmap(false)
if c.isFixed() {
Expand Down