From 86d97a40e0d51caa8856701bfe3a47ae7f6536a1 Mon Sep 17 00:00:00 2001 From: Harshil Goel <54325286+harshil-goel@users.noreply.github.com> Date: Sat, 13 May 2023 00:45:32 +0530 Subject: [PATCH] chore(core): Update comment about how drop op behaves with rollups (#8808) ## Problem Dgraph.drop.op doesn't conflict out other transactions, unlike what we have written in code. ## Solution We have to update the explanation about how dgraph.drop.op would behave under rollups. --- posting/list.go | 30 ++++++++++++++++++++++++++++-- 1 file changed, 28 insertions(+), 2 deletions(-) diff --git a/posting/list.go b/posting/list.go index fdf9e8b0e8a..a3afa552cd1 100644 --- a/posting/list.go +++ b/posting/list.go @@ -831,8 +831,34 @@ func (l *List) Length(readTs, afterUid uint64) int { // if someone is reading data between two timestamps. // // - Drop operation can cause issues if they are rolled up. Since we are storing results at ts + 1, -// in dgraph.drop.op. If some operations were done then, they would be overwriten. -// This won't happen as the transactions should conflict out. +// in dgraph.drop.op. When we do drop op, we delete the relevant data first using a mutation. +// Then we write a record into the dgraph.drop.op. We use this record to figure out if a +// drop was performed. This helps us during backup, when we want to know if we need to +// read a given backup or not. A backup, which has a drop record, would render older backups +// unnecessary. +// +// If we rollup the dgraph.drop.op, and store result on ts + 1, it effectively copies the +// original record into a new location. We want to see if there can be any issues in +// backup/restore due to this. To ensure that there is no issue in writing on ts + 1, +// we do the following analysis. +// +// Analysis is done for drop op, but it would be the same for drop predicate and namespace. +// Assume that there were two backups, at b1 and b2. We move rollup ts around to see if it +// can cause any issues. There can be 3 cases: +// +// 1. b1 < ts < b2. In this case, we would have a drop record in b2. This is the same behaviour +// as we would have writen on ts. +// +// 2. b1 = ts < b2. In this case, we would have a drop record in b1, and in b2. Originally, only +// b1 would have a drop record. With this new approach, b2 would also have a drop record. This +// is okay because last entry in b1 is drop, so it wouldn't have any data to be applied. +// +// 3. b1 < ts < ts + 1 = b2. In this case, we would have both drop drop records in b2. No issues +// in this case. +// +// This proves that writing rollups at ts + 1 would not cause any issues with dgraph.drop.op. +// The only issue would come if a rollup happens at ts + k. If a backup happens in between +// ts and ts + k, it could lead to some data being dropped during restore. func (l *List) Rollup(alloc *z.Allocator, readTs uint64) ([]*bpb.KV, error) { l.RLock() defer l.RUnlock()