Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RLL under high contention TPCC #132

Closed
wangtzh opened this issue Sep 20, 2016 · 3 comments
Closed

RLL under high contention TPCC #132

wangtzh opened this issue Sep 20, 2016 · 3 comments
Assignees
Labels

Comments

@wangtzh
Copy link
Collaborator

wangtzh commented Sep 20, 2016

Infinite lock spinning when running with RLL and the number of whs < number of threads. Hyeontaek Lim (https://www.cs.cmu.edu/~hl/) found this problem. To reproduce:

./tpcc -fork_workers=false -take_snapshot=false -nvm_folder=/dev/shm -volatile_pool_size=10 -snapshot_pool_size=1 -reducer_buffer_size=1 -loggers_per_node=1 -neworder_remote_percent=1 -payment_remote_percent=15 -thread_per_node=10 -numa_nodes=2 -log_buffer_mb=1024 -null_log_device=true -high_priority=false -warehouses=4 -duration_micro=10000000 -hcc_policy=0

I got infinite spinning under relwithdebinfo, and an assertion failure under debug mode, see stack traces in the end.

Looks like the problem is in last_locked_entry_: I got different results returned by calculate_last_locked_entry() and calculate_last_locked_entry_from(pos-1), the former is larger.
retrospective_lock_list.hpp:591 updates last_locked_entry_ by scanning backward only from [pos], but beyond [pos] there might be locked entries. One possibility I guess is in a previous round the transaction followed RLL and made sure all are locked, then in another round it tries to upgrade, at retrospective_lock_list.hpp:591 last_locked_entry_ will be smaller than it should be.

I did the following, it seems to be working:

  1. Replacing calculate_last_locked_entry_from() with calculate_last_locked_entry() at retrospective_lock_list.hpp:591, and reset lock_entry->mcs_block_ to 0 (so it won't hit assertion failure later in non-canonical mode acquire);
  2. Return immediately if the RLL will actually be empty (ie when last_active_entry_ == kLockListPositionInvalid, before the assertion at line 300) in construct() (retrospective_lock_list.cpp:296, after the write-set loop); for read-only transactions, in this example I hit abort in stock-level.

Always using calculate_last_locked_entry() might be inefficient, maybe we should do:
last_locked_entry_ = std::max(last_locked_entry_, calculate_last_locked_entry(pos-1))?

@hkimura it's been a while since I touched the code last time, could you see if these changes make sense? thanks!

============== The stack traces===========
==== Infinite spinning under relwithdebinfo ====
#0 atomic_load_acquire (target=0x2aafd30d0009 "\202") at /home/wantianz/foedus_code/foedus-core/include/foedus/assorted/atomic_fences.hpp:115
#1 read_state (this=) at /home/wantianz/foedus_code/foedus-core/include/foedus/xct/xct_id.hpp:439
#2 is_blocked (this=) at /home/wantianz/foedus_code/foedus-core/include/foedus/xct/xct_id.hpp:448
#3 is_granted (this=) at /home/wantianz/foedus_code/foedus-core/include/foedus/xct/xct_id.hpp:451
#4 operator() (__closure=) at /home/wantianz/foedus_code/foedus-core/src/foedus/xct/xct_mcs_impl.cpp:513
#5 spin_until<foedus::xct::McsImpl<ADAPTOR, foedus::xct::McsRwSimpleBlock>::acquire_unconditional_rw_writer(foedus::xct::McsRwLock*) [with ADAPTOR = foedus::thread::ThreadPimplMcsAdaptorfoedus::xct::McsRwSimpleBlock; foedus::xct::McsBlockIndex = unsigned int]::__lambda7> (spin_until_cond=...) at /home/wantianz/foedus_code/foedus-core/include/foedus/assorted/spin_until_impl.hpp:65
#6 spin_until<foedus::xct::McsImpl<ADAPTOR, foedus::xct::McsRwSimpleBlock>::acquire_unconditional_rw_writer(foedus::xct::McsRwLock*) [with ADAPTOR = foedus::thread::ThreadPimplMcsAdaptorfoedus::xct::McsRwSimpleBlock; foedus::xct::McsBlockIndex = unsigned int]::__lambda7> (spin_until_cond=...) at /home/wantianz/foedus_code/foedus-core/src/foedus/xct/xct_mcs_impl.cpp:42
#7 foedus::xct::McsImplfoedus::thread::ThreadPimplMcsAdaptor<foedus::xct::McsRwSimpleBlock, foedus::xct::McsRwSimpleBlock>::acquire_unconditional_rw_writer (this=,

mcs_rw_lock=<optimized out>) at /home/wantianz/foedus_code/foedus-core/src/foedus/xct/xct_mcs_impl.cpp:513

#8 0x00007ffff7b2494d in try_or_acquire_single_lockfoedus::xct::McsImpl<foedus::thread::ThreadPimplMcsAdaptor<foedus::xct::McsRwSimpleBlock, foedus::xct::McsRwSimpleBlock> > (mcs_rw_impl=0x7fffaa7fae30,

pos=1, this=0x7fffe802d340) at /home/wantianz/foedus_code/foedus-core/include/foedus/xct/retrospective_lock_list.hpp:603

#9 try_or_acquire_multiple_locksfoedus::xct::McsImpl<foedus::thread::ThreadPimplMcsAdaptor<foedus::xct::McsRwSimpleBlock, foedus::xct::McsRwSimpleBlock> > (mcs_rw_impl=0x7fffaa7fae30, upto_pos=122,

this=0x7fffe802d340) at /home/wantianz/foedus_code/foedus-core/include/foedus/xct/retrospective_lock_list.hpp:643

#10 foedus::thread::ThreadPimpl::cll_try_or_acquire_multiple_locks (this=0x7fffe802c3c0, upto_pos=122) at /home/wantianz/foedus_code/foedus-core/src/foedus/thread/thread_pimpl.cpp:913
#11 0x00007ffff7b24979 in foedus::thread::Thread::cll_try_or_acquire_multiple_locks (this=, upto_pos=)

at /home/wantianz/foedus_code/foedus-core/src/foedus/thread/thread_pimpl.cpp:854

#12 0x00007ffff7b2c649 in foedus::xct::Xct::on_record_read_take_locks_if_needed (this=0x7fffe802d298, intended_for_write=, page_address=page_address@entry=0x2aafe86cf000,

lock_id=lock_id@entry=85378080, tid_address=tid_address@entry=0x2aafe86cf420) at /home/wantianz/foedus_code/foedus-core/src/foedus/xct/xct.cpp:380

#13 0x00007ffff7b2c7cf in foedus::xct::Xct::on_record_read (this=0x7fffe802d298, intended_for_write=, tid_address=0x2aafe86cf420, observed_xid=observed_xid@entry=0x7fffaa7fb100,

read_set_address=read_set_address@entry=0x7fffaa7fb108, no_readset_if_moved=no_readset_if_moved@entry=true, no_readset_if_next_layer=no_readset_if_next_layer@entry=true)
at /home/wantianz/foedus_code/foedus-core/src/foedus/xct/xct.cpp:307

#14 0x00007ffff7adba9b in foedus::storage::masstree::RecordLocation::populate_logical (this=this@entry=0x7fffaa7fb0f0, cur_xct=, page=page@entry=0x2aafe86cf000, index=,

intended_for_write=<optimized out>) at /home/wantianz/foedus_code/foedus-core/src/foedus/storage/masstree/masstree_record_location.cpp:38

#15 0x00007ffff7acbe69 in foedus::storage::masstree::MasstreeCursor::fetch_cur_record_logical (this=this@entry=0x7fffaa7fb0a0, page=page@entry=0x2aafe86cf000, record=)

at /home/wantianz/foedus_code/foedus-core/src/foedus/storage/masstree/masstree_cursor.cpp:547

#16 0x00007ffff7accaf4 in foedus::storage::masstree::MasstreeCursor::locate_border (this=this@entry=0x7fffaa7fb0a0, slice=slice@entry=25769803776)

at /home/wantianz/foedus_code/foedus-core/src/foedus/storage/masstree/masstree_cursor.cpp:1118

#17 0x00007ffff7ad1828 in locate_layer (layer=0 '\000', this=0x7fffaa7fb0a0) at /home/wantianz/foedus_code/foedus-core/src/foedus/storage/masstree/masstree_cursor.cpp:1040
#18 foedus::storage::masstree::MasstreeCursor::open (this=this@entry=0x7fffaa7fb0a0, begin_key=begin_key@entry=0x7fffaa7fb078 "", begin_key_length=begin_key_length@entry=8,

end_key=end_key@entry=0x7fffaa7fb080 "", end_key_length=end_key_length@entry=8, forward_cursor=forward_cursor@entry=false, for_writes=for_writes@entry=true, begin_inclusive=begin_inclusive@entry=false, 
end_inclusive=end_inclusive@entry=true) at /home/wantianz/foedus_code/foedus-core/src/foedus/storage/masstree/masstree_cursor.cpp:974

#19 0x0000000000412152 in open_normalized (end_inclusive=true, begin_inclusive=false, for_writes=true, forward_cursor=false, end_key=23622320128, begin_key=25769803776, this=0x7fffaa7fb0a0)

at /home/wantianz/foedus_code/foedus-core/include/foedus/storage/masstree/masstree_cursor.hpp:220

#20 foedus::tpcc::TpccClientTask::pop_neworder (this=this@entry=0x7fffaa7fb3b0, wid=wid@entry=1, did=did@entry=1 '\001', oid=oid@entry=0x7fffaa7fb1c4)

at /home/wantianz/foedus_code/experiments-core/src/foedus/tpcc/tpcc_delivery.cpp:104

#21 0x0000000000412474 in foedus::tpcc::TpccClientTask::do_delivery (this=this@entry=0x7fffaa7fb3b0, wid=wid@entry=1) at /home/wantianz/foedus_code/experiments-core/src/foedus/tpcc/tpcc_delivery.cpp:38

==== Assertion failure under debug mode ====


**** Assertion failed! "correct == last_locked_entry_" did not hold in assert_last_locked_entry(): /home/wantianz/foedus_code/foedus-core/include/foedus/xct/retrospective_lock_list.hpp:408


#4 0x00007ffff7a1ea1f in foedus::xct::CurrentLockList::assert_last_locked_entry (this=0x7ffff0008ed0) at /home/wantianz/foedus_code/foedus-core/include/foedus/xct/retrospective_lock_list.hpp:408
#5 0x00007ffff7a206ee in foedus::xct::CurrentLockList::try_or_acquire_single_lockfoedus::xct::McsImpl<foedus::thread::ThreadPimplMcsAdaptor<foedus::xct::McsRwSimpleBlock, foedus::xct::McsRwSimpleBlock> >

(this=0x7ffff0008ed0, pos=1, mcs_rw_impl=0x7fffe29ce400) at /home/wantianz/foedus_code/foedus-core/include/foedus/xct/retrospective_lock_list.hpp:591

#6 0x00007ffff7a2112b in foedus::xct::CurrentLockList::try_or_acquire_multiple_locksfoedus::xct::McsImpl<foedus::thread::ThreadPimplMcsAdaptor<foedus::xct::McsRwSimpleBlock, foedus::xct::McsRwSimpleBlock> > (this=0x7ffff0008ed0, upto_pos=1, mcs_rw_impl=0x7fffe29ce400) at /home/wantianz/foedus_code/foedus-core/include/foedus/xct/retrospective_lock_list.hpp:643
#7 0x00007ffff7a1dafe in foedus::thread::ThreadPimpl::cll_try_or_acquire_multiple_locks (this=0x7ffff0007f50, upto_pos=1) at /home/wantianz/foedus_code/foedus-core/src/foedus/thread/thread_pimpl.cpp:913
#8 0x00007ffff7a1d7f6 in foedus::thread::Thread::cll_try_or_acquire_multiple_locks (this=0x7ffff0005d00, upto_pos=1) at /home/wantianz/foedus_code/foedus-core/src/foedus/thread/thread_pimpl.cpp:854
#9 0x00007ffff7a31647 in foedus::xct::Xct::on_record_read_take_locks_if_needed (this=0x7ffff0008e28, intended_for_write=true, page_address=0x2aafe46ce000, lock_id=18266496, tid_address=0x2aafe46ce980)

at /home/wantianz/foedus_code/foedus-core/src/foedus/xct/xct.cpp:380

#10 0x00007ffff7a31000 in foedus::xct::Xct::on_record_read (this=0x7ffff0008e28, intended_for_write=true, tid_address=0x2aafe46ce980, observed_xid=0x7fffe29cee70, read_set_address=0x7fffe29cee78,

no_readset_if_moved=true, no_readset_if_next_layer=true) at /home/wantianz/foedus_code/foedus-core/src/foedus/xct/xct.cpp:307

#11 0x00007ffff792ecf4 in foedus::storage::masstree::RecordLocation::populate_logical (this=0x7fffe29cee60, cur_xct=0x7ffff0008e28, page=0x2aafe46ce000, index=51, intended_for_write=true)

at /home/wantianz/foedus_code/foedus-core/src/foedus/storage/masstree/masstree_record_location.cpp:38

#12 0x00007ffff78fa55c in foedus::storage::masstree::MasstreeCursor::fetch_cur_record_logical (this=0x7fffe29cee10, page=0x2aafe46ce000, record=51)

at /home/wantianz/foedus_code/foedus-core/src/foedus/storage/masstree/masstree_cursor.cpp:547

#13 0x00007ffff78fcf78 in foedus::storage::masstree::MasstreeCursor::locate_border (this=0x7fffe29cee10, slice=12884901888)

at /home/wantianz/foedus_code/foedus-core/src/foedus/storage/masstree/masstree_cursor.cpp:1118

#14 0x00007ffff7905b88 in foedus::storage::masstree::MasstreeCursor::locate_layer (this=0x7fffe29cee10, layer=0 '\000')

at /home/wantianz/foedus_code/foedus-core/src/foedus/storage/masstree/masstree_cursor.cpp:1040

#15 0x00007ffff78fb6aa in foedus::storage::masstree::MasstreeCursor::open (this=0x7fffe29cee10, begin_key=0x7fffe29cedd8 "", begin_key_length=8, end_key=0x7fffe29cedd0 "", end_key_length=8,

forward_cursor=false, for_writes=true, begin_inclusive=false, end_inclusive=true) at /home/wantianz/foedus_code/foedus-core/src/foedus/storage/masstree/masstree_cursor.cpp:974

#16 0x00000000004357a8 in foedus::storage::masstree::MasstreeCursor::open_normalized (this=0x7fffe29cee10, begin_key=12884901888, end_key=10737418240, forward_cursor=false, for_writes=true,

begin_inclusive=false, end_inclusive=true) at /home/wantianz/foedus_code/foedus-core/include/foedus/storage/masstree/masstree_cursor.hpp:220

#17 0x00000000004363b1 in foedus::tpcc::TpccClientTask::pop_neworder (this=0x7fffe29cf2b0, wid=0, did=5 '\005', oid=0x7fffe29cf01c)

at /home/wantianz/foedus_code/experiments-core/src/foedus/tpcc/tpcc_delivery.cpp:104

#18 0x0000000000435de0 in foedus::tpcc::TpccClientTask::do_delivery (this=0x7fffe29cf2b0, wid=0) at /home/wantianz/foedus_code/experiments-core/src/foedus/tpcc/tpcc_delivery.cpp:38

@hkimura
Copy link
Collaborator

hkimura commented Sep 20, 2016

Good catch. The fix is probably ok as the RLL-traversal itself never seems a major cost. To see the actual cause.. Is it possible to make a handy unittest to debug this easily? I know it won't be easy to make such a test case.

@wangtzh
Copy link
Collaborator Author

wangtzh commented Sep 21, 2016

yeah, a unittest would be useful, let me try later this week.

@hkimura
Copy link
Collaborator

hkimura commented Sep 21, 2016

BTW, could you make a pull request, against develop, of the change?
Not sure about the exact cause yet, but I can at least test performance effect of calculating last_locked_entry every time on our machine. I bet negligible.

hkimura added a commit that referenced this issue Sep 29, 2016
Recalculate last_locked_entry by scanning all entries in RLL during lock upgrade. This fixes Issue #132.
@wangtzh wangtzh closed this as completed Sep 15, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants