You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Infinite lock spinning when running with RLL and the number of whs < number of threads. Hyeontaek Lim (https://www.cs.cmu.edu/~hl/) found this problem. To reproduce:
I got infinite spinning under relwithdebinfo, and an assertion failure under debug mode, see stack traces in the end.
Looks like the problem is in last_locked_entry_: I got different results returned by calculate_last_locked_entry() and calculate_last_locked_entry_from(pos-1), the former is larger.
retrospective_lock_list.hpp:591 updates last_locked_entry_ by scanning backward only from [pos], but beyond [pos] there might be locked entries. One possibility I guess is in a previous round the transaction followed RLL and made sure all are locked, then in another round it tries to upgrade, at retrospective_lock_list.hpp:591 last_locked_entry_ will be smaller than it should be.
I did the following, it seems to be working:
Replacing calculate_last_locked_entry_from() with calculate_last_locked_entry() at retrospective_lock_list.hpp:591, and reset lock_entry->mcs_block_ to 0 (so it won't hit assertion failure later in non-canonical mode acquire);
Return immediately if the RLL will actually be empty (ie when last_active_entry_ == kLockListPositionInvalid, before the assertion at line 300) in construct() (retrospective_lock_list.cpp:296, after the write-set loop); for read-only transactions, in this example I hit abort in stock-level.
Always using calculate_last_locked_entry() might be inefficient, maybe we should do:
last_locked_entry_ = std::max(last_locked_entry_, calculate_last_locked_entry(pos-1))?
@hkimura it's been a while since I touched the code last time, could you see if these changes make sense? thanks!
============== The stack traces===========
==== Infinite spinning under relwithdebinfo ====
#0 atomic_load_acquire (target=0x2aafd30d0009 "\202") at /home/wantianz/foedus_code/foedus-core/include/foedus/assorted/atomic_fences.hpp:115 #1 read_state (this=) at /home/wantianz/foedus_code/foedus-core/include/foedus/xct/xct_id.hpp:439 #2 is_blocked (this=) at /home/wantianz/foedus_code/foedus-core/include/foedus/xct/xct_id.hpp:448 #3 is_granted (this=) at /home/wantianz/foedus_code/foedus-core/include/foedus/xct/xct_id.hpp:451 #4 operator() (__closure=) at /home/wantianz/foedus_code/foedus-core/src/foedus/xct/xct_mcs_impl.cpp:513 #5 spin_until<foedus::xct::McsImpl<ADAPTOR, foedus::xct::McsRwSimpleBlock>::acquire_unconditional_rw_writer(foedus::xct::McsRwLock*) [with ADAPTOR = foedus::thread::ThreadPimplMcsAdaptorfoedus::xct::McsRwSimpleBlock; foedus::xct::McsBlockIndex = unsigned int]::__lambda7> (spin_until_cond=...) at /home/wantianz/foedus_code/foedus-core/include/foedus/assorted/spin_until_impl.hpp:65 #6 spin_until<foedus::xct::McsImpl<ADAPTOR, foedus::xct::McsRwSimpleBlock>::acquire_unconditional_rw_writer(foedus::xct::McsRwLock*) [with ADAPTOR = foedus::thread::ThreadPimplMcsAdaptorfoedus::xct::McsRwSimpleBlock; foedus::xct::McsBlockIndex = unsigned int]::__lambda7> (spin_until_cond=...) at /home/wantianz/foedus_code/foedus-core/src/foedus/xct/xct_mcs_impl.cpp:42 #7 foedus::xct::McsImplfoedus::thread::ThreadPimplMcsAdaptor<foedus::xct::McsRwSimpleBlock, foedus::xct::McsRwSimpleBlock>::acquire_unconditional_rw_writer (this=,
mcs_rw_lock=<optimized out>) at /home/wantianz/foedus_code/foedus-core/src/foedus/xct/xct_mcs_impl.cpp:513
#8 0x00007ffff7b2494d in try_or_acquire_single_lockfoedus::xct::McsImpl<foedus::thread::ThreadPimplMcsAdaptor<foedus::xct::McsRwSimpleBlock, foedus::xct::McsRwSimpleBlock> > (mcs_rw_impl=0x7fffaa7fae30,
pos=1, this=0x7fffe802d340) at /home/wantianz/foedus_code/foedus-core/include/foedus/xct/retrospective_lock_list.hpp:603
this=0x7fffe802d340) at /home/wantianz/foedus_code/foedus-core/include/foedus/xct/retrospective_lock_list.hpp:643
#10 foedus::thread::ThreadPimpl::cll_try_or_acquire_multiple_locks (this=0x7fffe802c3c0, upto_pos=122) at /home/wantianz/foedus_code/foedus-core/src/foedus/thread/thread_pimpl.cpp:913 #11 0x00007ffff7b24979 in foedus::thread::Thread::cll_try_or_acquire_multiple_locks (this=, upto_pos=)
at /home/wantianz/foedus_code/foedus-core/src/foedus/thread/thread_pimpl.cpp:854
#12 0x00007ffff7b2c649 in foedus::xct::Xct::on_record_read_take_locks_if_needed (this=0x7fffe802d298, intended_for_write=, page_address=page_address@entry=0x2aafe86cf000,
lock_id=lock_id@entry=85378080, tid_address=tid_address@entry=0x2aafe86cf420) at /home/wantianz/foedus_code/foedus-core/src/foedus/xct/xct.cpp:380
#13 0x00007ffff7b2c7cf in foedus::xct::Xct::on_record_read (this=0x7fffe802d298, intended_for_write=, tid_address=0x2aafe86cf420, observed_xid=observed_xid@entry=0x7fffaa7fb100,
read_set_address=read_set_address@entry=0x7fffaa7fb108, no_readset_if_moved=no_readset_if_moved@entry=true, no_readset_if_next_layer=no_readset_if_next_layer@entry=true)
at /home/wantianz/foedus_code/foedus-core/src/foedus/xct/xct.cpp:307
#14 0x00007ffff7adba9b in foedus::storage::masstree::RecordLocation::populate_logical (this=this@entry=0x7fffaa7fb0f0, cur_xct=, page=page@entry=0x2aafe86cf000, index=,
intended_for_write=<optimized out>) at /home/wantianz/foedus_code/foedus-core/src/foedus/storage/masstree/masstree_record_location.cpp:38
#15 0x00007ffff7acbe69 in foedus::storage::masstree::MasstreeCursor::fetch_cur_record_logical (this=this@entry=0x7fffaa7fb0a0, page=page@entry=0x2aafe86cf000, record=)
at /home/wantianz/foedus_code/foedus-core/src/foedus/storage/masstree/masstree_cursor.cpp:547
#16 0x00007ffff7accaf4 in foedus::storage::masstree::MasstreeCursor::locate_border (this=this@entry=0x7fffaa7fb0a0, slice=slice@entry=25769803776)
at /home/wantianz/foedus_code/foedus-core/src/foedus/storage/masstree/masstree_cursor.cpp:1118
#17 0x00007ffff7ad1828 in locate_layer (layer=0 '\000', this=0x7fffaa7fb0a0) at /home/wantianz/foedus_code/foedus-core/src/foedus/storage/masstree/masstree_cursor.cpp:1040 #18 foedus::storage::masstree::MasstreeCursor::open (this=this@entry=0x7fffaa7fb0a0, begin_key=begin_key@entry=0x7fffaa7fb078 "", begin_key_length=begin_key_length@entry=8,
end_key=end_key@entry=0x7fffaa7fb080 "", end_key_length=end_key_length@entry=8, forward_cursor=forward_cursor@entry=false, for_writes=for_writes@entry=true, begin_inclusive=begin_inclusive@entry=false,
end_inclusive=end_inclusive@entry=true) at /home/wantianz/foedus_code/foedus-core/src/foedus/storage/masstree/masstree_cursor.cpp:974
at /home/wantianz/foedus_code/experiments-core/src/foedus/tpcc/tpcc_delivery.cpp:104
#21 0x0000000000412474 in foedus::tpcc::TpccClientTask::do_delivery (this=this@entry=0x7fffaa7fb3b0, wid=wid@entry=1) at /home/wantianz/foedus_code/experiments-core/src/foedus/tpcc/tpcc_delivery.cpp:38
==== Assertion failure under debug mode ====
**** Assertion failed! "correct == last_locked_entry_" did not hold in assert_last_locked_entry(): /home/wantianz/foedus_code/foedus-core/include/foedus/xct/retrospective_lock_list.hpp:408
#4 0x00007ffff7a1ea1f in foedus::xct::CurrentLockList::assert_last_locked_entry (this=0x7ffff0008ed0) at /home/wantianz/foedus_code/foedus-core/include/foedus/xct/retrospective_lock_list.hpp:408 #5 0x00007ffff7a206ee in foedus::xct::CurrentLockList::try_or_acquire_single_lockfoedus::xct::McsImpl<foedus::thread::ThreadPimplMcsAdaptor<foedus::xct::McsRwSimpleBlock, foedus::xct::McsRwSimpleBlock> >
(this=0x7ffff0008ed0, pos=1, mcs_rw_impl=0x7fffe29ce400) at /home/wantianz/foedus_code/foedus-core/include/foedus/xct/retrospective_lock_list.hpp:591
#6 0x00007ffff7a2112b in foedus::xct::CurrentLockList::try_or_acquire_multiple_locksfoedus::xct::McsImpl<foedus::thread::ThreadPimplMcsAdaptor<foedus::xct::McsRwSimpleBlock, foedus::xct::McsRwSimpleBlock> > (this=0x7ffff0008ed0, upto_pos=1, mcs_rw_impl=0x7fffe29ce400) at /home/wantianz/foedus_code/foedus-core/include/foedus/xct/retrospective_lock_list.hpp:643 #7 0x00007ffff7a1dafe in foedus::thread::ThreadPimpl::cll_try_or_acquire_multiple_locks (this=0x7ffff0007f50, upto_pos=1) at /home/wantianz/foedus_code/foedus-core/src/foedus/thread/thread_pimpl.cpp:913 #8 0x00007ffff7a1d7f6 in foedus::thread::Thread::cll_try_or_acquire_multiple_locks (this=0x7ffff0005d00, upto_pos=1) at /home/wantianz/foedus_code/foedus-core/src/foedus/thread/thread_pimpl.cpp:854 #9 0x00007ffff7a31647 in foedus::xct::Xct::on_record_read_take_locks_if_needed (this=0x7ffff0008e28, intended_for_write=true, page_address=0x2aafe46ce000, lock_id=18266496, tid_address=0x2aafe46ce980)
at /home/wantianz/foedus_code/foedus-core/src/foedus/xct/xct.cpp:380
#10 0x00007ffff7a31000 in foedus::xct::Xct::on_record_read (this=0x7ffff0008e28, intended_for_write=true, tid_address=0x2aafe46ce980, observed_xid=0x7fffe29cee70, read_set_address=0x7fffe29cee78,
no_readset_if_moved=true, no_readset_if_next_layer=true) at /home/wantianz/foedus_code/foedus-core/src/foedus/xct/xct.cpp:307
#11 0x00007ffff792ecf4 in foedus::storage::masstree::RecordLocation::populate_logical (this=0x7fffe29cee60, cur_xct=0x7ffff0008e28, page=0x2aafe46ce000, index=51, intended_for_write=true)
at /home/wantianz/foedus_code/foedus-core/src/foedus/storage/masstree/masstree_record_location.cpp:38
#12 0x00007ffff78fa55c in foedus::storage::masstree::MasstreeCursor::fetch_cur_record_logical (this=0x7fffe29cee10, page=0x2aafe46ce000, record=51)
at /home/wantianz/foedus_code/foedus-core/src/foedus/storage/masstree/masstree_cursor.cpp:547
#13 0x00007ffff78fcf78 in foedus::storage::masstree::MasstreeCursor::locate_border (this=0x7fffe29cee10, slice=12884901888)
at /home/wantianz/foedus_code/foedus-core/src/foedus/storage/masstree/masstree_cursor.cpp:1118
#14 0x00007ffff7905b88 in foedus::storage::masstree::MasstreeCursor::locate_layer (this=0x7fffe29cee10, layer=0 '\000')
at /home/wantianz/foedus_code/foedus-core/src/foedus/storage/masstree/masstree_cursor.cpp:1040
forward_cursor=false, for_writes=true, begin_inclusive=false, end_inclusive=true) at /home/wantianz/foedus_code/foedus-core/src/foedus/storage/masstree/masstree_cursor.cpp:974
#16 0x00000000004357a8 in foedus::storage::masstree::MasstreeCursor::open_normalized (this=0x7fffe29cee10, begin_key=12884901888, end_key=10737418240, forward_cursor=false, for_writes=true,
begin_inclusive=false, end_inclusive=true) at /home/wantianz/foedus_code/foedus-core/include/foedus/storage/masstree/masstree_cursor.hpp:220
#17 0x00000000004363b1 in foedus::tpcc::TpccClientTask::pop_neworder (this=0x7fffe29cf2b0, wid=0, did=5 '\005', oid=0x7fffe29cf01c)
at /home/wantianz/foedus_code/experiments-core/src/foedus/tpcc/tpcc_delivery.cpp:104
#18 0x0000000000435de0 in foedus::tpcc::TpccClientTask::do_delivery (this=0x7fffe29cf2b0, wid=0) at /home/wantianz/foedus_code/experiments-core/src/foedus/tpcc/tpcc_delivery.cpp:38
The text was updated successfully, but these errors were encountered:
Good catch. The fix is probably ok as the RLL-traversal itself never seems a major cost. To see the actual cause.. Is it possible to make a handy unittest to debug this easily? I know it won't be easy to make such a test case.
BTW, could you make a pull request, against develop, of the change?
Not sure about the exact cause yet, but I can at least test performance effect of calculating last_locked_entry every time on our machine. I bet negligible.
Infinite lock spinning when running with RLL and the number of whs < number of threads. Hyeontaek Lim (https://www.cs.cmu.edu/~hl/) found this problem. To reproduce:
./tpcc -fork_workers=false -take_snapshot=false -nvm_folder=/dev/shm -volatile_pool_size=10 -snapshot_pool_size=1 -reducer_buffer_size=1 -loggers_per_node=1 -neworder_remote_percent=1 -payment_remote_percent=15 -thread_per_node=10 -numa_nodes=2 -log_buffer_mb=1024 -null_log_device=true -high_priority=false -warehouses=4 -duration_micro=10000000 -hcc_policy=0
I got infinite spinning under relwithdebinfo, and an assertion failure under debug mode, see stack traces in the end.
Looks like the problem is in last_locked_entry_: I got different results returned by calculate_last_locked_entry() and calculate_last_locked_entry_from(pos-1), the former is larger.
retrospective_lock_list.hpp:591 updates last_locked_entry_ by scanning backward only from [pos], but beyond [pos] there might be locked entries. One possibility I guess is in a previous round the transaction followed RLL and made sure all are locked, then in another round it tries to upgrade, at retrospective_lock_list.hpp:591 last_locked_entry_ will be smaller than it should be.
I did the following, it seems to be working:
Always using calculate_last_locked_entry() might be inefficient, maybe we should do:
last_locked_entry_ = std::max(last_locked_entry_, calculate_last_locked_entry(pos-1))?
@hkimura it's been a while since I touched the code last time, could you see if these changes make sense? thanks!
============== The stack traces===========
==== Infinite spinning under relwithdebinfo ====
#0 atomic_load_acquire (target=0x2aafd30d0009 "\202") at /home/wantianz/foedus_code/foedus-core/include/foedus/assorted/atomic_fences.hpp:115
#1 read_state (this=) at /home/wantianz/foedus_code/foedus-core/include/foedus/xct/xct_id.hpp:439
#2 is_blocked (this=) at /home/wantianz/foedus_code/foedus-core/include/foedus/xct/xct_id.hpp:448
#3 is_granted (this=) at /home/wantianz/foedus_code/foedus-core/include/foedus/xct/xct_id.hpp:451
#4 operator() (__closure=) at /home/wantianz/foedus_code/foedus-core/src/foedus/xct/xct_mcs_impl.cpp:513
#5 spin_until<foedus::xct::McsImpl<ADAPTOR, foedus::xct::McsRwSimpleBlock>::acquire_unconditional_rw_writer(foedus::xct::McsRwLock*) [with ADAPTOR = foedus::thread::ThreadPimplMcsAdaptorfoedus::xct::McsRwSimpleBlock; foedus::xct::McsBlockIndex = unsigned int]::__lambda7> (spin_until_cond=...) at /home/wantianz/foedus_code/foedus-core/include/foedus/assorted/spin_until_impl.hpp:65
#6 spin_until<foedus::xct::McsImpl<ADAPTOR, foedus::xct::McsRwSimpleBlock>::acquire_unconditional_rw_writer(foedus::xct::McsRwLock*) [with ADAPTOR = foedus::thread::ThreadPimplMcsAdaptorfoedus::xct::McsRwSimpleBlock; foedus::xct::McsBlockIndex = unsigned int]::__lambda7> (spin_until_cond=...) at /home/wantianz/foedus_code/foedus-core/src/foedus/xct/xct_mcs_impl.cpp:42
#7 foedus::xct::McsImplfoedus::thread::ThreadPimplMcsAdaptor<foedus::xct::McsRwSimpleBlock, foedus::xct::McsRwSimpleBlock>::acquire_unconditional_rw_writer (this=,
#8 0x00007ffff7b2494d in try_or_acquire_single_lockfoedus::xct::McsImpl<foedus::thread::ThreadPimplMcsAdaptor<foedus::xct::McsRwSimpleBlock, foedus::xct::McsRwSimpleBlock> > (mcs_rw_impl=0x7fffaa7fae30,
#9 try_or_acquire_multiple_locksfoedus::xct::McsImpl<foedus::thread::ThreadPimplMcsAdaptor<foedus::xct::McsRwSimpleBlock, foedus::xct::McsRwSimpleBlock> > (mcs_rw_impl=0x7fffaa7fae30, upto_pos=122,
#10 foedus::thread::ThreadPimpl::cll_try_or_acquire_multiple_locks (this=0x7fffe802c3c0, upto_pos=122) at /home/wantianz/foedus_code/foedus-core/src/foedus/thread/thread_pimpl.cpp:913
#11 0x00007ffff7b24979 in foedus::thread::Thread::cll_try_or_acquire_multiple_locks (this=, upto_pos=)
#12 0x00007ffff7b2c649 in foedus::xct::Xct::on_record_read_take_locks_if_needed (this=0x7fffe802d298, intended_for_write=, page_address=page_address@entry=0x2aafe86cf000,
#13 0x00007ffff7b2c7cf in foedus::xct::Xct::on_record_read (this=0x7fffe802d298, intended_for_write=, tid_address=0x2aafe86cf420, observed_xid=observed_xid@entry=0x7fffaa7fb100,
#14 0x00007ffff7adba9b in foedus::storage::masstree::RecordLocation::populate_logical (this=this@entry=0x7fffaa7fb0f0, cur_xct=, page=page@entry=0x2aafe86cf000, index=,
#15 0x00007ffff7acbe69 in foedus::storage::masstree::MasstreeCursor::fetch_cur_record_logical (this=this@entry=0x7fffaa7fb0a0, page=page@entry=0x2aafe86cf000, record=)
#16 0x00007ffff7accaf4 in foedus::storage::masstree::MasstreeCursor::locate_border (this=this@entry=0x7fffaa7fb0a0, slice=slice@entry=25769803776)
#17 0x00007ffff7ad1828 in locate_layer (layer=0 '\000', this=0x7fffaa7fb0a0) at /home/wantianz/foedus_code/foedus-core/src/foedus/storage/masstree/masstree_cursor.cpp:1040
#18 foedus::storage::masstree::MasstreeCursor::open (this=this@entry=0x7fffaa7fb0a0, begin_key=begin_key@entry=0x7fffaa7fb078 "", begin_key_length=begin_key_length@entry=8,
#19 0x0000000000412152 in open_normalized (end_inclusive=true, begin_inclusive=false, for_writes=true, forward_cursor=false, end_key=23622320128, begin_key=25769803776, this=0x7fffaa7fb0a0)
#20 foedus::tpcc::TpccClientTask::pop_neworder (this=this@entry=0x7fffaa7fb3b0, wid=wid@entry=1, did=did@entry=1 '\001', oid=oid@entry=0x7fffaa7fb1c4)
#21 0x0000000000412474 in foedus::tpcc::TpccClientTask::do_delivery (this=this@entry=0x7fffaa7fb3b0, wid=wid@entry=1) at /home/wantianz/foedus_code/experiments-core/src/foedus/tpcc/tpcc_delivery.cpp:38
==== Assertion failure under debug mode ====
**** Assertion failed! "correct == last_locked_entry_" did not hold in assert_last_locked_entry(): /home/wantianz/foedus_code/foedus-core/include/foedus/xct/retrospective_lock_list.hpp:408
#4 0x00007ffff7a1ea1f in foedus::xct::CurrentLockList::assert_last_locked_entry (this=0x7ffff0008ed0) at /home/wantianz/foedus_code/foedus-core/include/foedus/xct/retrospective_lock_list.hpp:408
#5 0x00007ffff7a206ee in foedus::xct::CurrentLockList::try_or_acquire_single_lockfoedus::xct::McsImpl<foedus::thread::ThreadPimplMcsAdaptor<foedus::xct::McsRwSimpleBlock, foedus::xct::McsRwSimpleBlock> >
#6 0x00007ffff7a2112b in foedus::xct::CurrentLockList::try_or_acquire_multiple_locksfoedus::xct::McsImpl<foedus::thread::ThreadPimplMcsAdaptor<foedus::xct::McsRwSimpleBlock, foedus::xct::McsRwSimpleBlock> > (this=0x7ffff0008ed0, upto_pos=1, mcs_rw_impl=0x7fffe29ce400) at /home/wantianz/foedus_code/foedus-core/include/foedus/xct/retrospective_lock_list.hpp:643
#7 0x00007ffff7a1dafe in foedus::thread::ThreadPimpl::cll_try_or_acquire_multiple_locks (this=0x7ffff0007f50, upto_pos=1) at /home/wantianz/foedus_code/foedus-core/src/foedus/thread/thread_pimpl.cpp:913
#8 0x00007ffff7a1d7f6 in foedus::thread::Thread::cll_try_or_acquire_multiple_locks (this=0x7ffff0005d00, upto_pos=1) at /home/wantianz/foedus_code/foedus-core/src/foedus/thread/thread_pimpl.cpp:854
#9 0x00007ffff7a31647 in foedus::xct::Xct::on_record_read_take_locks_if_needed (this=0x7ffff0008e28, intended_for_write=true, page_address=0x2aafe46ce000, lock_id=18266496, tid_address=0x2aafe46ce980)
#10 0x00007ffff7a31000 in foedus::xct::Xct::on_record_read (this=0x7ffff0008e28, intended_for_write=true, tid_address=0x2aafe46ce980, observed_xid=0x7fffe29cee70, read_set_address=0x7fffe29cee78,
#11 0x00007ffff792ecf4 in foedus::storage::masstree::RecordLocation::populate_logical (this=0x7fffe29cee60, cur_xct=0x7ffff0008e28, page=0x2aafe46ce000, index=51, intended_for_write=true)
#12 0x00007ffff78fa55c in foedus::storage::masstree::MasstreeCursor::fetch_cur_record_logical (this=0x7fffe29cee10, page=0x2aafe46ce000, record=51)
#13 0x00007ffff78fcf78 in foedus::storage::masstree::MasstreeCursor::locate_border (this=0x7fffe29cee10, slice=12884901888)
#14 0x00007ffff7905b88 in foedus::storage::masstree::MasstreeCursor::locate_layer (this=0x7fffe29cee10, layer=0 '\000')
#15 0x00007ffff78fb6aa in foedus::storage::masstree::MasstreeCursor::open (this=0x7fffe29cee10, begin_key=0x7fffe29cedd8 "", begin_key_length=8, end_key=0x7fffe29cedd0 "", end_key_length=8,
#16 0x00000000004357a8 in foedus::storage::masstree::MasstreeCursor::open_normalized (this=0x7fffe29cee10, begin_key=12884901888, end_key=10737418240, forward_cursor=false, for_writes=true,
#17 0x00000000004363b1 in foedus::tpcc::TpccClientTask::pop_neworder (this=0x7fffe29cf2b0, wid=0, did=5 '\005', oid=0x7fffe29cf01c)
#18 0x0000000000435de0 in foedus::tpcc::TpccClientTask::do_delivery (this=0x7fffe29cf2b0, wid=0) at /home/wantianz/foedus_code/experiments-core/src/foedus/tpcc/tpcc_delivery.cpp:38
The text was updated successfully, but these errors were encountered: