SipHasher takes up lots of time in incremental builds #51054

nnethercote · 2018-05-25T00:47:19Z

Here's Cachegrind data for clap-rs from rustc-benchmarks on a "base incremental" "check" build (i.e. the first incremental build):

--------------------------------------------------------------------------------
            Ir
--------------------------------------------------------------------------------
39,792,587,805  PROGRAM TOTALS

6,008,282,218  librustc_data_structures/sip128.rs:rustc_data_structures::sip128::SipHasher128::short_write
1,655,965,059  libcore/num/mod.rs:rustc_data_structures::sip128::SipHasher128::short_write
  319,188,771  libcore/cmp.rs:rustc_data_structures::sip128::SipHasher128::short_write

That's over 20% of the instructions under short_write.

Here are the annotations for the hot pieces of code in librustc_data_structures/sip128.rs.

          .  macro_rules! compress {
          .      ($state:expr) => ({
967,427,173          compress!($state.v0, $state.v1, $state.v2, $state.v3)
          .      });
          .      ($v0:expr, $v1:expr, $v2:expr, $v3:expr) =>
          .      ({
          .          $v0 = $v0.wrapping_add($v1); $v1 = $v1.rotate_left(13); $v1 ^= $v0;
          .          $v0 = $v0.rotate_left(32);
          .          $v2 = $v2.wrapping_add($v3); $v3 = $v3.rotate_left(16); $v3 ^= $v2;
          .          $v0 = $v0.wrapping_add($v3); $v3 = $v3.rotate_left(21); $v3 ^= $v0;
          .          $v2 = $v2.wrapping_add($v1); $v1 = $v1.rotate_left(17); $v1 ^= $v2;
 79,051,764          $v2 = $v2.rotate_left(32);
          .      });
          .  }
          .
          .  /// Load an integer of the desired type from a byte stream, in LE order. Uses
          .  /// `copy_nonoverlapping` to let the compiler generate the most efficient way
          .  /// to load it from a possibly unaligned address.
          .  ///
          .  /// Unsafe because: unchecked indexing at i..i+size_of(int_ty)
          .  macro_rules! load_int_le {
          .      ($buf:expr, $i:expr, $int_ty:ident) =>
          .      ({
          .         debug_assert!($i + mem::size_of::<$int_ty>() <= $buf.len());
          .         let mut data = 0 as $int_ty;
 13,166,995         ptr::copy_nonoverlapping($buf.get_unchecked($i),
          .                                  &mut data as *mut _ as *mut u8,
          .                                  mem::size_of::<$int_ty>());
          .         data.to_le()
          .      });
          .  }
          .
          .  /// Load an u64 using up to 7 bytes of a byte slice.
          .  ///
          .  /// Unsafe because: unchecked indexing at start..start+len
          .  #[inline]
          .  unsafe fn u8to64_le(buf: &[u8], start: usize, len: usize) -> u64 {
          .      debug_assert!(len < 8);
          .      let mut i = 0; // current byte index (from LSB) in the output u64
          .      let mut out = 0;
345,951,546      if i + 3 < len {
 80,328,075          out = load_int_le!(buf, start + i, u32) as u64;
          .          i += 4;
          .      }
747,722,073      if i + 1 < len {
480,400,389          out |= (load_int_le!(buf, start + i, u16) as u64) << (i * 8);
 87,344,805          i += 2
          .      }
345,951,546      if i < len {
211,374,738          out |= (*buf.get_unchecked(start + i) as u64) << (i * 8);
          .          i += 1;
          .      }
          .      debug_assert_eq!(i, len);
          .      out
          .  }

and

          .      #[inline]
          .      fn short_write(&mut self, msg: &[u8]) {
          .          debug_assert!(msg.len() <= 8);
          .          let length = msg.len();
212,792,515          self.length += length;
          .
319,188,774          let needed = 8 - self.ntail;
          .          let fill = cmp::min(length, needed);
212,792,515          if fill == 8 {
 38,954,670              self.tail = unsafe { load_int_le!(msg, 0, u64) };
          .          } else {
560,468,202              self.tail |= unsafe { u8to64_le(msg, 0, fill) } << (8 * self.ntail);
186,822,734              if length < needed {
 55,081,556                  self.ntail += length;
          .                  return;
          .              }
          .          }
 78,855,480          self.state.v3 ^= self.tail;
          .          Sip24Rounds::c_rounds(&mut self.state);
157,710,960          self.state.v0 ^= self.tail;
          .
          .          // Buffered tail is now flushed, process new input.
157,710,958          self.ntail = length - needed;
 78,855,480          self.tail = unsafe { u8to64_le(msg, needed, self.ntail) };
212,792,514      }

And from libcore/num/mod.rs:

            .          #[inline]
            .          pub fn rotate_left(self, n: u32) -> Self {
            .              // Protect against undefined behaviour for over-long bit shifts
            .              let n = n % $BITS;
1,187,178,937              (self << n) | (self >> (($BITS - n) % $BITS))
            .          }

I stared at this for a while but wasn't able to come up with any notable improvements.

Hashing is the hottest part of most incremental check builds. If we can't speed up this code, perhaps we could use a different hasher, or find a way to hash less data.

CC @michaelwoerister

The text was updated successfully, but these errors were encountered:

nnethercote · 2018-05-25T01:15:34Z

or find a way to hash less data

A small example: the impls of HashStable for [T], str, and IndexVec all hash the length and then the elements. Is the length needed? I guess it's necessary because you might have a structure with two Vecs in a row, and (vec![1,2,3], vec![4,5]) must hash differently to (vec![1,2], vec![3,4,5])?

I guess I could retry the trick from #37427 of leb128-encoding integers. I did some measurements while compiling the compiler and 66% of the values hashed were u64, and 27% were u8, so there's potential there.

sfackler · 2018-05-25T02:42:22Z

I guess it's necessary because you might have a structure with two Vecs in a row, and (vec![1,2,3], vec![4,5]) must hash differently to (vec![1,2], vec![3,4,5])?

Yeah, that's the reason.

michaelwoerister · 2018-05-25T07:48:04Z

You could give leb128 encoding a try, especially for usize which usually represents small length values. Leb128 encoding has been optimized a bit since it was last used during hashing (#46919).

The most effective optimizations we've done here is caching of hashes, like in 45bd091.

Using a different hasher is an option if it supports 128 bit output and we can be sure that it has a similarly low collision probability as SipHash 2-4. However, I'd only like to change it if the results are compelling since with iterated on this quite a bit already (see #41215).

michaelwoerister · 2018-05-29T15:30:55Z

The highwayhash authors claim to have a faster implementation of SipHash which still produces the same outputs: https://github.com/google/highwayhash.

nnethercote · 2018-06-06T12:22:06Z

One thing that's not clear to me is what needs to be hashed and what doesn't. For example, ScopeTrees are hashed, and for clap-rs that hashing takes up 12% of the time of a "baseline incremental-check" build, i.e. a lot. But why are ScopeTrees hashed? It seems redundant, because they are derived from HIR, and HIR is hashed.

Here is a lightly edited transcript of an IRC discussion about this, between me and @Zoxc.

<njn> so I am wondering why `parent_map` (and the entire scope tree) is hashed at all,
      because it seems to me to be "derived data", in the sense that it'll only change if
      the HIR above it also changes
<Zoxc> I'm not sure why it's hashed, it seems redundant to me if it is treated as an input
       along with the HIR
<njn> Right. So I'm wondering if there's a clear notion of what should be hashed and what
      shouldn't
<Zoxc> We probably want to introduce a query kind which does not recheck the result for
       freshness for these cases. That would avoid the hashing
<njn> what is the definition of "these cases" -- derived data?
<Zoxc> No. It would be queries D, where query D uses queries A, and the users of D also
       use A.
<Zoxc> So if such a query D is executed, comparing its result to the previous session and
       marking it up to date will have no affect on its users since they use the same
       queries A and will execute anyway.
<njn> how does one identify these cases? Does it require domain expertise on a
      case-by-case basis?
<Zoxc> It probably does
<njn> :(
<njn> I guess if a handful of structures account for most of the hashing cost, that'll make
      it easier
<Zoxc> You could also just change one query at a time and benchmark things, starting
       with those with the largest query results

@michaelwoerister: what do you think?

michaelwoerister · 2018-06-06T14:41:15Z

Copying my comments from IRC:

njn, Zoxc: every query result needs to be hashed, otherwise incremental compilation could not recover from false positive. We started out with just hashing the HIR but re-use was very poor
njn, Zoxc: I think there's a bug report for everything that we forgot to hash properly :)  
njn, Zoxc: https://github.com/rust-lang/rust/issues/47389 contains some background on why spans are hashed and how things could be improved a little.

It might be possible optimize specific cases like this:

      A <---+
      ^     |
      |     |
      |     B
      |     ^
      |     |
      +--C--+

That is, if you have a query C that depends A and B and B is derived from A. Then hashing B is indeed redundant. However, if there was one more query D that depends only on B, then not hashing B might lead to redundantly executing D if A changed but B was not affected by the change.

      A <---+
      ^     |
      |     |
      |     B <---- D
      |     ^
      |     |
      +--C--+

Also note that we don't re-hash query result where we already know the hash from a previous compilation session.

Kobzol · 2021-10-04T15:33:15Z

FWIW, I tried replacing SipHash128 with HighwayHash. With the out-of-the-box implementation, it resulted in ~10 % instruction count regression and mixed cycle count results across the board locally on my PC.

I also tried experimenting with different buffer sizes for SipHash. It seems that a larger buffer size (16, 32) has a positive impact on cycle counts and walltimes on my PC, but this will probably differ a lot on various architectures.

nnethercote · 2021-12-06T21:17:11Z

I tried seahash and got instruction count increases of up to 29%, even though its docs claim it is more than 10x faster than SipHash :(

It may at least partly be explained by the fact that I managed to speed up SipHasher128 (the one used in incremental compilation) quite a lot last year in #68914. In fact, based on that, this issue doesn't need to be open any more.

Mark-Simulacrum added I-compiletime Issue: Problems and improvements with respect to compile times. A-incr-comp Area: Incremental compilation WG-compiler-performance Working group: Compiler Performance labels May 28, 2018

XAMPPRocky added C-enhancement Category: An issue proposing an enhancement or a PR with one. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Oct 2, 2018

ehuss mentioned this issue Jul 9, 2021

Deduplicate compiler diagnostics. rust-lang/cargo#9675

Merged

nnethercote closed this as completed Dec 6, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SipHasher takes up lots of time in incremental builds #51054

SipHasher takes up lots of time in incremental builds #51054

nnethercote commented May 25, 2018

nnethercote commented May 25, 2018

sfackler commented May 25, 2018

michaelwoerister commented May 25, 2018

michaelwoerister commented May 29, 2018

nnethercote commented Jun 6, 2018

michaelwoerister commented Jun 6, 2018

Kobzol commented Oct 4, 2021

nnethercote commented Dec 6, 2021

SipHasher takes up lots of time in incremental builds #51054

SipHasher takes up lots of time in incremental builds #51054

Comments

nnethercote commented May 25, 2018

nnethercote commented May 25, 2018

sfackler commented May 25, 2018

michaelwoerister commented May 25, 2018

michaelwoerister commented May 29, 2018

nnethercote commented Jun 6, 2018

michaelwoerister commented Jun 6, 2018

Kobzol commented Oct 4, 2021

nnethercote commented Dec 6, 2021