perf(filecoin-proofs): speed up Fr32Reader #1341

dignifiedquire · 2020-11-04T23:23:00Z

Switch to processing bit padding in blocks of 127bytes using u128s.

This also improves correctness, as now the reader will emit a full Fr32 at the end, not just a single byte with the additional bits, which is why some of the test code was updated.

Speed differences can be seen here, with the diffs being the same benchmarks run against master. https://gist.github.com/dignifiedquire/1c21680f237a8f3aa5ad43f0008ffa41

cryptonemo

I reviewed this earlier and it looks good, thanks for rebasing.

Switch to processing bit padding in blocks of 127bytes using u128s

porcuquine

This generally looks good, and I can't see any definite problems. However, the naming throughout is confusing enough that I find it hard to follow the logic. In some cases, the naming actively confuses me because words seem to be used ambiguously. If you can clarify such cases by using less ambiguous naming and especially avoid cases where the same word is being used to mean entirely different things (if any such actually exist), that would be valuable. Although it seems like a small thing, once such naming gets locked in, it can be very expensive for future readers of the code to swap in the context required to understand it.

porcuquine · 2020-11-06T18:19:29Z

filecoin-proofs/src/fr32_reader.rs

    /// Are we done reading?
    done: bool,
 }

+const NUM_U128S_PER_BLOCK: usize = 8;
+const MASK_SKIP_HIGH_2: u128 = 0b0011_1111_1111_1111_1111_1111_1111_1111_1111_1111_1111_1111_1111_1111_1111_1111_1111_1111_1111_1111_1111_1111_1111_1111_1111_1111_1111_1111_1111_1111_1111_1111;
+


I would rather see this written in a way that can be seen correct by inspection, like:

u128::MAX - (0b11 << 126);

I find that much harder to inspect to be honest

Really? It's easy for you to see at a glance that this is wrong?

0b0011_1111_1111_1111_1111_1111_1111_1111_1111_1111_1111_1111_1111_1111_1111_1111_1111_1111_1111_1111_1111_1111_1111_1111_1111_1111_1111_1111_1111_1111_1111

neither of them is "easy" for me, but in thase case I simply count the 1s and know if it is good, but your notation requires me to think through bitshfiting rules, and how the subtraction operator is used on a bit level.

My main point was to avoid counting the 1s — not the specific construction used to avoid it. Especially with an appropriate comment, I'm not averse to asking readers to think through bitshifting rules.

A simpler version which requires less thinking and still no counting would be:

const MASK_SKIP_HIGH_2: u128 = u128::MAX >> 2;

Maybe that is the best of both worlds?

porcuquine · 2020-11-06T18:20:41Z

filecoin-proofs/src/fr32_reader.rs

 impl<R: io::Read> Fr32Reader<R> {
    pub fn new(source: R) -> Self {
        Fr32Reader {
            source,
-            target_offset: 0,
-            buffer: Default::default(),
+            in_buffer: [0; NUM_U128S_PER_BLOCK * 16],


Can we move the 16 into a named constant?

porcuquine · 2020-11-06T18:56:42Z

filecoin-proofs/src/fr32_reader.rs

+        // 0..254
+        {
+            out_buffer[0] = in_buffer[0];
+            out_buffer[1] = in_buffer[1] & MASK_SKIP_HIGH_2;


Somewhere, there should be a comment noting that u128 is 16 bytes, so Fr32 output is arranged in groups of 2 u128s — and here we we handle groups of 4 Fr32s because that is the period at which alignment repeats.

porcuquine · 2020-11-06T19:14:48Z

filecoin-proofs/src/fr32_reader.rs

+            out_buffer[6] |= in_buffer[6] << 6; // low 122 bits
+            out_buffer[7] = in_buffer[6] >> 122; // top 6 bits
+            out_buffer[7] |= in_buffer[7] << 6; // low 120 bits
+            out_buffer[7] &= MASK_SKIP_HIGH_2; // zero high 2 bits


I think you could implement this with a macro which would make the logic entirely clear while still being efficient. The above is fine, just requires more effort to verify that what is written our here really does correspond to that notional 'macroexpansion'.

porcuquine · 2020-11-06T19:18:37Z

filecoin-proofs/src/fr32_reader.rs

-                    self.target_offset += 8;
-                    return Ok(1);
+                Ok(n) => {
+                    let tmp = buf;


Is tmp needed? Might be clearer to just assign buf = &mut buf[n..] if that works.

porcuquine · 2020-11-06T20:15:36Z

filecoin-proofs/src/fr32_reader.rs

-    pub fn read_u32(&mut self) -> u32 {
-        debug_assert!(self.available() >= 32);
+                self.process_block();
+                self.to_process += div_ceil(bytes_read * 8, 254);


This to_process quantity is very confusing. Maybe it's just the name. We just finished processing (process_block()), so I would not expect the amount 'to process' to increase. Can you rework the naming so what is actually being tracked is clearer?

Is the word 'process' actually referring to the same thing in these two names? If not, one should really be changed.

porcuquine · 2020-11-06T20:35:31Z

filecoin-proofs/src/fr32_reader.rs

-        target[2] = value[2];
-        target[3] = value[3];
-    }
+            let start = read;


Amplifying my comment when read was introduced, I think this is a confusing name. I think I would be less confused if it was something like target_read. As a reader, it's very hard to keep track of the context and semantics of these variables. If there's not a way to make it clearer by the structure, at least maybe more explicit names will help.

I think part of the confusion, too, is that in this context, you're writing to target but referring to the completed result as having read. I know that's what this method purports to do, just explaining how it creates confusion, giving that you also read from self.source into a buffer, then read from that buffer before 'reading' into target.

Maybe target_offset would be clearest?

Well it is very unfortunate that the past tense and the verb happen to be the same word here, as this is explicitly the past tense, maybe "bytes_read" would be better?

porcuquine · 2020-11-06T21:05:06Z

filecoin-proofs/src/fr32_reader.rs

+        while read < num_bytes {
+            // load new block
+            if self.to_process == 0 {
+                let bytes_read = self.fill_buffer()?;


This fills in_buffer.

porcuquine · 2020-11-06T21:13:46Z

filecoin-proofs/src/fr32_reader.rs

-    /// Currently read byte.
-    buffer: Buffer,
+    /// Currently read block.
+    in_buffer: [u8; NUM_U128S_PER_BLOCK * 16],


Instead of 16, you could explicitly calculate with size_of to show that the in and out buffers hold the same number of bytes.

porcuquine · 2020-11-06T21:20:04Z

filecoin-proofs/src/fr32_reader.rs

+                .copy_from_slice(&self.out_buffer.as_byte_slice()[out_start..out_end]);
+            read += len;
+            self.out_offset += len;
+            self.to_process -= div_ceil(len * 8, 256);


div_ceil(len, 32) might be clearer. But this raises a question, if to_process is the number of blocks, and one block is 4 Fr, shouldn't this be div_ceil(len, 128)?

I can believe the actual algorithm is doing the right thing here, but I'm confused trying to line up all the names and descriptions throughout this whole changeset.

I think maybe 'block' in this context is being used to mean 'one Frs worth of bytes' — but it seems like in the definition of NUM_U128S_PER_BLOCK (where the value is 8), 'block' is being used to refer to 128 bytes = 4 Fr32s.

Am I missing something, or can some rectification of names help?

to process is in number of Frs sorry, which is 254 bits in, 256bits out

dignifiedquire requested review from cryptonemo and porcuquine as code owners November 4, 2020 23:23

dignifiedquire force-pushed the perf-fr32 branch from a4cee66 to 8b9920a Compare November 5, 2020 16:40

cryptonemo previously approved these changes Nov 5, 2020

View reviewed changes

dignifiedquire dismissed cryptonemo’s stale review via 80198e6 November 6, 2020 14:14

dignifiedquire force-pushed the perf-fr32 branch from f8340e2 to 80198e6 Compare November 6, 2020 14:14

dignifiedquire added 2 commits November 6, 2020 20:16

perf(filecoin-proofs): speed up Fr32Reader

7ea9b61

Switch to processing bit padding in blocks of 127bytes using u128s

fixup: remove testing test

d4a99cd

dignifiedquire force-pushed the perf-fr32 branch from 80198e6 to d4a99cd Compare November 6, 2020 19:16

porcuquine reviewed Nov 6, 2020

View reviewed changes

dignifiedquire added 2 commits November 9, 2020 17:29

apply CR

a75f77d

happy clippy

a660f10

dignifiedquire merged commit 4e0594c into master Nov 9, 2020

dignifiedquire deleted the perf-fr32 branch November 9, 2020 17:16

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf(filecoin-proofs): speed up Fr32Reader #1341

perf(filecoin-proofs): speed up Fr32Reader #1341

dignifiedquire commented Nov 4, 2020

cryptonemo left a comment

porcuquine left a comment

porcuquine Nov 6, 2020

dignifiedquire Nov 6, 2020

porcuquine Nov 6, 2020

dignifiedquire Nov 9, 2020

porcuquine Nov 9, 2020

porcuquine Nov 6, 2020

porcuquine Nov 6, 2020

porcuquine Nov 6, 2020

porcuquine Nov 6, 2020

porcuquine Nov 6, 2020

porcuquine Nov 6, 2020

dignifiedquire Nov 6, 2020

porcuquine Nov 6, 2020

porcuquine Nov 6, 2020

porcuquine Nov 6, 2020

dignifiedquire Nov 6, 2020

perf(filecoin-proofs): speed up Fr32Reader #1341

perf(filecoin-proofs): speed up Fr32Reader #1341

Conversation

dignifiedquire commented Nov 4, 2020

cryptonemo left a comment

Choose a reason for hiding this comment

porcuquine left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment