Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Empty sector update tree r gpu #1528

Merged
merged 4 commits into from
Nov 12, 2021

Conversation

cryptonemo
Copy link
Collaborator

No description provided.

@cryptonemo
Copy link
Collaborator Author

This may be slower in the general case, so it needs to be perf tested before it can be considered. Working on that ...

@cryptonemo
Copy link
Collaborator Author

cryptonemo commented Oct 15, 2021

These are the PC2 performance numbers that I got (showing no measurable regression):

master

Run 1:
seal-pre-commit-phase2-cpu-time-ms: 3,171,382
seal-pre-commit-phase2-wall-time-ms: 548,635

Run 2:
seal-pre-commit-phase2-cpu-time-ms: 3,139,479
seal-pre-commit-phase2-wall-time-ms: 548,908

This PR branch

Run 1:
seal-pre-commit-phase2-cpu-time-ms: 3,141,659
seal-pre-commit-phase2-wall-time-ms: 548,493

Run 2:
seal-pre-commit-phase2-cpu-time-ms: 3,135,765
seal-pre-commit-phase2-wall-time-ms: 548,141

EDIT: These numbers are using GPU and CUDA. We also need CPU numbers to make sure there are no measurable regressions.

@cryptonemo
Copy link
Collaborator Author

I will probably replace the closures with real methods (to help avoid duplication) and re-test to be sure.

@cryptonemo
Copy link
Collaborator Author

Hrm, with CPU Tree building enabled, a regression is noted:

master

Run 1:
seal-pre-commit-phase2-cpu-time-ms: 15,249,411
seal-pre-commit-phase2-wall-time-ms: 918,187

Run 2:
seal-pre-commit-phase2-cpu-time-ms: 15,190,269
seal-pre-commit-phase2-wall-time-ms: 916,123

This PR branch

Run 1:
seal-pre-commit-phase2-cpu-time-ms: 15,479,586
seal-pre-commit-phase2-wall-time-ms: 931,320

Run 2:
seal-pre-commit-phase2-cpu-time-ms: 15,516,990
seal-pre-commit-phase2-wall-time-ms: 930,456

@cryptonemo
Copy link
Collaborator Author

The latest commit is not pretty, but reduces the added 'double conversion' when CPU tree building. Times are:

Run 1:
seal-pre-commit-phase2-cpu-time-ms:15,408,242
seal-pre-commit-phase2-wall-time-ms: 918,553

Run 2:
seal-pre-commit-phase2-cpu-time-ms: 15,495,886
seal-pre-commit-phase2-wall-time-ms: 920,236

Which is comparable to the measured master performance. Need to re-test GPU also.

@cryptonemo cryptonemo force-pushed the empty-sector-update-tree-r-gpu branch from b936815 to 4a8cf99 Compare October 19, 2021 16:17
@cryptonemo
Copy link
Collaborator Author

Ooh nice, the latest GPU times improved:

seal-pre-commit-phase2-cpu-time-ms: 2,935,005
seal-pre-commit-phase2-wall-time-ms: 536,858

@cryptonemo cryptonemo force-pushed the empty-sector-update-tree-r-gpu branch 3 times, most recently from f4f247a to 875f063 Compare October 19, 2021 18:23
@cryptonemo cryptonemo force-pushed the empty-sector-update-tree-r-gpu branch 4 times, most recently from 0888200 to 2ef0f1f Compare October 20, 2021 18:53
@cryptonemo
Copy link
Collaborator Author

Odd, the failing cuda tests all pass locally on my setup :-(

@cryptonemo
Copy link
Collaborator Author

Odd, the failing cuda tests all pass locally on my setup :-(

Nevermind, I can reproduce now

@cryptonemo
Copy link
Collaborator Author

Odd, the failing cuda tests all pass locally on my setup :-(

Nevermind, I can reproduce now

I'm confused. Reproduced once, but can't again. Something is going on.

@cryptonemo
Copy link
Collaborator Author

Hrm, this rebased is hosed :-( Will see about getting it going better

@cryptonemo cryptonemo force-pushed the empty-sector-update-tree-r-gpu branch from ede86b4 to f8b6533 Compare October 26, 2021 17:02
@cryptonemo
Copy link
Collaborator Author

Ok still hosed, but getting there 😓

@cryptonemo cryptonemo force-pushed the empty-sector-update-tree-r-gpu branch from f8b6533 to 1f83fca Compare October 26, 2021 18:33
@DrPeterVanNostrand DrPeterVanNostrand force-pushed the empty-sector-update branch 2 times, most recently from a6dc94c to 9b21f76 Compare October 29, 2021 20:29
reduce conversions in encode (@dignifiedquire)
feat: reduce conversions for CPU tree building differently
feat: use an enum return type to avoid transmute (unsafe)
fix: properly use GPU if possible
@cryptonemo
Copy link
Collaborator Author

Final testing showed comparablly equal performance for CPU tree building and improved building for GPU.

@cryptonemo cryptonemo merged commit d8ecb48 into empty-sector-update Nov 12, 2021
@cryptonemo cryptonemo deleted the empty-sector-update-tree-r-gpu branch November 12, 2021 20:12
cryptonemo added a commit that referenced this pull request Nov 16, 2021
* feat: re-factor tree_r_last building for external re-use
* reduce conversions in encode (@dignifiedquire)
* feat: use an enum return type to avoid transmute (unsafe)
* fix: keep new crate version consistent with latest release
* fix: add doc and additional type check
* feat: wrap settings for tree builders in a method
cryptonemo added a commit that referenced this pull request Dec 2, 2021
* feat: re-factor tree_r_last building for external re-use
* reduce conversions in encode (@dignifiedquire)
* feat: use an enum return type to avoid transmute (unsafe)
* fix: keep new crate version consistent with latest release
* fix: add doc and additional type check
* feat: wrap settings for tree builders in a method
cryptonemo added a commit that referenced this pull request Dec 9, 2021
* feat: re-factor tree_r_last building for external re-use
* reduce conversions in encode (@dignifiedquire)
* feat: use an enum return type to avoid transmute (unsafe)
* fix: keep new crate version consistent with latest release
* fix: add doc and additional type check
* feat: wrap settings for tree builders in a method
cryptonemo added a commit that referenced this pull request Dec 13, 2021
* feat: encode/decode/remove_data API with tests

* feat: add Jake's latest circuit code
* feat: added some filecoin-proofs types for updated porep_config
* feat: add a generate update proof stub API call
* feat: start hooking up vanilla merkle proof generation for challenges
* feat: add gathering of apex leaves
* fix: updates required after rebasing against master
* feat: formalize the vanilla update and vanilla proof types
* feat: empty-sector-update ProofScheme and CompoundProof
* feat: update filecoin-proofs with latest changes
* fix: properly calculate phi using comm_r instead of comm_r_last
* refactor: remove lifetime parameter from EmptySectorUpdate struct
* feat: complete vanilla proving and verify through tests
* feat: remove t_aux from being required in vanilla encode
* feat: remove t_aux from vanilla PrivateInputs
* feat EmptySectorUpdateCompound tests
* fix: CompoundProof::circuit partition-index
* feat: expose some required data through filecoin-proofs
* feat: add prove and verify API interfaces
* feat: update benches with new porep_config fields
* feat: generate empty sector update parameters for testing
* feat: update paramcache for generating empty sector update params
* feat: bump parameter cache version to generate new params
* feat: properly wire in prove/verify for empty sector updates
* feat: add apex-por gadget tests
* fix: high bits should include partition-index
* style: rename 'blank' to 'empty'
* style: move por_no_challenge_input gadget to storage_proofs_core
* feat: precompute rhos when encoding/decoding
* feat: pack k and h_select into one circuit pub-input
* fix: apply more feedback
* feat: bump CI parameter cache version
* feat: clean-up p_aux/t_aux usage
* feat: output converted fr bytes directly
* docs: update incorrect comment
* refactor: add structs for EmptySectorUpdateCircuit's public and private inputs
* fix: address review comments
* fix: bump CI parameter cache version
* feat: test challenges against hardcoded vectors
* feat: update parameter cache version for empty sector updates
* feat: isolate testing of empty circuit tests to reduce RAM usage
* fix: update clippy settings due to isolated testing
* fix: updates required after rebase to master
* fix: move comm_c from being a public input and into the proof
* feat: use poseidon dst when generating randomness
* fix: update challenge test vectors
* feat: bump ci param-cache version
* feat: parallelize vanilla challenge-proof validation
* refactor: improve naming and comments
* Empty sector update tree r gpu (#1528)
* feat: re-factor tree_r_last building for external re-use
* reduce conversions in encode (@dignifiedquire)
* feat: use an enum return type to avoid transmute (unsafe)
* fix: keep new crate version consistent with latest release
* fix: add doc and additional type check
* feat: wrap settings for tree builders in a method
* fix: ensure data file is >= sector key file

This check was loosened since lotus may pad the data file

* feat: add larger commented ignored tests for sector updates
* fix: add partition arg in non-exposed single vanilla partition proof API
* feat: add and expose an API to prove sector updates with vanilla proofs
* fix: apply missed doc and error handling feedback
* feat: expose sector update partition proof type
* fix: add opencl/cuda feature flags to more storage update spots
* feat: consolidate CPU tree building method
* fix: use different method to build binary tree

From the lotus side, additional data may be stored after the nodes we
want in the tree, so using a different builder method will allow that
to happen without a panic

* feat: add optional data mapping of a specified length
* fix: use proper node count
* feat: persist p_aux and t_aux into cache dir after encoding/remove
* fix: ensure data is >= replica in remove data call
* Change H to 1024

Signed-off-by: Jakub Sztandera <kubuxu@protocol.ai>

* feat: try to split lifecycle upgrade tests from others on OpenCL GPU
* fix: remove typo
* docs: add official external audit result to the repo

Co-authored-by: DrPeterVanNostrand <jnz@riseup.net>
Co-authored-by: Jakub Sztandera <kubuxu@protocol.ai>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants