Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

a sanity check failed of window post haven't catch. #1308

Closed
metagates-dev opened this issue Oct 7, 2020 · 6 comments
Closed

a sanity check failed of window post haven't catch. #1308

metagates-dev opened this issue Oct 7, 2020 · 6 comments

Comments

@metagates-dev
Copy link

Description

I have a window post deadline has failed cause by sanity check failed, Maybe that the return program missed processing.

It expected return the the failed sector, but it doesn't.

Acceptance criteria

Return the fault sector, but it is not.

Risks + pitfalls

Unknow

Where to begin

tag: filecoin-proofs-v5.2.0

code by here?

error!("comm_c != comm_r_last: {:?}", sector_id);

the failed program logs:

2020-10-04T16:16:31.441-0400    ^[[34mINFO^[[0m main    lotus-hlm-worker/sub.go:108     New task: s-t03176-1601794562237240821_1000, sector s-t03176-1601794562237240821, action: 1000
2020-10-04T16:16:31.441-0400    ^[[34mINFO^[[0m main    lotus-hlm-worker/rpc_server.go:93       GenerateWindowPoSt RPC in:3176
2020-10-04T16:16:31.452-0400    ^[[34mINFO^[[0m ffiwrapper      ffiwrapper/remote.go:443        DEBUG:GenerateWindowPoSt in(remote:false),3176-1601794813851402319
2020-10-04T16:16:31.452-0400    ^[[34mINFO^[[0m ffiwrapper      ffiwrapper/faults.go:36 Manager.CheckProvable in, len:6389
2020-10-04T16:16:33.155-0400    ^[[34mINFO^[[0m ffiwrapper      ffiwrapper/faults.go:128        Manager.CheckProvable out, len:6389
2020-10-04T16:16:33.204 INFO filcrypto::proofs::api > generate_window_post: start
2020-10-04T16:16:35.068 INFO filecoin_proofs::api::post > generate_window_post:start
2020-10-04T16:16:35.068 INFO filecoin_proofs::caches > trying parameters memory cache for: Window_POST[34359738368]
2020-10-04T16:16:35.068 INFO filecoin_proofs::caches > found params in memory cache for Window_POST[34359738368]
2020-10-04T16:16:35.068 INFO filecoin_proofs::api::post > generate mekle_tree start
2020-10-04T16:16:35.856 INFO filecoin_proofs::api::post > generate mekle_tree end
2020-10-04T16:16:35.856 INFO filecoin_proofs::api::post > prepare sectors start
2020-10-04T16:16:35.858 INFO filecoin_proofs::api::post > prepare sectors end
2020-10-04T16:16:35.858 INFO filecoin_proofs::api::post > FallbackPoStCompound prove start
2020-10-04T16:16:35.858 INFO storage_proofs_core::compound_proof > vanilla_proofs:start
2020-10-04T16:16:35.858 INFO storage_proofs_post::fallback::vanilla > proving partition 0
2020-10-04T16:16:38.223 INFO storage_proofs_post::fallback::vanilla > proving partition 1
2020-10-04T16:16:53.738 INFO storage_proofs_post::fallback::vanilla > proving partition 2
2020-10-04T16:17:03.249 INFO storage_proofs_core::compound_proof > vanilla_proofs:finish
2020-10-04T16:17:03.249 INFO storage_proofs_core::compound_proof > sanity_check:start
2020-10-04T16:17:03.249 INFO storage_proofs_post::fallback::vanilla > verify_all_partitions start
2020-10-04T16:17:42.275 ERROR storage_proofs_post::fallback::vanilla > comm_c != comm_r_last: SectorId(270169) ### failed by this sector, and no fault sector return.###
2020-10-04T16:17:42.275 INFO storage_proofs_core::compound_proof > sanity_check:finish
2020-10-04T16:17:42.522 INFO filcrypto::proofs::api > generate_window_post: finish
2020-10-04T16:17:42.524-0400    ^[[34mINFO^[[0m ffiwrapper      ffiwrapper/remote.go:447        DEBUG:GenerateWindowPoSt out,3176-1601794813851402319
2020-10-04T16:17:42.524-0400    ^[[33mWARN^[[0m main    lotus-hlm-worker/rpc_server.go:106      ignore len:0
2020-10-04T16:17:42.524-0400    ^[[34mINFO^[[0m main    lotus-hlm-worker/rpc_server.go:108      GenerateWindowPoSt RPC out:3176
2020-10-04T16:17:42.524-0400    ^[[34mINFO^[[0m main    lotus-hlm-worker/sub.go:423     Get Node Api
2020-10-04T16:17:42.524-0400    ^[[34mINFO^[[0m main    lotus-hlm-worker/sub.go:429     Do WorkerDone
2020-10-04T16:17:42.525-0400    ^[[34mINFO^[[0m main    lotus-hlm-worker/sub.go:132     Task s-t03176-1601794562237240821_1000 done, err: sanity check failed
github.com/filecoin-project/filecoin-ffi.GenerateWindowPoSt
        /root/go/src/github.com/filecoin-project/lotus/extern/filecoin-ffi/proofs.go:586
github.com/filecoin-project/lotus/extern/sector-storage/ffiwrapper.(*Sealer).generateWindowPoSt
        /root/go/src/github.com/filecoin-project/lotus/extern/sector-storage/ffiwrapper/verifier_cgo.go:50
github.com/filecoin-project/lotus/extern/sector-storage/ffiwrapper.(*Sealer).GenerateWindowPoSt
        /root/go/src/github.com/filecoin-project/lotus/extern/sector-storage/ffiwrapper/remote.go:447
main.(*rpcServer).GenerateWindowPoSt
        /root/go/src/github.com/filecoin-project/lotus/cmd/lotus-hlm-worker/rpc_server.go:104
main.(*worker).processTask
        /root/go/src/github.com/filecoin-project/lotus/cmd/lotus-hlm-worker/sub.go:464
main.acceptJobs.func1
        /root/go/src/github.com/filecoin-project/lotus/cmd/lotus-hlm-worker/sub.go:129
runtime.goexit
        /usr/local/go/src/runtime/asm_amd64.s:1373

the expected program logs like:

2020-10-04T16:18:17.807-0400    ^[[34mINFO^[[0m main    lotus-hlm-worker/sub.go:108     New task: s-t03176-1601794562237240822_1000, sector s-t03176-1601794562237240822, action: 1000
2020-10-04T16:18:17.807-0400    ^[[34mINFO^[[0m main    lotus-hlm-worker/rpc_server.go:93       GenerateWindowPoSt RPC in:3176
2020-10-04T16:18:17.819-0400    ^[[34mINFO^[[0m ffiwrapper      ffiwrapper/remote.go:443        DEBUG:GenerateWindowPoSt in(remote:false),3176-1601794813851402319
2020-10-04T16:18:17.823-0400    ^[[34mINFO^[[0m ffiwrapper      ffiwrapper/faults.go:36 Manager.CheckProvable in, len:6389
2020-10-04T16:18:19.452-0400    ^[[34mINFO^[[0m ffiwrapper      ffiwrapper/faults.go:128        Manager.CheckProvable out, len:6389
2020-10-04T16:18:19.497 INFO filcrypto::proofs::api > generate_window_post: start
2020-10-04T16:18:21.276 INFO filecoin_proofs::api::post > generate_window_post:start
2020-10-04T16:18:21.277 INFO filecoin_proofs::caches > trying parameters memory cache for: Window_POST[34359738368]
2020-10-04T16:18:21.277 INFO filecoin_proofs::caches > found params in memory cache for Window_POST[34359738368]
2020-10-04T16:18:21.277 INFO filecoin_proofs::api::post > generate mekle_tree start
2020-10-04T16:18:22.082 INFO filecoin_proofs::api::post > generate mekle_tree end
2020-10-04T16:18:22.082 INFO filecoin_proofs::api::post > prepare sectors start
2020-10-04T16:18:22.084 INFO filecoin_proofs::api::post > prepare sectors end
2020-10-04T16:18:22.084 INFO filecoin_proofs::api::post > FallbackPoStCompound prove start
2020-10-04T16:18:22.084 INFO storage_proofs_core::compound_proof > vanilla_proofs:start
2020-10-04T16:18:22.084 INFO storage_proofs_post::fallback::vanilla > proving partition 0
2020-10-04T16:18:24.466 INFO storage_proofs_post::fallback::vanilla > proving partition 1
2020-10-04T16:18:24.570 ERROR storage_proofs_post::fallback::vanilla > faulty sector: SectorId(141812) ### this failed sector has been return. it's in expected.###
2020-10-04T16:18:24.570 ERROR storage_proofs_post::fallback::vanilla > faulty sector: SectorId(141812)
2020-10-04T16:18:26.866 INFO storage_proofs_post::fallback::vanilla > proving partition 2
2020-10-04T16:18:28.894 INFO filcrypto::proofs::api > generate_window_post: finish
2020-10-04T16:18:28.896-0400    ^[[34mINFO^[[0m ffiwrapper      ffiwrapper/remote.go:447        DEBUG:GenerateWindowPoSt out,3176-1601794813851402319
2020-10-04T16:18:28.896-0400    ^[[33mWARN^[[0m main    lotus-hlm-worker/rpc_server.go:106      ignore len:1
2020-10-04T16:18:28.896-0400    ^[[34mINFO^[[0m main    lotus-hlm-worker/rpc_server.go:108      GenerateWindowPoSt RPC out:3176
2020-10-04T16:18:28.896-0400    ^[[34mINFO^[[0m main    lotus-hlm-worker/sub.go:423     Get Node Api
2020-10-04T16:18:28.897-0400    ^[[34mINFO^[[0m main    lotus-hlm-worker/sub.go:132     Task s-t03176-1601794562237240822_1000 done, err: faulty sectors [SectorId(141812)]
github.com/filecoin-project/filecoin-ffi.GenerateWindowPoSt
        /root/go/src/github.com/filecoin-project/lotus/extern/filecoin-ffi/proofs.go:586
github.com/filecoin-project/lotus/extern/sector-storage/ffiwrapper.(*Sealer).generateWindowPoSt
        /root/go/src/github.com/filecoin-project/lotus/extern/sector-storage/ffiwrapper/verifier_cgo.go:50
github.com/filecoin-project/lotus/extern/sector-storage/ffiwrapper.(*Sealer).GenerateWindowPoSt
        /root/go/src/github.com/filecoin-project/lotus/extern/sector-storage/ffiwrapper/remote.go:447
main.(*rpcServer).GenerateWindowPoSt
        /root/go/src/github.com/filecoin-project/lotus/cmd/lotus-hlm-worker/rpc_server.go:104
main.(*worker).processTask
        /root/go/src/github.com/filecoin-project/lotus/cmd/lotus-hlm-worker/sub.go:464
main.acceptJobs.func1
        /root/go/src/github.com/filecoin-project/lotus/cmd/lotus-hlm-worker/sub.go:129
runtime.goexit
        /usr/local/go/src/runtime/asm_amd64.s:1373
@porcuquine
Copy link
Collaborator

porcuquine commented Oct 12, 2020

See more detail at #1311 (which may be the same issue as this). I proposed a solution there, which should resolve this issue as well. I think at one point @vmx was going to address this or something related by adding more logging. I think the proposed solution of ensuring these errors result in faulty sectors is preferable.

UPDATE: @vmx, I see now that you did add the logging. See my following comment for a refinement of its content.

@porcuquine
Copy link
Collaborator

As an aside, I see that this error is being logged: comm_c != comm_r last:. This is an inaccurate error message. comm_c and comm_r_last are not expected to be equal, and their inequality is not in fact the condition which triggers the message. We are actually checking that H(comm_c || comm_r_last) == comm_r, and the message on failure should reflect that.

@porcuquine
Copy link
Collaborator

Since #1311 may not be the same issue, I will note here that this fix is to put the comm_r check in with the inclusion-proof check when screening for faulty sectors.

vmx added a commit that referenced this issue Oct 12, 2020
The log message was wrong. It's about the hash of the concatenated
`comm_c` and `comm_r_last`.

This was brought up at
#1308 (comment)
vmx added a commit that referenced this issue Oct 13, 2020
The log message was wrong. It's about the hash of the concatenated
`comm_c` and `comm_r_last`.

This was brought up at
#1308 (comment)
vmx added a commit that referenced this issue Oct 13, 2020
The log message was wrong. It's about the hash of the concatenated
`comm_c` and `comm_r_last`.

This was brought up at
#1308 (comment)
@porcuquine
Copy link
Collaborator

I'm working on #1284, so I'll probably just add this check — since I'm in that code anyway.

@porcuquine
Copy link
Collaborator

Since it seems like #1284 is still going to be delayed a while, it's probably worth moving just the fix to catch bad p_aux/comm_c/comm_r_last and report as faulty sector into the next release.

@cryptonemo
Copy link
Collaborator

Resolved.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants