Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Execution failed block=17687644 nonce too high #7892

Closed
gwtp opened this issue Jul 14, 2023 · 5 comments
Closed

Execution failed block=17687644 nonce too high #7892

gwtp opened this issue Jul 14, 2023 · 5 comments

Comments

@gwtp
Copy link

gwtp commented Jul 14, 2023

System information

Erigon version: $ ./erigon --version erigon version 2.48.0-stable-084acc1a

OS & Version: Windows/Linux/OSX

$ lsb_release -a No LSB modules are available. Distributor ID: Ubuntu Description: Ubuntu 20.04.6 LTS Release: 20.04 Codename: focal

Commit hash:

Erigon Command (with flags/config):
/usr/local/bin/erigon --datadir /var/lib/erigon --chain mainnet --metrics --pprof --prune htc --prune.r.before=11052984 --authrpc.jwtsecret=/secrets/jwtsecret
Concensus Layer:
teku --version teku/v23.6.1/linux-x86_64/-privatebuild-openjdk64bitservervm-java-17

Concensus Layer Command (with flags/config):

/usr/local/bin/teku/bin/teku -c /etc/teku/teku.yaml

network: "mainnet"
initial-state: "https://2ELeBcG9SHjE7mAOryNPxYGiGAj:c672ee768e02dbec29343bb088e315c3@eth2-beacon-mainnet.infura.io"

# validators
validator-keys: "/var/lib/teku/validator_keys:/var/lib/teku/validator_keys"
#validators-graffiti: "<MY_GRAFFITI>"

# execution engine
ee-endpoint: http://localhost:8551
ee-jwt-secret-file: "/secrets/jwtsecret"

# fee recipient
validators-proposer-default-fee-recipient: <REMOVED>

# metrics
metrics-enabled: true
metrics-port: 8008

# database
data-path: "/var/lib/teku"
data-storage-mode: "prune"

# mev-boost
validators-builder-registration-default-enabled: true
builder-endpoint: http://127.0.0.1:18550

# get_validator_duties.py
rest-api-enabled: true

Chain/Network: Mainnet

Actual behaviour

Jul 14 12:51:02 erigon[5088]: [INFO] [07-14|12:51:02.401] [5/15 Bodies] Processing bodies...       from=17687643 to=17688849
Jul 14 12:51:02 erigon[5088]: [INFO] [07-14|12:51:02.789] [5/15 Bodies] Processed                  highest=17688849
Jul 14 12:51:02 erigon[5088]: [INFO] [07-14|12:51:02.789] [6/15 Senders] Started                   from=17687643 to=17688849
Jul 14 12:51:02 erigon[5088]: [INFO] [07-14|12:51:02.798] [7/15 Execution] Blocks execution        from=17687643 to=17688849
Jul 14 12:51:02 erigon[5088]: [WARN] [07-14|12:51:02.799] [7/15 Execution] Execution failed        block=17687644 hash=0x8cc219f272ce8609d4644acbbcc2a859cba4f565d4e37301b76875d92fcb976a err="could not apply tx 3 from block 17687644 [0x4f43f0f0dbf7653f2521641f7b54f9d213ebf6ab0a4d5e9e473076eb0766880c]: nonce too high: address 0xE634668Ed6e4bFEeC5E9eA850eA795d675CAE0C4, tx: 150 state: 0"
Jul 14 12:51:02 erigon[5088]: [INFO] [07-14|12:51:02.799] UnwindTo                                 block=17687643 bad_block_hash=0x8cc219f272ce8609d4644acbbcc2a859cba4f565d4e37301b76875d92fcb976a
Jul 14 12:51:02 erigon[5088]: [INFO] [07-14|12:51:02.799] [7/15 Execution] Completed on            block=17687643
Jul 14 12:51:02 erigon[5088]: [INFO] [07-14|12:51:02.817] Timings (slower than 50ms)               Headers=82ms Bodies=387ms
Jul 14 12:51:02 erigon[5088]: [INFO] [07-14|12:51:02.817] Tables                                   PlainState=97.0GB AccountChangeSet=1.4GB StorageChangeSet=2.5GB BlockTransaction=11.9GB TransactionLog=438.2GB FreeList=91.6MB ReclaimableSpace=91.6GB
Jul 14 12:51:02 erigon[5088]: [INFO] [07-14|12:51:02.817] [2/15 Headers] Waiting for Consensus Layer...
@AskAlexSharov
Copy link
Collaborator

Try
integration state_stages —unwind=100
integration stage_headers —unwind=100
start erigon

@gwtp
Copy link
Author

gwtp commented Jul 14, 2023

I tried this but seems to be stuck on the same block still, any other suggestions?

@AskAlexSharov
Copy link
Collaborator

try 1000 (but don't go for more)

@gwtp
Copy link
Author

gwtp commented Jul 15, 2023

Did 1000 then hit this error:

$ ./integration state_stages --unwind=1000 --datadir=/var/lib/erigon --chain=mainnet
INFO[07-15|09:03:01.891] logging to file system                   log dir=/var/lib/erigon/logs file prefix=integration log level=info json=false
INFO[07-15|09:03:03.649] [snapshots] Blocks Stat                  blocks=17597k indices=17597k alloc=2.1GB sys=2.4GB
INFO[07-15|09:03:03.777] Disk storage enabled for ethash DAGs     dir=/root/.local/share/erigon-ethash count=2
INFO[07-15|09:03:04.156] UnwindTo                                 block=17686643 bad_block_hash=0x0000000000000000000000000000000000000000000000000000000000000000
INFO[07-15|09:05:38.954] [10/15 CallTraces] Unwind                from=17687643 to=17686643


INFO[07-15|09:06:29.247] [8/15 HashState] Unwinding started       from=17687643 to=17686643 storage=false codes=true
INFO[07-15|09:06:36.589] [8/15 HashState] Unwinding started       from=17687643 to=17686643 storage=false codes=false
INFO[07-15|09:07:07.002] [8/15 HashState] ETL [2/2] Loading       into=HashedAccount progress=92
INFO[07-15|09:07:13.208] [8/15 HashState] Unwinding started       from=17687643 to=17686643 storage=true codes=false
INFO[07-15|09:07:43.770] [8/15 HashState] ETL [2/2] Loading       into=HashedStorage progress=79
INFO[07-15|09:08:01.609] [8/15 HashState] Unwind done             in=1m32.362539943s
INFO[07-15|09:08:01.610] [9/15 IntermediateHashes] Unwinding      from=17687643 to=17686643 csbucket=AccountChangeSet
EROR[07-15|09:08:23.130] [9/15 IntermediateHashes] mdbx_cursor_get: MDBX_CORRUPTED: Maybe free space is over on disk. Otherwise it's hardware failure. Before creating issue please use tools like https://www.memtest86.com to test RAM and tools like https://www.smartmontools.org to test Disk. To handle hardware risks: use ECC RAM, use RAID of disks, run multiple application instances (or do backups). If hardware checks passed - check FS settings - 'fsync' and 'flock' must be enabled.  Otherwise - please create issue in Application repo. On default DURABLE mode, power outage can't cause this error. On other modes - power outage may break last transaction and mdbx_chk can recover db in this case, see '-t' and '-0|1|2' options. 

Currently running:

$ sudo ./mdbx_chk -vv -w /var/lib/erigon/chaindata/

Say if there is a certain section that is corrupted, is it possible just to delete that and resync or need to resync from scratch? I suspect the corruption is caused by bad NVME. New one is on the way. I would like to know if I can recover though without fully resyncing. Will provide output once complete.

@gwtp
Copy link
Author

gwtp commented Jul 15, 2023

Here is a dump of some of the messages I'm seeing in mdbx_chk for brevity:

$ ./mdbx_chk -vv -w /var/lib/erigon/chaindata/
mdbx_chk v0.12.0-71-g1cac6536 (2022-07-28T09:57:31+07:00, T-9a6d7e5b917e5fbd14dc51835fa749d092aa1d72)
Running for /var/lib/erigon/chaindata/ in 'read-write' mode...
   open-MADV_DONTNEED 221941554..222298112
   readahead OFF 0..221941554
 - monopolistic mode
 - current boot-id baeaf43830f25049-993ef9fb7b2a4e4d
 - pagesize 4096 (4096 system), max keysize 1980..2022, max readers 116
 - mapsize 3298534883328 (3.00 Tb)
 - dynamic datafile: 24576 (24.00 Kb) .. 3298534883328 (3.00 Tb), +2147483648 (2.00 Gb), -4294967296 (4.00 Gb)
 - current datafile: 910533066752 (848.00 Gb), 222298112 pages
 - meta-0: weak-intact (same boot-id) txn#2780767, tail
 - meta-1: steady txn#2780768, head
 - meta-2: steady txn#2772422, stay
 - performs check for meta-pages clashes
 - performs full check recent-txn-id with meta-pages
 - transactions: recent 2780768, latter reader 2780768, lag 0
Traversal b-tree by txn#2780768...
 - found 'AccountChangeSet' area
 - found 'AccountHistory' area
 - found 'BadHeaderNumber' area
 - found 'BlockBody' area
 - found 'BlockTransaction' area
 - found 'BlockTransactionLookup' area
 - found 'CallFromIndex' area
 - found 'CallToIndex' area
 - found 'CallTraceSet' area
 - found 'CanonicalHeader' area
 - found 'Code' area
 - found 'Config' area
 - found 'CumulativeGasIndex' area
 - found 'DbInfo' area
 - found 'HashedAccount' area
 - found 'HashedCodeHash' area
 - found 'HashedStorage' area
 - found 'Header' area
 - found 'HeaderNumber' area
 - found 'HeadersTotalDifficulty' area
 - found 'IncarnationMap' area
 - found 'Issuance' area
 - found 'LastBlock' area
 - found 'LastForkchoice' area
 - found 'LastHeader' area
 - found 'LogAddressIndex' area
 - found 'LogTopicIndex' area
 - found 'Migration' area
 - found 'PlainCodeHash' area
 - found 'PlainState' area
 ! corrupted leaf-page #176686839, mod-txnid 1890650
 ! node-data size (3) <> min/max value-length (0/0)
 ! node-data size (3) <> min/max value-length (0/0)
 ! node-data size (10) <> min/max value-length (0/0)
 ! node-data size (38) <> min/max value-length (0/0)

! node-data size (3) <> min/max value-length (0/0)
     page #168516760: invalid/corrupted (leaf-page)
 ! corrupted branch-page #160022920, mod-txnid 2766537
 ! invalid page' txnid (2766537) for parent-page' txnid (2690581)
     page #160022920: invalid/corrupted (branch-page)
 ! corrupted leaf-page #178622440, mod-txnid 2477976
 ! node-data size (3) <> min/max value-length (0/0)

 ! node-data size (70) <> min/max value-length (0/0)
     page #158976703: invalid/corrupted (leaf-page)
 ! corrupted leaf-page #152782564, mod-txnid 2738533
 ! invalid page' txnid (2738533) for parent-page' txnid (2691555)
     page #152782564: already used (leaf-page: by CanonicalHeader, deep 9)
 ! corrupted large-page #133493736, mod-txnid 2764248
 ! unexpected large/overlow instead of branch/leaf/leaf2 (4)
     page #133493736: invalid/corrupted (large-page)
 ! corrupted leaf-page #133454652, mod-txnid 2586785
...

 - found 'Receipt' area
 - found 'Sequence' area
 - found 'StorageChangeSet' area
 - found 'StorageHistory' area
 - found 'SyncStage' area
 - found 'TransactionLog' area
 - found 'TrieAccount' area
     page #143098620: already used (leaf-page: by PlainState, deep 5)
     page #132547261: already used (leaf-page: by PlainState, deep 5)
     page #143359677: already used (leaf-page: by PlainState, deep 5)
     page #202613017: already used (leaf-page: by PlainState, deep 5)
...

 - found 'TrieStorage' area
 - found 'TxSender' area
 - problems: loop (1), already used (510), invalid/corrupted (369)
 - pages: walked 196966602, left/unused 24975463
     @GC: subtotal 25161, branch 5, large 24461, leaf 693
     @MAIN: subtotal 3, branch 1, leaf 2
     @META: subtotal 3
     AccountChangeSet: subtotal 377255, branch 88993, leaf 288262
     AccountHistory: subtotal 302274, branch 4730, leaf 297544
     BadHeaderNumber: subtotal 131, branch 3, leaf 128
     BlockBody: subtotal 16513, branch 206, leaf 16307
     BlockTransaction: subtotal 3341913, branch 5284, large 348138, leaf 1183207
     BlockTransactionLookup: subtotal 296665, branch 4617, leaf 292048
     CallFromIndex: subtotal 750996, branch 12962, leaf 738034
     CallToIndex: subtotal 683640, branch 11789, leaf 671851
     CallTraceSet: subtotal 374821, branch 88745, leaf 286076
     CanonicalHeader: subtotal 220449, branch 991, leaf 219458
     Code: subtotal 2047854, branch 1456, large 730616, leaf 91291
     Config: subtotal 1, leaf 1
     CumulativeGasIndex: subtotal 3777, branch 18, leaf 3759
     DbInfo: subtotal 3, large 1, leaf 1
     HashedAccount: subtotal 5474052, branch 66339, leaf 5407713
     HashedCodeHash: subtotal 2163295, branch 34080, leaf 2129215
     HashedStorage: subtotal 17675175, branch 524487, leaf 17150688
     Header: subtotal 16520, branch 206, leaf 16314
     HeaderNumber: subtotal 418820, branch 4442, leaf 414378
     HeadersTotalDifficulty: subtotal 272906, branch 3371, leaf 269535
     IncarnationMap: subtotal 306086, branch 3266, leaf 302820
     Issuance: subtotal 1, leaf 1
     LastBlock: subtotal 1, leaf 1
     LastForkchoice: subtotal 1, leaf 1
     LastHeader: subtotal 1, leaf 1
     LogAddressIndex: subtotal 482085, branch 6259, leaf 475826
     LogTopicIndex: subtotal 5368868, branch 73268, leaf 5295600
     Migration: subtotal 1, leaf 1
     PlainCodeHash: subtotal 1448534, branch 18867, leaf 1429667
     PlainState: subtotal 25431961, branch 669071, large 6, leaf 24763006
     Receipt: subtotal 3503277, branch 7929, large 1610538, leaf 1775683
     Sequence: subtotal 1, leaf 1
     StorageChangeSet: subtotal 655218, branch 19249, leaf 635969
     StorageHistory: subtotal 5049139, branch 155599, leaf 4893540
     SyncStage: subtotal 1, leaf 1
     TransactionLog: subtotal 114873531, branch 453732, large 21037659, leaf 83029741
     TrieAccount: subtotal 792169, branch 5295, leaf 787263
     TrieStorage: subtotal 4492872, branch 82140, leaf 4410732
     TxSender: subtotal 100117, branch 96, large 84948, leaf 7531
 - usage: total 806775201792 bytes, payload 634573566206 (78.7%), unused 172201635586 (21.3%)
 - summary: average fill 78.7%, 880 problems
Skip processing @MAIN since tree is corrupted (880 problems)
 ! abort processing '@GC' due to a previous error
 - space: 805306368 total pages, backed 222298112 (27.6%), allocated 221941554 (27.6%), remained 583364814 (72.4%), used 196966602 (24.5%), gc 0 (0.0%), detained 0 (0.0%), reclaimable 0 (0.0%), available 583364814 (72.4%)
Total 7571 errors are detected, elapsed 26816.522 seconds.


AskAlexSharov pushed a commit that referenced this issue Jul 28, 2023
Adds `clear_bad_blocks` command to integration tool. This command allows
to re-process blocks that were erroneously marked as bad.

Command just clears `BadHeaderNumber` table. It can be safer in some
cases than
```
./integration state_stages —unwind=<some_number>
./integration stage_headers —unwind=<some_number>
```
and can be used in the cases like this one
#7892

Command syntax:
```
./integration clear_bad_blocks --datadir=<datadir>
```
@gwtp gwtp closed this as completed Jul 29, 2023
AskAlexSharov pushed a commit that referenced this issue Sep 6, 2023
Adds `clear_bad_blocks` command to integration tool. This command allows
to re-process blocks that were erroneously marked as bad.

Command just clears `BadHeaderNumber` table. It can be safer in some
cases than
```
./integration state_stages —unwind=<some_number>
./integration stage_headers —unwind=<some_number>
```
and can be used in the cases like this one
#7892

Command syntax:
```
./integration clear_bad_blocks --datadir=<datadir>
```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants