Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use p2p to fetch blocks and headers #1516

Closed
wants to merge 3 commits into from

Conversation

andrewtoth
Copy link
Contributor

@andrewtoth andrewtoth commented Feb 6, 2023

Uses the p2p interface to fetch block hashes and headers much more efficiently. Fetches headers 2k at a time, and fetches blocks in chunks of 10.

Syncs the first 200k or so blocks much faster, but haven't benchmarked the whole chain. I don't suspect it will be much more gain for the whole chain since most of the time will be spent indexing to the db, and blocks are already being fetched in the background. However, this is a much more efficient way of fetching this data.

Still have to fix tests because they don't support connecting via p2p yet.

Closes https://github.com/casey/ord/issues/1502.

@andrewtoth
Copy link
Contributor Author

andrewtoth commented Feb 6, 2023

Might fix issues related to https://github.com/casey/ord/issues/1377, since those look like the RPC threads are falling over from too many requests.

I haven't fully groked the indexing logic for fetching raw transactions, but if we are doing a lot of getrawtransaction RPC calls it might be worth using REST interface for that. Benchmarking a getrawtransaction RPC vs a REST binary request for a large tx shows REST request is about 2x faster. But, using REST interface does require setting rest=1 in the config.

@casey
Copy link
Collaborator

casey commented Feb 6, 2023

This is awesome! Keep in mind that https://github.com/casey/ord/issues/1364 will wind up making sync faster.

Some thoughts!

  1. I don't want to make things much more complicated, so keeping an initial PR as simple and focused as possible would be great.
  2. We need to make sure that they must use a trusted BTC node, since ord doesn't enforce consensus.
  3. All our tests run against our own mocked bitcoind, which is in test-bitcoincore-rpc. This is required to keep tests fast. Spinning up a bitcoind instances is slow, and tests are in our edit-test-debug cycle, so they must stay fast. As long as they stay fast and comprehensive, the method we use for testing doesn't matter, but one way to keep them fast would be to add P2P protocol support to test-bitcoincore-rpc. It only needs the minimum for the tests to work, so doesn't have to have comprehensive support for the full protocol.
  4. I didn't even know that bitcoind had a rest interface. Is it documented anywhere? How is it different from the JSON RPC interface? I love rest and prefer it over JSON RPC now that I've had experience with both.

@andrewtoth
Copy link
Contributor Author

andrewtoth commented Feb 6, 2023

Thanks @casey. This will make syncing faster even when skipping txs, since it fetches headers 2k at a time instead of incrementally.

  1. Sorry for the REST tangent, that can be a follow-up.
  2. Right. I suppose we can just use the same address as rpc and just specify the p2p port as an option.
  3. Ok, will give that a shot.
  4. REST is documented here. I have an implementation that we can pull in as a cargo crate here. But, that uses async/await so not sure if it will be useful in this project. REST is unauthenticated and doesn't have to go through the json parsing, so is much faster. It can also reply with raw binary data which is faster to serialize and smaller to send over the wire. But it also only shares public data, so we can't use it for things like the wallet. If rpc,rest,zmq: faster getblock, NotifyBlock and rest_block by reading raw block bitcoin/bitcoin#26415 was merged then I think that would be an even better candidate to fetch blocks than p2p, but the requirement to deserialize the block before serializing back into binary makes it slower than using the p2p endpoint right now. We can query it for utxo set data as well, but not sure what your ideas are for Re-enable skipping transactions below first inscription height #1364.

.first()
.unwrap()
.to_string(),
};
Copy link
Contributor Author

@andrewtoth andrewtoth Feb 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure what the best way to parse the url is. By default it is 127.0.0.1:8332/wallet/ord, which doesn't parse as a url, and in tests it is http://127.0.0.1:... which won't parse with a split(':'). Maybe regex?
Maybe pass --rpc-port as an option instead of --rpc-url? But that would be a breaking change.

@andrewtoth
Copy link
Contributor Author

@casey I think this is ready for a closer look.

@casey
Copy link
Collaborator

casey commented Feb 8, 2023

@andrewtoth Awesome, thank you! Putting it on the project backlog so it doesn't fall off our radar.

@jpistone
Copy link

jpistone commented Feb 9, 2023

WOW thank you! I was ticking <1 block per second. Found this and pulled your changes, now at ~50-100 per second. This should definitely be a top priority as most questions on the discord are about this right now (3 days - 2 weeks to index is rough).

Quick note: commit seems to happen every 5k blocks & takes about 3 minutes on my machine (I believe that's what was causing the json-rpc errors before this patch). This may confuse users when they see the block count hang for some time - I killed & restarted indexing myself a few times thinking that it had frozen.

image

@dude513
Copy link

dude513 commented Feb 12, 2023

After 16 hours the index is at block 420001 and moving very slowly, no errors/crashes yet.

I am on p2p-indexing branch git cloned from andrewtoth/ord. @andrewtoth

@andrewtoth
Copy link
Contributor Author

This has got a lot of successful testing from users in #1648.

@dude513
Copy link

dude513 commented Feb 12, 2023

This has got a lot of successful testing from users in #1648.

With rest=1 and on a speedup-improvements branch I got 99% of all blocks in a few minutes.
I guess last 9000 blocks will be the most satisfying. :)

@andrewtoth
Copy link
Contributor Author

One user had set maxuploadtarget to something too low in their bitcoin.conf, and so this disallowed the p2p interface from serving everything it needs to. We should advise users to add whitebind=127.0.0.1:8333 to their config files.

@veryordinally
Copy link
Collaborator

@casey We should definitely consider this, did an initial review and test.

Copy link
Collaborator

@veryordinally veryordinally left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed conceptually and high-level flow and logic - looks good to me and ready for more detailed review.

@casey
Copy link
Collaborator

casey commented Feb 15, 2023

I'm sorry for taking so long to collect my thoughts on this! Bitcoin's RPC and P2P interfaces are very different. The P2P interfaces is an untrusted interface used between full nodes on the bitcoin network. The RPC interfaces is an authenticated interface between a client and a trusted node. Using the P2P interface for applications where the node must be trusted is dicey. You can use it to directly connect to a trusted node, but you can also use it to connect to a totally untrusted node. I think that expressing this subtle difference to our users would be hard, and that lots of people, because running a full node is challenging, would try to connect to some random node, and have their security degraded as a result.

Also #1364 fixes the same issue as this would, namely, speeding up initial sync, and is a very clean and low-impact chain that doesn't introduce another connection to manage. So, I would like to see that implemented first, before deciding if also using P2P or REST is a good idea.

Also, keeping ord simple to configure is important (The degens are dying out there 😆) and every new outgoing connection introduces code, configuration, and maintainence complexity.

Sorry for not figuring this out sooner, it's only been in the last few days that I've been able to sit down and review PRs in detail, and previously I didn't give it enough attention.

Let's make sure we nail down the following in an issue before anyone writes a big PR:

  • What is the problem we're trying to solve or the benefit we're trying to eliminate?
  • What is the best design?
  • What is the testing strategy?

Sorry to close! We're still figuring out this whole massive open source project thing ourselves!

@casey casey closed this Feb 15, 2023
@andrewtoth andrewtoth deleted the p2p-indexing branch February 15, 2023 19:55
@andrewtoth
Copy link
Contributor Author

@casey disappointing conclusion. I disagree with your reasoning.

This requires no other configuration options. It works out of the box. It's been tested by many users already without problems. The only configuration issue was someone who had maxuploadtarget already set, which is obviously an advanced configuration option.
This patch would require using the same node as the RPC connection. It cannot be used to connect to a different node because only the port is configurable, not the host.
It is much more efficient for all the reasons stated in the initial issue #1502 . Blocks are fetched directly from disk without serialization steps, and can be requested in batches. It puts less stress on bitcoind and is faster.
Without this change, #1364 would be making RPC requests on multiple threads. This can overload the RPC workqueue and cause disconnections.
You can see the issues with fetching blocks via RPC in the updater code. There is a lot of code that has retries, and users are complaining that it aborts after retrying forever.

@andrewtoth andrewtoth restored the p2p-indexing branch February 16, 2023 01:39
@kalibox
Copy link

kalibox commented Feb 18, 2023

Does anyone know how much space ord index files take and where they're located?

This was referenced Oct 30, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Use p2p for fetching blocks from bitcoind instead of rpc
6 participants