Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incremental solvable orders cache update #2923

Merged
merged 110 commits into from
Sep 5, 2024
Merged

Conversation

squadgazzz
Copy link
Contributor

@squadgazzz squadgazzz commented Aug 23, 2024

Description

Updating solvable orders (ie creating a new auction) currently takes >2s with some pretty heavy outliers (logs)

This makes it hard to bring CoW protocol's auction rate down to one batch per block as simply creating up to date state would take >15% of the total time we have at hand. We should at least be able to half this time (if not getting it down even more)

In order to relieve the situation, it was proposed to introduce incremental solvable orders cache update, which selects all the solvable orders using the old heavy query only at startup, stores the latest received order's creation timestamp in memory, and then makes much faster incremental bounded queries to the orders and additional tables that select fewer data and executes faster.

Changes

Since incremental fetching retrieves orders created/cancelled after the specific timestamps, it is also required now to fetch orders that have any onchain update based on the last fetched block number. Having said that, the data needs to be fetched within a single TX, so there is no way to run all the queries in parallel.

  1. If the current solvable orders cache is empty, execute the original heavy SQL query to fetch all current solvable orders and store them in memory.
  2. Otherwise, fetch full orders that created or cancelled after the last stored timestamp and also find UIDs of the order that have any onchain data updated after the latest observed block number. This includes fetching data from the following tables: trades, ethflow_data, order_execution, invalidations, onchain_order_invalidations, onchain_placed_orders, presignature_events.
  3. Fetch quotes for all the collected orders.
  4. Add all the newly received orders to the cache.
  5. Filter out all the orders that are one of: contain on-chain errors, expired, fulfilled, invalidated.
  6. Calculate the latest observed order creation timestamp.
  7. Continue with the regular auction creation process.

As a result, we now have 3 SQL queries where each executes in ~50ms instead of a single one taking ~2s.

How to test

New DB tests. Existing e2e tests.

Related Issues

Fixes #2831

Copy link

github-actions bot commented Aug 23, 2024

Reminder: Please update the DB Readme.


Caused by:

crates/autopilot/src/solvable_orders.rs Outdated Show resolved Hide resolved
crates/autopilot/src/infra/persistence/mod.rs Outdated Show resolved Hide resolved
crates/autopilot/src/solvable_orders.rs Outdated Show resolved Hide resolved
crates/autopilot/src/solvable_orders.rs Outdated Show resolved Hide resolved
Comment on lines +444 to +445
// Fetch quotes for new orders and also update them for the cached ones since
// they could also be updated.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is also needed because ethflow orders could theoretically be reorged?
We should really find a way to implement ethflow orders in a nicer way to remove all those ugly edge cases. :/

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is also needed because ethflow orders could theoretically be reorged?

Because of this code:

// We only need to insert quotes for orders that will be included in an
// auction (they are needed to compute solver rewards). If placement
// failed, then the quote is not needed.
insert_quotes(
transaction,
quotes
.clone()
.into_iter()
.flatten()
.collect::<Vec<_>>()
.as_slice(),
)
.await
.context("appending quotes for onchain orders failed")?;

@@ -168,31 +172,84 @@ impl SolvableOrdersCache {
/// the case in unit tests, then concurrent calls might overwrite each
/// other's results.
pub async fn update(&self, block: u64) -> Result<()> {
const INITIAL_ORDER_CREATION_TIMESTAMP: DateTime<Utc> = DateTime::<Utc>::MIN_UTC;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After thinking more about it do we even need this sentinel value?
The cached order can already be optional and I would expect once we populate the cache there also has to be a non-zero timestamp in there so we can determine by the cache being None that we need to fetch ALL open orders instead of the incremental update.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With None some of the e2e tests fail. It's probably worth a separate PR to fix it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's please understand at least what causes the tests to fail. I don't want us to check in code without understanding why it's actually needed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That requires refactoring. Some of the code depends on auction always present in the DB, which is used across many of the e2e tests.

pub async fn get_auction(&self) -> dto::AuctionWithId {
let response = self
.http
.get(format!("{API_HOST}{AUCTION_ENDPOINT}"))
.send()
.await
.unwrap();
let status = response.status();
let body = response.text().await.unwrap();
assert_eq!(status, StatusCode::OK, "{body}");
serde_json::from_str(&body).unwrap()
}

Also:
async fn wait_until_autopilot_ready(&self) {
let is_up = || async {
let mut db = self.db.acquire().await.unwrap();
const QUERY: &str = "SELECT COUNT(*) FROM auctions";
let count: i64 = sqlx::query_scalar(QUERY)
.fetch_one(db.deref_mut())
.await
.unwrap();
count > 0
};
wait_for_condition(TIMEOUT, is_up)
.await
.expect("waiting for autopilot timed out");
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be great to refactor since the sentinel value really doesn't seem necessary but IMO that can happen after this gets merged so we can test this a bit on staging before merging to prod on Tuesday.

crates/autopilot/src/solvable_orders.rs Show resolved Hide resolved
# Conflicts:
#	crates/autopilot/src/infra/persistence/mod.rs
@@ -337,6 +340,71 @@ impl SolvableOrdersCache {
.collect()
}

/// Returns current solvable orders along with the latest order creation
/// timestamp.
async fn get_solvable_orders(&self) -> Result<(SolvableOrders, DateTime<Utc>)> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Getting the latest order creation timestamp can be done with the SolvableOrders. I believe it shouldn't be the responsibility of this function.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function has to return either previous_creation_timestamp or latest_creation_timestamp. Since the former is not used anywhere else, I decided to keep it in this function.

Copy link
Contributor

@MartinquaXD MartinquaXD left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would like to simplify the code inside SolvableOrdersCache further if possible but if my suggestion doesn't work I'm fine with merging now and refactoring later to already get confidence in the change by running it in staging.

@@ -337,6 +340,72 @@ impl SolvableOrdersCache {
.collect()
}

/// Returns current solvable orders along with the latest order creation
/// timestamp.
async fn get_solvable_orders(&self) -> Result<(SolvableOrders, DateTime<Utc>)> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to simplify this query a lot by always using solvable_orders_after()?
All you need for that is the list of known orders, latest timestamp and latest block.
If the cache is populated you just take those values and if it's not you should be able to just use defaults for everything, no?

Copy link
Contributor Author

@squadgazzz squadgazzz Sep 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to simplify this query a lot by always using solvable_orders_after()?

solvable_orders_after doesn't filter orders in the SQL query, so it would return the whole table at the first start if the creation timestamp/block number contains default values. I tried to explain this in the comment:

// A new auction should be created regardless of whether new solvable orders are
// found. The incremental solvable orders cache updater should only be
// enabled after the initial full SQL query
// (`persistence::all_solvable_orders`) returned some orders. Until then,
// `MIN_UTC` is used to indicate that no orders have been found yet by
// (`persistence::all_solvable_orders`). This prevents situations where
// starting the service with a large existing DB would cause
// the incremental query to load all unfiltered orders into memory, potentially
// leading to OOM issues because incremental query doesn't filter out
// expired/invalid orders in the SQL query and basically can return the whole
// table when filters with default values are used.

Copy link
Contributor

@MartinquaXD MartinquaXD left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright, let's get this merged to make strides on the roadmap but let's try to keep simplifying the code further.

@squadgazzz squadgazzz enabled auto-merge (squash) September 5, 2024 14:17
@squadgazzz squadgazzz merged commit ed09b91 into main Sep 5, 2024
10 checks passed
@squadgazzz squadgazzz deleted the 2831/improve-auction-update branch September 5, 2024 14:23
@github-actions github-actions bot locked and limited conversation to collaborators Sep 5, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

chore: Improve time to create Auction
3 participants