Skip to content
This repository has been archived by the owner on Aug 8, 2023. It is now read-only.

[core] Evict cached resources and tiles equally by access time #7770

Merged
merged 1 commit into from
Jan 19, 2017
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
33 changes: 29 additions & 4 deletions platform/default/mbgl/storage/offline_database.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -792,6 +792,31 @@ bool OfflineDatabase::evict(uint64_t neededFreeSize) {
// The addition of pageSize is a fudge factor to account for non `data` column
// size, and because pages can get fragmented on the database.
while (usedSize() + neededFreeSize + pageSize > maximumCacheSize) {
// clang-format off
Statement accessedStmt = getStatement(
"SELECT max(accessed) "
Copy link
Contributor

@jfirebaugh jfirebaugh Jan 18, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Query plan for this:

2|0|0|SCAN TABLE resources USING COVERING INDEX resources_accessed
2|1|1|SEARCH TABLE region_resources USING COVERING INDEX region_resources_resource_id (resource_id=?)
3|0|0|SCAN TABLE tiles USING COVERING INDEX tiles_accessed
3|1|1|SEARCH TABLE region_tiles USING COVERING INDEX region_tiles_tile_id (tile_id=?)
1|0|0|COMPOUND SUBQUERIES 2 AND 3 (UNION ALL)
0|0|0|SEARCH SUBQUERY 1

This concerns me a bit: it's table scanning both resources and tiles. The scan is done on indexes that are ordered appropriately for the ORDER BY ... LIMIT clause, and the hope is that these scans will be run in parallel, and both aborted when the limit is reached. But the query plan doesn't really make clear whether that's the case or not. It would be bad if the tables both had to be scanned in full.

Can you do some benchmarking over databases of a few different sizes (empty, just ambient use, large offline region), and see how query time scales empirically?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had a similar concern, since as part of the union it seem it would need to load the entirety of each table before sorting.

The below query adds complexity but leverages the indexes within subqueries before the union:

SELECT max(accessed) 
FROM ( 
  SELECT *
  FROM (
    SELECT accessed 
    FROM resources 
    LEFT JOIN region_resources 
    ON resource_id = resources.id 
    WHERE resource_id IS NULL 
    ORDER BY accessed ASC LIMIT 50 
  )
  UNION ALL
  SELECT * 
  FROM ( 
    SELECT accessed 
    FROM tiles 
    LEFT JOIN region_tiles 
    ON tile_id = tiles.id 
    WHERE tile_id IS NULL 
    ORDER BY accessed ASC LIMIT 50 
  )
  ORDER BY accessed ASC LIMIT 50
);

And has the following query plan:

3|0|0|SCAN TABLE resources USING COVERING INDEX resources_accessed
3|1|1|SEARCH TABLE region_resources USING COVERING INDEX region_resources_resource_id (resource_id=?)
2|0|0|SCAN SUBQUERY 3
2|0|0|USE TEMP B-TREE FOR ORDER BY
5|0|0|SCAN TABLE tiles USING COVERING INDEX tiles_accessed
5|1|1|SEARCH TABLE region_tiles USING COVERING INDEX region_tiles_tile_id (tile_id=?)
4|0|0|SCAN SUBQUERY 5
4|0|0|USE TEMP B-TREE FOR ORDER BY
1|0|0|COMPOUND SUBQUERIES 2 AND 4 (UNION ALL)
0|0|0|SEARCH SUBQUERY 1

I've used sqlite's built in .timer ON to attempt to profile the queries, and the differences seem negligible for a full cache (50mb database, 1010 tiles, 9 resources) but no offline regions stored.
Query in commit: Run Time: real 0.000 user 0.000155 sys 0.000076
Query in comment: Run Time: real 0.000 user 0.000207 sys 0.000120

I'm not sure of a better way to profile sqlite here, and not sure if those query results are cached. Let me know if you have any ideas, couldn't find much.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That seems promising. It's pretty important that we don't regress eviction performance for larger databases too, so can you test both approaches with a large offline region as well?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Testing on a database with SF downloaded for offline: 153mb, 3300 tiles, 1284 resources

Query in commit, run 3x:

Run Time: real 0.014 user 0.002650 sys 0.002637
Run Time: real 0.002 user 0.001782 sys 0.000058
Run Time: real 0.002 user 0.001967 sys 0.000073

Query in comment:

Run Time: real 0.028 user 0.003179 sys 0.003182
Run Time: real 0.002 user 0.001882 sys 0.000116
Run Time: real 0.002 user 0.001802 sys 0.000082

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Let's go with the simpler query then.

"FROM ( "
" SELECT accessed "
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you sure the inner query is parsed as suggested by the indentation, and not "(select from resources [unordered and unlimited]) UNION ALL (select from tiles ... ordery by ... limit 50)"?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

" FROM resources "
" LEFT JOIN region_resources "
" ON resource_id = resources.id "
" WHERE resource_id IS NULL "
" UNION ALL "
" SELECT accessed "
" FROM tiles "
" LEFT JOIN region_tiles "
" ON tile_id = tiles.id "
" WHERE tile_id IS NULL "
" ORDER BY accessed ASC LIMIT ?1 "
") "
);
accessedStmt->bind(1, 50);
// clang-format on
if (!accessedStmt->run()) {
return false;
}
Timestamp accessed = accessedStmt->get<Timestamp>(0);

// clang-format off
Statement stmt1 = getStatement(
"DELETE FROM resources "
Expand All @@ -800,10 +825,10 @@ bool OfflineDatabase::evict(uint64_t neededFreeSize) {
" LEFT JOIN region_resources "
" ON resource_id = resources.id "
" WHERE resource_id IS NULL "
" ORDER BY accessed ASC LIMIT ?1 "
" AND accessed <= ?1 "
") ");
// clang-format on
stmt1->bind(1, 50);
stmt1->bind(1, accessed);
stmt1->run();
uint64_t changes1 = db->changes();

Expand All @@ -815,10 +840,10 @@ bool OfflineDatabase::evict(uint64_t neededFreeSize) {
" LEFT JOIN region_tiles "
" ON tile_id = tiles.id "
" WHERE tile_id IS NULL "
" ORDER BY accessed ASC LIMIT ?1 "
" AND accessed <= ?1 "
") ");
// clang-format on
stmt2->bind(1, 50);
stmt2->bind(1, accessed);
stmt2->run();
uint64_t changes2 = db->changes();

Expand Down