indexeddb: Shrink values in inbound_group_sessions store, improving performance all around #3073

andybalaam · 2024-01-30T13:35:33Z

Change the way we store values in Indexed DB for the crypto store, specifically in the inbound_group_sessions store. Reduce the size of the stored values and replace arrays of numbers with base 64 strings. The rationale and some implementation ideas are documented in #3057. (TL;DR: small values are faster, base64 is faster.)

This change:

Introduces a MaybeEncrypted type in matrix_sdk_indexeddb::crypto_store::indexeddb_serializer that encapsulates the fact that we serialize differently if the session is encrypted with a store cipher. (Previously, we just serialized an array of numbers in both cases. This was actually quite confusing since these arrays of numbers were holding encoded versions of very different values.) This enum is untagged in Serde to avoid increasing the size of Indexed DB records.
Adds matrix_sdk_store_encryption::EncryptedValueBase64, which works the same way as EncryptedValue, but stores arrays of numbers as base 64.
Adds a migration to the new schema to matrix_sdk_indexeddb::crypto_store::migrations, including adding the backed_up_to property, so we can support a fix for #26892 (see this comment) without doing another migration.

I also wrote a performance test that runs in a web browser. In the checked-in version, it runs for 2K records, but when I manually increased the number on my machine to 200K, I got these results:

Inserting 200000 records with v8 schema took 216,745ms.
Inserting 200000 records with v10 schema took 34,875ms.

Counting 200000 records with v8 schema took 11,786ms.
Counting 200000 records with v10 schema took 469ms.

So the speedup is impressive.

codecov · 2024-01-30T13:58:52Z

Codecov Report

Attention: 10 lines in your changes are missing coverage. Please review.

Comparison is base (18eefcf) 83.72% compared to head (e3da325) 83.68%.
Report is 20 commits behind head on main.

Files	Patch %	Lines
crates/matrix-sdk-store-encryption/src/lib.rs	68.75%	10 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #3073      +/-   ##
==========================================
- Coverage   83.72%   83.68%   -0.05%     
==========================================
  Files         222      222              
  Lines       23357    23389      +32     
==========================================
+ Hits        19556    19573      +17     
- Misses       3801     3816      +15

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

BillCarsonFr

Great work, thx!
LGTM to me. I agree that it's worth it to include the future backed_up_to field at this point.

Hywan

It looks excellent to me. I've a couple of really tiny feedback, but overall, great work!

crates/matrix-sdk-indexeddb/src/crypto_store/indexeddb_serializer.rs

crates/matrix-sdk-indexeddb/src/crypto_store/migrations.rs

crates/matrix-sdk-store-encryption/src/lib.rs

poljar · 2024-02-08T13:06:17Z

Oh no: #718.

dkasak · 2024-02-08T13:41:06Z

Great work with the speed-up!

I'm a bit late to the party, but what role does base64-encoding the payload serve in this? It seems that the trick is to replace an array with lots of values with a single value (a string), which could've just as well been a JSON string. Which would mean the base64 step only serves to add overhead on top of this?

The only way the base64 encoding would make sense is if we opted for a more compact binary representation of the array of numbers (which I think was the ultimate goal of #718 there which poljar linked).

andybalaam · 2024-02-08T13:42:39Z

Oh no: #718.

@poljar hmm. I think we could move the JSON-specific stuff into the indexeddb crate if we wanted to?

andybalaam · 2024-02-08T13:46:27Z

Great work with the speed-up!

I'm a bit late to the party, but what role does base64-encoding the payload serve in this? It seems that the trick is to replace an array with lots of values with a single value (a string), which could've just as well been a JSON string. Which would mean the base64 step only serves to add overhead on top of this?

The only way the base64 encoding would make sense is if we opted for a more compact binary representation of the array of numbers (which I think was the ultimate goal of #718 there which poljar linked).

@dkasak the base64 encoding is to transform an array of bytes into a string, because I found that strings are much faster than arrays of numbers to store inside Indexed DB. When you say a JSON string, do you mean treat the bytes as characters directly? That would result in invalid UTF-8, which is not allowed in ~~JSON~~ JavaScript, right?

poljar · 2024-02-08T13:47:24Z

Oh no: #718.

@poljar hmm. I think we could move the JSON-specific stuff into the indexeddb crate if we wanted to?

Yeah, maybe, will think about it.

dkasak · 2024-02-08T14:05:04Z

When you say a JSON string, do you mean treat the bytes as characters directly? That would result in invalid UTF-8, which is not allowed in ~~JSON~~ JavaScript, right?

What I mean is using serde_json::to_string (which produces an UTF-8 encoded string representation of the JSON) instead of serde_json::to_vec (which produces a vector of bytes of the same representation). And then skipping the base64 encoding step altogether.

andybalaam · 2024-02-08T14:33:06Z

We are already encoding to JSON using to_string the question is what are we encoding? Previously it was a Vec<u8> and now it's a String, which contains base64.

andybalaam · 2024-02-08T14:35:07Z

If I remember correctly, the Vec<u8> we have is a pickled version of an InboundGroupSession.

dkasak · 2024-02-08T14:57:54Z

Just to write down the conclusion: after a quick chat with @andybalaam, only the unencrypted case unnecessarily base64-encodes the payload before storing. The encrypted case a couple of lines above does what is expected, so that's why we were talking past each other.

andybalaam marked this pull request as ready for review January 30, 2024 13:47

andybalaam requested a review from a team as a code owner January 30, 2024 13:48

andybalaam requested review from bnjbvr and removed request for a team January 30, 2024 13:48

andybalaam requested review from BillCarsonFr and Hywan and removed request for bnjbvr January 30, 2024 17:09

BillCarsonFr approved these changes Jan 30, 2024

View reviewed changes

Hywan approved these changes Jan 31, 2024

View reviewed changes

indexeddb: Shrink values in inbound_group_sessions store

e3ba336

andybalaam force-pushed the andybalaam/shrink-inbound_group_sessions branch from 64fb436 to 56ac6df Compare January 31, 2024 12:00

andybalaam added 2 commits January 31, 2024 12:15

indexeddb: Measure performance of v8 and v10 inbound_group_session store

860fe4a

doc: Add rust,no_run to examples in matrix-sdk-store-encryption

e3da325

andybalaam force-pushed the andybalaam/shrink-inbound_group_sessions branch from 56ac6df to e3da325 Compare January 31, 2024 12:16

andybalaam enabled auto-merge January 31, 2024 12:16

andybalaam merged commit f64af12 into main Jan 31, 2024
34 checks passed

andybalaam deleted the andybalaam/shrink-inbound_group_sessions branch January 31, 2024 12:29

andybalaam mentioned this pull request Jan 31, 2024

Element R: Indexed DB performance is slow because of large values element-hq/element-web#26930

Closed

poljar mentioned this pull request Feb 8, 2024

Deprecate methods in the store-encryption crate that use serde_json #3086

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

indexeddb: Shrink values in inbound_group_sessions store, improving performance all around #3073

indexeddb: Shrink values in inbound_group_sessions store, improving performance all around #3073

andybalaam commented Jan 30, 2024 •

edited

Loading

codecov bot commented Jan 30, 2024 •

edited

Loading

BillCarsonFr left a comment

Hywan left a comment

poljar commented Feb 8, 2024

dkasak commented Feb 8, 2024

andybalaam commented Feb 8, 2024

andybalaam commented Feb 8, 2024 •

edited

Loading

poljar commented Feb 8, 2024

dkasak commented Feb 8, 2024 •

edited

Loading

andybalaam commented Feb 8, 2024

andybalaam commented Feb 8, 2024

dkasak commented Feb 8, 2024

indexeddb: Shrink values in inbound_group_sessions store, improving performance all around #3073

indexeddb: Shrink values in inbound_group_sessions store, improving performance all around #3073

Conversation

andybalaam commented Jan 30, 2024 • edited Loading

codecov bot commented Jan 30, 2024 • edited Loading

Codecov Report

BillCarsonFr left a comment

Choose a reason for hiding this comment

Hywan left a comment

Choose a reason for hiding this comment

poljar commented Feb 8, 2024

dkasak commented Feb 8, 2024

andybalaam commented Feb 8, 2024

andybalaam commented Feb 8, 2024 • edited Loading

poljar commented Feb 8, 2024

dkasak commented Feb 8, 2024 • edited Loading

andybalaam commented Feb 8, 2024

andybalaam commented Feb 8, 2024

dkasak commented Feb 8, 2024

andybalaam commented Jan 30, 2024 •

edited

Loading

codecov bot commented Jan 30, 2024 •

edited

Loading

andybalaam commented Feb 8, 2024 •

edited

Loading

dkasak commented Feb 8, 2024 •

edited

Loading