Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cache message history in local storage to speed up launch and avoid initialSyncs #121

Closed
ara4n opened this issue Aug 30, 2015 · 9 comments
Closed
Assignees
Milestone

Comments

@ara4n
Copy link
Member

ara4n commented Aug 30, 2015

No description provided.

@kegsay
Copy link
Contributor

kegsay commented Jan 12, 2017

Local storage probably isn't suitable for some people's /sync data: there will be too much. Indexed DB is probably the way to go. IIRC Rob was prodding around in this area, I'll ask him what he's done if anything.

@lukebarnard1
Copy link
Contributor

lukebarnard1 commented Jan 12, 2017

What about putting the pagination token in local storage... ? And then start syncing based on the pagination token.

@kegsay
Copy link
Contributor

kegsay commented Jan 12, 2017

That doesn't help. We still need to know all the previous room state.

@kegsay
Copy link
Contributor

kegsay commented Jan 17, 2017

So I was musing this over lunch, and I'm going to go for a quick and dirty implementation first to see where it falls down. The plan is:

  • Serialise Room objects as-is (works because of the algorithm used).
  • "Batch" writes to the database by periodically (timer or # reqs) flushing rooms which have been touched by /sync. In order for this to work, we need to NOT update the sync token for a while until the flush happens, so if we die before we flush we just start up with a slightly older sync token. I'm thinking of basing the period on time, e.g every 5 minutes. By only writing "touched" rooms and by batching like this, there should be little performance implications. This technique also allows for easy storage of additional fields in Room objects later down the line since we're just serialising the lot. We will probably need to work out a way to "upgrade" if there are new mandatory fields, although in that case we might as well start from scratch.
  • Service reads as they are currently: via store.getRoom(roomId).

This solution is completely transparent to the end-user (Riot-web). When the SDK starts, it will do it's SYNCING state change as normal using an incremental /sync, and the rooms will be magically there ready for getRooms() or whatever.

An interesting implementation Q is whether or not to use the InMemoryStore in the persistent implementation. I quite like the idea of still relying on the in-memory store (and hence still keeping it up-to-date), and the persistent bit only cares about writing those member fields to IndexedDB. Of course, this relies on the read performance not being completely terrible, though even if it is, we can then use the in-memory layer as a form of cache, so this still feels like a decent solution.

@kegsay
Copy link
Contributor

kegsay commented Jan 18, 2017

Spoke with @richvdh and @dbkr about this today:

  • We need to store account data and users as well (separate to rooms in different document stores)
  • We need to reap parts of the timeline for each room eventually. Best to do that on startup, and we should just be able to remove complete Timeline segments as they have fwd/back pagination tokens already.

@ara4n
Copy link
Member Author

ara4n commented Jan 20, 2017

are you sure we ever want to reap parts of timeline automatically? especially for E2E rooms, i'd have thought you'd want to keep as much content as possible around forever - especially when we get around to doing clientside search, unless the user explicitly purges it for privacy or diskspace reasons. Just as you wouldn't write an IRC client whose logging randomly deletes logs from under you.

@kegsay
Copy link
Contributor

kegsay commented Jan 26, 2017

Sure, we could not do this automatically, but we do need a way to reap events beyond the nuclear option. It could be an option in RoomSettings on Riot or whatever.

@kegsay
Copy link
Contributor

kegsay commented Feb 2, 2017

So I was musing this over lunch, and I'm going to go for a quick and dirty implementation first to see where it falls down.

It fell over for these reasons:

  • Circular dependencies on requires(). I managed to serialize things as JSON objects okay, but then when you want to convert the objects back into instances of classes you need to require() the files to expose the Class, so you get horrible loops where things don't exist for a while.

  • Most classes ended up needing custom serialize/deserialize routines because it was also copying the underlying EventEmitter properties (and failing if there was an attached listener). This made most classes much more brittle as if you add a new property to a class you need to remember to serialize/deserialize it too, and actually know how to (non-primitive types needed their own serialize() routine).

  • Circular dependencies in the object graph. There were subtle situations where it was important that one instance of an object was referenced between events X and Y in order to pick up changes (the RoomMember sentinel/room state stuff) which are impossible for me to work out from a fully JSONified tree. We either needed to add our own special circular ref notation, or something to that effect.

All these reasons together have pushed me over the edge in terms of working out if this is the Right Way, and I don't think it is anymore. Instead, I'm going to have a stab "lower down the stack" at the raw /sync responses: in other words, replaying history. This is fraught with peril because unless you "compact" in some way, load times will get progressively worse as you accumulate more and more /sync responses. The main concerns here are with no-op state clobbering state and endless message accumulation. We need to make sure that if we do compact, people can still roll back history as desired (that is, we have suitable /sync tokens to use with the right dataset at the edges (old/current)). This may be as simple as dropping accumulated "non-live" EventTimelines as they are the things which hold the message events AND RoomState edges AND pagination tokens. On the bright side, we can reasonably store account data and User objects as we currently are, by dumping the raw event they represent into the object store and clobber based on well-defined keys (event type and user ID respectively) as there is no history problem.

The plan of attack now is:

  • Keep account data and User objects working with the current system. These had no real problems and we have clear guidelines on when it is safe to clobber.

  • Add a new class(?) which is simply there to accumulate a representative /sync response for a room, which is formed by accumulating multiple /sync responses. This class can ditch any /sync data not related to rooms (presence/account data) and just needs to know the right clobbering semantics. We keep a separate sync token which represents where this sync response is up to, and periodically splat that entire response to IndexedDB, ready for injecting at startup. This would looks something like:

    const syncAccumulator = new SyncAccumulator();
    syncAccumualtor.process(initialSyncResponseJson);
    syncAccumualtor.process(syncResponseJson);
    syncAccumualtor.process(anotherSyncResponseJson);
    syncAccumualtor.process(moreSyncResponseJson);
    const syncJson = syncAccumulator.getJSON();
    indexedDB.store(syncJson);
    
    // startup
    const syncJson = indexedDB.getSyncJson();
    // ... pass it to SyncApi

@kegsay
Copy link
Contributor

kegsay commented Feb 21, 2017

This is done now using the SyncAccumulator way.

@kegsay kegsay closed this as completed Feb 21, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants