-
Notifications
You must be signed in to change notification settings - Fork 99
Use msgpack instead of json for serializing events #17
Conversation
Indeed, it's a lot of change :). |
Here's my two cents:
|
Maybe we can also use something like pickle? |
I'm not quite sure that |
Seem that's clearly documented |
Unlike JSON, MessagePack is able to preserve binary data. Using MessagePack for event serialization means that native binary data can be transmitted without modification and Event.encode_bytes is no longer necessary. Preserving the bytestring/unicode distinction requires some extra steps to preserve Python 2/3 interop. Specifically, literal keys are marked as unicode explicitly (i.e. using u'...'). Branch and repo names are also encoded with UTF-8. The database schema is modified to use BLOB instead of TEXT to record events. A migration script is provided to update a database in place.
Metaclass type names need to be made using "str(...)" to maintain python 2/3 compat
All IDA platforms use UTF-8 internally. Decoding strings should only be necessary if they're displayed to the user.
@NeatMonster I agree with all of those issues:
u-msg-pack is a reasonably efficient, pure-python implementation of msgpack distributed in a single file. It would be possible to include a copy of it and use it if the msgpack library isn't available. Would you be ok with that?
Now that it's clear that all IDA platforms use UTF-8 internally, it should be possible to completely remove string encoding/decoding for events. Strings only need to be decoded if they're directly shown to the user. I made this change in d78643e.
Using explicit unicode literals really is annoying and error prone. By using |
|
It still bothers me a little. I've scrolled through the
Are you 100% sure about that? I was able to verify it in most cases, but I would love to have an official source on this. Still, it makes our lives much easier to not have to encode strings/binary strings.
That's a lot better in my opinion, thanks! The one question that remains for me is: can we keep the same level of detail shown to the user and inter-operability if we switch to |
Yeah I can give you the source of that. |
Thanks @NyaMisty. We may not need to worry about the encoding of string anymore if we get rid of JSON then. But to display the packets to the user, I'm afraid that will still be necessary. |
Maybe we should close the this PR and move it into a separate issue? |
First: this is a fairly intrusive patch, and I don't really expect it to be merged as is.
Several bugs that I ran into in early IDArling use were caused because binary data wasn't encoded for transit via json (e.g. 2bd7be7). The mismatch between IDA strings and JSON also results in a fair amount of boilerplate spread throughout
events.py
.This patch addresses these issues by using msgpack for serialization. The msgpack data model is very similar to JSON, but it allows unicode strings to be distinguished from binary data.
The patch addresses python 2/3 interoperability by explicitly using unicode for event keys.
A small script is included for migrating databases from JSON to msgpack.