Skip to content

Commit

Permalink
[KATC] Add documentation for troubleshooting deserialization (#1813)
Browse files Browse the repository at this point in the history
  • Loading branch information
RebeccaMahany authored Aug 1, 2024
1 parent 5a6f49f commit 3ef4750
Showing 1 changed file with 53 additions and 0 deletions.
53 changes: 53 additions & 0 deletions ee/katc/test_data/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,59 @@ If you are iteratively making changes to the database, Chrome will complain abou
the database. You can delete the database via Dev Tools and then reload the page to re-create
the database successfully.

## Troubleshooting parsing errors

First, it is helpful to understand how deserialization works. The general process is documented below,
but it is also helpful to read the deserialization code itself -- both in launcher, where we have
documented caveats and limitations as thoroughly as possible, and in the Chrome and Firefox code linked
in order to compare launcher's deserialization with the actual serialization and deserialization steps.

### General object deserialization process

The exact deserialization process differs a bit between the two browsers, but in general, we read byte-by-byte:

1. First, we will parse the header if present, continuing until we get a byte indicating the start of an object
2. We will read the next byte as a "token"/"tag" that indicates an upcoming string, which is the next object property name (e.g. "uuid")
3. We will read the next byte as the length of the upcoming string
4. We will read the n bytes as the string (e.g. "uuid")
5. We will read the next byte as a "token"/"tag", indicating the upcoming data type (for example, an int, an array, a nested object)
6. Depending on the upcoming data type, we may read the next byte as the number of expected bytes holding this data
7. We process the upcoming data according to its type; we read until either we hit an end token or have reached the number of expected bytes
8. We may have to read and discard metadata or padding at the end of the value
9. We set the object property name equal to its value, and continue
10. We repeat until we reach a token indicating we've reached the end of the object

For Chrome, review the code [here](https://github.com/v8/v8/blob/master/src/objects/value-serializer.cc).
For Firefox, review the code [here](https://searchfox.org/mozilla-central/source/js/src/vm/StructuredClone.cpp).

### Most likely parsing issues

In initial rollout and testing, we have come across parsing errors that mostly fall into two categories:

1. Data type not yet implemented
2. Not correctly discarding padding or metadata

If you are seeing the first error, the error message should hopefully make that clear. For example, the
error message might say "unsupported tag type" or "unimplemented array item type". In this case,
fixing the error entails reading the Chrome or Firefox source code and implementing deserialization for
the new data type. Note that if you have to implement deserialization of a new data type for one browser,
you will probably want to do it for the other browser too.

The second error is more difficult to figure out. The symptoms often look like misaligned data, where an
object property name might look like `id"1"name"jane"` -- i.e., several keys/values concatenated together
because the deserialization process hasn't correctly read the string length. In this case, the most useful
troubleshooting step is to determine the object property that was processed immediately prior to the malformed
data or the error, and then compare how that property is deserialized in launcher versus in the Chrome or
Firefox source code.

If you have access to the database exhibiting this issue, you can add it to the [indexeddbs](./indexeddbs/)
directory and run the tests in [table_test.go](../table_test.go) against this database. (However, unless
you handcrafted this database, you should not commit it to launcher, in case it contains sensitive data.)
Otherwise, for new data types, you can add the new data type to [main.js](./main.js) and update the existing
databases in order to test your fixes.

## References

* [Helpful tutorial for working with the IndexedDB API](https://developer.mozilla.org/en-US/docs/Web/API/IndexedDB_API/Using_IndexedDB)
* [Chrome serialization code](https://github.com/v8/v8/blob/master/src/objects/value-serializer.cc)
* [Firefox serialization code](https://searchfox.org/mozilla-central/source/js/src/vm/StructuredClone.cpp)

0 comments on commit 3ef4750

Please sign in to comment.