Add support for archiving DB to SQLite #1104
Draft
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This adds a rake command to export the contents of the DB into a SQLite file for public archiving. It's mostly a pretty straightforward copy of every table/row, but we skip tables that are irrelevant for a public data set (administrative things like GoodJob tables, users, imports, etc.), drop columns with user data, and do some basic conversions.
Part of edgi-govdata-archiving/web-monitoring#170
For changes/annotations, we probably want to just select relevant annotations, like the important changes (make sure we have them all in the DB first, see https://github.com/edgi-govdata-archiving/web-monitoring-processing/blob/main/web_monitoring/cli/annotations_import.py), and only import those and the changes they apply to.