Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature atuin import from file #2170

Draft
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

mijoharas
Copy link

@mijoharas mijoharas commented Jun 19, 2024

Hi, I wanted a one-off offline sync for my atuin history.db. I've left it in a rough and ready state to see if there's any interest in it, and to get feedback on if the feature is wanted.

(would address #816 ).

Reasons I'd personally like this feature:

  • People aren't always in a situation where they have access to the internet, and a one-off import to copy history across can be useful.
  • The syncing server seems non-trivial to self-host, and there is a lot of code to review if you want to be sure you can trust the external syncing server.

Either way, figured it was easy enough to write some code to start a discussion. Let me know your thoughts.

Checks

  • I am happy for maintainers to push small adjustments to this PR, to speed up the review cycle
  • I have checked that there are no existing pull requests for the same thing

@mijoharas mijoharas marked this pull request as draft June 19, 2024 21:02
@ellie
Copy link
Member

ellie commented Jun 20, 2024

Hey! Thanks for bringing this up

I think it's worth addressing what the full set of requirements are here. Are users OK with duplicate data if they run it twice? Do they expect sync to work after this has been done?

I'd rather not use the database as the transport mechanism here either - really something like atuin history dump should dump a format like jsonlines, and then the importer can read and import that. I'd be concerned that changes to the db schema could break imports in the future

@mijoharas
Copy link
Author

mijoharas commented Jun 21, 2024

Hey, schema changes and versioning is definitely one of the reasons I wanted to raise this Pr early for feedback! And thanks for all the good points:

Are users OK with duplicate data if they run it twice?

I should have mentioned (but I only tested it after I had raised the PR)., while we don't do anything in the code to remove duplicates, running it twice doesn't actually cause duplicates. I assume it's because we include the id of the record so the duplicate insert fails (I didn't dig into this.)

Do they expect sync to work after this has been done?

It's a good question, and probably something that I'd approach via documentation (sync is obviously the go-to ongoing sync feature, this new thing could be stressed as an option for if you have no internet, and we could note that it won't setup sync, and that setting up sync is probably what most users would want). But I've obviously got a lot less insight into your users than you do, so let me know your thoughts!

I'd rather not use the database as the transport mechanism here either - really something like atuin history dump should dump a format like jsonlines, and then the importer can read and import that.

This seems like a reasonable approach to me, and should also be fairly straightforward (I'm happy to update the Pr to do that if that's what we want to do). The final question is:

I'd be concerned that changes to the db schema could break imports in the future

This feels like it would be an issue whether we go for jsonlines or the history dump anyways, so something we should consider. Easiest way to solve is to ensure that both atuin's have the exact same schema version or something. I'm not super familiar with the data format, is there a schema number or something? if so, (and if we wanted to proceed with a json-lines version of this) how should it be enforced? would the json-lines file have the first line be something like {"atuin_history_dump_schema_version": "1"} and we bail out in the import process if dump-schema-version != current_schema_version? (and if so, where do we have the history schema version, is there something like that?)

Let me know your thoughts, and thanks for the response!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants