Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How can we explain the approx 550MB additional backup size on each incremental backup with only 1-10MB of new generated data #1888

Open
bjce opened this issue Sep 27, 2024 · 4 comments

Comments

@bjce
Copy link

bjce commented Sep 27, 2024

Hello

Many thanks to the maintainers of BIT which is a great piece of softwarewhich I heavily use

this is most probably not a bug but much more a humble question to what is the mechanism behind the following phenomenon:

I am using BIT with an hourly schedule. Most of my work is essentially gnerating text for coding, documentation and emails. When viewing the "last log" window in BIT, I made a du -sh on all the files backup on several backups and found that I approximately generate a range of 1-10MB each hour ("space01"). However if I now cd the dir where all the backups are store and get a list of dir that looks like this:

51M bckp_from_lenovo_to_external_HDD
700GB 20240927-130001-322
550MB 20240927-140001-580
560MB 20240927-150001-703
557MB 20240927-160001-600
551MB 20240927-170001-161
0 last_snapshot

I observe the following pattern: the first backup is obviously much larger because it has to do a backup of everything, and the incremental following backups are much smaller with range 550-560MB ("space02")

Here is my question: how can I explain that space01 is approx 55 to 550 smaller than space02.

Again, this is just a question of understanding the innerworking, but I don't know where else I could find the answer

Many thanks!

@buhtz
Copy link
Member

buhtz commented Sep 27, 2024

I am sure there is a good reason.

Check your "last log". You will find lines like this

[C] <f..t...... home/user/goodCloud/.sync_ceefb93c0308.db-shm
[C] <f.st...... home/user/goodCloud/.sync_ceefb93c0308.db-wal
[C] cf...p..... home/user/goodCloud/link_to_home.sh
[C] <f+++++++++ home/user/goodCloud/_transfer/25-09-24 54083-985515-4-CE_FINAL.docx

The 10 characters after the [C] do tell you why rsync decided to transfer that file. See this FAQ entry for more details.

@bjce
Copy link
Author

bjce commented Sep 28, 2024

Many thanks for your answer and the link to the explanation about rsync logs. @buhtz
Taking back your example:

[C] <f..t...... home/user/goodCloud/.sync_ceefb93c0308.db-shm
[C] <f.st...... home/user/goodCloud/.sync_ceefb93c0308.db-wal
[C] cf...p..... home/user/goodCloud/link_to_home.sh
[C] <f+++++++++ home/user/goodCloud/_transfer/25-09-24 54083-985515-4-CE_FINAL.docx

For example

[C] <f.st...... home/user/goodCloud/.sync_ceefb93c0308.db-wal

as both the file size is different, the time stamp is different: the file already existed and has been transferred to the remote host (sent to the external harddrive for example for example).

[C] <f+++++++++ home/user/goodCloud/_transfer/25-09-24 54083-985515-4-CE_FINAL.docx

is a file that has newly created and sent to the remote host (sent to the external harddrive for example for example).

But what I did then is to run the following script

du -sh /home/user/goodCloud/.sync_ceefb93c0308.db-shm
du -sh /home/user/goodCloud/.sync_ceefb93c0308.db-wal
du -sh /home/user/goodCloud/link_to_home.sh
du -sh /home/user/goodCloud/_transfer/25-09-24 54083-985515-4-CE_FINAL.docx

And this amounts to 10MB max. So this does not take into account of the file was replaced becaus of time stamp change or file size or if it was newly created. But in the scenario that all file were newly created and transfered should be the scenario that uses the most space 10MB max. If some files were just updated, it should take less space (or there is something I am missing?). I unfortunately still do not undestand the discrepancy of 10MB total of the files and the size of the snapshot (approx 550MB). Do you have any other ideas about how to explain this difference? Many thanks again for your time

@daveTheOldCoder
Copy link

I use a command like this to show the actual size of the snapshots:
du -chd0 /media/mount_point/backintime/host/user/profile/*

@buhtz
Copy link
Member

buhtz commented Sep 29, 2024

Hello bjce,

Please try to avoid using @ mentions unless absolutely necessary.
@ mentions trigger notification emails, which create unnecessary
noise and distract from the issue or pull request itself. As one of
the maintainers, I am notified of all activity in the repository
without the need for mentions. Thank you for understanding.

I am sorry, but I still don't have enough information to answer your question. I also don't understand all your sentences.

If some files were just updated, it should take less space (or there is something I am missing?).

No, rsync does transfer the whole file or just create a hardlink. There is nothing in between. Even if you modify just 1 single byte in a 200 MB file. That 200 MB file is treated as "modified" and will get transferred to the snapshot you take.

Best,
Christian

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants