Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix for #1491 (error user callback not called sometimes) and #489 (killing rsync not handled gracefully / rsync exit code not evaluated) #1502

Merged
merged 11 commits into from
Aug 17, 2023
Merged
9 changes: 9 additions & 0 deletions CHANGES
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,15 @@ Version 1.3.4-dev (development of upcoming release)
* Fix bug: Unit test fails on some machines due to warning "Ignoring XDG_SESSION_TYPE=wayland on Gnome..." (#1429)
* Fix bug: Generation of config-manpage caused an error with Debian's Lintian (#1398).
* Fix bug: Return empty list in smartRemove (#1392, Debian Bug Report 973760)
* Fix bug: Taking a snapshot reports `rsync` errors now even if no snapshot was taken (#1491)
* Fix bug: takeSnapshot() recognizes errors now by also evaluating the rsync exit code (#489)
Fixes related problem: Killing `rsync` was not handled gracefully (by ignoring the rsync exit code)
* Fix bug: The error user-callback is now always called if an error happened during taking a snapshot (#1491)
* Improvement: Introduce new error codes for the "error" user callback (as part of #1491):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on changelog standards I would name this a "Feature".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you still found the link to the changelog standards (I have lost oversight here).

I would like to add it at the top of the changelog file keep it "in sight" whenever I change the changelog...

Copy link
Member

@buhtz buhtz Aug 17, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my own projects where I also decided to write markdown changelog I do use

I didn't want to bother you folks with that details. And myself I'm also quit new to handling things like this. So I'm still in experimenting period and try out what fits best to me.

While getting in touch with that standards I also read about auto-generating changelogs from git commit history. But I don't like that approach because the target readers for commit messages and changelog messages are IMHO different. The first is more technical based and the latter should focus on behavior of the application.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

5: Error while taking a snapshot.
6: New snapshot taken but with errors.
* Improvement: The `rsync` exit code is now contained in the snapshot log (part of #489). Example:
[E] Error: 'rsync' ended with exit code -9 (negative values are signal numbers, see 'kill -l')
* Breaking change: Minimal Python version 3.8 required (#1358).
* Removed: Handling and checking of user group "fuse" (#1472).
* Feature: Exclude /swapfile by default (#1053)
Expand Down
13 changes: 9 additions & 4 deletions common/pluginmanager.py
Original file line number Diff line number Diff line change
Expand Up @@ -91,14 +91,19 @@ def error(self, code, message):

Known error codes:

1. No or no valid configuration
1: No or no valid configuration
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume that new codes should be mentioned in the user-callback repo also.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, I forgot to update the README.md there!

I will update this too today

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

(check the configuration file)
2. A backup process is already running.
2: A backup process is already running.
Make sure that automatic and manual backups
do not run at once.
3. Snapshots directory not found
3: Snapshots directory not found
(eg. when a removable drive is not mounted)
4. The requested snapshot for "now" already exists
4: The requested snapshot for "now" already exists.
``message`` contains the SID (snapshot ID) then.
5: Error while taking a snapshot.
``message`` contains more information (as string).
6: New snapshot taken but with errors.
``message`` contains the SID (snapshot ID) then.

message: The error message for the code
(mostly an empty string by default)
Expand Down
103 changes: 79 additions & 24 deletions common/snapshots.py
Original file line number Diff line number Diff line change
Expand Up @@ -129,6 +129,24 @@ def takeSnapshotMessage(self):

#TODO: make own class for takeSnapshotMessage
def setTakeSnapshotMessage(self, type_id, message, timeout = -1):
"""Update the status message of the active snapshot creation job
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome. Never really get what this function does! 🤣


Write the status message into a message file to allow async
processing for the GUI, plug-ins (like user-callback)
and desktop notifications.

Args:
type_id: Simplified severity level of the status message:
1: ERROR
other values: INFO
message: status message string
timeout: Requested maximum processing duration in plug-ins.
Default: -1 (no limit)
Called plug-ins must try to keep the timeout itself
(it is not enforced by the ``PluginManager``.
In fact currently all known plug-ins do
ignore the timeout value!
"""
data = str(type_id) + '\n' + message

try:
Expand Down Expand Up @@ -680,7 +698,7 @@ def backup(self, force = False):

if not self.config.isConfigured():
logger.warning('Not configured', self)
self.config.PLUGIN_MANAGER.error(1) #not configured
self.config.PLUGIN_MANAGER.error(1) # not configured
elif not force and self.config.noSnapshotOnBattery() and tools.onBattery():
self.setTakeSnapshotMessage(0, _('Deferring backup while on battery'))
logger.info('Deferring backup while on battery', self)
Expand All @@ -696,7 +714,7 @@ def backup(self, force = False):
if not instance.check():
logger.warning('A backup is already running. The pid of the \
already running backup is in file %s. Maybe delete it' % instance.pidFile , self )
self.config.PLUGIN_MANAGER.error(2) #a backup is already running
self.config.PLUGIN_MANAGER.error(2) # a backup is already running
elif not restore_instance.check():
logger.warning('Restore is still running. Stop backup until \
restore is done. The pid of the already running restore is in %s. Maybe delete it'\
Expand All @@ -711,10 +729,10 @@ def backup(self, force = False):

now = datetime.datetime.today()

#inhibit suspend/hibernate during snapshot is running
# inhibit suspend/hibernate during snapshot is running
self.config.inhibitCookie = tools.inhibitSuspend(toplevel_xid = self.config.xWindowId)

#mount
# mount
try:
hash_id = mount.Mount(cfg = self.config).mount()
except MountException as ex:
Expand All @@ -733,7 +751,7 @@ def backup(self, force = False):
elif not self.config.PLUGIN_MANAGER.processBegin():
logger.info('A plugin prevented the backup', self)
else:
#take snapshot process begin
# take snapshot process begin
self.setTakeSnapshotMessage(0, '...')
self.snapshotLog.new(now)
profile_id = self.config.currentProfile()
Expand All @@ -757,19 +775,19 @@ def backup(self, force = False):
if counter != 0:
logger.info("Waited %d seconds for target directory to be available", counter)


if not self.config.canBackup(profile_id):
logger.warning('Can\'t find snapshots folder!', self)
self.config.PLUGIN_MANAGER.error(3) #Can't find snapshots directory (is it on a removable drive ?)
self.config.PLUGIN_MANAGER.error(3) # Can't find snapshots directory (is it on a removable drive ?)
else:
ret_error = False
sid = SID(now, self.config)

if sid.exists():
logger.warning("Snapshot path \"%s\" already exists" %sid.path(), self)
self.config.PLUGIN_MANAGER.error(4, sid) #This snapshots already exists
self.config.PLUGIN_MANAGER.error(4, sid) # This snapshot already exists
else:
try:
# TODO rename ret_val to new_snapshot_created and ret_error to has_error for clearer code
ret_val, ret_error = self.takeSnapshot(sid, now, include_folders)
except:
new = NewSnapshot(self.config)
Expand All @@ -783,12 +801,20 @@ def backup(self, force = False):

if ret_error:
logger.error('Failed to take snapshot.', self)
self.setTakeSnapshotMessage(1, _('Failed to take snapshot {snapshot_id}.').format(snapshot_id=sid.displayID))
msg = _('Failed to take snapshot {snapshot_id}.').format(snapshot_id=sid.displayID)
self.setTakeSnapshotMessage(1, msg)
self.config.PLUGIN_MANAGER.error(5, msg) # Fixes #1491

time.sleep(2)
else:
logger.warning("No new snapshot", self)
else:
ret_error = False
else: # new snapshot taken...
if ret_error:
logger.error('New snapshot taken but errors detected', self)
self.config.PLUGIN_MANAGER.error(6, sid.displayID) # Fixes #1491
ret_error = False # Why ignore errors now?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure about this but your arguments sounds fine to me.

# Probably because a new snapshot has been created (= changes transfered)
# and "continue on errors" is enabled

if not ret_error:
self.freeSpace(now)
Expand All @@ -809,7 +835,7 @@ def backup(self, force = False):
if not ret_error:
self.clearTakeSnapshotMessage()

#unmount
# unmount
try:
mount.Mount(cfg = self.config).umount(self.config.current_hash_id)
except MountException as ex:
Expand Down Expand Up @@ -866,9 +892,11 @@ def rsyncCallback(self, line, params):

Args:
line (str): stdout line from rsync
params (list): list of two bool '[error, changes]'. Using siteefect
on changing list items will change original
list, too. If rsync reported an error ``params[0]``
params (list): list of two bool '[error, changes]'.
Uses a side effect by changing list items here
to change the original list of the caller, too
(lists are passed as reference in Python).
If rsync reported an error ``params[0]``
will be set to ``True``. If rsync reported a changed
file ``params[1]`` will be set to ``True``
"""
Expand All @@ -878,13 +906,17 @@ def rsyncCallback(self, line, params):
self.setTakeSnapshotMessage(
0, _('Take snapshot') + " (rsync: %s)" % line)

# Did rsync report an error?
if line.endswith(')'):
if line.startswith('rsync:'):
if not line.startswith('rsync: chgrp ') and not line.startswith('rsync: chown '):
# matches rsync error lines like:
# rsync: [generator] link [...] failed: Invalid cross-device link (18)
params[0] = True
self.setTakeSnapshotMessage(1, 'Error: ' + line)

if len(line) >= 13:
# The prefix is created by rsync via the argument "--out-format=BACKINTIME: %i %n%L"
if line.startswith('BACKINTIME: '):
if line[12] != '.' and line[12:14] != 'cd':
params[1] = True
Expand Down Expand Up @@ -1099,7 +1131,10 @@ def takeSnapshot(self, sid, now, include_folders):

new_snapshot = NewSnapshot(self.config)
encode = self.config.ENCODE
params = [False, False] # [error, changes]
params = [False, False] # [error, changes] # "return" values set during async rsync execution (as user data "by ref")
# TODO docstring of return value for this function swaps the meaning of the elements,
# this is confusing (``ret_val``, ``ret_error``) and error-prone.
# Use a mutable data structure with named elements instead, e.g. a DataClass

if new_snapshot.exists() and new_snapshot.saveToContinue:
logger.info("Found leftover '%s' which can be continued." %new_snapshot.displayID, self)
Expand Down Expand Up @@ -1158,6 +1193,11 @@ def takeSnapshot(self, sid, now, include_folders):
# It should delete the excluded folders then
rsync_prefix.extend(('--delete', '--delete-excluded'))
rsync_prefix.append('-v')
# Use a fixed logging format for the rsync "changed files" list to make it parsable e.g. in rsyncCallback()
# %i = itemized list (11 characters) of what is being updated (see "--itemize-changes" in "man rsync")
# %n = the filename (short form; trailing "/" on dir)
# %L = the string " -> SYMLINK", " => HARDLINK", or "" (where SYMLINK or HARDLINK is a filename)
# (see log format section in "man rsyncd.conf")
rsync_prefix.extend(('-i', '--out-format=BACKINTIME: %i %n%L'))
if prev_sid:
link_dest = encode.path(os.path.join(prev_sid.sid, 'backup'))
Expand All @@ -1176,12 +1216,18 @@ def takeSnapshot(self, sid, now, include_folders):

# run rsync
proc = tools.Execute(cmd,
callback = self.rsyncCallback,
callback = self.rsyncCallback, # TODO interprets the user_data in params as: list of two bool [error, changes] but params is reused as return value of this function with [changes, error]. Use a separate variable to avoid confusion!
user_data = params,
filters = (self.filterRsyncProgress,),
parent = self)
self.snapshotLog.append('[I] ' + proc.printable_cmd, 3)
proc.run()
self.snapshotLog.append('[I] ' + proc.printable_cmd, 3) # TODO introduce centralized log msg builder to avoid spread severity level indicators like "[I]" here?
# TODO Process return value with rsync exit code to recognize errors that cannot be recognized by parsing the rsync output currently
rsync_exit_code = proc.run()
# Fix for #1491 and #489
# Note that the return value (containing the exit code) of the rsync child process
# is not the only way to detect errors (and sometimes not reliably delivers <> 0 in case of an error):
# Errors are also indicated via the pass-by-ref argument user_data="params" list
# (updated by the callback function that parses the rsync output for error message patterns).

# cleanup
try:
Expand All @@ -1192,9 +1238,16 @@ def takeSnapshot(self, sid, now, include_folders):
self)

# handle errors
has_errors = False
has_errors = False # TODO Fix inconsistent usage: Collects return value,
# but errors are also checked via params[0]

if rsync_exit_code != 0: # indicates an error
params[0] = True # HACK to fix #489 (params[0] and has_errors should be merged)
self.setTakeSnapshotMessage(1,
_("Error: 'rsync' ended with exit code {exit_code} (negative values are signal numbers, see 'kill -l')".format(
exit_code=rsync_exit_code)))

# params[0] -> error
# params[0] -> error?
if params[0]:
if not self.config.continueOnErrors():
self.remove(new_snapshot)
Expand All @@ -1203,7 +1256,7 @@ def takeSnapshot(self, sid, now, include_folders):
has_errors = True
new_snapshot.failed = True

# params[1] -> changes
# params[1] -> changes?
if not params[1] and not self.config.takeSnapshotRegardlessOfChanges():
self.remove(new_snapshot)
logger.info("Nothing changed, no new snapshot necessary", self)
Expand All @@ -1212,7 +1265,8 @@ def takeSnapshot(self, sid, now, include_folders):
prev_sid.setLastChecked()
if not has_errors and not list(self.config.anacrontabFiles()):
tools.writeTimeStamp(self.config.anacronSpoolFile())
return [False, False]
return [False, has_errors] # Part of fix for #1491: Returns "has_errors" instead of False now
# to signal rsync errors (which may have prevented processing any changes)

self.backupConfig(new_snapshot)
self.backupPermissions(new_snapshot)
Expand All @@ -1226,6 +1280,7 @@ def takeSnapshot(self, sid, now, include_folders):
logger.debug('Failed to write takeSnapshot log %s into compressed file %s: %s'
%(self.config.takeSnapshotLogFile(), new_snapshot.path(SID.LOG), str(e)),
self)
# TODO How is this error handled? Currently it looks like it is ignored (just logged)!

new_snapshot.saveToContinue = False

Expand All @@ -1248,7 +1303,7 @@ def takeSnapshot(self, sid, now, include_folders):
if not has_errors and not list(self.config.anacrontabFiles()):
tools.writeTimeStamp(self.config.anacronSpoolFile())

#create last_snapshot symlink
# create last_snapshot symlink
self.createLastSnapshotSymlink(sid)

return [True, has_errors]
Expand Down
Loading