You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The SQLite3 library locks all database operations to stop concurrent processes from corrupting the database. See http://www.sqlite.org/lockingv3.html. The scheme allows many concurrent readers or one single writer at a time.
The current architecture of Rhizome as implemented in servald allows more than one process to directly access the Rhizome database, which can produce lock conflicts. These conflicts cause the SQLite queries to fail immediately: the sqlite3_step() function returns MYSQL_BUSY.
On immediate consequence of these lock errors is in sending/receiving MeshMS messages. An incoming MeshMS message log (Rhizome bundle) causes the Batphone app to fork a thread which accesses the Rhizome database directly via calls to the servald command line operations rhizome list, rhizome extract manifest, rhizome extract file, and rhizome add file. Sometimes these operations fail because of a database lock error.
As a side issue (issue #3), these errors are not always reported back to the Java code, which continues on assuming the operation was successful. So a retry scheme cannot be implemented by the Batphone app.
The main issue is that MeshMS reception (acknowledgement) and sending should simply not be allowed fail because of database lock errors. The architecture must be made to deal with database concurrency issues completely and correctly.
This can be dealt with in three stages:
A partial fix to reduce the impact of database lock errors and keep MeshMS reliable enough for demo purposes. Issue Recover from Rhizome database lock errors using sleep-retry #2 introduces a low-level sleep-retry mechanism into all database accesses that should avoid the majority of lock errors.
Issue Report Rhizome database errors to command line caller #3 fixes the reporting of database lock errors to the Rhizome command-line operations that invoke them, so that Batphone Java code can detect and deal with the failure.
The substance of this issue is to change the existing servald architecture to fix the issue properly, as described below.
All Rhizome database operations ought to be performed by a single Rhizome server process that should be a fork(2) of the servald process.
The Rhizome server will present a simple request-response interface to all other components of the Serval Mesh product, and all Rhizome database operations will be performed exclusively by that process, thus eliminating the risk of database lock errors under normal circumstances.
The Rhizome server can safely have very high latency if needed, and this will not affect the low-latency services offered by servald. If servald wishes to perform a Rhizome store operation, for example storing a paylod that was just received via HTTP, then it puts the data into request parameters (and optionally files in external storage), and sends a request to the Rhizome server, using its asynchronous i/o mechanism to wait for the server to accept and then complete the request. All command-line rhizome operations will do the same.
The text was updated successfully, but these errors were encountered:
The SQLite3 library locks all database operations to stop concurrent processes from corrupting the database. See http://www.sqlite.org/lockingv3.html. The scheme allows many concurrent readers or one single writer at a time.
The current architecture of Rhizome as implemented in servald allows more than one process to directly access the Rhizome database, which can produce lock conflicts. These conflicts cause the SQLite queries to fail immediately: the sqlite3_step() function returns MYSQL_BUSY.
On immediate consequence of these lock errors is in sending/receiving MeshMS messages. An incoming MeshMS message log (Rhizome bundle) causes the Batphone app to fork a thread which accesses the Rhizome database directly via calls to the servald command line operations rhizome list, rhizome extract manifest, rhizome extract file, and rhizome add file. Sometimes these operations fail because of a database lock error.
As a side issue (issue #3), these errors are not always reported back to the Java code, which continues on assuming the operation was successful. So a retry scheme cannot be implemented by the Batphone app.
The main issue is that MeshMS reception (acknowledgement) and sending should simply not be allowed fail because of database lock errors. The architecture must be made to deal with database concurrency issues completely and correctly.
This can be dealt with in three stages:
All Rhizome database operations ought to be performed by a single Rhizome server process that should be a fork(2) of the servald process.
The Rhizome server will present a simple request-response interface to all other components of the Serval Mesh product, and all Rhizome database operations will be performed exclusively by that process, thus eliminating the risk of database lock errors under normal circumstances.
The Rhizome server can safely have very high latency if needed, and this will not affect the low-latency services offered by servald. If servald wishes to perform a Rhizome store operation, for example storing a paylod that was just received via HTTP, then it puts the data into request parameters (and optionally files in external storage), and sends a request to the Rhizome server, using its asynchronous i/o mechanism to wait for the server to accept and then complete the request. All command-line rhizome operations will do the same.
The text was updated successfully, but these errors were encountered: