Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve management of iterator returned by collect (database table is locked error) #377

Open
3 tasks
shcheklein opened this issue Aug 31, 2024 · 2 comments
Open
3 tasks
Labels
enhancement New feature or request

Comments

@shcheklein
Copy link
Member

shcheklein commented Aug 31, 2024

Action Items

  • We should check why read-only iterator locks the data for other data chains (it fails on creating tables for UDF for example, or removing tables on cleanup - so, on updating system tables on SQLite). Can we make these read-only stuff a non-blocker in the first place
  • We need to make it a context manager (so that it close itself and release database underneath)
  • We can see if we can improve the error message (database is locked -> "iterator is open" or something)

Description

Related to iterative/datachain-examples#15

Simple way to reproduce:

from datachain import DataChain

ds = DataChain.from_values(fib=[1, 1, 2, 3, 5, 8])
itr = ds.collect("fib")
print(next(itr))

ds = DataChain.from_values(fib=[1, 1, 2, 3, 5, 8])
ds.show()

Causes something like (when you run it within a test suite at least, since it is causing DB cleanup, but the same can be imitated by let's say making the second part a bit more complicated - with UDF):

>           result = self.db.execute(*self.compile_to_args(query))
E           sqlite3.OperationalError: database table is locked: sqlite_master

/Users/ivan/Projects/dvcx/src/datachain/data_storage/sqlite.py:199: OperationalError

It's caused (I think) by us using:

with super().select(*db_signals).as_iterable() as rows:
     ...
     yield

It is an open connection / cursor staying underneath. It affects probably also other DataChains since we are reusing the same DB.

Workaround

Obvious workaround - use limit (+ offset if needed) to make sure that it's a single (or whatever number of elements is needed) and / or list(...) or iterate through them.

@shcheklein shcheklein changed the title database is locked if collect is used and not consumed Improve management of iterator returned by collect Aug 31, 2024
@shcheklein shcheklein changed the title Improve management of iterator returned by collect Improve management of iterator returned by collect (database table is locked error) Aug 31, 2024
@shcheklein shcheklein added the enhancement New feature or request label Aug 31, 2024
@mattseddon
Copy link
Member

We should check why read-only iterator locks the data for other data chains (it fails on creating tables for UDF for example, or removing tables on cleanup - so, on updating system tables on SQLite). Can we make these read-only stuff a non-blocker in the first place

It doesn't seem like any SQLite connections are opened in read-only mode. They all take both the read and write locks.

I had a look at introducing a read method in the SQLiteDatabaseEngine that would look something like this:

    @retry_sqlite_locks
    def read(self, query):
        with sqlite3.connect(
            "file::memory:?mode=ro&nolock=1&cache=shared",
            uri=True,
            detect_types=DETECT_TYPES,
        ) as connection:
            return connection.execute(*self.compile_to_args(query))

Unfortunately, that doesn't seem to work (at least running as a test), perhaps because you can't open the in-memory database in read-only mode.

We need to make it a context manager (so that it close itself and release database underneath)

Doesn't this go against the decision that we made earlier to avoid using a session as a context manager?

@shcheklein
Copy link
Member Author

Doesn't this go against the decision that we made earlier to avoid using a session as a context manager?

it's not about session, it's only about this Iterator returned by collect - it should be closable if we can't make DB work in a mode where there is no lock.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants