You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the Feature
Many tests rely on test tables (e.g. fct_bookings) in a source schema in the target SQL engine to verify that the generated SQL can run and produce valid results. Currently, the source schema with a unique name is created and populated before the tests run, and dropped at the conclusion of the test. However, this incurs a significant overhead when running a single test with engines other than DuckDB. During development, a single test is often run repeatedly to resolve a bug. In addition, this overhead is present in the tests suites that are run in CI.
Since the test tables in the source schema change infrequently, this overhead can be reduced by creating a persistent schema that is reused between testing sessions. By using a hash of the data in the name of the schema, issues with stale data in the schema can be avoided. This also enables more automatic updates when the test data changes without requiring the user to manually drop / update tables.
When using a persistent schema, potential race conditions may exist when having multiple concurrent testing sessions create tables in the schema for the firs time since the name of the schema is only dependent on the hash. After the schema and the test tables are created, concurrency will not be an issue since the tables do not change. There may be some other conditions as well, so using a persistent schema will be enabled by a flag as the default behavior is robust. A ideal solution to the concurrency for the initial schema creation / table population needs more investigation.
Would you like to contribute?
Yes.
Anything Else?
N/A
The text was updated successfully, but these errors were encountered:
Describe the Feature
Many tests rely on test tables (e.g.
fct_bookings
) in a source schema in the target SQL engine to verify that the generated SQL can run and produce valid results. Currently, the source schema with a unique name is created and populated before the tests run, and dropped at the conclusion of the test. However, this incurs a significant overhead when running a single test with engines other than DuckDB. During development, a single test is often run repeatedly to resolve a bug. In addition, this overhead is present in the tests suites that are run in CI.Since the test tables in the source schema change infrequently, this overhead can be reduced by creating a persistent schema that is reused between testing sessions. By using a hash of the data in the name of the schema, issues with stale data in the schema can be avoided. This also enables more automatic updates when the test data changes without requiring the user to manually drop / update tables.
When using a persistent schema, potential race conditions may exist when having multiple concurrent testing sessions create tables in the schema for the firs time since the name of the schema is only dependent on the hash. After the schema and the test tables are created, concurrency will not be an issue since the tables do not change. There may be some other conditions as well, so using a persistent schema will be enabled by a flag as the default behavior is robust. A ideal solution to the concurrency for the initial schema creation / table population needs more investigation.
Would you like to contribute?
Yes.
Anything Else?
N/A
The text was updated successfully, but these errors were encountered: