Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce time to run tests by using a persistent source schema #482

Closed
plypaul opened this issue Apr 28, 2023 · 0 comments
Closed

Reduce time to run tests by using a persistent source schema #482

plypaul opened this issue Apr 28, 2023 · 0 comments
Labels
backlog enhancement New feature or request

Comments

@plypaul
Copy link
Contributor

plypaul commented Apr 28, 2023

Describe the Feature
Many tests rely on test tables (e.g. fct_bookings) in a source schema in the target SQL engine to verify that the generated SQL can run and produce valid results. Currently, the source schema with a unique name is created and populated before the tests run, and dropped at the conclusion of the test. However, this incurs a significant overhead when running a single test with engines other than DuckDB. During development, a single test is often run repeatedly to resolve a bug. In addition, this overhead is present in the tests suites that are run in CI.

Since the test tables in the source schema change infrequently, this overhead can be reduced by creating a persistent schema that is reused between testing sessions. By using a hash of the data in the name of the schema, issues with stale data in the schema can be avoided. This also enables more automatic updates when the test data changes without requiring the user to manually drop / update tables.

When using a persistent schema, potential race conditions may exist when having multiple concurrent testing sessions create tables in the schema for the firs time since the name of the schema is only dependent on the hash. After the schema and the test tables are created, concurrency will not be an issue since the tables do not change. There may be some other conditions as well, so using a persistent schema will be enabled by a flag as the default behavior is robust. A ideal solution to the concurrency for the initial schema creation / table population needs more investigation.

Would you like to contribute?
Yes.

Anything Else?
N/A

@plypaul plypaul added enhancement New feature or request triage Tasks that need to be triaged labels Apr 28, 2023
@Jstein77 Jstein77 added backlog and removed triage Tasks that need to be triaged labels Aug 30, 2023
@tlento tlento closed this as completed Sep 8, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backlog enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants