Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Schema Viewer Drawer #3291

Merged
merged 21 commits into from
Mar 13, 2019
Merged

Schema Viewer Drawer #3291

merged 21 commits into from
Mar 13, 2019

Conversation

emtwo
Copy link

@emtwo emtwo commented Jan 16, 2019

This is a fresh PR with the code from #2990 rebased and linted.

It is ready for review now. This PR is the first of a series of PRs for schema enhancements. I will link the subsequent PRs here as they become available.

[1] Schema viewer drawer #3291 (this one)
[2] Schema admin configuration #3292
[3] Schema query samples #3293
[4] Data source descriptions #3401

@emtwo
Copy link
Author

emtwo commented Jan 16, 2019

Note: the way that schema updates work now is through a periodic celery task that runs the queries to get column names and types etc. The results are stored in the new schema tables. Whenever the schema is fetched from the UI, it just directly queries the data in these tables.

Since the schema is set to refresh only every 30 min (https://github.com/getredash/redash/blob/master/redash/settings/__init__.py#L48), this is likely why the percy/redash visual error shows up.

We can either increase the frequency of schema update (quicker option, but not as good) or have a one-off schema refresh that is done on init so that the schema is available. I'll look into the latter.

@emtwo emtwo requested a review from arikfr January 16, 2019 17:46
cy.login();
cy.request({
Copy link
Member

@gabrieldutra gabrieldutra Jan 21, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've created db-seed.js with this purpose 🤔, so this kind of dependency could be created by using npm run cypress db-seed prior to all tests and this would be avoided among then:

// create_query_spec.js - a few upper lines that were not shown
  const pg = {
      name: 'test',
      options: {
        dbname: 'postgres',
        host: 'postgres',
        password: 'postgres',
        user: 'postgres',
      },
      type: 'pg',
    };

LMK what you think haha

PS: if you are just testing, ignore this 😅

Copy link
Author

@emtwo emtwo Jan 21, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gabrieldutra I was in fact, just testing. Though I could use some help. I cannot reproduce this percy issue locally that shows up here. In fact, when I run the create_query_spec.js test on master locally, the DOM snapshots seem to be missing the shema data (included screenshot below) And on the other hand, the snapshot for this PR seems to show the schema locally (screenshot also included below).

Any idea what might be going on here or how I can reproduce this?

Screenshot from master
screen shot 2019-01-21 at 5 18 28 pm

Screenshot from this PR:
screen shot 2019-01-21 at 5 25 07 pm

Copy link
Member

@gabrieldutra gabrieldutra Jan 21, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have faced some issues when I was doing changes in frontend and handling Cypress in development mode. The frontend code seems not to be shared within the docker container. I'll handle this further, but a quick fix to make it respond properly is to, after start cypress server just like you did, run npm run start for webpack development server and open cypress with CYPRESS_baseUrl=http://localhost:8080 npm run cypress open

Copy link
Member

@gabrieldutra gabrieldutra Jan 21, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, I don't know if it's related, but I noticed the Chinook data source is not showing schema info in the preview.

I'll try to reproduce this locally and give you some help with Percy anyway

Edit: the Chinook issue is probably related to the missing schema queue in one of the files (docker-compose.production.yml perhaps)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have faced some issues when I was doing changes in frontend and handling Cypress in development mode. The frontend code seems not to be shared within the docker container.

Does the Cypress Docker Compose configuration use VOLUMEs?

Copy link
Member

@gabrieldutra gabrieldutra Jan 22, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cypress is using docker-compose.cypress.yml when in CI and the development docker-compose.yml when not.
Edit: Forgot to mention about the volumes haha, but the first one doesn't use and the second one does.

However it uses http://localhost:5000, which I guess doesn't use webpack to watch files, so frontend in this case only updates after a rebuild. The two options I see to make it friendlier to the developer would be either adding a npm run start to a frontend container in docker-compose.yml or adding this outside docker in cypress scripts.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should add to the instructions to run npm run build before running Cypress tests. Running npm in the container is not possible, because the container will not have Node (currently it does, but it's a temporary thing).

@@ -29,7 +29,7 @@ services:
REDASH_LOG_LEVEL: "INFO"
REDASH_REDIS_URL: "redis://redis:6379/0"
REDASH_DATABASE_URL: "postgresql://postgres@postgres/postgres"
QUEUES: "queries,scheduled_queries,celery"
QUEUES: "queries,scheduled_queries,celery,schemas"
Copy link
Member

@gabrieldutra gabrieldutra Jan 21, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it haha, there are other docker-compose.yml files inside .circleci, just add this to them and Percy should do fine 🚀

Edit: Only docker-compose.cypress.yml affects the Percy screenshots
Edit2: There are probably other files where this may be necessary:
queues

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aweosme! Updating docker-compose.cypress.yml did the trick! I didn't realize cypress had its own yml file. Thank you for your help @gabrieldutra!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're welcome @emtwo! 🙂

@emtwo emtwo force-pushed the emtwo/schema1_info_drawer branch 2 times, most recently from e6be093 to 3c4e8c8 Compare January 22, 2019 15:30
@emtwo
Copy link
Author

emtwo commented Jan 22, 2019

I've rebased the PR again and the original percy issue is fixed. Note that currently percy is failing for an expected reason - there are 2 new tables added - column_metadata and table_metadata that show up in the schema viewer.

@gabrieldutra
Copy link
Member

I've rebased the PR again and the original percy issue is fixed. Note that currently percy is failing for an expected reason - there are 2 new tables added - column_metadata and table_metadata that show up in the schema viewer.

Don't forget to add the schema queue in the other files (such as docker-compose.production.yml) as this could cause some bugs in the future 😁

@emtwo
Copy link
Author

emtwo commented Jan 22, 2019

I've rebased the PR again and the original percy issue is fixed. Note that currently percy is failing for an expected reason - there are 2 new tables added - column_metadata and table_metadata that show up in the schema viewer.

Don't forget to add the schema queue in the other files (such as docker-compose.production.yml) as this could cause some bugs in the future 😁

I've added the schemas queue in a couple of other spots as you suggested. However, I was hesitant at first to add it since the schemas queue was used in the code prior to this PR but it did not appear in any of the docker files. So without knowing the use cases of all the docker files, it's hard to tell if it's required in them. Though, I'm sure it doesn't hurt to add them. Perhaps @arikfr might have more insight into this.

@arikfr
Copy link
Member

arikfr commented Jan 23, 2019

I will do a review of all the Docker Compose files and add schemas where needed.

I do realize now that everyone who are using the AMIs we build, use a Docker Compose setup without this queue. Which means that: 1) this queue is growing in size, but nothing is processing it; 2) they don't get schema refreshes. 🤦‍♂️

@arikfr
Copy link
Member

arikfr commented Jan 23, 2019

Change of plans: #3325.

@@ -198,21 +198,25 @@ def delete(self):
return res

def get_schema(self, refresh=False):
key = "data_source:schema:{}".format(self.id)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is removing this redis caching of schema information intentional? Is there a performance impact?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for pointing this out @washort!

It was intentional because from what I recall back in the Berlin work-week, I think @arikfr was saying he felt that using redis to store schema was a bit of a hack and he would prefer it stored in a table. Of course, we could be storing the data in tables and have additional caching for performance, but I felt this added complexity of maintaining both a cache and tables for the same data was perhaps not worth the performance gain.

I did a quick test on my machine and with 5 runs of the old vs. the new get_schema() function, the redis one averages 7.2ms per call and this one (from this pr) averages 44ms per call. It's a big relative difference, but 44ms isn't so bad. Though of course this could be worse in different scenarios, e.g. slower network/machine or more data. I suppose I will defer this decision to @arikfr

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great, just curious.

@@ -67,6 +67,56 @@ def get(self, query_id):
scheduled_queries_executions = ScheduledQueriesExecutions()


@python_2_unicode_compatible
class TableMetadata(db.Model):
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you'll need a migration to create these tables

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah! I had missed this, thank you!

@ghost ghost added the in progress label Jan 24, 2019

for row in results['rows']:
for i, row in enumerate(results['rows']):
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar blocks of code found in 2 locations. Consider refactoring.

for row in results['rows']:
table_samples = {}

for i, row in enumerate(results['rows']):
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar blocks of code found in 2 locations. Consider refactoring.

persisted_table = models.db.session.query(
TableMetadata).filter(
TableMetadata.table_name==table_name).filter(
TableMetadata.data_source_id==ds.id).first()
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing whitespace around operator

redash/models/__init__.py Show resolved Hide resolved
jezdez pushed a commit to mozilla/redash that referenced this pull request Aug 19, 2019
* Process extra column metadata for a few sql-based data sources.

* Add Table and Column metadata tables.

* Periodically update table and column schema tables in a celery task.

* Fetching schema returns data from table and column metadata tables.

* Add tests for backend changes.

* Front-end shows extra table metadata and uses new schema response.

* Delete datasource schema data when deleting a data source.

* Process and store data source schema when a data source is first created or after a migration.

* Tables should have a unique name per datasource.

* Addressing review comments.

* Update migration file for mixins.

* Appease PEP8

* Upgrade migration file for rebase.

* Cascade delete.

* Adding org_id

* Remove redundant column and table prefixes.

* Non-existing tables and columns should be filtered out on the server side not client side.

* Fetching table samples should be optional and should happen in a separate task per table.

* Allow users to force a schema refresh.

* Use updated_at to help prune old schema metadata periodically.

* Using settings.SCHEMAS_REFRESH_QUEUE

* fix for getredash#2426 test

* more stable test_interactive_new

* Closes #927, #928: Schema refresh improvements.

* Closes #934, #935: Remove type from schema browser and don't show empty example column in schema drawer (#936)

* Speed up schema fetch requests with fewer postgres queries.

* Add column metadata to Athena glue processing.

* Fix bug assuming 'metadata' exists for every table.

* Closes #939: Persisted, existing table metadata should be updated.

* Sample processing should be rate-limited.

* Add cli command for refreshing data samples.

* Schema refreshes should not overwrite column 'example' field.

* refresh_samples() should filter tables_to_sample on the datasource's id being sampled

* Correctly wrap long text in schema drawer.

Co-authored-by: Alison <github@bankofknowledge.net>

Schema Improvements Part 2: Add data source config options.

Adding BigQuery schema drawer with data types and samples.
washort pushed a commit to mozilla/redash that referenced this pull request Sep 16, 2019
* Process extra column metadata for a few sql-based data sources.

* Add Table and Column metadata tables.

* Periodically update table and column schema tables in a celery task.

* Fetching schema returns data from table and column metadata tables.

* Add tests for backend changes.

* Front-end shows extra table metadata and uses new schema response.

* Delete datasource schema data when deleting a data source.

* Process and store data source schema when a data source is first created or after a migration.

* Tables should have a unique name per datasource.

* Addressing review comments.

* Update migration file for mixins.

* Appease PEP8

* Upgrade migration file for rebase.

* Cascade delete.

* Adding org_id

* Remove redundant column and table prefixes.

* Non-existing tables and columns should be filtered out on the server side not client side.

* Fetching table samples should be optional and should happen in a separate task per table.

* Allow users to force a schema refresh.

* Use updated_at to help prune old schema metadata periodically.

* Using settings.SCHEMAS_REFRESH_QUEUE

* fix for getredash#2426 test

* more stable test_interactive_new

* Closes #927, #928: Schema refresh improvements.

* Closes #934, #935: Remove type from schema browser and don't show empty example column in schema drawer (#936)

* Speed up schema fetch requests with fewer postgres queries.

* Add column metadata to Athena glue processing.

* Fix bug assuming 'metadata' exists for every table.

* Closes #939: Persisted, existing table metadata should be updated.

* Sample processing should be rate-limited.

* Add cli command for refreshing data samples.

* Schema refreshes should not overwrite column 'example' field.

* refresh_samples() should filter tables_to_sample on the datasource's id being sampled

* Correctly wrap long text in schema drawer.

Co-authored-by: Alison <github@bankofknowledge.net>

Schema Improvements Part 2: Add data source config options.

Adding BigQuery schema drawer with data types and samples.
washort pushed a commit to mozilla/redash that referenced this pull request Sep 17, 2019
* Process extra column metadata for a few sql-based data sources.

* Add Table and Column metadata tables.

* Periodically update table and column schema tables in a celery task.

* Fetching schema returns data from table and column metadata tables.

* Add tests for backend changes.

* Front-end shows extra table metadata and uses new schema response.

* Delete datasource schema data when deleting a data source.

* Process and store data source schema when a data source is first created or after a migration.

* Tables should have a unique name per datasource.

* Addressing review comments.

* Update migration file for mixins.

* Appease PEP8

* Upgrade migration file for rebase.

* Cascade delete.

* Adding org_id

* Remove redundant column and table prefixes.

* Non-existing tables and columns should be filtered out on the server side not client side.

* Fetching table samples should be optional and should happen in a separate task per table.

* Allow users to force a schema refresh.

* Use updated_at to help prune old schema metadata periodically.

* Using settings.SCHEMAS_REFRESH_QUEUE

* fix for getredash#2426 test

* more stable test_interactive_new

* Closes #927, #928: Schema refresh improvements.

* Closes #934, #935: Remove type from schema browser and don't show empty example column in schema drawer (#936)

* Speed up schema fetch requests with fewer postgres queries.

* Add column metadata to Athena glue processing.

* Fix bug assuming 'metadata' exists for every table.

* Closes #939: Persisted, existing table metadata should be updated.

* Sample processing should be rate-limited.

* Add cli command for refreshing data samples.

* Schema refreshes should not overwrite column 'example' field.

* refresh_samples() should filter tables_to_sample on the datasource's id being sampled

* Correctly wrap long text in schema drawer.

Co-authored-by: Alison <github@bankofknowledge.net>

Schema Improvements Part 2: Add data source config options.

Adding BigQuery schema drawer with data types and samples.
emtwo pushed a commit to mozilla/redash that referenced this pull request Nov 5, 2019
* Process extra column metadata for a few sql-based data sources.

* Add Table and Column metadata tables.

* Periodically update table and column schema tables in a celery task.

* Fetching schema returns data from table and column metadata tables.

* Add tests for backend changes.

* Front-end shows extra table metadata and uses new schema response.

* Delete datasource schema data when deleting a data source.

* Process and store data source schema when a data source is first created or after a migration.

* Tables should have a unique name per datasource.

* Addressing review comments.

* Update migration file for mixins.

* Appease PEP8

* Upgrade migration file for rebase.

* Cascade delete.

* Adding org_id

* Remove redundant column and table prefixes.

* Non-existing tables and columns should be filtered out on the server side not client side.

* Fetching table samples should be optional and should happen in a separate task per table.

* Allow users to force a schema refresh.

* Use updated_at to help prune old schema metadata periodically.

* Using settings.SCHEMAS_REFRESH_QUEUE

* fix for getredash#2426 test

* more stable test_interactive_new

* Closes #927, #928: Schema refresh improvements.

* Closes #934, #935: Remove type from schema browser and don't show empty example column in schema drawer (#936)

* Speed up schema fetch requests with fewer postgres queries.

* Add column metadata to Athena glue processing.

* Fix bug assuming 'metadata' exists for every table.

* Closes #939: Persisted, existing table metadata should be updated.

* Sample processing should be rate-limited.

* Add cli command for refreshing data samples.

* Schema refreshes should not overwrite column 'example' field.

* refresh_samples() should filter tables_to_sample on the datasource's id being sampled

* Correctly wrap long text in schema drawer.

Co-authored-by: Alison <github@bankofknowledge.net>

Schema Improvements Part 2: Add data source config options.

Adding BigQuery schema drawer with data types and samples.
jezdez pushed a commit to mozilla/redash that referenced this pull request Jan 22, 2020
* Process extra column metadata for a few sql-based data sources.

* Add Table and Column metadata tables.

* Periodically update table and column schema tables in a celery task.

* Fetching schema returns data from table and column metadata tables.

* Add tests for backend changes.

* Front-end shows extra table metadata and uses new schema response.

* Delete datasource schema data when deleting a data source.

* Process and store data source schema when a data source is first created or after a migration.

* Tables should have a unique name per datasource.

* Addressing review comments.

* Update migration file for mixins.

* Appease PEP8

* Upgrade migration file for rebase.

* Cascade delete.

* Adding org_id

* Remove redundant column and table prefixes.

* Non-existing tables and columns should be filtered out on the server side not client side.

* Fetching table samples should be optional and should happen in a separate task per table.

* Allow users to force a schema refresh.

* Use updated_at to help prune old schema metadata periodically.

* Using settings.SCHEMAS_REFRESH_QUEUE

* fix for getredash#2426 test

* more stable test_interactive_new

* Closes #927, #928: Schema refresh improvements.

* Closes #934, #935: Remove type from schema browser and don't show empty example column in schema drawer (#936)

* Speed up schema fetch requests with fewer postgres queries.

* Add column metadata to Athena glue processing.

* Fix bug assuming 'metadata' exists for every table.

* Closes #939: Persisted, existing table metadata should be updated.

* Sample processing should be rate-limited.

* Add cli command for refreshing data samples.

* Schema refreshes should not overwrite column 'example' field.

* refresh_samples() should filter tables_to_sample on the datasource's id being sampled

* Correctly wrap long text in schema drawer.

Schema Improvements Part 2: Add data source config options.

Adding BigQuery schema drawer with data types and samples.

Add empty migration to replace the removed schedule_until migration

Add merge migration.

Co-authored-by: Alison <github@bankofknowledge.net>
jezdez pushed a commit to mozilla/redash that referenced this pull request Feb 5, 2020
* Process extra column metadata for a few sql-based data sources.

* Add Table and Column metadata tables.

* Periodically update table and column schema tables in a celery task.

* Fetching schema returns data from table and column metadata tables.

* Add tests for backend changes.

* Front-end shows extra table metadata and uses new schema response.

* Delete datasource schema data when deleting a data source.

* Process and store data source schema when a data source is first created or after a migration.

* Tables should have a unique name per datasource.

* Addressing review comments.

* Update migration file for mixins.

* Appease PEP8

* Upgrade migration file for rebase.

* Cascade delete.

* Adding org_id

* Remove redundant column and table prefixes.

* Non-existing tables and columns should be filtered out on the server side not client side.

* Fetching table samples should be optional and should happen in a separate task per table.

* Allow users to force a schema refresh.

* Use updated_at to help prune old schema metadata periodically.

* Using settings.SCHEMAS_REFRESH_QUEUE

* fix for getredash#2426 test

* more stable test_interactive_new

* Closes #927, #928: Schema refresh improvements.

* Closes #934, #935: Remove type from schema browser and don't show empty example column in schema drawer (#936)

* Speed up schema fetch requests with fewer postgres queries.

* Add column metadata to Athena glue processing.

* Fix bug assuming 'metadata' exists for every table.

* Closes #939: Persisted, existing table metadata should be updated.

* Sample processing should be rate-limited.

* Add cli command for refreshing data samples.

* Schema refreshes should not overwrite column 'example' field.

* refresh_samples() should filter tables_to_sample on the datasource's id being sampled

* Correctly wrap long text in schema drawer.

Schema Improvements Part 2: Add data source config options.

Adding BigQuery schema drawer with data types and samples.

Add empty migration to replace the removed schedule_until migration

Add merge migration.

Co-authored-by: Alison <github@bankofknowledge.net>
jezdez added a commit to mozilla/redash that referenced this pull request May 4, 2020
* Process extra column metadata for a few sql-based data sources.

* Add Table and Column metadata tables.

* Periodically update table and column schema tables in a celery task.

* Fetching schema returns data from table and column metadata tables.

* Add tests for backend changes.

* Front-end shows extra table metadata and uses new schema response.

* Delete datasource schema data when deleting a data source.

* Process and store data source schema when a data source is first created or after a migration.

* Tables should have a unique name per datasource.

* Addressing review comments.

* Update migration file for mixins.

* Appease PEP8

* Upgrade migration file for rebase.

* Cascade delete.

* Adding org_id

* Remove redundant column and table prefixes.

* Non-existing tables and columns should be filtered out on the server side not client side.

* Fetching table samples should be optional and should happen in a separate task per table.

* Allow users to force a schema refresh.

* Use updated_at to help prune old schema metadata periodically.

* Using settings.SCHEMAS_REFRESH_QUEUE

* fix for getredash#2426 test

* more stable test_interactive_new

* Closes #927, #928: Schema refresh improvements.

* Closes #934, #935: Remove type from schema browser and don't show empty example column in schema drawer (#936)

* Speed up schema fetch requests with fewer postgres queries.

* Add column metadata to Athena glue processing.

* Fix bug assuming 'metadata' exists for every table.

* Closes #939: Persisted, existing table metadata should be updated.

* Sample processing should be rate-limited.

* Add cli command for refreshing data samples.

* Schema refreshes should not overwrite column 'example' field.

* refresh_samples() should filter tables_to_sample on the datasource's id being sampled

* Correctly wrap long text in schema drawer.

Schema Improvements Part 2: Add data source config options.

Adding BigQuery schema drawer with data types and samples.

Add empty migration to replace the removed schedule_until migration

Add merge migration.

Co-authored-by: Alison <github@bankofknowledge.net>
Co-authored-by: Jannis Leidel <jannis@leidel.info>
jezdez added a commit to mozilla/redash that referenced this pull request May 14, 2020
* Process extra column metadata for a few sql-based data sources.

* Add Table and Column metadata tables.

* Periodically update table and column schema tables in a celery task.

* Fetching schema returns data from table and column metadata tables.

* Add tests for backend changes.

* Front-end shows extra table metadata and uses new schema response.

* Delete datasource schema data when deleting a data source.

* Process and store data source schema when a data source is first created or after a migration.

* Tables should have a unique name per datasource.

* Addressing review comments.

* Update migration file for mixins.

* Appease PEP8

* Upgrade migration file for rebase.

* Cascade delete.

* Adding org_id

* Remove redundant column and table prefixes.

* Non-existing tables and columns should be filtered out on the server side not client side.

* Fetching table samples should be optional and should happen in a separate task per table.

* Allow users to force a schema refresh.

* Use updated_at to help prune old schema metadata periodically.

* Using settings.SCHEMAS_REFRESH_QUEUE

* fix for getredash#2426 test

* more stable test_interactive_new

* Closes #927, #928: Schema refresh improvements.

* Closes #934, #935: Remove type from schema browser and don't show empty example column in schema drawer (#936)

* Speed up schema fetch requests with fewer postgres queries.

* Add column metadata to Athena glue processing.

* Fix bug assuming 'metadata' exists for every table.

* Closes #939: Persisted, existing table metadata should be updated.

* Sample processing should be rate-limited.

* Add cli command for refreshing data samples.

* Schema refreshes should not overwrite column 'example' field.

* refresh_samples() should filter tables_to_sample on the datasource's id being sampled

* Correctly wrap long text in schema drawer.

Schema Improvements Part 2: Add data source config options.

Adding BigQuery schema drawer with data types and samples.

Add empty migration to replace the removed schedule_until migration

Add merge migration.

Fix spacing issue with data scanned value in query execution metadata.

Increase schema refresh timeout.

Co-authored-by: Alison <github@bankofknowledge.net>
Co-authored-by: Jannis Leidel <jannis@leidel.info>
robhudson pushed a commit to mozilla/redash that referenced this pull request Jun 11, 2020
* Process extra column metadata for a few sql-based data sources.

* Add Table and Column metadata tables.

* Periodically update table and column schema tables in a celery task.

* Fetching schema returns data from table and column metadata tables.

* Add tests for backend changes.

* Front-end shows extra table metadata and uses new schema response.

* Delete datasource schema data when deleting a data source.

* Process and store data source schema when a data source is first created or after a migration.

* Tables should have a unique name per datasource.

* Addressing review comments.

* Update migration file for mixins.

* Appease PEP8

* Upgrade migration file for rebase.

* Cascade delete.

* Adding org_id

* Remove redundant column and table prefixes.

* Non-existing tables and columns should be filtered out on the server side not client side.

* Fetching table samples should be optional and should happen in a separate task per table.

* Allow users to force a schema refresh.

* Use updated_at to help prune old schema metadata periodically.

* Using settings.SCHEMAS_REFRESH_QUEUE

* fix for getredash#2426 test

* more stable test_interactive_new

* Closes #927, #928: Schema refresh improvements.

* Closes #934, #935: Remove type from schema browser and don't show empty example column in schema drawer (#936)

* Speed up schema fetch requests with fewer postgres queries.

* Add column metadata to Athena glue processing.

* Fix bug assuming 'metadata' exists for every table.

* Closes #939: Persisted, existing table metadata should be updated.

* Sample processing should be rate-limited.

* Add cli command for refreshing data samples.

* Schema refreshes should not overwrite column 'example' field.

* refresh_samples() should filter tables_to_sample on the datasource's id being sampled

* Correctly wrap long text in schema drawer.

Schema Improvements Part 2: Add data source config options.

Adding BigQuery schema drawer with data types and samples.

Add empty migration to replace the removed schedule_until migration

Add merge migration.

Fix spacing issue with data scanned value in query execution metadata.

Increase schema refresh timeout.

Remove old migrations.

Co-authored-by: Alison <github@bankofknowledge.net>
Co-authored-by: Jannis Leidel <jannis@leidel.info>
emtwo pushed a commit to mozilla/redash that referenced this pull request Jul 15, 2020
* Process extra column metadata for a few sql-based data sources.

* Add Table and Column metadata tables.

* Periodically update table and column schema tables in a celery task.

* Fetching schema returns data from table and column metadata tables.

* Add tests for backend changes.

* Front-end shows extra table metadata and uses new schema response.

* Delete datasource schema data when deleting a data source.

* Process and store data source schema when a data source is first created or after a migration.

* Tables should have a unique name per datasource.

* Addressing review comments.

* Update migration file for mixins.

* Appease PEP8

* Upgrade migration file for rebase.

* Cascade delete.

* Adding org_id

* Remove redundant column and table prefixes.

* Non-existing tables and columns should be filtered out on the server side not client side.

* Fetching table samples should be optional and should happen in a separate task per table.

* Allow users to force a schema refresh.

* Use updated_at to help prune old schema metadata periodically.

* Using settings.SCHEMAS_REFRESH_QUEUE

* fix for getredash#2426 test

* more stable test_interactive_new

* Closes #927, #928: Schema refresh improvements.

* Closes #934, #935: Remove type from schema browser and don't show empty example column in schema drawer (#936)

* Speed up schema fetch requests with fewer postgres queries.

* Add column metadata to Athena glue processing.

* Fix bug assuming 'metadata' exists for every table.

* Closes #939: Persisted, existing table metadata should be updated.

* Sample processing should be rate-limited.

* Add cli command for refreshing data samples.

* Schema refreshes should not overwrite column 'example' field.

* refresh_samples() should filter tables_to_sample on the datasource's id being sampled

* Correctly wrap long text in schema drawer.

Schema Improvements Part 2: Add data source config options.

Adding BigQuery schema drawer with data types and samples.

Add empty migration to replace the removed schedule_until migration

Add merge migration.

Fix spacing issue with data scanned value in query execution metadata.

Increase schema refresh timeout.

Remove old migrations.

Co-authored-by: Alison <github@bankofknowledge.net>
Co-authored-by: Jannis Leidel <jannis@leidel.info>
jezdez added a commit to mozilla/redash that referenced this pull request Oct 15, 2020
* Process extra column metadata for a few sql-based data sources.

* Add Table and Column metadata tables.

* Periodically update table and column schema tables in a celery task.

* Fetching schema returns data from table and column metadata tables.

* Add tests for backend changes.

* Front-end shows extra table metadata and uses new schema response.

* Delete datasource schema data when deleting a data source.

* Process and store data source schema when a data source is first created or after a migration.

* Tables should have a unique name per datasource.

* Addressing review comments.

* Update migration file for mixins.

* Appease PEP8

* Upgrade migration file for rebase.

* Cascade delete.

* Adding org_id

* Remove redundant column and table prefixes.

* Non-existing tables and columns should be filtered out on the server side not client side.

* Fetching table samples should be optional and should happen in a separate task per table.

* Allow users to force a schema refresh.

* Use updated_at to help prune old schema metadata periodically.

* Using settings.SCHEMAS_REFRESH_QUEUE

* fix for getredash#2426 test

* more stable test_interactive_new

* Closes #927, #928: Schema refresh improvements.

* Closes #934, #935: Remove type from schema browser and don't show empty example column in schema drawer (#936)

* Speed up schema fetch requests with fewer postgres queries.

* Add column metadata to Athena glue processing.

* Fix bug assuming 'metadata' exists for every table.

* Closes #939: Persisted, existing table metadata should be updated.

* Sample processing should be rate-limited.

* Add cli command for refreshing data samples.

* Schema refreshes should not overwrite column 'example' field.

* refresh_samples() should filter tables_to_sample on the datasource's id being sampled

* Correctly wrap long text in schema drawer.

Schema Improvements Part 2: Add data source config options.

Adding BigQuery schema drawer with data types and samples.

Add empty migration to replace the removed schedule_until migration

Add merge migration.

Fix spacing issue with data scanned value in query execution metadata.

Increase schema refresh timeout.

Remove old migrations.

Co-authored-by: Alison <github@bankofknowledge.net>
Co-authored-by: Jannis Leidel <jannis@leidel.info>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants