Schema Viewer Drawer #3291

emtwo · 2019-01-16T15:08:13Z

This is a fresh PR with the code from #2990 rebased and linted.

It is ready for review now. This PR is the first of a series of PRs for schema enhancements. I will link the subsequent PRs here as they become available.

[1] Schema viewer drawer #3291 (this one)
[2] Schema admin configuration #3292
[3] Schema query samples #3293
[4] Data source descriptions #3401

emtwo · 2019-01-16T15:41:32Z

Note: the way that schema updates work now is through a periodic celery task that runs the queries to get column names and types etc. The results are stored in the new schema tables. Whenever the schema is fetched from the UI, it just directly queries the data in these tables.

Since the schema is set to refresh only every 30 min (https://github.com/getredash/redash/blob/master/redash/settings/__init__.py#L48), this is likely why the percy/redash visual error shows up.

We can either increase the frequency of schema update (quicker option, but not as good) or have a one-off schema refresh that is done on init so that the schema is available. I'll look into the latter.

gabrieldutra · 2019-01-21T22:01:57Z

cypress/integration/query/create_query_spec.js

    cy.login();
+    cy.request({


I've created db-seed.js with this purpose 🤔, so this kind of dependency could be created by using npm run cypress db-seed prior to all tests and this would be avoided among then:

// create_query_spec.js - a few upper lines that were not shown const pg = { name: 'test', options: { dbname: 'postgres', host: 'postgres', password: 'postgres', user: 'postgres', }, type: 'pg', };

LMK what you think haha

PS: if you are just testing, ignore this 😅

@gabrieldutra I was in fact, just testing. Though I could use some help. I cannot reproduce this percy issue locally that shows up here. In fact, when I run the create_query_spec.js test on master locally, the DOM snapshots seem to be missing the shema data (included screenshot below) And on the other hand, the snapshot for this PR seems to show the schema locally (screenshot also included below).

Any idea what might be going on here or how I can reproduce this?

Screenshot from master

Screenshot from this PR:

I have faced some issues when I was doing changes in frontend and handling Cypress in development mode. The frontend code seems not to be shared within the docker container. I'll handle this further, but a quick fix to make it respond properly is to, after start cypress server just like you did, run npm run start for webpack development server and open cypress with CYPRESS_baseUrl=http://localhost:8080 npm run cypress open

Also, I don't know if it's related, but I noticed the Chinook data source is not showing schema info in the preview.

I'll try to reproduce this locally and give you some help with Percy anyway

Edit: the Chinook issue is probably related to the missing schema queue in one of the files (docker-compose.production.yml perhaps)

I have faced some issues when I was doing changes in frontend and handling Cypress in development mode. The frontend code seems not to be shared within the docker container.

Does the Cypress Docker Compose configuration use VOLUMEs?

Cypress is using docker-compose.cypress.yml when in CI and the development docker-compose.yml when not.
Edit: Forgot to mention about the volumes haha, but the first one doesn't use and the second one does.

However it uses http://localhost:5000, which I guess doesn't use webpack to watch files, so frontend in this case only updates after a rebuild. The two options I see to make it friendlier to the developer would be either adding a npm run start to a frontend container in docker-compose.yml or adding this outside docker in cypress scripts.

I think we should add to the instructions to run npm run build before running Cypress tests. Running npm in the container is not possible, because the container will not have Node (currently it does, but it's a temporary thing).

gabrieldutra · 2019-01-21T23:23:34Z

docker-compose.yml

@@ -29,7 +29,7 @@ services:
      REDASH_LOG_LEVEL: "INFO"
      REDASH_REDIS_URL: "redis://redis:6379/0"
      REDASH_DATABASE_URL: "postgresql://postgres@postgres/postgres"
-      QUEUES: "queries,scheduled_queries,celery"
+      QUEUES: "queries,scheduled_queries,celery,schemas"


Got it haha, there are other docker-compose.yml files inside .circleci, just add this to them and Percy should do fine 🚀

Edit: Only docker-compose.cypress.yml affects the Percy screenshots
Edit2: There are probably other files where this may be necessary:

Aweosme! Updating docker-compose.cypress.yml did the trick! I didn't realize cypress had its own yml file. Thank you for your help @gabrieldutra!

You're welcome @emtwo! 🙂

emtwo · 2019-01-22T15:39:41Z

I've rebased the PR again and the original percy issue is fixed. Note that currently percy is failing for an expected reason - there are 2 new tables added - column_metadata and table_metadata that show up in the schema viewer.

gabrieldutra · 2019-01-22T17:05:05Z

I've rebased the PR again and the original percy issue is fixed. Note that currently percy is failing for an expected reason - there are 2 new tables added - column_metadata and table_metadata that show up in the schema viewer.

Don't forget to add the schema queue in the other files (such as docker-compose.production.yml) as this could cause some bugs in the future 😁

emtwo · 2019-01-22T17:49:03Z

I've rebased the PR again and the original percy issue is fixed. Note that currently percy is failing for an expected reason - there are 2 new tables added - column_metadata and table_metadata that show up in the schema viewer.

Don't forget to add the schema queue in the other files (such as docker-compose.production.yml) as this could cause some bugs in the future 😁

I've added the schemas queue in a couple of other spots as you suggested. However, I was hesitant at first to add it since the schemas queue was used in the code prior to this PR but it did not appear in any of the docker files. So without knowing the use cases of all the docker files, it's hard to tell if it's required in them. Though, I'm sure it doesn't hurt to add them. Perhaps @arikfr might have more insight into this.

arikfr · 2019-01-23T07:34:43Z

I will do a review of all the Docker Compose files and add schemas where needed.

I do realize now that everyone who are using the AMIs we build, use a Docker Compose setup without this queue. Which means that: 1) this queue is growing in size, but nothing is processing it; 2) they don't get schema refreshes. 🤦‍♂️

arikfr · 2019-01-23T07:36:50Z

Change of plans: #3325.

washort · 2019-01-23T19:29:22Z

redash/models/__init__.py

@@ -198,21 +198,25 @@ def delete(self):
        return res

    def get_schema(self, refresh=False):
-        key = "data_source:schema:{}".format(self.id)


Is removing this redis caching of schema information intentional? Is there a performance impact?

Thanks for pointing this out @washort!

It was intentional because from what I recall back in the Berlin work-week, I think @arikfr was saying he felt that using redis to store schema was a bit of a hack and he would prefer it stored in a table. Of course, we could be storing the data in tables and have additional caching for performance, but I felt this added complexity of maintaining both a cache and tables for the same data was perhaps not worth the performance gain.

I did a quick test on my machine and with 5 runs of the old vs. the new get_schema() function, the redis one averages 7.2ms per call and this one (from this pr) averages 44ms per call. It's a big relative difference, but 44ms isn't so bad. Though of course this could be worse in different scenarios, e.g. slower network/machine or more data. I suppose I will defer this decision to @arikfr

Great, just curious.

washort · 2019-01-23T19:35:26Z

redash/models/__init__.py

@@ -67,6 +67,56 @@ def get(self, query_id):
 scheduled_queries_executions = ScheduledQueriesExecutions()


+@python_2_unicode_compatible
+class TableMetadata(db.Model):


you'll need a migration to create these tables

Ah! I had missed this, thank you!

codeclimate · 2019-01-30T19:48:50Z

redash/query_runner/presto.py


-        for row in results['rows']:
+        for i, row in enumerate(results['rows']):


Similar blocks of code found in 2 locations. Consider refactoring.

codeclimate · 2019-01-30T19:48:50Z

redash/query_runner/athena.py

-        for row in results['rows']:
+        table_samples = {}
+
+        for i, row in enumerate(results['rows']):


Similar blocks of code found in 2 locations. Consider refactoring.

codeclimate · 2019-01-30T19:48:50Z

redash/tasks/queries.py

+            persisted_table = models.db.session.query(
+                TableMetadata).filter(
+                TableMetadata.table_name==table_name).filter(
+                TableMetadata.data_source_id==ds.id).first()


Missing whitespace around operator

redash/models/__init__.py

* Process extra column metadata for a few sql-based data sources. * Add Table and Column metadata tables. * Periodically update table and column schema tables in a celery task. * Fetching schema returns data from table and column metadata tables. * Add tests for backend changes. * Front-end shows extra table metadata and uses new schema response. * Delete datasource schema data when deleting a data source. * Process and store data source schema when a data source is first created or after a migration. * Tables should have a unique name per datasource. * Addressing review comments. * Update migration file for mixins. * Appease PEP8 * Upgrade migration file for rebase. * Cascade delete. * Adding org_id * Remove redundant column and table prefixes. * Non-existing tables and columns should be filtered out on the server side not client side. * Fetching table samples should be optional and should happen in a separate task per table. * Allow users to force a schema refresh. * Use updated_at to help prune old schema metadata periodically. * Using settings.SCHEMAS_REFRESH_QUEUE * fix for getredash#2426 test * more stable test_interactive_new * Closes #927, #928: Schema refresh improvements. * Closes #934, #935: Remove type from schema browser and don't show empty example column in schema drawer (#936) * Speed up schema fetch requests with fewer postgres queries. * Add column metadata to Athena glue processing. * Fix bug assuming 'metadata' exists for every table. * Closes #939: Persisted, existing table metadata should be updated. * Sample processing should be rate-limited. * Add cli command for refreshing data samples. * Schema refreshes should not overwrite column 'example' field. * refresh_samples() should filter tables_to_sample on the datasource's id being sampled * Correctly wrap long text in schema drawer. Co-authored-by: Alison <github@bankofknowledge.net> Schema Improvements Part 2: Add data source config options. Adding BigQuery schema drawer with data types and samples.

* Process extra column metadata for a few sql-based data sources. * Add Table and Column metadata tables. * Periodically update table and column schema tables in a celery task. * Fetching schema returns data from table and column metadata tables. * Add tests for backend changes. * Front-end shows extra table metadata and uses new schema response. * Delete datasource schema data when deleting a data source. * Process and store data source schema when a data source is first created or after a migration. * Tables should have a unique name per datasource. * Addressing review comments. * Update migration file for mixins. * Appease PEP8 * Upgrade migration file for rebase. * Cascade delete. * Adding org_id * Remove redundant column and table prefixes. * Non-existing tables and columns should be filtered out on the server side not client side. * Fetching table samples should be optional and should happen in a separate task per table. * Allow users to force a schema refresh. * Use updated_at to help prune old schema metadata periodically. * Using settings.SCHEMAS_REFRESH_QUEUE * fix for getredash#2426 test * more stable test_interactive_new * Closes #927, #928: Schema refresh improvements. * Closes #934, #935: Remove type from schema browser and don't show empty example column in schema drawer (#936) * Speed up schema fetch requests with fewer postgres queries. * Add column metadata to Athena glue processing. * Fix bug assuming 'metadata' exists for every table. * Closes #939: Persisted, existing table metadata should be updated. * Sample processing should be rate-limited. * Add cli command for refreshing data samples. * Schema refreshes should not overwrite column 'example' field. * refresh_samples() should filter tables_to_sample on the datasource's id being sampled * Correctly wrap long text in schema drawer. Schema Improvements Part 2: Add data source config options. Adding BigQuery schema drawer with data types and samples. Add empty migration to replace the removed schedule_until migration Add merge migration. Co-authored-by: Alison <github@bankofknowledge.net>

* Process extra column metadata for a few sql-based data sources. * Add Table and Column metadata tables. * Periodically update table and column schema tables in a celery task. * Fetching schema returns data from table and column metadata tables. * Add tests for backend changes. * Front-end shows extra table metadata and uses new schema response. * Delete datasource schema data when deleting a data source. * Process and store data source schema when a data source is first created or after a migration. * Tables should have a unique name per datasource. * Addressing review comments. * Update migration file for mixins. * Appease PEP8 * Upgrade migration file for rebase. * Cascade delete. * Adding org_id * Remove redundant column and table prefixes. * Non-existing tables and columns should be filtered out on the server side not client side. * Fetching table samples should be optional and should happen in a separate task per table. * Allow users to force a schema refresh. * Use updated_at to help prune old schema metadata periodically. * Using settings.SCHEMAS_REFRESH_QUEUE * fix for getredash#2426 test * more stable test_interactive_new * Closes #927, #928: Schema refresh improvements. * Closes #934, #935: Remove type from schema browser and don't show empty example column in schema drawer (#936) * Speed up schema fetch requests with fewer postgres queries. * Add column metadata to Athena glue processing. * Fix bug assuming 'metadata' exists for every table. * Closes #939: Persisted, existing table metadata should be updated. * Sample processing should be rate-limited. * Add cli command for refreshing data samples. * Schema refreshes should not overwrite column 'example' field. * refresh_samples() should filter tables_to_sample on the datasource's id being sampled * Correctly wrap long text in schema drawer. Schema Improvements Part 2: Add data source config options. Adding BigQuery schema drawer with data types and samples. Add empty migration to replace the removed schedule_until migration Add merge migration. Co-authored-by: Alison <github@bankofknowledge.net> Co-authored-by: Jannis Leidel <jannis@leidel.info>

* Process extra column metadata for a few sql-based data sources. * Add Table and Column metadata tables. * Periodically update table and column schema tables in a celery task. * Fetching schema returns data from table and column metadata tables. * Add tests for backend changes. * Front-end shows extra table metadata and uses new schema response. * Delete datasource schema data when deleting a data source. * Process and store data source schema when a data source is first created or after a migration. * Tables should have a unique name per datasource. * Addressing review comments. * Update migration file for mixins. * Appease PEP8 * Upgrade migration file for rebase. * Cascade delete. * Adding org_id * Remove redundant column and table prefixes. * Non-existing tables and columns should be filtered out on the server side not client side. * Fetching table samples should be optional and should happen in a separate task per table. * Allow users to force a schema refresh. * Use updated_at to help prune old schema metadata periodically. * Using settings.SCHEMAS_REFRESH_QUEUE * fix for getredash#2426 test * more stable test_interactive_new * Closes #927, #928: Schema refresh improvements. * Closes #934, #935: Remove type from schema browser and don't show empty example column in schema drawer (#936) * Speed up schema fetch requests with fewer postgres queries. * Add column metadata to Athena glue processing. * Fix bug assuming 'metadata' exists for every table. * Closes #939: Persisted, existing table metadata should be updated. * Sample processing should be rate-limited. * Add cli command for refreshing data samples. * Schema refreshes should not overwrite column 'example' field. * refresh_samples() should filter tables_to_sample on the datasource's id being sampled * Correctly wrap long text in schema drawer. Schema Improvements Part 2: Add data source config options. Adding BigQuery schema drawer with data types and samples. Add empty migration to replace the removed schedule_until migration Add merge migration. Fix spacing issue with data scanned value in query execution metadata. Increase schema refresh timeout. Co-authored-by: Alison <github@bankofknowledge.net> Co-authored-by: Jannis Leidel <jannis@leidel.info>

* Process extra column metadata for a few sql-based data sources. * Add Table and Column metadata tables. * Periodically update table and column schema tables in a celery task. * Fetching schema returns data from table and column metadata tables. * Add tests for backend changes. * Front-end shows extra table metadata and uses new schema response. * Delete datasource schema data when deleting a data source. * Process and store data source schema when a data source is first created or after a migration. * Tables should have a unique name per datasource. * Addressing review comments. * Update migration file for mixins. * Appease PEP8 * Upgrade migration file for rebase. * Cascade delete. * Adding org_id * Remove redundant column and table prefixes. * Non-existing tables and columns should be filtered out on the server side not client side. * Fetching table samples should be optional and should happen in a separate task per table. * Allow users to force a schema refresh. * Use updated_at to help prune old schema metadata periodically. * Using settings.SCHEMAS_REFRESH_QUEUE * fix for getredash#2426 test * more stable test_interactive_new * Closes #927, #928: Schema refresh improvements. * Closes #934, #935: Remove type from schema browser and don't show empty example column in schema drawer (#936) * Speed up schema fetch requests with fewer postgres queries. * Add column metadata to Athena glue processing. * Fix bug assuming 'metadata' exists for every table. * Closes #939: Persisted, existing table metadata should be updated. * Sample processing should be rate-limited. * Add cli command for refreshing data samples. * Schema refreshes should not overwrite column 'example' field. * refresh_samples() should filter tables_to_sample on the datasource's id being sampled * Correctly wrap long text in schema drawer. Schema Improvements Part 2: Add data source config options. Adding BigQuery schema drawer with data types and samples. Add empty migration to replace the removed schedule_until migration Add merge migration. Fix spacing issue with data scanned value in query execution metadata. Increase schema refresh timeout. Remove old migrations. Co-authored-by: Alison <github@bankofknowledge.net> Co-authored-by: Jannis Leidel <jannis@leidel.info>

ghost assigned emtwo Jan 16, 2019

ghost added in progress labels Jan 16, 2019

This was referenced Jan 16, 2019

Closes #3192: Add data source config options. #3292

Closed

Allow queries to be associated with tables as samples #3293

Closed

emtwo requested a review from arikfr January 16, 2019 17:46

emtwo force-pushed the emtwo/schema1_info_drawer branch from a47575c to ab13344 Compare January 17, 2019 20:30

gabrieldutra reviewed Jan 21, 2019

View reviewed changes

emtwo force-pushed the emtwo/schema1_info_drawer branch from 99cfad1 to 7a5d95d Compare January 22, 2019 02:01

emtwo force-pushed the emtwo/schema1_info_drawer branch 2 times, most recently from e6be093 to 3c4e8c8 Compare January 22, 2019 15:30

emtwo force-pushed the emtwo/schema1_info_drawer branch from 3c4e8c8 to 95d3ff6 Compare January 22, 2019 17:45

arikfr added PR: unreviewed and removed in progress labels Jan 23, 2019

washort reviewed Jan 23, 2019

View reviewed changes

emtwo force-pushed the emtwo/schema1_info_drawer branch from 95d3ff6 to 2ac1607 Compare January 24, 2019 19:48

ghost added the in progress label Jan 24, 2019

emtwo force-pushed the emtwo/schema1_info_drawer branch from 2ac1607 to 0c1813c Compare January 30, 2019 19:47

codeclimate bot reviewed Jan 30, 2019

View reviewed changes

emtwo force-pushed the emtwo/schema1_info_drawer branch from 0c1813c to f072e64 Compare February 5, 2019 19:44

emtwo mentioned this pull request Feb 5, 2019

Add a data source description as hover text #3401

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Schema Viewer Drawer #3291

Schema Viewer Drawer #3291

emtwo commented Jan 16, 2019 •

edited

Loading

emtwo commented Jan 16, 2019

gabrieldutra Jan 21, 2019 •

edited

Loading

emtwo Jan 21, 2019 •

edited

Loading

gabrieldutra Jan 21, 2019 •

edited

Loading

gabrieldutra Jan 21, 2019 •

edited

Loading

arikfr Jan 22, 2019

gabrieldutra Jan 22, 2019 •

edited

Loading

arikfr Jan 22, 2019

gabrieldutra Jan 21, 2019 •

edited

Loading

emtwo Jan 22, 2019

gabrieldutra Jan 22, 2019

emtwo commented Jan 22, 2019

gabrieldutra commented Jan 22, 2019

emtwo commented Jan 22, 2019

arikfr commented Jan 23, 2019

arikfr commented Jan 23, 2019

washort Jan 23, 2019

emtwo Jan 24, 2019

washort Jan 24, 2019

washort Jan 23, 2019

emtwo Jan 24, 2019

codeclimate bot Jan 30, 2019

codeclimate bot Jan 30, 2019

codeclimate bot Jan 30, 2019


		for row in results['rows']:
		for i, row in enumerate(results['rows']):

Schema Viewer Drawer #3291

Schema Viewer Drawer #3291

Conversation

emtwo commented Jan 16, 2019 • edited Loading

emtwo commented Jan 16, 2019

gabrieldutra Jan 21, 2019 • edited Loading

Choose a reason for hiding this comment

emtwo Jan 21, 2019 • edited Loading

Choose a reason for hiding this comment

gabrieldutra Jan 21, 2019 • edited Loading

Choose a reason for hiding this comment

gabrieldutra Jan 21, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gabrieldutra Jan 22, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gabrieldutra Jan 21, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

emtwo commented Jan 22, 2019

gabrieldutra commented Jan 22, 2019

emtwo commented Jan 22, 2019

arikfr commented Jan 23, 2019

arikfr commented Jan 23, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

codeclimate bot Jan 30, 2019

Choose a reason for hiding this comment

codeclimate bot Jan 30, 2019

Choose a reason for hiding this comment

codeclimate bot Jan 30, 2019

Choose a reason for hiding this comment

emtwo commented Jan 16, 2019 •

edited

Loading

gabrieldutra Jan 21, 2019 •

edited

Loading

emtwo Jan 21, 2019 •

edited

Loading

gabrieldutra Jan 21, 2019 •

edited

Loading

gabrieldutra Jan 21, 2019 •

edited

Loading

gabrieldutra Jan 22, 2019 •

edited

Loading

gabrieldutra Jan 21, 2019 •

edited

Loading