Athena schema browser not showing all tables #927

rafrombrc · 2019-04-02T17:23:49Z

The schema browser for our Athena data source is showing most but not all of the tables that exist. Seems to be related to a timeout when requesting the schema data for insertion into redash's PostGreSQL database.

* Process extra column metadata for a few sql-based data sources. * Add Table and Column metadata tables. * Periodically update table and column schema tables in a celery task. * Fetching schema returns data from table and column metadata tables. * Add tests for backend changes. * Front-end shows extra table metadata and uses new schema response. * Delete datasource schema data when deleting a data source. * Process and store data source schema when a data source is first created or after a migration. * Tables should have a unique name per datasource. * Addressing review comments. * Update migration file for mixins. * Appease PEP8 * Upgrade migration file for rebase. * Cascade delete. * Adding org_id * Remove redundant column and table prefixes. * Non-existing tables and columns should be filtered out on the server side not client side. * Fetching table samples should be optional and should happen in a separate task per table. * Allow users to force a schema refresh. * Use updated_at to help prune old schema metadata periodically. * Using settings.SCHEMAS_REFRESH_QUEUE * fix for getredash#2426 test * more stable test_interactive_new * Closes #927, #928: Schema refresh improvements. * Closes #934, #935: Remove type from schema browser and don't show empty example column in schema drawer (#936) * Speed up schema fetch requests with fewer postgres queries. * Add column metadata to Athena glue processing. * Fix bug assuming 'metadata' exists for every table. * Closes #939: Persisted, existing table metadata should be updated. * Sample processing should be rate-limited. * Add cli command for refreshing data samples. * Schema refreshes should not overwrite column 'example' field. * refresh_samples() should filter tables_to_sample on the datasource's id being sampled * Correctly wrap long text in schema drawer.

* Process extra column metadata for a few sql-based data sources. * Add Table and Column metadata tables. * Periodically update table and column schema tables in a celery task. * Fetching schema returns data from table and column metadata tables. * Add tests for backend changes. * Front-end shows extra table metadata and uses new schema response. * Delete datasource schema data when deleting a data source. * Process and store data source schema when a data source is first created or after a migration. * Tables should have a unique name per datasource. * Addressing review comments. * Update migration file for mixins. * Appease PEP8 * Upgrade migration file for rebase. * Cascade delete. * Adding org_id * Remove redundant column and table prefixes. * Non-existing tables and columns should be filtered out on the server side not client side. * Fetching table samples should be optional and should happen in a separate task per table. * Allow users to force a schema refresh. * Use updated_at to help prune old schema metadata periodically. * Using settings.SCHEMAS_REFRESH_QUEUE * fix for getredash#2426 test * more stable test_interactive_new * Closes #927, #928: Schema refresh improvements. * Closes #934, #935: Remove type from schema browser and don't show empty example column in schema drawer (#936) * Speed up schema fetch requests with fewer postgres queries. * Add column metadata to Athena glue processing. * Fix bug assuming 'metadata' exists for every table. * Closes #939: Persisted, existing table metadata should be updated. * Sample processing should be rate-limited. * Add cli command for refreshing data samples. * Schema refreshes should not overwrite column 'example' field. * refresh_samples() should filter tables_to_sample on the datasource's id being sampled * Correctly wrap long text in schema drawer. Co-authored-by: Alison <github@bankofknowledge.net>

* Process extra column metadata for a few sql-based data sources. * Add Table and Column metadata tables. * Periodically update table and column schema tables in a celery task. * Fetching schema returns data from table and column metadata tables. * Add tests for backend changes. * Front-end shows extra table metadata and uses new schema response. * Delete datasource schema data when deleting a data source. * Process and store data source schema when a data source is first created or after a migration. * Tables should have a unique name per datasource. * Addressing review comments. * Update migration file for mixins. * Appease PEP8 * Upgrade migration file for rebase. * Cascade delete. * Adding org_id * Remove redundant column and table prefixes. * Non-existing tables and columns should be filtered out on the server side not client side. * Fetching table samples should be optional and should happen in a separate task per table. * Allow users to force a schema refresh. * Use updated_at to help prune old schema metadata periodically. * Using settings.SCHEMAS_REFRESH_QUEUE * fix for getredash#2426 test * more stable test_interactive_new * Closes #927, #928: Schema refresh improvements. * Closes #934, #935: Remove type from schema browser and don't show empty example column in schema drawer (#936) * Speed up schema fetch requests with fewer postgres queries. * Add column metadata to Athena glue processing. * Fix bug assuming 'metadata' exists for every table. * Closes #939: Persisted, existing table metadata should be updated. * Sample processing should be rate-limited. * Add cli command for refreshing data samples. * Schema refreshes should not overwrite column 'example' field. * refresh_samples() should filter tables_to_sample on the datasource's id being sampled * Correctly wrap long text in schema drawer. Co-authored-by: Alison <github@bankofknowledge.net> Schema Improvements Part 2: Add data source config options. Adding BigQuery schema drawer with data types and samples.

* Process extra column metadata for a few sql-based data sources. * Add Table and Column metadata tables. * Periodically update table and column schema tables in a celery task. * Fetching schema returns data from table and column metadata tables. * Add tests for backend changes. * Front-end shows extra table metadata and uses new schema response. * Delete datasource schema data when deleting a data source. * Process and store data source schema when a data source is first created or after a migration. * Tables should have a unique name per datasource. * Addressing review comments. * Update migration file for mixins. * Appease PEP8 * Upgrade migration file for rebase. * Cascade delete. * Adding org_id * Remove redundant column and table prefixes. * Non-existing tables and columns should be filtered out on the server side not client side. * Fetching table samples should be optional and should happen in a separate task per table. * Allow users to force a schema refresh. * Use updated_at to help prune old schema metadata periodically. * Using settings.SCHEMAS_REFRESH_QUEUE * fix for getredash#2426 test * more stable test_interactive_new * Closes #927, #928: Schema refresh improvements. * Closes #934, #935: Remove type from schema browser and don't show empty example column in schema drawer (#936) * Speed up schema fetch requests with fewer postgres queries. * Add column metadata to Athena glue processing. * Fix bug assuming 'metadata' exists for every table. * Closes #939: Persisted, existing table metadata should be updated. * Sample processing should be rate-limited. * Add cli command for refreshing data samples. * Schema refreshes should not overwrite column 'example' field. * refresh_samples() should filter tables_to_sample on the datasource's id being sampled * Correctly wrap long text in schema drawer. Schema Improvements Part 2: Add data source config options. Adding BigQuery schema drawer with data types and samples. Add empty migration to replace the removed schedule_until migration Add merge migration. Co-authored-by: Alison <github@bankofknowledge.net>

* Process extra column metadata for a few sql-based data sources. * Add Table and Column metadata tables. * Periodically update table and column schema tables in a celery task. * Fetching schema returns data from table and column metadata tables. * Add tests for backend changes. * Front-end shows extra table metadata and uses new schema response. * Delete datasource schema data when deleting a data source. * Process and store data source schema when a data source is first created or after a migration. * Tables should have a unique name per datasource. * Addressing review comments. * Update migration file for mixins. * Appease PEP8 * Upgrade migration file for rebase. * Cascade delete. * Adding org_id * Remove redundant column and table prefixes. * Non-existing tables and columns should be filtered out on the server side not client side. * Fetching table samples should be optional and should happen in a separate task per table. * Allow users to force a schema refresh. * Use updated_at to help prune old schema metadata periodically. * Using settings.SCHEMAS_REFRESH_QUEUE * fix for getredash#2426 test * more stable test_interactive_new * Closes #927, #928: Schema refresh improvements. * Closes #934, #935: Remove type from schema browser and don't show empty example column in schema drawer (#936) * Speed up schema fetch requests with fewer postgres queries. * Add column metadata to Athena glue processing. * Fix bug assuming 'metadata' exists for every table. * Closes #939: Persisted, existing table metadata should be updated. * Sample processing should be rate-limited. * Add cli command for refreshing data samples. * Schema refreshes should not overwrite column 'example' field. * refresh_samples() should filter tables_to_sample on the datasource's id being sampled * Correctly wrap long text in schema drawer. Schema Improvements Part 2: Add data source config options. Adding BigQuery schema drawer with data types and samples. Add empty migration to replace the removed schedule_until migration Add merge migration. Co-authored-by: Alison <github@bankofknowledge.net> Co-authored-by: Jannis Leidel <jannis@leidel.info>

* Process extra column metadata for a few sql-based data sources. * Add Table and Column metadata tables. * Periodically update table and column schema tables in a celery task. * Fetching schema returns data from table and column metadata tables. * Add tests for backend changes. * Front-end shows extra table metadata and uses new schema response. * Delete datasource schema data when deleting a data source. * Process and store data source schema when a data source is first created or after a migration. * Tables should have a unique name per datasource. * Addressing review comments. * Update migration file for mixins. * Appease PEP8 * Upgrade migration file for rebase. * Cascade delete. * Adding org_id * Remove redundant column and table prefixes. * Non-existing tables and columns should be filtered out on the server side not client side. * Fetching table samples should be optional and should happen in a separate task per table. * Allow users to force a schema refresh. * Use updated_at to help prune old schema metadata periodically. * Using settings.SCHEMAS_REFRESH_QUEUE * fix for getredash#2426 test * more stable test_interactive_new * Closes #927, #928: Schema refresh improvements. * Closes #934, #935: Remove type from schema browser and don't show empty example column in schema drawer (#936) * Speed up schema fetch requests with fewer postgres queries. * Add column metadata to Athena glue processing. * Fix bug assuming 'metadata' exists for every table. * Closes #939: Persisted, existing table metadata should be updated. * Sample processing should be rate-limited. * Add cli command for refreshing data samples. * Schema refreshes should not overwrite column 'example' field. * refresh_samples() should filter tables_to_sample on the datasource's id being sampled * Correctly wrap long text in schema drawer. Schema Improvements Part 2: Add data source config options. Adding BigQuery schema drawer with data types and samples. Add empty migration to replace the removed schedule_until migration Add merge migration. Fix spacing issue with data scanned value in query execution metadata. Increase schema refresh timeout. Co-authored-by: Alison <github@bankofknowledge.net> Co-authored-by: Jannis Leidel <jannis@leidel.info>

* Process extra column metadata for a few sql-based data sources. * Add Table and Column metadata tables. * Periodically update table and column schema tables in a celery task. * Fetching schema returns data from table and column metadata tables. * Add tests for backend changes. * Front-end shows extra table metadata and uses new schema response. * Delete datasource schema data when deleting a data source. * Process and store data source schema when a data source is first created or after a migration. * Tables should have a unique name per datasource. * Addressing review comments. * Update migration file for mixins. * Appease PEP8 * Upgrade migration file for rebase. * Cascade delete. * Adding org_id * Remove redundant column and table prefixes. * Non-existing tables and columns should be filtered out on the server side not client side. * Fetching table samples should be optional and should happen in a separate task per table. * Allow users to force a schema refresh. * Use updated_at to help prune old schema metadata periodically. * Using settings.SCHEMAS_REFRESH_QUEUE * fix for getredash#2426 test * more stable test_interactive_new * Closes #927, #928: Schema refresh improvements. * Closes #934, #935: Remove type from schema browser and don't show empty example column in schema drawer (#936) * Speed up schema fetch requests with fewer postgres queries. * Add column metadata to Athena glue processing. * Fix bug assuming 'metadata' exists for every table. * Closes #939: Persisted, existing table metadata should be updated. * Sample processing should be rate-limited. * Add cli command for refreshing data samples. * Schema refreshes should not overwrite column 'example' field. * refresh_samples() should filter tables_to_sample on the datasource's id being sampled * Correctly wrap long text in schema drawer. Schema Improvements Part 2: Add data source config options. Adding BigQuery schema drawer with data types and samples. Add empty migration to replace the removed schedule_until migration Add merge migration. Fix spacing issue with data scanned value in query execution metadata. Increase schema refresh timeout. Remove old migrations. Co-authored-by: Alison <github@bankofknowledge.net> Co-authored-by: Jannis Leidel <jannis@leidel.info>

rafrombrc added this to the 20 milestone Apr 2, 2019

rafrombrc assigned emtwo Apr 2, 2019

rafrombrc added the in progress label Apr 2, 2019

emtwo pushed a commit that referenced this issue Apr 3, 2019

Closes #927, #928: Schema refresh improvements.

f7c12a2

emtwo closed this as completed in 1662f47 Apr 3, 2019

emtwo pushed a commit that referenced this issue Apr 4, 2019

Closes #927, #928: Schema refresh improvements.

fd87610

jezdez mentioned this issue May 13, 2019

M21 rebase #952

Closed

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Athena schema browser not showing all tables #927

Athena schema browser not showing all tables #927

rafrombrc commented Apr 2, 2019

Athena schema browser not showing all tables #927

Athena schema browser not showing all tables #927

Comments

rafrombrc commented Apr 2, 2019