Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve examples & related tests #7773

Merged
merged 7 commits into from
Jul 17, 2019

Conversation

mistercrunch
Copy link
Member

@mistercrunch mistercrunch commented Jun 25, 2019

CATEGORY

Choose one

  • Bug Fix
  • Enhancement (new features, refinement)
  • Refactor
  • Add tests
  • Build / Development Environment
  • Documentation

SUMMARY

related to #7478
This enables setting up an alternative URI for loading/storing/serving the examples data. New URI is defined with config key SQLALCHEMY_EXAMPLES_URI. If nothing not set explicitly, will default to point to the metadata database (SQLALCHEMY_DATABASE_URI). This in turn adds a new default entry to the Database model named examples

As a result, examples data tables can now be reused across Superset environments, and the tables don't get mixed in with the metadata database, leaving it cleaner.

Approach

  • allowing specifying an alternate database connection for examples
  • allowing a --only-metadata flag to load_examples to load only
    dashboard and chart definitions, no actual data is loaded
  • fixed a bunch of tests that are in some cases not directly related to this, but were not idempotent and prevented me from making progress.

Longer term we will generate the examples by exporting
them into tarball as in #7472.

@mistercrunch mistercrunch force-pushed the improve_load_examples branch 14 times, most recently from c11a5f4 to 1b249b5 Compare July 6, 2019 20:33
@codecov-io
Copy link

codecov-io commented Jul 6, 2019

Codecov Report

Merging #7773 into master will decrease coverage by 0.04%.
The diff coverage is 45.78%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #7773      +/-   ##
==========================================
- Coverage   65.82%   65.77%   -0.05%     
==========================================
  Files         461      461              
  Lines       22105    22167      +62     
  Branches     2425     2425              
==========================================
+ Hits        14551    14581      +30     
- Misses       7433     7465      +32     
  Partials      121      121
Impacted Files Coverage Δ
superset/examples/misc_dashboard.py 27.27% <ø> (ø)
superset/examples/__init__.py 100% <ø> (ø)
superset/examples/css_templates.py 100% <ø> (ø)
superset/examples/countries.py 100% <ø> (ø)
superset/examples/deck.py 11.11% <0%> (ø)
superset/examples/helpers.py 100% <100%> (ø)
superset/examples/unicode_test_data.py 100% <100%> (ø)
superset/examples/tabbed_dashboard.py 22.22% <100%> (ø)
superset/examples/birth_names.py 100% <100%> (ø)
superset/examples/energy.py 100% <100%> (ø)
... and 24 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 86fdceb...29068a7. Read the comment docs.

@mistercrunch mistercrunch changed the title [WiP] improve load_examples Improve examples & related tests Jul 6, 2019
Copy link
Member

@villebro villebro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A big PR, so difficult to review thoroughly, but many important fixes/improvements and can be further refined later. I suggest merging this and dealing with possible regressions later.

"path_json": Text,
},
index=False,
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, one more thing; when I added csv import functionality for BigQuery, I refactored db_engine_specs so that one can call db_engine_spec.df_to_sql(df, *kwargs) in place of df.to_sql(*kwargs) for engines that don't support df.to_sql(). So to make this work universally here, one would write

        database.db_engine_spec.df_to_sql(
            tbl_name,
            database.get_sqla_engine(),
            if_exists="replace",
            chunksize=500,
            dtype={
                "color": String(255),
                "name": String(255),
                "polyline": Text,
                "path_json": Text,
            },
            index=False,
        )

This doesn't necessarily have to be addressed in this PR; I can do that later, too, as I have a good test rig for that.

related to apache#7472, longer term we will generate the examples by exporting
them into tarball as in apache#7472. In the meantime, we need this subset of
the features:

* allowing specifying an alternate database connection for examples
* allowing a --only-metadata flag to `load_examples` to load only
  dashboard and chart definitions, no actual data is loaded
@mistercrunch mistercrunch mentioned this pull request Jul 17, 2019
6 tasks
@mistercrunch mistercrunch merged commit d65b039 into apache:master Jul 17, 2019
@mistercrunch mistercrunch deleted the improve_load_examples branch July 17, 2019 04:37
smacker added a commit to smacker/incubator-superset that referenced this pull request Jul 18, 2019
The bug was introduced in apache#7773

It uses filter by `cls.table_name == datasource_name`:
https://github.com/apache/incubator-superset/pull/7773/files#diff-a8dd5ec8d8decda2e3c5571d1ec0cdb6R740

But export puts `slc.datasource.name` into exported json:
https://github.com/apache/incubator-superset/pull/7773/files#diff-ceeb7eee8d573333109e0037299c9711L673

`slc.datasource.name` in case of `SqlaTable` is `"{}.{}".format(self.schema, self.table_name)`
mistercrunch pushed a commit that referenced this pull request Jul 23, 2019
The bug was introduced in #7773

It uses filter by `cls.table_name == datasource_name`:
https://github.com/apache/incubator-superset/pull/7773/files#diff-a8dd5ec8d8decda2e3c5571d1ec0cdb6R740

But export puts `slc.datasource.name` into exported json:
https://github.com/apache/incubator-superset/pull/7773/files#diff-ceeb7eee8d573333109e0037299c9711L673

`slc.datasource.name` in case of `SqlaTable` is `"{}.{}".format(self.schema, self.table_name)`
alex-mark pushed a commit to alex-mark/incubator-superset that referenced this pull request Jul 29, 2019
* [WiP] improve load_examples

related to apache#7472, longer term we will generate the examples by exporting
them into tarball as in apache#7472. In the meantime, we need this subset of
the features:

* allowing specifying an alternate database connection for examples
* allowing a --only-metadata flag to `load_examples` to load only
  dashboard and chart definitions, no actual data is loaded

* Improve logging

* Rename data->examples

* Load only if not exist

* By default do not load, add a force flag

* fix build

* set published to true
alex-mark pushed a commit to alex-mark/incubator-superset that referenced this pull request Jul 29, 2019
The bug was introduced in apache#7773

It uses filter by `cls.table_name == datasource_name`:
https://github.com/apache/incubator-superset/pull/7773/files#diff-a8dd5ec8d8decda2e3c5571d1ec0cdb6R740

But export puts `slc.datasource.name` into exported json:
https://github.com/apache/incubator-superset/pull/7773/files#diff-ceeb7eee8d573333109e0037299c9711L673

`slc.datasource.name` in case of `SqlaTable` is `"{}.{}".format(self.schema, self.table_name)`
@mistercrunch mistercrunch added 🏷️ bot A label used by `supersetbot` to keep track of which PR where auto-tagged with release labels 🚢 0.34.0 labels Feb 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🏷️ bot A label used by `supersetbot` to keep track of which PR where auto-tagged with release labels size/XL 🚢 0.34.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants