Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor: Improve performance regression introduced in #20473 #20810

Merged

Conversation

john-bodley
Copy link
Member

@john-bodley john-bodley commented Jul 21, 2022

SUMMARY

This PR resolves a performance regression introduced in #20473. Previously only the new columns/metrics where returned via the update_columns and update_metrics methods respectively—which were defined as properties of the dataset and then bound to the table via the DAO update. I also re-introduced the commit to both methods as one likely would expect these methods to commit rather than waiting for the update commit. Note the previous optimization exists, i.e., the commit occurs at the end rather than for each create, update, or delete.

In #20473 both new and existing columns were returned which added unnecessary overhead. Furthermore I'm not sure how complex a SQLAlchemy ORM model update with relationships is, i.e., where it tries to re-update all the columns and metrics, but given we already have dedicated methods to "update" columns/metrics—actually create, update, and delete—it seems simpler/clearer to bind new columns/metrics to the parent table on create and exclude all columns/metrics when updating the dataset model, i.e., previously it wasn't overly clear from the code—without digging—why the update_columns and update_metrics were returning a subset of entities.

BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF

TESTING INSTRUCTIONS

CI.

ADDITIONAL INFORMATION

  • Has associated issue:
  • Required feature flags:
  • Changes UI
  • Includes DB Migration (follow approval process in SIP-59)
    • Migration is atomic, supports rollback & is backwards-compatible
    • Confirm DB migration upgrade and downgrade tested
    • Runtime estimates and downtime expectations provided
  • Introduces new feature or API
  • Removes existing feature or API


for properties in property_columns:
if "id" in properties:
columns.append(
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was the root of the problem, i.e., existing columns shouldn't be added to the set of columns which need to be bound to the dataset when updated.

@john-bodley john-bodley force-pushed the john-bodley--optimize-dataset-update branch from 9bab0fe to ef3dda9 Compare July 21, 2022 17:28
@john-bodley john-bodley force-pushed the john-bodley--optimize-dataset-update branch from ef3dda9 to f1916a0 Compare July 21, 2022 17:34
@john-bodley john-bodley requested a review from ktmud July 21, 2022 17:44
@codecov
Copy link

codecov bot commented Jul 21, 2022

Codecov Report

Merging #20810 (db10275) into master (1101922) will decrease coverage by 11.40%.
The diff coverage is 46.81%.

❗ Current head db10275 differs from pull request most recent head 22498db. Consider uploading reports for the commit 22498db to get more accurate results

@@             Coverage Diff             @@
##           master   #20810       +/-   ##
===========================================
- Coverage   66.20%   54.79%   -11.41%     
===========================================
  Files        1754     1757        +3     
  Lines       66678    66864      +186     
  Branches     7049     7077       +28     
===========================================
- Hits        44143    36640     -7503     
- Misses      20738    28415     +7677     
- Partials     1797     1809       +12     
Flag Coverage Δ
hive 53.25% <37.98%> (?)
mysql ?
postgres ?
presto 53.11% <38.31%> (?)
python 57.82% <40.58%> (-23.56%) ⬇️
sqlite ?
unit 50.25% <26.29%> (-0.19%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
...d/packages/superset-ui-chart-controls/src/types.ts 100.00% <ø> (ø)
...ackages/superset-ui-core/src/query/types/Column.ts 100.00% <ø> (ø)
...ages/superset-ui-core/src/query/types/Dashboard.ts 100.00% <ø> (ø)
...ges/superset-ui-core/src/query/types/Datasource.ts 100.00% <ø> (+33.33%) ⬆️
...s/legacy-plugin-chart-country-map/src/countries.ts 100.00% <ø> (ø)
...s/legacy-plugin-chart-heatmap/src/controlPanel.tsx 57.14% <ø> (ø)
...ns/legacy-preset-chart-deckgl/src/utils/explore.js 0.00% <0.00%> (ø)
...set-chart-nvd3/src/vendor/superset/exploreUtils.js 0.00% <0.00%> (ø)
.../plugin-chart-echarts/src/MixedTimeseries/index.ts 25.00% <ø> (ø)
...ins/plugin-chart-echarts/src/components/Echart.tsx 0.00% <0.00%> (ø)
... and 409 more

Help us with your feedback. Take ten seconds to tell us how you rate us.

@john-bodley john-bodley force-pushed the john-bodley--optimize-dataset-update branch from f1916a0 to 24690ef Compare July 21, 2022 21:39
@@ -154,10 +154,10 @@ def update(
"""

if "columns" in properties:
properties["columns"] = cls.update_columns(model, properties["columns"])
cls.update_columns(model, properties.pop("columns"), commit=commit)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why pop instead of direct access?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ktmud these properties need to be popped so they're not re-included in the base update method which updates the underlying dataset. The new columns and metrics are already bound to the model in the respective update_columns and update_metrics method.

@john-bodley john-bodley merged commit d327437 into apache:master Jul 27, 2022
@john-bodley john-bodley deleted the john-bodley--optimize-dataset-update branch July 27, 2022 00:49
@mistercrunch mistercrunch added 🏷️ bot A label used by `supersetbot` to keep track of which PR where auto-tagged with release labels 🚢 2.1.0 labels Mar 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🏷️ bot A label used by `supersetbot` to keep track of which PR where auto-tagged with release labels size/M 🚢 2.1.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants