Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore: use shillelagh instead of gsheetsdb #13185

Merged
merged 5 commits into from
Feb 18, 2021

Conversation

betodealmeida
Copy link
Member

@betodealmeida betodealmeida commented Feb 17, 2021

SUMMARY

shillelagh is a Python module that provides a SQLAlchemy dialect and DB API 2.0 implementation to a variety of non-SQL resources. For example, it can be used to query CSV files or fetch weather data from https://www.weatherapi.com/ via SQL. The library has 100% unit test coverage and uses SQLite virtual tables to allow efficiently accessing the resources as if they were tables in a database.

The current module that we use to access Google Spreadsheets is gsheetsdb, which depends on the unmaintained moz-sql-parser. In addition to the dependency, gsheetsdb implements a code transpiler, and uses pre and post-processors to fetch data from the spreadsheet, making the code somewhat brittle.

This week I implemented a Google Spreadsheets adapter in shillelagh, aiming to deprecate the gsheetsdb module. The implementation is a drop-in replacement, and users should be able to uninstall the gsheetsdb module and install the shillelagh module to have existing queries, datasets and charts continue working.

This works because shillelagh registers a SQLAlchemy dialect called "gsheets". This way SQLAlchemy will use shillelagh for all engines with the URL "gsheets://". This is an in-memory only backend, without access to the filesystem, so it's safe to use.

On the other hand, shillelagh also registers its own dialect (aptly named "shillelagh", with an alias called "shillelagh+apsw"). That dialect is unsafe, as a malicious user could use it to read data from the filesystem by enabling the CSV adapter. For security reasons, I disabled the "shillelagh" dialect unless PREVENT_UNSAFE_DB_CONNECTIONS is on.

BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF

N/A

TEST PLAN

Tested SQL Editor and charts reading from Google spreadsheets, everything works.

ADDITIONAL INFORMATION

  • Has associated issue:
  • Changes UI
  • Requires DB Migration.
  • Confirm DB Migration upgrade and downgrade tested.
  • Introduces new feature or API
  • Removes existing feature or API

@codecov-io
Copy link

codecov-io commented Feb 17, 2021

Codecov Report

Merging #13185 (45550d5) into master (2ce7982) will increase coverage by 13.79%.
The diff coverage is 54.59%.

Impacted file tree graph

@@             Coverage Diff             @@
##           master   #13185       +/-   ##
===========================================
+ Coverage   53.06%   66.85%   +13.79%     
===========================================
  Files         489      492        +3     
  Lines       17314    29051    +11737     
  Branches     4482        0     -4482     
===========================================
+ Hits         9187    19421    +10234     
- Misses       8127     9630     +1503     
Flag Coverage Δ
cypress ?
python 66.85% <54.59%> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
superset/constants.py 100.00% <ø> (ø)
superset/examples/birth_names.py 73.19% <ø> (ø)
...43f2fdb_add_granularity_to_charts_where_missing.py 0.00% <0.00%> (ø)
...s/260bf0649a77_migrate_x_dateunit_in_time_range.py 0.00% <0.00%> (ø)
...ons/versions/41ce8799acc3_rename_pie_label_type.py 0.00% <0.00%> (ø)
superset/connectors/sqla/views.py 62.43% <4.34%> (ø)
superset/views/datasource.py 88.70% <16.66%> (ø)
superset/viz.py 58.61% <27.27%> (ø)
superset/databases/commands/test_connection.py 84.78% <50.00%> (ø)
superset/charts/commands/exceptions.py 92.85% <77.77%> (ø)
... and 928 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 6e09156...45550d5. Read the comment docs.

@pull-request-size pull-request-size bot added size/L and removed size/M labels Feb 18, 2021
@betodealmeida
Copy link
Member Author

While fixing unit tests I did some clean up and removed duplicated code and duplicated exceptions.

Copy link
Member

@villebro villebro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One minor nit - this is a great step forward for as the gsheets connector seems to be fairly widely used. Also big thanks for cleaning up the related code 👍

setup.py Outdated
@@ -129,7 +129,7 @@ def get_git_sha():
"elasticsearch": ["elasticsearch-dbapi>=0.2.0, <0.3.0"],
"exasol": ["sqlalchemy-exasol>=2.1.0, <2.2"],
"excel": ["xlrd>=1.2.0, <1.3"],
"gsheets": ["gsheetsdb>=0.1.9"],
"gsheets": ["shillelagh>=0.2"],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would add , <0.3 here, just in case.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

@@ -1234,7 +1231,7 @@ def testconn( # pylint: disable=too-many-return-statements,no-self-use
uri = request.json.get("uri")
try:
if app.config["PREVENT_UNSAFE_DB_CONNECTIONS"]:
check_sqlalchemy_uri(uri)
check_sqlalchemy_uri(make_url(uri))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm surprised we weren't getting mypy errors here before

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Me too.

@betodealmeida betodealmeida merged commit 3d23ade into apache:master Feb 18, 2021
@mistercrunch mistercrunch added 🏷️ bot A label used by `supersetbot` to keep track of which PR where auto-tagged with release labels 🚢 1.2.0 labels Mar 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🏷️ bot A label used by `supersetbot` to keep track of which PR where auto-tagged with release labels size/L 🚢 1.2.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants