Merge pull request #46 from TidierOrg/pkgexts

Moves to use of package extensions.
TidierOrg · Jul 25, 2024 · 8c98381 · 8c98381 · drizk1 · Jul 25, 2024
2 parents c292ba5 + ba209a2
commit 8c98381
Show file tree

Hide file tree

Showing 27 changed files with 707 additions and 517 deletions.
diff --git a/NEWS.md b/NEWS.md
@@ -1,5 +1,11 @@
 # TidierDB.jl updates
 
+## v0.3. - 2024-07-25
+- Introduces package extensions for:
+    - Postgres, ClickHouse, MySQL, MsSQL, SQLite, Oracle, Athena, and Google BigQuery
+    - (Documentation)[https://tidierorg.github.io/TidierDB.jl/latest/examples/generated/UserGuide/getting_started/] updated for using these backends.  
+- Change `set_sql_mode()` to use types not symbols (ie `set_sql_mode(snowflake())` not `set_sql_mode(:snowflake)`)
+
 ## v0.2.4 - 2024-07-12
 - Switches to DuckDB to 1.0 version
 - Adds support for `iceberg` tables via DuckDB to read iceberg paths in `db_table` when `iceberg = true` 

diff --git a/Project.toml b/Project.toml
@@ -1,25 +1,36 @@
 name = "TidierDB"
 uuid = "86993f9b-bbba-4084-97c5-ee15961ad48b"
 authors = ["Daniel Rizk <rizk.daniel.12@gmail.com> and contributors"]
-version = "0.2.4"
+version = "0.3.0"
 
 [deps]
-AWS = "fbe9abb3-538b-5e4e-ba9e-bc94f4f92ebc"
 Arrow = "69666777-d1a9-59fb-9406-91d4454c9d45"
 Chain = "8be319e6-bccf-4806-a6f7-6fae938471bc"
-ClickHouse = "82f2e89e-b495-11e9-1d9d-fb40d7cf2130"
 DataFrames = "a93c6f00-e57d-5684-b7b6-d8193f3e46c0"
 DuckDB = "d2f5444f-75bc-4fdf-ac35-56f514c445e1"
 GZip = "92fee26a-97fe-5a0c-ad85-20a5f3185b63"
-GoogleCloud = "55e21f81-8b0a-565e-b5ad-6816892a5ee7"
 HTTP = "cd3eb016-35fb-5094-929b-558a96fad6f3"
 JSON3 = "0f8b85d8-7281-11e9-16c2-39a750bddbf1"
-LibPQ = "194296ae-ab2e-5f79-8cd4-7183a0a5a0d1"
 MacroTools = "1914dd2f-81c6-5fcd-8719-6d5c9610ff09"
-MySQL = "39abe10b-433b-5dbd-92d4-e302a9df00cd"
 ODBC = "be6f12e9-ca4f-5eb2-a339-a4f995cc0291"
 Reexport = "189a3867-3050-52da-a836-e630ba90ab69"
+
+[weakdeps]
 SQLite = "0aa819cd-b072-5ff4-a722-6bc24af294d9"
+LibPQ = "194296ae-ab2e-5f79-8cd4-7183a0a5a0d1"
+GoogleCloud = "55e21f81-8b0a-565e-b5ad-6816892a5ee7"
+AWS = "fbe9abb3-538b-5e4e-ba9e-bc94f4f92ebc"
+MySQL = "39abe10b-433b-5dbd-92d4-e302a9df00cd"
+ClickHouse = "82f2e89e-b495-11e9-1d9d-fb40d7cf2130"
+
+
+[extensions]
+SQLiteExt = "SQLite"
+LibPQExt = "LibPQ"
+GBQExt = "GoogleCloud"
+AWSExt = "AWS"
+MySQLExt = "MySQL"
+CHExt = "ClickHouse"
 
 [compat]
 AWS = "1.9"
@@ -44,6 +55,13 @@ julia = "1.9"
 [extras]
 Documenter = "e30172f5-a6a5-5a46-863b-614d45cd2de4"
 Test = "8dfed614-e22c-5e08-85e1-65c5234f0b40"
+SQLite = "0aa819cd-b072-5ff4-a722-6bc24af294d9"
+LibPQ = "194296ae-ab2e-5f79-8cd4-7183a0a5a0d1"
+GoogleCloud = "55e21f81-8b0a-565e-b5ad-6816892a5ee7"
+AWS = "fbe9abb3-538b-5e4e-ba9e-bc94f4f92ebc"
+MySQL = "39abe10b-433b-5dbd-92d4-e302a9df00cd"
+ClickHouse = "82f2e89e-b495-11e9-1d9d-fb40d7cf2130"
+
 
 [targets]
 test = ["Documenter", "Test"]
diff --git a/README.md b/README.md
@@ -14,19 +14,19 @@ The main goal of TidierDB.jl is to bring the syntax of Tidier.jl to multiple SQL
 
 ## Currently supported backends include:
 
-- DuckDB (the default) `set_sql_mode(:duckdb)`
-- ClickHouse `set_sql_mode(:clickhouse)`
-- SQLite `set_sql_mode(:lite)`
-- MySQL and MariaDB `set_sql_mode(:mysql)`
-- MSSQL `set_sql_mode(:mssql)`
-- Postgres `set_sql_mode(:postgres)`
-- Athena `set_sql_mode(:athena)`
-- Snowflake `set_sql_mode(:snowflake)`
-- Google Big Query `set_sql_mode(:gbq)`
-- Oracle `set_sql_mode(:oracle)`
-- Databricks
-
-The style of SQL that is generated can be modified using `set_sql_mode()`.
+- DuckDB (the default) `duckdb()`
+- ClickHouse `clickhouse()`
+- SQLite `sqlite()`
+- MySQL and MariaDB `mysql()`
+- MSSQL `mssql()`
+- Postgres `postgres()`
+- Athena `athena()`
+- Snowflake `snowflake()`
+- Google Big Query `gbq()`
+- Oracle `oracle()`
+- Databricks `databricks()`
+
+Change the backend using `set_sql_mode()` - for example  - `set_sql_mode(databricks())`
 
 ## Installation
 
@@ -95,10 +95,10 @@ Even though the code reads similarly to TidierData, note that no computational w
 using TidierData
 import TidierDB as DB
 
-db = DB.connect(:duckdb);
-path = "https://gist.githubusercontent.com/seankross/a412dfbd88b3db70b74b/raw/5f23f993cd87c283ce766e7ac6b329ee7cc2e1d1/mtcars.csv"
+db = DB.connect(duckdb());
+path_or_name = "https://gist.githubusercontent.com/seankross/a412dfbd88b3db70b74b/raw/5f23f993cd87c283ce766e7ac6b329ee7cc2e1d1/mtcars.csv"
 
-@chain DB.db_table(db, path) begin
+@chain DB.db_table(db, path_or_name) begin
     DB.@filter(!starts_with(model, "M"))
     DB.@group_by(cyl)
     DB.@summarize(mpg = mean(mpg))
@@ -128,7 +128,7 @@ end
 We cannot do this using TidierDB. However, we can call `@pivot_longer()` from TidierData *after* the result of the query has been instantiated as a DataFrame, like this: 
 
 ```julia
-@chain DB.db_table(db, :mtcars) begin
+@chain DB.db_table(db, path_or_name) begin
     DB.@filter(!starts_with(model, "M"))
     DB.@group_by(cyl)
     DB.@summarize(mpg = mean(mpg))
@@ -167,7 +167,7 @@ end
 We can replace `DB.collect()` with `DB.@show_query` to reveal the underlying SQL query being generated by TidierDB. To handle complex queries, TidierDB makes heavy use of Common Table Expressions (CTE), which are a useful tool to organize long queries.
 
 ```julia
-@chain DB.db_table(db, :mtcars) begin
+@chain DB.db_table(db, path_or_name) begin
     DB.@filter(!starts_with(model, "M"))
     DB.@group_by(cyl)
     DB.@summarize(mpg = mean(mpg))
@@ -207,7 +207,7 @@ SELECT *
 ## TidierDB is already quite fully-featured, supporting advanced TidierData functions like `across()` for multi-column selection.
 
 ```julia
-@chain DB.db_table(db, :mtcars) begin
+@chain DB.db_table(db, path_or_name) begin
     DB.@group_by(cyl)
     DB.@summarize(across((starts_with("a"), ends_with("s")), (mean, sum)))
     DB.@collect

diff --git a/docs/examples/UserGuide/Snowflake.jl b/docs/examples/UserGuide/Snowflake.jl
@@ -15,6 +15,7 @@
 #   - Allow you to build a a SQL query and `@show_query` even if the OAuth_token has expired. To `@collect` you will have to reconnect and rerun db_table if your OAuth token has expired
 
 # ```julia
+# set_sql_mode(snowflake())
 # ac_id = "string_id"
 # token = "OAuth_token_string" 
 # con = connect(:snowflake, ac_id, token, "DEMODB", "PUBLIC", "COMPUTE_WH")

diff --git a/docs/examples/UserGuide/athena.jl b/docs/examples/UserGuide/athena.jl
@@ -5,7 +5,7 @@
 
 # ```julia
 # using TidierDB, AWS
-# set_sql_mode(:athena)
+# set_sql_mode(athena())
 # # Replace your credentials as needed below
 # aws_access_key_id = get(ENV,"AWS_ACCESS_KEY_ID","key")
 # aws_secret_access_key = get(ENV, "AWS_SECRET_ACCESS_KEY","secret_key")

diff --git a/docs/examples/UserGuide/databricks.jl b/docs/examples/UserGuide/databricks.jl
@@ -12,6 +12,7 @@
 # Since each time `db_table` runs, it runs a query to pull the metadata, you may choose to use run `db_table` and save the results, and use these results with `from_query()`. This will reduce the number of queries to your database and is illustrated below.
 
 # ```julia
+# set_sql_mode(databricks())
 # instance_id = "string_id"
 # token "string_token"
 # warehouse_id = "e673cd4f387f964a"
@@ -26,18 +27,18 @@
 # end
 # ```
 # ```
-#  32×2 DataFrame
+# 32×2 DataFrame
 #  Row │ wt       test    
-# │ Float64  Float64 
+#      │ Float64  Float64 
 # ─────┼──────────────────
-# 1 │   2.62     5.24
-# 2 │   2.875    5.75
-# 3 │   2.32     4.64
-# 4 │   3.215    6.43
-# ⋮  │    ⋮        ⋮
-# 29 │   3.17     6.34
-# 30 │   2.77     5.54
-# 31 │   3.57     7.14
-# 32 │   2.78     5.56
-#      24 rows omitted
+#    1 │   2.62     5.24
+#    2 │   2.875    5.75
+#    3 │   2.32     4.64
+#    4 │   3.215    6.43
+#   ⋮  │    ⋮        ⋮
+#   29 │   3.17     6.34
+#   30 │   2.77     5.54
+#   31 │   3.57     7.14
+#   32 │   2.78     5.56
+#          24 rows omitted
 # ```
diff --git a/docs/examples/UserGuide/from_queryex.jl b/docs/examples/UserGuide/from_queryex.jl
@@ -2,13 +2,13 @@
 
 # ```julia
 # import TidierDB as DB
-# con = DB.connect(:duckdb)
-# DB.copy_to(con, "https://gist.githubusercontent.com/seankross/a412dfbd88b3db70b74b/raw/5f23f993cd87c283ce766e7ac6b329ee7cc2e1d1/mtcars.csv", "mtcars2")
+# con = DB.connect(duckdb())
+# mtcars_path = "https://gist.githubusercontent.com/seankross/a412dfbd88b3db70b74b/raw/5f23f993cd87c283ce766e7ac6b329ee7cc2e1d1/mtcars.csv"
 # ```
 
 # Start a query to analyze fuel efficiency by number of cylinders. However, to further build on this query later, end the chain without using `@show_query` or `@collect`
 # ```julia
-# query = DB.@chain DB.db_table(con, :mtcars2) begin
+# query = DB.@chain DB.db_table(con, mtcars_path) begin
 #     DB.@group_by cyl
 #     DB.@summarize begin
 #         across(mpg, (mean, minimum, maximum))

diff --git a/docs/examples/UserGuide/getting_started.jl b/docs/examples/UserGuide/getting_started.jl
@@ -9,26 +9,27 @@
 
 # Alternatively, `using Tidier` will import TidierDB in the above manner for you, where TidierDB functions and macros will be available as `DB.@mutate()` and so on, and the TidierData equivalent would be `@mutate()`.
 
-# There are two ways to connect to the database. You can use `connect` without any need to load any additional packages. However, Oracle and Athena do not support this method yet and will require you to load in ODBC.jl or AWS.jl respectively.
+# To connect to a database, you can uset the `connect` function  as shown below, or establish your own connection through the respecitve libraries.
 
 # For example
 # Connecting to MySQL
 # ```julia
-# conn = connect(:mysql; host="localhost", user="root", password="password", db="mydb")
+# conn = connect(mysql(); host="localhost", user="root", password="password", db="mydb")
 # ```
 # versus connecting to DuckDB
 # ```julia
-# conn = connect(:duckdb)
+# conn = connect(duckdb())
 # ```
 
-# Alternatively, you can use the packages outlined below to establish a connection through their respective methods.
+# ## Package Extensions 
+# The following backends utilize package extensions. To use one of backends listed below, you will need to write `using Library`
 
-# - ClickHouse: ClickHouse.jl
-# - MySQL and MariaDB: MySQL.jl
-# - MSSQL: ODBC.jl 
-# - Postgres: LibPQ.jl
-# - SQLite: SQLite.jl
-# - Athena: AWS.jl
-# - Oracle: ODBC.jl 
+# - ClickHouse: `using ClickHouse`
+# - MySQL and MariaDB: `using MySQL`
+# - MSSQL: `using ODBC` 
+# - Postgres: `using LibPQ`
+# - SQLite: `using SQLite`
+# - Athena: `using AWS`
+# - Oracle: `using ODBC` 
+# - Google BigQuery: `using GoogleCloud`
 
-# For DuckDB, SQLite, and MySQL, `copy_to()` lets you copy data to the database and query there. ClickHouse, MSSQL, and Postgres support for `copy_to()` has not been added yet.
diff --git a/docs/examples/UserGuide/key_differences.jl b/docs/examples/UserGuide/key_differences.jl
@@ -11,7 +11,7 @@ df = DataFrame(id = [string('A' + i ÷ 26, 'A' + i % 26) for i in 0:9],
                         value = repeat(1:5, 2), 
                         percent = 0.1:0.1:1.0);
 
-db = connect(:duckdb);
+db = connect(duckdb());
 
 copy_to(db, df, "df_mem"); # copying over the data frame to an in-memory database
 

diff --git a/docs/examples/UserGuide/s3viaduckdb.jl b/docs/examples/UserGuide/s3viaduckdb.jl
@@ -8,10 +8,10 @@
 # Using TidierDB
 # 
 # #Connect to Google Cloud via DuckDB
-# #google_db = connect(:duckdb, :gbq, access_key="string", secret_key="string")
+# #google_db = connect(duckdb(), :gbq, access_key="string", secret_key="string")
 
 # #Connect to AWS via DuckDB
-# aws_db = connect(:duckdb, :aws, aws_access_key_id= "string", 
+# aws_db = connect(duckdb(), :aws, aws_access_key_id= "string", 
 #                                 aws_secret_access_key= "string", 
 #                                 aws_region="us-east-1")
 # s3_csv_path = "s3://path/to_data.csv"

diff --git a/docs/src/index.md b/docs/src/index.md
@@ -8,19 +8,19 @@ The main goal of TidierDB.jl is to bring the syntax of Tidier.jl to multiple SQL
 
 ## Currently supported backends include:
 
-- DuckDB (the default) `set_sql_mode(:duckdb)`
-- ClickHouse `set_sql_mode(:clickhouse)`
-- SQLite `set_sql_mode(:lite)`
-- MySQL and MariaDB `set_sql_mode(:mysql)`
-- MSSQL `set_sql_mode(:mssql)`
-- Postgres `set_sql_mode(:postgres)`
-- Athena `set_sql_mode(:athena)`
-- Snowflake `set_sql_mode(:snowflake)`
-- Google Big Query `set_sql_mode(:gbq)`
-- Oracle `set_sql_mode(:oracle)`
-- Databricks 
-
-The style of SQL that is generated can be modified using `set_sql_mode()`.
+- DuckDB (the default) `duckdb()`
+- ClickHouse `clickhouse()`
+- SQLite `sqlite()`
+- MySQL and MariaDB `mysql()`
+- MSSQL `mssql()`
+- Postgres `postgres()`
+- Athena `athena()`
+- Snowflake `snowflake()`
+- Google Big Query `gbq()`
+- Oracle `oracle()`
+- Databricks `databricks()`
+
+Change the backend using `set_sql_mode()` - for example  - `set_sql_mode(databricks())`
 
 ## Installation
 
@@ -89,10 +89,10 @@ Even though the code reads similarly to TidierData, note that no computational w
 using TidierData
 import TidierDB as DB
 
-db = DB.connect(:duckdb);
-path = "https://gist.githubusercontent.com/seankross/a412dfbd88b3db70b74b/raw/5f23f993cd87c283ce766e7ac6b329ee7cc2e1d1/mtcars.csv"
+db = DB.connect(duckdb());
+path_or_name = "https://gist.githubusercontent.com/seankross/a412dfbd88b3db70b74b/raw/5f23f993cd87c283ce766e7ac6b329ee7cc2e1d1/mtcars.csv"
 
-@chain DB.db_table(db, path) begin
+@chain DB.db_table(db, path_or_name) begin
     DB.@filter(!starts_with(model, "M"))
     DB.@group_by(cyl)
     DB.@summarize(mpg = mean(mpg))
@@ -122,7 +122,7 @@ end
 We cannot do this using TidierDB. However, we can call `@pivot_longer()` from TidierData *after* the result of the query has been instantiated as a DataFrame, like this: 
 
 ```julia
-@chain DB.db_table(db, :mtcars) begin
+@chain DB.db_table(db, path_or_name) begin
     DB.@filter(!starts_with(model, "M"))
     DB.@group_by(cyl)
     DB.@summarize(mpg = mean(mpg))
@@ -161,7 +161,7 @@ end
 We can replace `DB.collect()` with `DB.@show_query` to reveal the underlying SQL query being generated by TidierDB. To handle complex queries, TidierDB makes heavy use of Common Table Expressions (CTE), which are a useful tool to organize long queries.
 
 ```julia
-@chain DB.db_table(db, :mtcars) begin
+@chain DB.db_table(db, path_or_name) begin
     DB.@filter(!starts_with(model, "M"))
     DB.@group_by(cyl)
     DB.@summarize(mpg = mean(mpg))
@@ -201,7 +201,7 @@ SELECT *
 ## TidierDB is already quite fully-featured, supporting advanced TidierData functions like `across()` for multi-column selection.
 
 ```julia
-@chain DB.db_table(db, :mtcars) begin
+@chain DB.db_table(db, path_or_name) begin
     DB.@group_by(cyl)
     DB.@summarize(across((starts_with("a"), ends_with("s")), (mean, sum)))
     DB.@collect