Skip to content

Commit

Permalink
Merge pull request #46 from TidierOrg/pkgexts
Browse files Browse the repository at this point in the history
Moves to use of package extensions.
  • Loading branch information
drizk1 committed Jul 25, 2024
2 parents c292ba5 + ba209a2 commit 8c98381
Show file tree
Hide file tree
Showing 27 changed files with 707 additions and 517 deletions.
6 changes: 6 additions & 0 deletions NEWS.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,11 @@
# TidierDB.jl updates

## v0.3. - 2024-07-25
- Introduces package extensions for:
- Postgres, ClickHouse, MySQL, MsSQL, SQLite, Oracle, Athena, and Google BigQuery
- (Documentation)[https://tidierorg.github.io/TidierDB.jl/latest/examples/generated/UserGuide/getting_started/] updated for using these backends.
- Change `set_sql_mode()` to use types not symbols (ie `set_sql_mode(snowflake())` not `set_sql_mode(:snowflake)`)

## v0.2.4 - 2024-07-12
- Switches to DuckDB to 1.0 version
- Adds support for `iceberg` tables via DuckDB to read iceberg paths in `db_table` when `iceberg = true`
Expand Down
30 changes: 24 additions & 6 deletions Project.toml
Original file line number Diff line number Diff line change
@@ -1,25 +1,36 @@
name = "TidierDB"
uuid = "86993f9b-bbba-4084-97c5-ee15961ad48b"
authors = ["Daniel Rizk <rizk.daniel.12@gmail.com> and contributors"]
version = "0.2.4"
version = "0.3.0"

[deps]
AWS = "fbe9abb3-538b-5e4e-ba9e-bc94f4f92ebc"
Arrow = "69666777-d1a9-59fb-9406-91d4454c9d45"
Chain = "8be319e6-bccf-4806-a6f7-6fae938471bc"
ClickHouse = "82f2e89e-b495-11e9-1d9d-fb40d7cf2130"
DataFrames = "a93c6f00-e57d-5684-b7b6-d8193f3e46c0"
DuckDB = "d2f5444f-75bc-4fdf-ac35-56f514c445e1"
GZip = "92fee26a-97fe-5a0c-ad85-20a5f3185b63"
GoogleCloud = "55e21f81-8b0a-565e-b5ad-6816892a5ee7"
HTTP = "cd3eb016-35fb-5094-929b-558a96fad6f3"
JSON3 = "0f8b85d8-7281-11e9-16c2-39a750bddbf1"
LibPQ = "194296ae-ab2e-5f79-8cd4-7183a0a5a0d1"
MacroTools = "1914dd2f-81c6-5fcd-8719-6d5c9610ff09"
MySQL = "39abe10b-433b-5dbd-92d4-e302a9df00cd"
ODBC = "be6f12e9-ca4f-5eb2-a339-a4f995cc0291"
Reexport = "189a3867-3050-52da-a836-e630ba90ab69"

[weakdeps]
SQLite = "0aa819cd-b072-5ff4-a722-6bc24af294d9"
LibPQ = "194296ae-ab2e-5f79-8cd4-7183a0a5a0d1"
GoogleCloud = "55e21f81-8b0a-565e-b5ad-6816892a5ee7"
AWS = "fbe9abb3-538b-5e4e-ba9e-bc94f4f92ebc"
MySQL = "39abe10b-433b-5dbd-92d4-e302a9df00cd"
ClickHouse = "82f2e89e-b495-11e9-1d9d-fb40d7cf2130"


[extensions]
SQLiteExt = "SQLite"
LibPQExt = "LibPQ"
GBQExt = "GoogleCloud"
AWSExt = "AWS"
MySQLExt = "MySQL"
CHExt = "ClickHouse"

[compat]
AWS = "1.9"
Expand All @@ -44,6 +55,13 @@ julia = "1.9"
[extras]
Documenter = "e30172f5-a6a5-5a46-863b-614d45cd2de4"
Test = "8dfed614-e22c-5e08-85e1-65c5234f0b40"
SQLite = "0aa819cd-b072-5ff4-a722-6bc24af294d9"
LibPQ = "194296ae-ab2e-5f79-8cd4-7183a0a5a0d1"
GoogleCloud = "55e21f81-8b0a-565e-b5ad-6816892a5ee7"
AWS = "fbe9abb3-538b-5e4e-ba9e-bc94f4f92ebc"
MySQL = "39abe10b-433b-5dbd-92d4-e302a9df00cd"
ClickHouse = "82f2e89e-b495-11e9-1d9d-fb40d7cf2130"


[targets]
test = ["Documenter", "Test"]
38 changes: 19 additions & 19 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,19 +14,19 @@ The main goal of TidierDB.jl is to bring the syntax of Tidier.jl to multiple SQL

## Currently supported backends include:

- DuckDB (the default) `set_sql_mode(:duckdb)`
- ClickHouse `set_sql_mode(:clickhouse)`
- SQLite `set_sql_mode(:lite)`
- MySQL and MariaDB `set_sql_mode(:mysql)`
- MSSQL `set_sql_mode(:mssql)`
- Postgres `set_sql_mode(:postgres)`
- Athena `set_sql_mode(:athena)`
- Snowflake `set_sql_mode(:snowflake)`
- Google Big Query `set_sql_mode(:gbq)`
- Oracle `set_sql_mode(:oracle)`
- Databricks

The style of SQL that is generated can be modified using `set_sql_mode()`.
- DuckDB (the default) `duckdb()`
- ClickHouse `clickhouse()`
- SQLite `sqlite()`
- MySQL and MariaDB `mysql()`
- MSSQL `mssql()`
- Postgres `postgres()`
- Athena `athena()`
- Snowflake `snowflake()`
- Google Big Query `gbq()`
- Oracle `oracle()`
- Databricks `databricks()`

Change the backend using `set_sql_mode()` - for example - `set_sql_mode(databricks())`

## Installation

Expand Down Expand Up @@ -95,10 +95,10 @@ Even though the code reads similarly to TidierData, note that no computational w
using TidierData
import TidierDB as DB

db = DB.connect(:duckdb);
path = "https://gist.githubusercontent.com/seankross/a412dfbd88b3db70b74b/raw/5f23f993cd87c283ce766e7ac6b329ee7cc2e1d1/mtcars.csv"
db = DB.connect(duckdb());
path_or_name = "https://gist.githubusercontent.com/seankross/a412dfbd88b3db70b74b/raw/5f23f993cd87c283ce766e7ac6b329ee7cc2e1d1/mtcars.csv"

@chain DB.db_table(db, path) begin
@chain DB.db_table(db, path_or_name) begin
DB.@filter(!starts_with(model, "M"))
DB.@group_by(cyl)
DB.@summarize(mpg = mean(mpg))
Expand Down Expand Up @@ -128,7 +128,7 @@ end
We cannot do this using TidierDB. However, we can call `@pivot_longer()` from TidierData *after* the result of the query has been instantiated as a DataFrame, like this:

```julia
@chain DB.db_table(db, :mtcars) begin
@chain DB.db_table(db, path_or_name) begin
DB.@filter(!starts_with(model, "M"))
DB.@group_by(cyl)
DB.@summarize(mpg = mean(mpg))
Expand Down Expand Up @@ -167,7 +167,7 @@ end
We can replace `DB.collect()` with `DB.@show_query` to reveal the underlying SQL query being generated by TidierDB. To handle complex queries, TidierDB makes heavy use of Common Table Expressions (CTE), which are a useful tool to organize long queries.

```julia
@chain DB.db_table(db, :mtcars) begin
@chain DB.db_table(db, path_or_name) begin
DB.@filter(!starts_with(model, "M"))
DB.@group_by(cyl)
DB.@summarize(mpg = mean(mpg))
Expand Down Expand Up @@ -207,7 +207,7 @@ SELECT *
## TidierDB is already quite fully-featured, supporting advanced TidierData functions like `across()` for multi-column selection.

```julia
@chain DB.db_table(db, :mtcars) begin
@chain DB.db_table(db, path_or_name) begin
DB.@group_by(cyl)
DB.@summarize(across((starts_with("a"), ends_with("s")), (mean, sum)))
DB.@collect
Expand Down
1 change: 1 addition & 0 deletions docs/examples/UserGuide/Snowflake.jl
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@
# - Allow you to build a a SQL query and `@show_query` even if the OAuth_token has expired. To `@collect` you will have to reconnect and rerun db_table if your OAuth token has expired

# ```julia
# set_sql_mode(snowflake())
# ac_id = "string_id"
# token = "OAuth_token_string"
# con = connect(:snowflake, ac_id, token, "DEMODB", "PUBLIC", "COMPUTE_WH")
Expand Down
2 changes: 1 addition & 1 deletion docs/examples/UserGuide/athena.jl
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@

# ```julia
# using TidierDB, AWS
# set_sql_mode(:athena)
# set_sql_mode(athena())
# # Replace your credentials as needed below
# aws_access_key_id = get(ENV,"AWS_ACCESS_KEY_ID","key")
# aws_secret_access_key = get(ENV, "AWS_SECRET_ACCESS_KEY","secret_key")
Expand Down
25 changes: 13 additions & 12 deletions docs/examples/UserGuide/databricks.jl
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@
# Since each time `db_table` runs, it runs a query to pull the metadata, you may choose to use run `db_table` and save the results, and use these results with `from_query()`. This will reduce the number of queries to your database and is illustrated below.

# ```julia
# set_sql_mode(databricks())
# instance_id = "string_id"
# token "string_token"
# warehouse_id = "e673cd4f387f964a"
Expand All @@ -26,18 +27,18 @@
# end
# ```
# ```
# 32×2 DataFrame
# 32×2 DataFrame
# Row │ wt test
# │ Float64 Float64
# │ Float64 Float64
# ─────┼──────────────────
# 1 │ 2.62 5.24
# 2 │ 2.875 5.75
# 3 │ 2.32 4.64
# 4 │ 3.215 6.43
# ⋮ │ ⋮ ⋮
# 29 │ 3.17 6.34
# 30 │ 2.77 5.54
# 31 │ 3.57 7.14
# 32 │ 2.78 5.56
# 24 rows omitted
# 1 │ 2.62 5.24
# 2 │ 2.875 5.75
# 3 │ 2.32 4.64
# 4 │ 3.215 6.43
# ⋮ │ ⋮ ⋮
# 29 │ 3.17 6.34
# 30 │ 2.77 5.54
# 31 │ 3.57 7.14
# 32 │ 2.78 5.56
# 24 rows omitted
# ```
6 changes: 3 additions & 3 deletions docs/examples/UserGuide/from_queryex.jl
Original file line number Diff line number Diff line change
Expand Up @@ -2,13 +2,13 @@

# ```julia
# import TidierDB as DB
# con = DB.connect(:duckdb)
# DB.copy_to(con, "https://gist.githubusercontent.com/seankross/a412dfbd88b3db70b74b/raw/5f23f993cd87c283ce766e7ac6b329ee7cc2e1d1/mtcars.csv", "mtcars2")
# con = DB.connect(duckdb())
# mtcars_path = "https://gist.githubusercontent.com/seankross/a412dfbd88b3db70b74b/raw/5f23f993cd87c283ce766e7ac6b329ee7cc2e1d1/mtcars.csv"
# ```

# Start a query to analyze fuel efficiency by number of cylinders. However, to further build on this query later, end the chain without using `@show_query` or `@collect`
# ```julia
# query = DB.@chain DB.db_table(con, :mtcars2) begin
# query = DB.@chain DB.db_table(con, mtcars_path) begin
# DB.@group_by cyl
# DB.@summarize begin
# across(mpg, (mean, minimum, maximum))
Expand Down
25 changes: 13 additions & 12 deletions docs/examples/UserGuide/getting_started.jl
Original file line number Diff line number Diff line change
Expand Up @@ -9,26 +9,27 @@

# Alternatively, `using Tidier` will import TidierDB in the above manner for you, where TidierDB functions and macros will be available as `DB.@mutate()` and so on, and the TidierData equivalent would be `@mutate()`.

# There are two ways to connect to the database. You can use `connect` without any need to load any additional packages. However, Oracle and Athena do not support this method yet and will require you to load in ODBC.jl or AWS.jl respectively.
# To connect to a database, you can uset the `connect` function as shown below, or establish your own connection through the respecitve libraries.

# For example
# Connecting to MySQL
# ```julia
# conn = connect(:mysql; host="localhost", user="root", password="password", db="mydb")
# conn = connect(mysql(); host="localhost", user="root", password="password", db="mydb")
# ```
# versus connecting to DuckDB
# ```julia
# conn = connect(:duckdb)
# conn = connect(duckdb())
# ```

# Alternatively, you can use the packages outlined below to establish a connection through their respective methods.
# ## Package Extensions
# The following backends utilize package extensions. To use one of backends listed below, you will need to write `using Library`

# - ClickHouse: ClickHouse.jl
# - MySQL and MariaDB: MySQL.jl
# - MSSQL: ODBC.jl
# - Postgres: LibPQ.jl
# - SQLite: SQLite.jl
# - Athena: AWS.jl
# - Oracle: ODBC.jl
# - ClickHouse: `using ClickHouse`
# - MySQL and MariaDB: `using MySQL`
# - MSSQL: `using ODBC`
# - Postgres: `using LibPQ`
# - SQLite: `using SQLite`
# - Athena: `using AWS`
# - Oracle: `using ODBC`
# - Google BigQuery: `using GoogleCloud`

# For DuckDB, SQLite, and MySQL, `copy_to()` lets you copy data to the database and query there. ClickHouse, MSSQL, and Postgres support for `copy_to()` has not been added yet.
2 changes: 1 addition & 1 deletion docs/examples/UserGuide/key_differences.jl
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ df = DataFrame(id = [string('A' + i ÷ 26, 'A' + i % 26) for i in 0:9],
value = repeat(1:5, 2),
percent = 0.1:0.1:1.0);

db = connect(:duckdb);
db = connect(duckdb());

copy_to(db, df, "df_mem"); # copying over the data frame to an in-memory database

Expand Down
4 changes: 2 additions & 2 deletions docs/examples/UserGuide/s3viaduckdb.jl
Original file line number Diff line number Diff line change
Expand Up @@ -8,10 +8,10 @@
# Using TidierDB
#
# #Connect to Google Cloud via DuckDB
# #google_db = connect(:duckdb, :gbq, access_key="string", secret_key="string")
# #google_db = connect(duckdb(), :gbq, access_key="string", secret_key="string")

# #Connect to AWS via DuckDB
# aws_db = connect(:duckdb, :aws, aws_access_key_id= "string",
# aws_db = connect(duckdb(), :aws, aws_access_key_id= "string",
# aws_secret_access_key= "string",
# aws_region="us-east-1")
# s3_csv_path = "s3://path/to_data.csv"
Expand Down
38 changes: 19 additions & 19 deletions docs/src/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,19 +8,19 @@ The main goal of TidierDB.jl is to bring the syntax of Tidier.jl to multiple SQL

## Currently supported backends include:

- DuckDB (the default) `set_sql_mode(:duckdb)`
- ClickHouse `set_sql_mode(:clickhouse)`
- SQLite `set_sql_mode(:lite)`
- MySQL and MariaDB `set_sql_mode(:mysql)`
- MSSQL `set_sql_mode(:mssql)`
- Postgres `set_sql_mode(:postgres)`
- Athena `set_sql_mode(:athena)`
- Snowflake `set_sql_mode(:snowflake)`
- Google Big Query `set_sql_mode(:gbq)`
- Oracle `set_sql_mode(:oracle)`
- Databricks

The style of SQL that is generated can be modified using `set_sql_mode()`.
- DuckDB (the default) `duckdb()`
- ClickHouse `clickhouse()`
- SQLite `sqlite()`
- MySQL and MariaDB `mysql()`
- MSSQL `mssql()`
- Postgres `postgres()`
- Athena `athena()`
- Snowflake `snowflake()`
- Google Big Query `gbq()`
- Oracle `oracle()`
- Databricks `databricks()`

Change the backend using `set_sql_mode()` - for example - `set_sql_mode(databricks())`

## Installation

Expand Down Expand Up @@ -89,10 +89,10 @@ Even though the code reads similarly to TidierData, note that no computational w
using TidierData
import TidierDB as DB

db = DB.connect(:duckdb);
path = "https://gist.githubusercontent.com/seankross/a412dfbd88b3db70b74b/raw/5f23f993cd87c283ce766e7ac6b329ee7cc2e1d1/mtcars.csv"
db = DB.connect(duckdb());
path_or_name = "https://gist.githubusercontent.com/seankross/a412dfbd88b3db70b74b/raw/5f23f993cd87c283ce766e7ac6b329ee7cc2e1d1/mtcars.csv"

@chain DB.db_table(db, path) begin
@chain DB.db_table(db, path_or_name) begin
DB.@filter(!starts_with(model, "M"))
DB.@group_by(cyl)
DB.@summarize(mpg = mean(mpg))
Expand Down Expand Up @@ -122,7 +122,7 @@ end
We cannot do this using TidierDB. However, we can call `@pivot_longer()` from TidierData *after* the result of the query has been instantiated as a DataFrame, like this:

```julia
@chain DB.db_table(db, :mtcars) begin
@chain DB.db_table(db, path_or_name) begin
DB.@filter(!starts_with(model, "M"))
DB.@group_by(cyl)
DB.@summarize(mpg = mean(mpg))
Expand Down Expand Up @@ -161,7 +161,7 @@ end
We can replace `DB.collect()` with `DB.@show_query` to reveal the underlying SQL query being generated by TidierDB. To handle complex queries, TidierDB makes heavy use of Common Table Expressions (CTE), which are a useful tool to organize long queries.

```julia
@chain DB.db_table(db, :mtcars) begin
@chain DB.db_table(db, path_or_name) begin
DB.@filter(!starts_with(model, "M"))
DB.@group_by(cyl)
DB.@summarize(mpg = mean(mpg))
Expand Down Expand Up @@ -201,7 +201,7 @@ SELECT *
## TidierDB is already quite fully-featured, supporting advanced TidierData functions like `across()` for multi-column selection.

```julia
@chain DB.db_table(db, :mtcars) begin
@chain DB.db_table(db, path_or_name) begin
DB.@group_by(cyl)
DB.@summarize(across((starts_with("a"), ends_with("s")), (mean, sum)))
DB.@collect
Expand Down
Loading

2 comments on commit 8c98381

@drizk1
Copy link
Member Author

@drizk1 drizk1 commented on 8c98381 Jul 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JuliaRegistrator register

Release notes:

  • Introduces package extensions for:
    • Postgres, ClickHouse, MySQL, MsSQL, SQLite, Oracle, Athena, and Google BigQuery
    • Documentation updated for using these backends.
  • Change set_sql_mode() to use types not symbols (ie set_sql_mode(snowflake()) not set_sql_mode(:snowflake))

@JuliaRegistrator
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Registration pull request created: JuliaRegistries/General/111769

Tagging

After the above pull request is merged, it is recommended that a tag is created on this repository for the registered package version.

This will be done automatically if the Julia TagBot GitHub Action is installed, or can be done manually through the github interface, or via:

git tag -a v0.3.0 -m "<description of version>" 8c9838114c6bcbe11988a582fcd326f42d876e44
git push origin v0.3.0

Please sign in to comment.