Skip to content

Commit

Permalink
Replace NonAggregate with something less broken
Browse files Browse the repository at this point in the history
This commit implements the proposal laid out at
https://discourse.diesel.rs/t/proposed-change-replace-nonaggregate-with-something-less-broken-that-can-support-group-by/18.

The changes made in this commit now correctly enforce semantics around
the mixing of aggregate/non-aggregate expressions, and lays the
foundations required for full `GROUP BY` support in the future. This
commit does not implement `GROUP BY` as a feature in public API, though
I have added some minimal support to prove the soundness of the change.

Since this will likely be the largest breaking change in 2.0, I am going
to include as much detail as I can about the problem, the reasoning
behind the solution, and the likely impact of the change.

Recap of the problem
----

`NonAggregate` was originally introduced in
c66d96f. The goal was to prevent the
following error at compile time:

   [local] sean@sean=# select count(*), name from users;
   ERROR:  42803: column "users.name" must appear in the GROUP BY clause or be used in an aggregate function

I didn't think about this nearly as much as I should have at the time,
which resulted in a hilariously broken implementation that has prevented
the addition of `group_by` support, and left bugs in the codebase for
more than 4 years.

At the time I was focused on "mixing aggregate and non-aggregate
expressions in a select statement". Since select uses tuples to
represent multiple values, I added a trait to represent non-aggregate
values, added it as a bound for `impl Expression for AnyTuple`, and
called it a day. This had a number of problems.

The most obvious was that it prevented valid queries involving multiple
aggregate expressions. At the time I figured we'd have a separate
trait for aggregate expressions, and then `impl Expression for AnyTuple
where AllFields: NonAggregate | AllFields: Aggregate`. This eventually
led to [RFC #1268](rust-lang/rfcs#1268), which
doesn't seem likely to be stabilized soon, and even if it were we still
have the other issues with this solution.

The next issue is that you cannot say whether a given expression is
aggregate by looking at it on its own. The answer to "Is `users`.`name`
aggregate?" depends on whether or not that column appears in the group
by clause. So any trait which correctly handles this must be generic
over the group by clause, or it cannot be correctly implemented for
columns.

However, the most egregious issue with that commit is that it does not
handle *any* composite expressions besides tuples. Even today,
`.select((count_star(), id))` fails, but `.select(count_star() + id)`
will compile (and either error at runtime or give non-deterministic
results depending on your backend).

User Impact
----

This change will unfortunately have more breakage than I had
anticipated. That said, the breakage only affects *extremely* generic
code, and I do not believe that the majority of our users will notice or
care about this change.

There are three main ways in which the breakage will affect users:

- The additional bound on `SelectStatement<...>: SelectDsl` and
  `SelectStatement<...>: Query`.
  - This one broke our test suite in a few places, mainly where we have
    *really* generic code to say "I can select T.eq(sql_string)". I do
    not believe this is representative of real code.
  - I did a GitHub-wide search of all occurances of
    `SelectableExpression` (which is *not* the bound on the public API,
    but is the bound on `SelectStatement`'s impl, and would point to
    broken code). I found one repo which is relying on internals that
    will break, and a lot of results from Wundergraph. I didnt' look at
    those. This probably breaks Wundergraph. Sorry, Georg. It should be
    easy to fix I promise.
- `NonAggregate` can no longer be directly implemented
  - I did a GitHub-wide search for `NonAggregate`. With one exception,
    every repo implementing this trait directly was on pre-1.0. The only
    affected repo is manually implementing `Expression` instead of using
    `#[derive(AsExpression)]`. With that change they will be
    future-proofed.
- `T: NonAggregate` no longer implies `(OtherType, T): NonAggregate`
  - This broke `infer_schema`, since it was relying on `AssocType:
    NonAggregate` to know `(known_column, AssocType, known_column):
    Expression`. Ironically, that's no longer important, it did still
    break due to the first item on this list. I could not find any code
    in the wild that fell into this category.

Originally I thought that the only code which would be affected by this
is code that wrote `impl NonAggregate`, since that code would need to
change to `impl ValidGrouping`. However, I missed a few things when I
made that assessment.

The first is that... Well the feature exists. The whole point of this
was to prevent `aggregate + non_aggregate` from compiling when passed to
select, which implies a new trait bound *somewhere*. While
`SelectStatement` and its impls are private API, it's really easy to
couple yourself ot the bounds on those impls. It doesn't help that
`rustc` literally recommends that folks do that when errors occur. Any
code that is written as `Foo: SelectDsl<Selection>` will be fine, but
any code that's gone down the rabbit hole and has copied the bounds from
`impl SelectDsl for SelectStatement<...>` will break. I didn't find many
cases in the wild, but I do think it's relatively common.

The second thing I missed is that "is this aggregate" is not a binary
question. Originally I expected we would have two answers to the
question, and compound expressions would enforce that their
sub-expressions all had the same answer. The issue is that `1` doesn't
care. You can have `count(*) + 1`, and you can also have `non_aggregate
+ 1`. `1` is always valid, regardless of aggregation. So we need three
answers. The problem is that this means we can't have `T: NonAggregate`
mean everything it used to.

On stable `T: NonAggregate` will mean `T: ValidGrouping<()>`, and that
`T` can be passed to everything with a `NonAggregate` bound (`filter`,
`order`, etc). On nightly, it also implies `T::IsAggregate:
MixedAggregates<is_aggregate::No, Output = is_aggregate::No>`. In
English, it implies that this is valid with no group by clause, and its
output can appear with an expression which is not aggregate (but might
be with a different group by clause). The outcome of this is that `T:
NonAggregate` implies `(T, Other): NonAggregate`. However the missing
link from both stable and unstable is `is_aggregate::No:
MixedAggregates<T::IsAggregate>` being implied, which would mean
`(Other, T): NonAggregate` would be implied.

Ultimately this is a really long-winded way of saying that `T:
NonAggregate` used to imply interactions with other types. That is no
longer consistently true. It's unlikely there are many affected users,
but any that are affected will need to directly have a `ValidGrouping`
bound.

Implementation Notes
---

Because this broke diesel_infer_schema, I had to do a version bump to
get that crate to rely on the released 1.4.

There's a note on the unsoundness of the `NonAggregate` impl of
`Subselect`. I haven't changed the bounds on that impl, but we almost
certainly should address it before 2.0 is released.

I opted to validate the new bound in `SelectDsl`, so folks get errors on
`.select` instead of `.load`. We can't validate this on calls to both
`.select` *and* `.group_by`, since a select statement valid with a group
by is often invalid without one, and a group by clause usually makes the
default select clause invalid.

While this commit does not add proper group by support, I fleshed it out
a bit to test the type constraints. Right now a column only considers
itself valid if there is no group by clause, or if the group by clause
is exactly that column.

I had more to say but I went on a call and lost my train of thought. I
need to come back and finish writing this later.

Notable Limitations
---

`SELECT lower(name) FROM users GROUP BY lower(name)` probably can't be
represented.

Unanswered Questions
---

`BoxableExpression` has been limited to only work when there's no
group by clause, and only work with types which are `is_aggregate::No`.
This is probably not what we want to do, but was the smallest change to
start.
  • Loading branch information
sgrif committed Dec 18, 2019
1 parent 6e2d467 commit a0b428b
Show file tree
Hide file tree
Showing 51 changed files with 460 additions and 143 deletions.
6 changes: 3 additions & 3 deletions .travis.yml
Original file line number Diff line number Diff line change
Expand Up @@ -53,7 +53,7 @@ matrix:
env: RUSTFLAGS="--cap-lints=warn"
script:
- (cd diesel_compile_tests && cargo +$TRAVIS_RUST_VERSION test)
- rust: 1.36.0
- rust: 1.37.0
name: "Rustfmt && Clippy"
script:
- rustup component add rustfmt clippy
Expand All @@ -65,8 +65,8 @@ matrix:
- SQLITE_DATABASE_URL=/tmp/test.db
script:
- (cd diesel_cli && cargo +$TRAVIS_RUST_VERSION test --no-default-features --features "sqlite-bundled")
- rust: 1.36.0
name: "Minimal supported rust version == 1.36.0"
- rust: 1.37.0
name: "Minimal supported rust version == 1.37.0"
script:
- cargo +$TRAVIS_RUST_VERSION check --all

Expand Down
60 changes: 57 additions & 3 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@ for Rust libraries in [RFC #1105](https://github.com/rust-lang/rfcs/blob/master/
### Added

* `NonAggregate` can now be derived for simple cases.

* `Connection` and `SimpleConnection` traits are implemented for a broader range
of `r2d2::PooledConnection<M>` types when the `r2d2` feature is enabled.

Expand All @@ -18,19 +19,27 @@ for Rust libraries in [RFC #1105](https://github.com/rust-lang/rfcs/blob/master/
* All expression methods can now be called on expressions of nullable types.

* Added `BoxedSqlQuery`. This allows users to do a variable amount of `.sql` or

`.bind` calls without changing the underlying type.

* Added `.sql` to `SqlQuery` and `UncheckedBind` to allow appending SQL code to

an existing query.

* Multiple aggregate expressions can now appear together in the same select
clause. See [the upgrade notes](#2-0-0-upgrade-non-aggregate) for details.

* `ValidGrouping` has been added to represent whether an expression is valid for
a given group by clause, and whether or not it's aggregate. It replaces the
functionality of `NonAggregate`. See [the upgrade
notes](#2-0-0-upgrade-non-aggregate) for details.

### Removed

* All previously deprecated items have been removed.

* `#[derive(NonAggregate)]` has been deprecated in favor of
`#[derive(ValidGrouping)]`. The `NonAggregate` trait is now a trait alias, and
cannot be directly implemented. Both derives generate identical code.

### Changed

* The way [the `Backend` trait][backend-2-0-0] handles its `RawValue` type has
Expand All @@ -55,6 +64,20 @@ for Rust libraries in [RFC #1105](https://github.com/rust-lang/rfcs/blob/master/
For the postgres backend additionally type information where added to the `RawValue`
type. This allows to dynamically deserialize `RawValues` in container types.

* The handling of mixed aggregate values is more robust. Invalid queries such as
`.select(max(id) + other_column)` are now correctly rejected, and valid
queries such as `.select((count_star(), max(other_column)))` are now correctly
accepted. For more details, see [the upgrade notes](#2-0-0-upgrade-non-aggregate).

* `NonAggregate` is now a trait alias for `ValidGrouping<()>` for expressions
that are not aggregate. On stable this is a normal trait with a blanket impl,
but it should never be implemented directly. With the `unstable` feature, it
will use trait aliases which prevent manual implementations.

Due to language limitations, we cannot make the new trait alias by itself
represent everything it used to, so in some rare cases code changes may be
required. See [the upgrade notes](#2-0-0-upgrade-non-aggregate) for details.

### Fixed

* Many types were incorrectly considered non-aggregate when they should not
Expand All @@ -67,7 +90,38 @@ for Rust libraries in [RFC #1105](https://github.com/rust-lang/rfcs/blob/master/
are now available without the `diesel_` prefix. With Rust 2018 they can be
invoked as `diesel::infix_operator!` instead.


### Upgrade Notes

#### Replacement of `NonAggregate` with `ValidGrouping`
<a name="2-0-0-upgrade-non-aggregate"></a>

FIXME: This should probably be on the website, but I wanted to document it in
the PR adding the changes.

Key points:

- Rules for aggregation are now correctly enforced. They match the semantics of
PG or MySQL with `ONLY_FULL_GROUP_BY` enabled.
- As before, `sql` is the escape hatch if needed.
- MySQL users can use `ANY_VALUE`, PG users can use `DISTINCT ON`. Also
consider using max/min/etc to get deterministic values.
- Any `impl NonAggregate` must be replaced with `impl ValidGrouping`
- If you were deriving before you can still derive.
- For most code, `T: NonAggregate` should continue to work. Unless you're
getting a compiler error, you most likely don't need to change it.
- The full equivalent of what `T: NonAggregate` used to mean is:

where
T: ValidGrouping<()>,
T::IsAggregate: MixedGrouping<is_aggregate::No, Output = is_aggregate::No>,
is_aggreagte::No: MixedGrouping<T::IsAggregate, Output = is_aggreagte::No>,

- With `feature = "unstable"`, `T: NonAggregate` implies the first two bounds,
but not the third. On stable only the first bound is implied. This is a
language limitation.
- `T: NonAggregate` can still be passed everywhere it could before, but `T:
NonAggregate` no longer implies `(OtherType, T): NonAggregate`.
- With `feature = "unstable"`, `(T, OtherType): NonAggregate` is still implied.

[2-0-migration]: FIXME write a migration guide

Expand Down
2 changes: 0 additions & 2 deletions Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -26,8 +26,6 @@ members = [
]

[replace]
"diesel:1.4.3" = { path = "diesel" }
"diesel_derives:1.4.1" = { path = "diesel_derives" }
"diesel_migrations:1.4.0" = { path = "diesel_migrations" }
"migrations_internals:1.4.0" = { path = "diesel_migrations/migrations_internals" }
"migrations_macros:1.4.1" = { path = "diesel_migrations/migrations_macros" }
4 changes: 2 additions & 2 deletions diesel/Cargo.toml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
[package]
name = "diesel"
version = "1.4.3"
version = "2.0.0"
authors = ["Sean Griffin <sean@seantheprogrammer.com>"]
license = "MIT OR Apache-2.0"
description = "A safe, extensible ORM and Query Builder for PostgreSQL, SQLite, and MySQL"
Expand All @@ -13,7 +13,7 @@ categories = ["database"]

[dependencies]
byteorder = "1.0"
diesel_derives = "~1.4.0"
diesel_derives = { version = "~2.0.0", path = "../diesel_derives" }
chrono = { version = "0.4", optional = true }
libc = { version = "0.2.0", optional = true }
libsqlite3-sys = { version = ">=0.8.0, <0.17.0", optional = true, features = ["min_sqlite_version_3_7_16"] }
Expand Down
6 changes: 3 additions & 3 deletions diesel/src/expression/array_comparison.rs
Original file line number Diff line number Diff line change
Expand Up @@ -5,13 +5,13 @@ use query_builder::*;
use result::QueryResult;
use sql_types::Bool;

#[derive(Debug, Copy, Clone, QueryId, NonAggregate)]
#[derive(Debug, Copy, Clone, QueryId, ValidGrouping)]
pub struct In<T, U> {
left: T,
values: U,
}

#[derive(Debug, Copy, Clone, QueryId, NonAggregate)]
#[derive(Debug, Copy, Clone, QueryId, ValidGrouping)]
pub struct NotIn<T, U> {
left: T,
values: U,
Expand Down Expand Up @@ -140,7 +140,7 @@ where
}
}

#[derive(Debug, Clone, NonAggregate)]
#[derive(Debug, Clone, ValidGrouping)]
pub struct Many<T>(Vec<T>);

impl<T: Expression> Expression for Many<T> {
Expand Down
4 changes: 3 additions & 1 deletion diesel/src/expression/bound.rs
Original file line number Diff line number Diff line change
Expand Up @@ -47,4 +47,6 @@ impl<T, U, QS> SelectableExpression<QS> for Bound<T, U> where Bound<T, U>: Appea

impl<T, U, QS> AppearsOnTable<QS> for Bound<T, U> where Bound<T, U>: Expression {}

impl<T, U> NonAggregate for Bound<T, U> {}
impl<T, U, GB> ValidGrouping<GB> for Bound<T, U> {
type IsAggregate = is_aggregate::Never;
}
6 changes: 5 additions & 1 deletion diesel/src/expression/coerce.rs
Original file line number Diff line number Diff line change
Expand Up @@ -53,4 +53,8 @@ where
}
}

impl<T, ST> NonAggregate for Coerce<T, ST> where T: NonAggregate {}
impl<T, ST, GB> ValidGrouping<GB> for Coerce<T, ST>
where T: ValidGrouping<GB>,
{
type IsAggregate = T::IsAggregate;
}
6 changes: 5 additions & 1 deletion diesel/src/expression/count.rs
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
use super::Expression;
use super::{Expression, ValidGrouping, is_aggregate};
use backend::Backend;
use query_builder::*;
use result::QueryResult;
Expand Down Expand Up @@ -63,6 +63,10 @@ impl Expression for CountStar {
type SqlType = BigInt;
}

impl<GB> ValidGrouping<GB> for CountStar {
type IsAggregate = is_aggregate::Yes;
}

impl<DB: Backend> QueryFragment<DB> for CountStar {
fn walk_ast(&self, mut out: AstPass<DB>) -> QueryResult<()> {
out.push_sql("COUNT(*)");
Expand Down
9 changes: 7 additions & 2 deletions diesel/src/expression/exists.rs
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
use backend::Backend;
use expression::subselect::Subselect;
use expression::{AppearsOnTable, Expression, NonAggregate, SelectableExpression};
use expression::{AppearsOnTable, Expression, ValidGrouping, SelectableExpression};
use query_builder::*;
use result::QueryResult;
use sql_types::Bool;
Expand Down Expand Up @@ -43,7 +43,12 @@ where
type SqlType = Bool;
}

impl<T> NonAggregate for Exists<T> where Subselect<T, ()>: NonAggregate {}
impl<T, GB> ValidGrouping<GB> for Exists<T>
where
Subselect<T, ()>: ValidGrouping<GB>,
{
type IsAggregate = <Subselect<T, ()> as ValidGrouping<GB>>::IsAggregate;
}

#[cfg(not(feature = "unstable"))]
impl<T, DB> QueryFragment<DB> for Exists<T>
Expand Down
2 changes: 1 addition & 1 deletion diesel/src/expression/functions/date_and_time.rs
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ use sql_types::*;
/// Represents the SQL `CURRENT_TIMESTAMP` constant. This is equivalent to the
/// `NOW()` function on backends that support it.
#[allow(non_camel_case_types)]
#[derive(Debug, Copy, Clone, QueryId, NonAggregate)]
#[derive(Debug, Copy, Clone, QueryId, ValidGrouping)]
pub struct now;

impl Expression for now {
Expand Down
4 changes: 2 additions & 2 deletions diesel/src/expression/functions/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -90,7 +90,7 @@
/// are:
///
/// - `#[aggregate]`
/// - Indicates that this is an aggregate function, and that `NonAggregate`
/// - Indicates that this is an aggregate function, and that `ValidGrouping`
/// should not be implemented.
/// - `#[sql_name="name"]`
/// - The SQL to be generated is different than the Rust name of the function.
Expand Down Expand Up @@ -174,7 +174,7 @@ macro_rules! no_arg_sql_function_body_except_to_sql {
($type_name:ident, $return_type:ty, $docs:expr) => {
#[allow(non_camel_case_types)]
#[doc=$docs]
#[derive(Debug, Clone, Copy, QueryId, NonAggregate)]
#[derive(Debug, Clone, Copy, QueryId, ValidGrouping)]
pub struct $type_name;

impl $crate::expression::Expression for $type_name {
Expand Down
2 changes: 1 addition & 1 deletion diesel/src/expression/grouped.rs
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@ use expression::Expression;
use query_builder::*;
use result::QueryResult;

#[derive(Debug, Copy, Clone, QueryId, Default, DieselNumericOps, NonAggregate)]
#[derive(Debug, Copy, Clone, QueryId, Default, DieselNumericOps, ValidGrouping)]
pub struct Grouped<T>(pub T);

impl<T: Expression> Expression for Grouped<T> {
Expand Down
Loading

0 comments on commit a0b428b

Please sign in to comment.