Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SQL: Implement FIRST/LAST aggregate functions #37936

Merged
merged 11 commits into from
Jan 31, 2019
198 changes: 198 additions & 0 deletions docs/reference/sql/functions/aggs.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -113,6 +113,196 @@ Returns the total number of _distinct non-null_ values in input values.
include-tagged::{sql-specs}/docs.csv-spec[aggCountDistinct]
--------------------------------------------------

[[sql-functions-aggs-first]]
===== `FIRST/FIRST_VALUE`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Additionally please add an example for FIRST_VALUE.


.Synopsis:
[source, sql]
----------------------------------------------
FIRST(field_name<1>[, ordering_field_name]<2>)
----------------------------------------------

*Input*:

<1> target field for the aggregation
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"aggregation" refers to technical implementation in the background. My personal approach would be not to expose this, but try to explain what the field is used for from the end-user perspective. Again, personal preference.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not necessarily - for the aggregate function (aggregation is ES terminology, aggregate is SQL).

<2> optional field used for ordering

*Output*: same type as the input

.Description:

Returns the first **non-NULL** value (if such exists) of the `field_name` input column sorted by
the `ordering_field_name` column. If `ordering_field_name` is not provided, only the `field_name`
column is used for the sorting. E.g.:

[cols="<,<"]
|===
s| a | b

| 100 | 1
| 200 | 1
| 1 | 2
| 2 | 2
| 10 | null
| 20 | null
| null | null
|===

[source, sql]
----------------------
SELECT FIRST(a) FROM t
----------------------

will result in:
[cols="<"]
|===
s| FIRST(a)
| 1
|===

and

[source, sql]
-------------------------
SELECT FIRST(a, b) FROM t
-------------------------

will result in:
[cols="<"]
|===
s| FIRST(a, b)
| 100
|===


["source","sql",subs="attributes,macros"]
-----------------------------------------------------------
include-tagged::{sql-specs}/docs.csv-spec[firstWithOneArg]
-----------------------------------------------------------

["source","sql",subs="attributes,macros"]
--------------------------------------------------------------------
include-tagged::{sql-specs}/docs.csv-spec[firstWithOneArgAndGroupBy]
--------------------------------------------------------------------

["source","sql",subs="attributes,macros"]
-----------------------------------------------------------
include-tagged::{sql-specs}/docs.csv-spec[firstWithTwoArgs]
-----------------------------------------------------------

["source","sql",subs="attributes,macros"]
---------------------------------------------------------------------
include-tagged::{sql-specs}/docs.csv-spec[firstWithTwoArgsAndGroupBy]
---------------------------------------------------------------------

`FIRST_VALUE` is a name alias and can be used instead of `FIRST`, e.g.:

["source","sql",subs="attributes,macros"]
--------------------------------------------------------------------------
include-tagged::{sql-specs}/docs.csv-spec[firstValueWithTwoArgsAndGroupBy]
--------------------------------------------------------------------------

[NOTE]
`FIRST` cannot be used in a HAVING clause.
[NOTE]
`FIRST` cannot be used with columns of type <<text, `text`>> unless
the field is also <<before-enabling-fielddata,saved as a keyword>>.

[[sql-functions-aggs-last]]
===== `LAST/LAST_VALUE`

.Synopsis:
[source, sql]
--------------------------------------------------
LAST(field_name<1>[, ordering_field_name]<2>)
--------------------------------------------------

*Input*:

<1> target field for the aggregation
<2> optional field used for ordering

*Output*: same type as the input

.Description:

It's the inverse of <<sql-functions-aggs-first>>. Returns the last **non-NULL** value (if such exists) of the
`field_name`input column sorted descending by the `ordering_field_name` column. If `ordering_field_name` is not
provided, only the `field_name` column is used for the sorting. E.g.:

[cols="<,<"]
|===
s| a | b

| 10 | 1
| 20 | 1
| 1 | 2
| 2 | 2
| 100 | null
| 200 | null
| null | null
|===

[source, sql]
------------------------
SELECT LAST(a) FROM t
------------------------

will result in:
[cols="<"]
|===
s| LAST(a)
| 200
|===

and

[source, sql]
------------------------
SELECT LAST(a, b) FROM t
------------------------

will result in:
[cols="<"]
|===
s| LAST(a, b)
| 2
|===


["source","sql",subs="attributes,macros"]
-----------------------------------------------------------
include-tagged::{sql-specs}/docs.csv-spec[lastWithOneArg]
-----------------------------------------------------------

["source","sql",subs="attributes,macros"]
-------------------------------------------------------------------
include-tagged::{sql-specs}/docs.csv-spec[lastWithOneArgAndGroupBy]
-------------------------------------------------------------------

["source","sql",subs="attributes,macros"]
-----------------------------------------------------------
include-tagged::{sql-specs}/docs.csv-spec[lastWithTwoArgs]
-----------------------------------------------------------

["source","sql",subs="attributes,macros"]
--------------------------------------------------------------------
include-tagged::{sql-specs}/docs.csv-spec[lastWithTwoArgsAndGroupBy]
--------------------------------------------------------------------

`LAST_VALUE` is a name alias and can be used instead of `LAST`, e.g.:

["source","sql",subs="attributes,macros"]
-------------------------------------------------------------------------
include-tagged::{sql-specs}/docs.csv-spec[lastValueWithTwoArgsAndGroupBy]
-------------------------------------------------------------------------

[NOTE]
`LAST` cannot be used in `HAVING` clause.
[NOTE]
`LAST` cannot be used with columns of type <<text, `text`>> unless
the field is also <<before-enabling-fielddata,`saved as a keyword`>>.

[[sql-functions-aggs-max]]
===== `MAX`

Expand All @@ -137,6 +327,10 @@ Returns the maximum value across input values in the field `field_name`.
include-tagged::{sql-specs}/docs.csv-spec[aggMax]
--------------------------------------------------

[NOTE]
`MAX` on a field of type <<text, `text`>> or <<keyword, `keyword`>> is translated into
<<sql-functions-aggs-last>> and therefore, it cannot be used in `HAVING` clause.

[[sql-functions-aggs-min]]
===== `MIN`

Expand All @@ -161,6 +355,10 @@ Returns the minimum value across input values in the field `field_name`.
include-tagged::{sql-specs}/docs.csv-spec[aggMin]
--------------------------------------------------

[NOTE]
`MIN` on a field of type <<text, `text`>> or <<keyword, `keyword`>> is translated into
<<sql-functions-aggs-first>> and therefore, it cannot be used in `HAVING` clause.

[[sql-functions-aggs-sum]]
===== `SUM`

Expand Down
7 changes: 7 additions & 0 deletions docs/reference/sql/limitations.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -90,3 +90,10 @@ include-tagged::{sql-specs}/docs.csv-spec[limitationSubSelectRewritten]

But, if the sub-select would include a `GROUP BY` or `HAVING` or the enclosing `SELECT` would be more complex than `SELECT X
FROM (SELECT ...) WHERE [simple_condition]`, this is currently **un-supported**.

[float]
=== Use <<sql-functions-aggs-first, `FIRST`>>/<<sql-functions-aggs-last,`LAST`>> aggregation functions in `HAVING` clause

Using `FIRST` and `LAST` in the `HAVING` clause is not supported. The same applies to
<<sql-functions-aggs-min,`MIN`>> and <<sql-functions-aggs-max,`MAX`>> when their target column
is of type <<keyword, `keyword`>> as they are internally translated to `FIRST` and `LAST`.
Original file line number Diff line number Diff line change
Expand Up @@ -31,6 +31,10 @@ public void testShowFunctions() throws IOException {
assertThat(readLine(), containsString(HEADER_SEPARATOR));
assertThat(readLine(), RegexMatcher.matches("\\s*AVG\\s*\\|\\s*AGGREGATE\\s*"));
assertThat(readLine(), RegexMatcher.matches("\\s*COUNT\\s*\\|\\s*AGGREGATE\\s*"));
assertThat(readLine(), RegexMatcher.matches("\\s*FIRST\\s*\\|\\s*AGGREGATE\\s*"));
assertThat(readLine(), RegexMatcher.matches("\\s*FIRST_VALUE\\s*\\|\\s*AGGREGATE\\s*"));
assertThat(readLine(), RegexMatcher.matches("\\s*LAST\\s*\\|\\s*AGGREGATE\\s*"));
assertThat(readLine(), RegexMatcher.matches("\\s*LAST_VALUE\\s*\\|\\s*AGGREGATE\\s*"));
assertThat(readLine(), RegexMatcher.matches("\\s*MAX\\s*\\|\\s*AGGREGATE\\s*"));
assertThat(readLine(), RegexMatcher.matches("\\s*MIN\\s*\\|\\s*AGGREGATE\\s*"));
String line = readLine();
Expand Down Expand Up @@ -58,6 +62,8 @@ public void testShowFunctions() throws IOException {
public void testShowFunctionsLikePrefix() throws IOException {
assertThat(command("SHOW FUNCTIONS LIKE 'L%'"), RegexMatcher.matches("\\s*name\\s*\\|\\s*type\\s*"));
assertThat(readLine(), containsString(HEADER_SEPARATOR));
assertThat(readLine(), RegexMatcher.matches("\\s*LAST\\s*\\|\\s*AGGREGATE\\s*"));
assertThat(readLine(), RegexMatcher.matches("\\s*LAST_VALUE\\s*\\|\\s*AGGREGATE\\s*"));
assertThat(readLine(), RegexMatcher.matches("\\s*LEAST\\s*\\|\\s*CONDITIONAL\\s*"));
assertThat(readLine(), RegexMatcher.matches("\\s*LOG\\s*\\|\\s*SCALAR\\s*"));
assertThat(readLine(), RegexMatcher.matches("\\s*LOG10\\s*\\|\\s*SCALAR\\s*"));
Expand Down
73 changes: 73 additions & 0 deletions x-pack/plugin/sql/qa/src/main/resources/agg.csv-spec
Original file line number Diff line number Diff line change
Expand Up @@ -373,3 +373,76 @@ SELECT COUNT(ALL last_name)=COUNT(ALL first_name) AS areEqual, COUNT(ALL first_n
---------------+---------------+---------------
false |90 |100
;

topHitsWithOneArgAndGroupBy
schema::gender:s|first:s|last:s
SELECT gender, FIRST(first_name) as first, LAST(first_name) as last FROM test_emp GROUP BY gender ORDER BY gender;

gender | first | last
---------------+---------------+---------------
null | Berni | Patricio
F | Alejandro | Xinglin
M | Amabile | Zvonko
;

topHitsWithTwoArgsAndGroupBy
schema::gender:s|first:s|last:s
SELECT gender, FIRST(first_name, birth_date) as first, LAST(first_name, birth_date) as last FROM test_emp GROUP BY gender ORDER BY gender;

gender | first | last
---------------+---------------+---------------
null | Lillian | Eberhardt
F | Sumant | Valdiodio
M | Remzi | Hilari
;

topHitsWithTwoArgsAndGroupByWithNullsOnTargetField
schema::gender:s|first:s|last:s
SELECT gender, FIRST(first_name, birth_date) AS first, LAST(first_name, birth_date) AS last FROM test_emp WHERE emp_no BETWEEN 10025 AND 10035 GROUP BY gender ORDER BY gender;

gender | first | last
---------------+---------------+---------------
F | null | Divier
M | null | Domenick
;

topHitsWithTwoArgsAndGroupByWithNullsOnSortingField
schema::gender:s|first:s|last:s
SELECT gender, FIRST(first_name, birth_date) AS first, LAST(first_name, birth_date) AS last FROM test_emp WHERE emp_no BETWEEN 10047 AND 10052 GROUP BY gender ORDER BY gender;

gender | first | last
---------------+---------------+---------------
F | Basil | Basil
M | Hidefumi | Heping
;

topHitsWithTwoArgsAndGroupByWithNullsOnTargetAndSortingField
schema::gender:s|first:s|last:s
SELECT gender, FIRST(first_name, birth_date) AS first, LAST(first_name, birth_date) AS last FROM test_emp WHERE emp_no BETWEEN 10037 AND 10052 GROUP BY gender ORDER BY gender;

gender | first | last
---------------+-------------+-----------------
F | Basil | Weiyi
M | Hidefumi | null
;

topHitsWithTwoArgsAndGroupByWithAllNullsOnTargetField
schema::gender:s|first:s|last:s
SELECT gender, FIRST(first_name, birth_date) AS first, LAST(first_name, birth_date) AS last FROM test_emp WHERE emp_no BETWEEN 10030 AND 10037 GROUP BY gender ORDER BY gender;

gender | first | last
---------------+---------------+---------------
F | null | null
M | null | null
;

topHitsOnDatetime
schema::gender:s|first:i|last:i
SELECT gender, month(first(birth_date, languages)) first, month(last(birth_date, languages)) last FROM test_emp GROUP BY gender ORDER BY gender;

gender | first | last
---------------+---------------+---------------
null | 1 | 10
F | 4 | 6
M | 1 | 4
;
8 changes: 6 additions & 2 deletions x-pack/plugin/sql/qa/src/main/resources/command.csv-spec
Original file line number Diff line number Diff line change
Expand Up @@ -8,8 +8,12 @@ SHOW FUNCTIONS;

name:s | type:s
AVG |AGGREGATE
COUNT |AGGREGATE
MAX |AGGREGATE
COUNT |AGGREGATE
FIRST |AGGREGATE
FIRST_VALUE |AGGREGATE
LAST |AGGREGATE
LAST_VALUE |AGGREGATE
MAX |AGGREGATE
MIN |AGGREGATE
SUM |AGGREGATE
KURTOSIS |AGGREGATE
Expand Down
Loading