Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SQL: Fix esType for DATETIME/DATE and INTERVALS #38179

Merged
merged 6 commits into from
Feb 5, 2019

Conversation

matriv
Copy link
Contributor

@matriv matriv commented Feb 1, 2019

Since introduction of data types that don't have a corresponding type
in ES the esType is error-prone when used for unmappedType() calls.
Moreover since the renaming of DATE to DATETIME and the introduction
of an actual date-only DATE the esType would return datetime which
is not a valid type for ES mapping.

Fixes: #38051

Since introduction of data types that don't have a corresponding type
in ES the `esType` is error-prone when used for `unmappedType()` calls.
Moreover since the renaming of `DATE` to `DATETIME` and the introduction
of an actual date-only `DATE` the `esType` would return `datetime` which
is not a valid type for ES mapping.

Fixes: elastic#38051
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-search

@@ -661,17 +661,17 @@ public void testTopHitsAggregationWithTwoArgs() {

}
{
PhysicalPlan p = optimizeAndPlan("SELECT LAST(keyword, int) FROM test");
PhysicalPlan p = optimizeAndPlan("SELECT LAST(date, int) FROM test");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was this change required by the types change in this PR or you just wanted some diversity in FIRST/LAST tests?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

*/
public final String esType;
public final String esSQLType;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I may be wrong, but why not esSqlType name?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure

@@ -126,9 +127,9 @@


/**
* Elasticsearch type name
* Lowercase type name
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure "Lowercase" is the right description for this field, I mean it is a true statement, but doesn't help much with the purpose of the field.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1.
The purpose of a comment is to explain the non-obvious:

/** an integer */ vs /** array index */
public int i;

Copy link
Contributor

@astefan astefan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Left just few minor comments.

Copy link
Member

@costin costin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see why a new property was introduced (esSQLType - by the way the naming is incorrect since we use Sql everywhere else) or what does it even mean - the type of esSQL?
If it's the Elasticsearch type then that's what esType is for. if it's the sql type it's sqlType.

If it's the type name itself why not use dt.name? If it's a lowercase vs uppercase (again arguable since it's es vs sql) a typeName() method might be more appropriate.

Further more the style of the code changes is confusing - the field access (.esType) was moved to method invocation (esType()) but not in all places.

@@ -44,13 +44,13 @@ AggregationBuilder toBuilder() {
new FieldSortBuilder(sortField)
.order(sortOrder)
.missing(LAST.position())
.unmappedType(sortFieldDataType.esType));
.unmappedType(sortFieldDataType.esType()));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is there still a sortFieldDataType? I recall discussing about removing it in the date PR.

Copy link
Contributor Author

@matriv matriv Feb 1, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the date PR? You mean in the FIRST/LAST PR? I checked again, but don't get, and sorry maybe I missed something, but we need it for this the unmappedType call.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#37936 (comment)

There's no need for sortFieldDataType - you know it already null (hence the FieldSortBuilder), simply do sortField.dataType()

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I forgot why I didn't do it after all:
We need the field name to be extracted from the field, which has some small logic and is something that we do in the QueryTranslator for other cases as well, so I didn't want to pass the field and do the name extraction inside the TopHitsAgg.

@matriv
Copy link
Contributor Author

matriv commented Feb 1, 2019

I don't see why a new property was introduced (esSQLType - by the way the naming is incorrect since we use Sql everywhere else) or what does it even mean - the type of esSQL?
If it's the Elasticsearch type then that's what esType is for. if it's the sql type it's sqlType.

If it's the type name itself why not use dt.name? If it's a lowercase vs uppercase (again arguable since it's es vs sql) a typeName() method might be more appropriate.

I wasn't sure about this name, so I was waiting for suggestions. I chose esSqlType (shouldn't be all uppercase ) just from the Idea we have in the docs with the table: es type/es-sql type/SQL type.
I like the typeName().

Further more the style of the code changes is confusing - the field access (.esType) was moved to method invocation (esType()) but not in all places.

So the idea is to use the lowercase name to everywhere it has to do with error messages or returning the column data type to the client as we used to, and use the new method .esType() that throws the exception and treats the DATETIME->date correctly wherever we use it for calls to ES APIs, like in the FieldHitExtractor or the TopHitsAgg.

@costin
Copy link
Member

costin commented Feb 2, 2019

So the idea is to use the lowercase name to everywhere it has to do with error messages or returning the column data type to the client as we used to, and use the new method .esType() that throws the exception and treats the DATETIME->date correctly wherever we use it for calls to ES APIs, like in the FieldHitExtractor or the TopHitsAgg.

Being a enum, name() already exists for the upper case variant and assuming we use it a lot, can define typeName for error messages.
My point still stands that the style should be consistent (use either the field or the method not both).
I also find it weird to throw an exception when calling esType instead of returning null.
It's defensive programming that I'm not a fan of plus the caller is already aware (otherwise one would get an exception).

Let's discuss this more next week.

@matriv
Copy link
Contributor Author

matriv commented Feb 2, 2019

So, for error messages, return column types, etc.. we need the name of the enum in lowercase.
And we want to have there date, datetime, and the interval types.

BUT for calls to unmappedType that we use when we do field extraction we need something else, since our DATETIME maps to ES date, so if we use the same method/field whatever as above that just lowercase we will call unmapedType with datetime for example which is not a valid type for ES.

Moreover if we try to make a call to unmappedType, which happens during field extraction with a DataType that doesn't exist in ES (like DATE or INTERVALxxx then returning null is not safe, imho, as null is a valid value for unmappedType method. We should never try to ask from ES to extract a field with type INTERVALxxx, therefore I chose the exception.

@matriv
Copy link
Contributor Author

matriv commented Feb 5, 2019

@costin Please check again.
Removed completely the check for ES not mapped types (DATE & Intervals) as in the places we require the esType we have a FieldAttribute with the proper EsField using the fromTypeName. My idea for the exception was to be bug-proof for the future, but I totally understand your point about "defensive" programming.

Copy link
Member

@costin costin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good however there's still the esType initialization that could be sorted out.

@@ -176,7 +181,8 @@

DataType(SQLType sqlType, int size, int defaultPrecision, int displaySize, boolean isInteger,
boolean isRational, boolean defaultDocValues) {
this.esType = name().toLowerCase(Locale.ROOT);
this.typeName = name().toLowerCase(Locale.ROOT);
this.esType = name().equals("DATETIME") ? "date" : typeName;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think a better way would be to ask this parameter for the constructor and add it for each declaration.
So DATETIME would be "date", INTERVAL_... null while NULL would be "null". Potentially add an overloaded constructor to specify esType name which by default would be the enum name lowercase but with the possibility of being overwritten.

Copy link
Member

@costin costin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

OBJECT( "object", JDBCType.STRUCT, -1, 0, 0, false, false, false),
NESTED( "nested", JDBCType.STRUCT, -1, 0, 0, false, false, false),
BINARY( "binary", JDBCType.VARBINARY, -1, Integer.MAX_VALUE, 0, false, false, false),
DATE( JDBCType.DATE, Long.BYTES, 10, 10, false, false, true),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this be "date" as well?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But DATE doesn't have a corresponding esType, same as all INTERVALS.

INTEGER( JDBCType.INTEGER, Integer.BYTES, 10, 11, true, false, true),
LONG( JDBCType.BIGINT, Long.BYTES, 19, 20, true, false, true),
// esType jdbc type, size, defPrecision,dispSize, int, rat, docvals
NULL( "null", JDBCType.NULL, 0, 0, 0, false, false, false),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would have gone the other way - use the lower-case enum way as a default type and just override this convention with null or "date". Just a matter of preference really.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's more elegant yes, but I choose the more "explicit" way to show clearly what the corresponding esType is.

@matriv matriv merged commit 2c30501 into elastic:master Feb 5, 2019
@matriv matriv deleted the mt/fix-38051 branch February 5, 2019 21:12
matriv added a commit that referenced this pull request Feb 5, 2019
Since introduction of data types that don't have a corresponding type
in ES the `esType` is error-prone when used for `unmappedType()` calls.
Moreover since the renaming of `DATE` to `DATETIME` and the introduction
of an actual date-only `DATE` the `esType` would return `datetime` which
is not a valid type for ES mapping.

Fixes: #38051
@matriv
Copy link
Contributor Author

matriv commented Feb 5, 2019

Backported to 6.x with 46cca7f

jasontedor added a commit to jasontedor/elasticsearch that referenced this pull request Feb 6, 2019
* 6.x: (25 commits)
  Backport of types removal for Put/Get index templates (elastic#38465)
  Add support for API keys to access Elasticsearch (elastic#38291) (elastic#38399)
  Deprecate support for internal versioning for concurrency control (elastic#38451)
  Deprecate types in rollover index API (elastic#38389) (elastic#38458)
  Add typless client side GetIndexRequest calls and response class (elastic#38422)
  [ML] Report index unavailable instead of waiting for lazy node (elastic#38444)
  await fix CurtIT#testIndex until elastic#38451 is merged (elastic#38466)
  Update ilm-api.asciidoc, point to REMOVE policy (elastic#38235) (elastic#38464)
  SQL: Fix esType for DATETIME/DATE and INTERVALS (elastic#38179)
  Clean up duplicate follow config parameter code (elastic#37688) (elastic#38443)
  Deprecation check for No Master Block setting (elastic#38383)
  Bubble-up exceptions from scheduler (elastic#38441)
  Lift retention lease expiration to index shard (elastic#38391)
  Deprecate maxRetryTimeout in RestClient and increase default value (elastic#38425)
  Update Rollup Caps to allow unknown fields (elastic#38446)
  Backport of elastic#38411: `if_seq_no` and `if_primary_term` parameters aren't wired correctly in REST Client's CRUD API
  Support unknown fields in ingest pipeline map configuration (elastic#38429)
  SQL: Implement CURRENT_DATE (elastic#38175)
  Backport changes to the release notes script. (elastic#38346)
  Fix ILM explain response to allow unknown fields (elastic#38363)
  ...
jasontedor added a commit to jasontedor/elasticsearch that referenced this pull request Feb 11, 2019
* master:
  Add an authentication cache for API keys (elastic#38469)
  Fix exit code in certutil packaging test (elastic#38393)
  Enable logs for intermittent test failure (elastic#38426)
  Disable BWC to backport recovering retention leases (elastic#38477)
  Enable bwc tests now that elastic#38443 is backported. (elastic#38462)
  Fix Master Failover and DataNode Leave Blocking Snapshot (elastic#38460)
  Recover retention leases during peer recovery (elastic#38435)
  Set update mappings mater node timeout to 30 min (elastic#38439)
  Assert job is not null in FullClusterRestartIT (elastic#38218)
  Update ilm-api.asciidoc, point to REMOVE policy (elastic#38235) (elastic#38463)
  SQL: Fix esType for DATETIME/DATE and INTERVALS (elastic#38179)
  Handle deprecation header-AbstractUpgradeTestCase (elastic#38396)
  XPack: core/ccr/Security-cli migration to java-time (elastic#38415)
  Disable bwc tests for elastic#38443 (elastic#38456)
  Bubble-up exceptions from scheduler (elastic#38317)
  Re-enable TasksClientDocumentationIT.testCancelTasks (elastic#38234)
  Allow custom authorization with an authorization engine  (elastic#38358)
  CRUDDocumentationIT fix documentation references
  Remove support for internal versioning for concurrency control (elastic#38254)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants