Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement fields fetch for runtime fields #61995

Merged
merged 27 commits into from
Sep 15, 2020

Conversation

nik9000
Copy link
Member

@nik9000 nik9000 commented Sep 4, 2020

This implements the fields API in _search for runtime fields using
doc values. Most of that implementation is stolen from the
docvalue_fields fetch sub-phase, just moved into the same API that the
fields API uses. At this point the docvalue_fields fetch phase looks
like a special case of the fields API.

While I was at it I moved the "which doc values sub-implementation
should I use for fetching?" question from a bunch of instanceofs to a
method on LeafFieldData so we can be much more flexible with what is
returned and we're not forced to extend certain classes just to make the
fetch phase happy.

Relates to #59332

This implements the `fields` API in `_search` for runtime fields using
doc values. Most of that implementation is stolen from the
`docvalue_fields` fetch sub-phase, just moved into the same API that the
`fields` API uses. At this point the `docvalue_fields` fetch phase looks
like a special case of the `fields` API.

While I was at it I moved the "which doc values sub-implementation
should I use for fetching?" question from a bunch of `instanceof`s to a
method on `LeafFieldData` so we can be much more flexible with what is
returned and we're not forced to extend certain classes just to make the
fetch phase happy.

Relates to elastic#59332
@@ -299,6 +299,18 @@ public SearchLookup lookup() {
return this.lookup;
}

private SearchLookup fetchLookup = null;

public SearchLookup fetchLookup() {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is the most controversial part of this change. Previously we were building the SourceLookup in the fetch phase and letting things that needed doc values at fetch time use the standard search lookup. This starts us down a road to removing that. I think we like this because we don't want things like the explain sub-phase to modify the rest of the fetch phases.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we just create a SearchLookup within FetchPhase? I think we could store it on HitContext instead of storing SourceLookup directly. That would make its scope and lifetime clearer. It also feels weird to me to put this on QueryShardContext, since it's not meant to be used during the query phase.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmmmm. Let me dig into this again! I did it this way when I was building things really early and needed something in a much more "global" place to do the building.

I was trying not to modify HitContext so much - maybe it can be something we pass in to the phases when we build them. I'll take another look.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can create SearchLookup just for the FetchPhase. I'll push a change that does that. Sadly, we still have to delegate to QueryShardContext to create it because it is the thing with all of the right dependencies.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've pushed the change.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess I need to read this discussion because I have been asking questions around this as part of my review :)

}
}
};
}

private abstract static class DocValueField {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All of the "brains" of this stuff has moved into the LeafFieldData implementations.

public ValueFetcher valueFetcher(MapperService mapperService, String format) {
throw new UnsupportedOperationException();
public ValueFetcher valueFetcher(MapperService mapperService, SearchLookup lookup, String format) {
return new DocValueFetcher(fieldType().docValueFormat(format, null), () -> lookup.doc().getForField(fieldType()));
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that everything else is in the LeafFieldData implemenations, this is all we need in runtime fields!

@nik9000 nik9000 marked this pull request as ready for review September 4, 2020 14:13
@nik9000 nik9000 added :Search/Search Search-related issues that do not fall into other categories v7.10.0 v8.0.0 labels Sep 4, 2020
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-search (:Search/Search)

@elasticmachine elasticmachine added the Team:Search Meta label for search team label Sep 4, 2020
@nik9000 nik9000 added >enhancement and removed Team:Search Meta label for search team labels Sep 4, 2020
@elasticmachine elasticmachine added the Team:Search Meta label for search team label Sep 4, 2020
@nik9000
Copy link
Member Author

nik9000 commented Sep 4, 2020

ooooh! This is broken for inner hits. I'll see what I can find!

@nik9000
Copy link
Member Author

nik9000 commented Sep 4, 2020

Nested works now! New error having to do with date_nanos. Looks like I screwed something up there. I'll track it down!

@nik9000
Copy link
Member Author

nik9000 commented Sep 4, 2020

And now data nanos!

Copy link
Contributor

@jtibshirani jtibshirani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left a first round of comments. (I'm still thinking through the details around the SearchLookup changes.)

Copy link
Contributor

@jtibshirani jtibshirani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've now read the full change and left some comments !

public FieldValueRetriever fieldValueRetriever() {
return fieldValueRetriever;
public FieldValueRetriever fieldValueRetriever(String indexName, MapperService mapperService, SearchLookup searchLookup) {
if (mapperService.documentMapper().sourceMapper().enabled() == false) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small comment, it feels more natural to do this in FetchFieldsContext rather than this factory method ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I inherited its location from where it was before - I dunno where it feels better. Would you prefer I move it into the factory method?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure if you don't mind I think it's nicer to create it directly in FetchFieldsContext.

@@ -299,6 +299,18 @@ public SearchLookup lookup() {
return this.lookup;
}

private SearchLookup fetchLookup = null;

public SearchLookup fetchLookup() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we just create a SearchLookup within FetchPhase? I think we could store it on HitContext instead of storing SourceLookup directly. That would make its scope and lifetime clearer. It also feels weird to me to put this on QueryShardContext, since it's not meant to be used during the query phase.

@@ -42,5 +44,10 @@
* @param lookup a lookup structure over the document's source.
* @return a list a standardized field values.
*/
List<Object> fetchValues(SourceLookup lookup);
List<Object> fetchValues(SourceLookup lookup) throws IOException;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure about this idea, but it feels natural to only pass the doc ID here:

List<Object> fetchValues(SourceLookup lookup) throws IOException;

Then SourceValueFetcher could hang onto the SearchLookup and use that to access SourceLookup ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking about that too! I didn't want to do it as part if this PR because it'd make it huge, but I think its a good idea.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm - this doesn't look like it'll make the PR much bigger. I'll do that.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've pushed a refactor for this too.

@nik9000
Copy link
Member Author

nik9000 commented Sep 10, 2020

@jtibshirani and I talked about this some over video today and settled on two things:

  1. Revert the change from SourceLookup to docId being passed to the DocValueFetcher. We're not 100% sure it makes sense yet.
  2. Rename newSearchLookup on QueryShardContext to newFetchLookup to make it clear that it is just for the fetch phase. We think we should build a more fetch-phase-specific version of SearchLookup, but that that effort should wait.

@nik9000
Copy link
Member Author

nik9000 commented Sep 14, 2020

@jtibshirani and I talked about this some over video today and settled on two things:

And I pushed changes to do those two things.

Copy link
Contributor

@jtibshirani jtibshirani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me, thanks again for the many iterations!

For context, @nik9000 and I decided that this approach is best for the initial PR, but the SearchLookup sharing logic is still a bit confusing. We're thinking through some follow-up refactors to help solidify + clarify it.

@jtibshirani
Copy link
Contributor

I just realized that we should also remove the REST test exclusions in runtime-fields/qa/rest/build.gradle:

'search/330_fetch_fields/*', // The 'fields' option is not yet supported
'search/110_field_collapsing/field collapsing, inner_hits, and fields', // Also fails because of the 'fields' option

@nik9000
Copy link
Member Author

nik9000 commented Sep 14, 2020

I just realized that we should also remove the REST test exclusions in runtime-fields/qa/rest/build.gradle:

Oh! I thought I had. I'll do it before merging. Thanks!

Copy link
Member

@javanna javanna left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@Override
public List<Object> fetchValues(int docId) throws IOException {
if (false == leaf.advanceExact(docId)) {
return List.of();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ping?

}
return this.lookup;
}

/**
* Build a lookup customized for the fetch phase. Use {@link #lookup()}
* in other phases.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shall we open an issue to track finding an alternative for this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like @romseygeek is working on this now.

@javanna javanna mentioned this pull request Sep 15, 2020
@nik9000 nik9000 merged commit 9a127ad into elastic:master Sep 15, 2020
nik9000 added a commit to nik9000/elasticsearch that referenced this pull request Sep 15, 2020
This implements the `fields` API in `_search` for runtime fields using
doc values. Most of that implementation is stolen from the
`docvalue_fields` fetch sub-phase, just moved into the same API that the
`fields` API uses. At this point the `docvalue_fields` fetch phase looks
like a special case of the `fields` API.

While I was at it I moved the "which doc values sub-implementation
should I use for fetching?" question from a bunch of `instanceof`s to a
method on `LeafFieldData` so we can be much more flexible with what is
returned and we're not forced to extend certain classes just to make the
fetch phase happy.

Relates to elastic#59332
nik9000 added a commit to nik9000/elasticsearch that referenced this pull request Sep 15, 2020
In elastic#61995 I moved the `docvalue_field` fetch code into a place where I
could share it with the fancy new `fields` fetch API. Specifically,
runtime fields can use it all that doc values code now. But I broke
`scaled_floats` by switching them how they are fetched from `double` to
`string`. This adds the override you need to switch them back.
nik9000 added a commit to nik9000/elasticsearch that referenced this pull request Sep 15, 2020
In elastic#61995 I moved the `docvalue_field` fetch code into a place where I
could share it with the fancy new `fields` fetch API. Specifically,
runtime fields can use it all that doc values code now. But I broke
`scaled_floats` by switching them how they are fetched from `double` to
`string`. This adds the override you need to switch them back.
nik9000 added a commit that referenced this pull request Sep 16, 2020
This implements the `fields` API in `_search` for runtime fields using
doc values. Most of that implementation is stolen from the
`docvalue_fields` fetch sub-phase, just moved into the same API that the
`fields` API uses. At this point the `docvalue_fields` fetch phase looks
like a special case of the `fields` API.

While I was at it I moved the "which doc values sub-implementation
should I use for fetching?" question from a bunch of `instanceof`s to a
method on `LeafFieldData` so we can be much more flexible with what is
returned and we're not forced to extend certain classes just to make the
fetch phase happy.

Relates to #59332
nik9000 added a commit that referenced this pull request Sep 16, 2020
In #61995 I moved the `docvalue_field` fetch code into a place where I
could share it with the fancy new `fields` fetch API. Specifically,
runtime fields can use it all that doc values code now. But I broke
`scaled_floats` by switching them how they are fetched from `double` to
`string`. This adds the override you need to switch them back.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>non-issue :Search/Search Search-related issues that do not fall into other categories Team:Search Meta label for search team v7.10.0 v8.0.0-alpha1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants