Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

REST high-level client: add reindex API #32679

Merged
merged 8 commits into from
Aug 28, 2018
Merged

Conversation

sohaibiftikhar
Copy link
Contributor

Relates to #27205

);
}
{
TimeUnit.SECONDS.sleep(1);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nik9000 I know this is incorrect. But I couldn't make the test pass without it. Will firing a refresh request help?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It worked. Sorry for the noise.

@nik9000
Copy link
Member

nik9000 commented Aug 7, 2018

@elasticmachine, white list this please

@nik9000
Copy link
Member

nik9000 commented Aug 7, 2018

whitelist this please
open sesame

@javanna
Copy link
Member

javanna commented Aug 7, 2018

add to whitelist

LOL

@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-core-infra

Copy link
Member

@nik9000 nik9000 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left a few questions, the biggest being, maybe a mutable builder object that knows how to build a status or the error might make this less terrible to parse. Maybe. Could you check?

request.setScript(
new Script(
ScriptType.INLINE, "painless",
"if (ctx._source.foo == 'bar') {ctx._version++; ctx._source.remove('foo')}",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think messing with the _version is a fairly rare thing. I think It'd be more normal to, say, add split a field on a regex and stick it into two fields. Or something like that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I copied this from the other doc. It is to show (I believe) that you can mess with it during reindex but not during update_by_query. I can change it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is worth changing. Probably in both places to be honest, but just in this place in this PR.

--------------------------------------------------
<1> Set the versionType to `EXTERNAL`

Settings `opType` to `create` will cause `_reindex` to only create missing documents in the target index. All existing
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/Settings/Setting/

<1> `setScript` to bump the version of the source document

`ReindexRequest` supports reindexing from a remote Elasticsearch cluster. When using a remote cluster the query should be
specified inside the `RemoteInfo` object and not using `setSourceQuery`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd explicitly say that the query set by setSourceQuery is ignored. It might also be nice to explain why: the remote Elasticsearch may not understand queries built by the modern query builders. This works all the way back to Elasticsearch 0.90 and the query language has drifted a bit since then. When you reach to old versions, it is better to write the query by hand in json.

List<Failure> bulkFailures = new ArrayList<>();
List<SearchFailure> searchFailures = new ArrayList<>();
for (Object object: failures) {
if (object instanceof Failure) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I regret building this in this way. It is indeed messy.

case SearchFailure.SHARD_FIELD:
shardId = parser.intValue();
break;
default:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow!

Float requestsPerSecond = (Float) a[startIndex + 10];
requestsPerSecond = requestsPerSecond == -1 ? Float.POSITIVE_INFINITY : requestsPerSecond;
String reasonCancelled = (String) a[startIndex + 11];
TimeValue throttledUntil =
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd probably extract them all from the array first, casting them, and then manipulating them.

Actually, I wonder if a builder object would be better than this object array stuff. It might be cleaner. Not nice, but the way I laid out the json makes nice kind of impossible.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see what you mean. But it might not be super helpful. There are two constructors. The way it works right now is you call one or the other. So you calculate stuff from sliceStatuses or you specify everything and sliceStatuses is an empty list. Not sure how that would translate here. Also, personally I like immutable objects. Validation is consolidated in the constructor. For example, fields that must be set. Plus it wouldn't change much in terms of lines of code I think.
But if you think it's important I can change it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By mutable builder I mean making a mutable object that you just use as the target for the parsing logic and then throw away afterwords. It just has setters and then a method to build status or error depending on which fields are set. The mutable object never escapes parsing. It might not be less code but I think it'd be easier to understand.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I see. Okay, thanks for explaining! I can make that change.

}

public static Status innerFromXContent(XContentParser parser) throws IOException {
Token token = parser.currentToken();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if you can avoid all of this with a builder object that knows how to build either status or error. Then you can use regular ObjectParser with it.

Copy link
Contributor Author

@sohaibiftikhar sohaibiftikhar Aug 7, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe I cannot parse the error myself (as we already discussed)? I need to peek at a field name to realise what I need to parse which doesn't work well with an ObjectParser. I could declare all fields in the ObjectParser and then try to build an object using that but for that, I would need to write the parse for the error as well. Which I really do not want to do... Or are you suggesting something else?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add a comment here explaining why you need to parse this manually? I'm 100% sure when I read this again in 6 months I'll waste an hour figuring out why you did it this way. A comment will save future me some time.

Copy link
Member

@nik9000 nik9000 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me! I left about 7 requests for more comments, mostly because I know that I'll read this again in six months and won't know why you made the choices that you did. Comments explaining those choices would be super nice. And some Javadoc on the new public methods would be wonderful!

@@ -189,6 +202,115 @@ public boolean shouldCancelChildrenOnCancellation() {
return true;
}

public static class StatusBuilder {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add javadoc for this?

import java.util.List;
import java.util.concurrent.TimeUnit;

public class BulkByScrollResponseBuilder extends StatusBuilder {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add javadoc for this too?

Also, should it be public? What about package scope?

}

public static Status innerFromXContent(XContentParser parser) throws IOException {
Token token = parser.currentToken();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add a comment here explaining why you need to parse this manually? I'm 100% sure when I read this again in 6 months I'll waste an hour figuring out why you did it this way. A comment will save future me some time.

@@ -610,6 +961,41 @@ public XContentBuilder toXContent(XContentBuilder builder, Params params) throws
return builder;
}

public static StatusOrException fromXContent(XContentParser parser) throws IOException {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you leave a comment about why you do it this way? The reasoning is fairly involved and a comment will totally help future-me when I reread this six months from now.

@@ -39,7 +44,8 @@
* of reasons, not least of which that scripts are allowed to change the destination request in drastic ways, including changing the index
* to which documents are written.
*/
public class ReindexRequest extends AbstractBulkIndexByScrollRequest<ReindexRequest> implements CompositeIndicesRequest {
public class ReindexRequest extends AbstractBulkIndexByScrollRequest<ReindexRequest>
implements CompositeIndicesRequest, ToXContentObject {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you indent this one more level? I think this kind of indenting makes it look like implements is part of the class body.

return this;
}

public ReindexRequest setDestIndex(String destIndex) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add javadoc for these? They'd be nice because these are part of the public API.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know we weren't good about this in the past but we're trying to be better lately.

Copy link
Member

@nik9000 nik9000 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I merged master and pushed a tiny cleanup. I'm going to let CI chew on it.

dnhatn added a commit that referenced this pull request Aug 29, 2018
* master:
  Painless: Add Bindings (#33042)
  Update version after client credentials backport
  Fix forbidden apis on FIPS (#33202)
  Remote 6.x transport BWC Layer for `_shrink` (#33236)
  Test fix - Graph HLRC tests needed another field adding to randomisation exception list
  HLRC: Add ML Get Records API (#33085)
  [ML] Fix character set finder bug with unencodable charsets (#33234)
  TESTS: Fix overly long lines (#33240)
  Test fix - Graph HLRC test was missing field name to be excluded from randomisation logic
  Remove unsupported group_shard_failures parameter (#33208)
  Update BucketUtils#suggestShardSideQueueSize signature (#33210)
  Parse PEM Key files leniantly (#33173)
  INGEST: Add Pipeline Processor (#32473)
  Core: Add java time xcontent serializers (#33120)
  Consider multi release jars when running third party audit (#33206)
  Update MSI documentation (#31950)
  HLRC: create base timed request class (#33216)
  [DOCS] Fixes command page titles
  HLRC: Move ML protocol classes into client ml package (#33203)
  Scroll queries asking for rescore are considered invalid (#32918)
  Painless: Fix Semicolon Regression (#33212)
  ingest: minor - update test to include dissect (#33211)
  Switch remaining LLREST usage to new style Requests (#33171)
  HLREST: add reindex API (#32679)
nik9000 pushed a commit that referenced this pull request Sep 1, 2018
Adds the reindex API to the high level REST client.
dnhatn added a commit that referenced this pull request Sep 2, 2018
* 6.x:
  HLRC: ML Flush job (#33187)
  Switch more LLREST usage to new style Requests (#33171)
  HLRC: Adding ML Job stats (#33183)
  HLREST: add reindex API (#32679)
  Mute testSyncerOnClosingShard
  [DOCS] Moves machine learning APIs to docs folder (#31118)
@sohaibiftikhar sohaibiftikhar deleted the reindex branch September 5, 2018 08:33
@jimczi jimczi added v7.0.0-beta1 and removed v7.0.0 labels Feb 7, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants