-
Notifications
You must be signed in to change notification settings - Fork 24.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HLRC: ML Post Data #33443
HLRC: ML Post Data #33443
Conversation
Pinging @elastic/es-core-infra |
Pinging @elastic/ml-core |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a great start. I left a few notes and ideas. I'll also ask the rest of the team to think about some of these design decisions.
/** | ||
* Specifies the end of the bucket resetting range | ||
* | ||
* @param resetEnd String representation of a timestamp; may be an epoch seconds, epoch millis or an ISO string |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might be best to say ISO 8601 instead of just ISO.
Also, this can be set to the string "now".
for(int i = 0; i < 10; i++) { | ||
Map<String, Object> hashMap = new HashMap<>(); | ||
hashMap.put("total", randomInt(1000)); | ||
postDataRequest.addDoc(hashMap); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be better if these docs contained the job's time field, with the loop generating some ascending values for it. Then the server could validate that they were received in ascending order.
To make this change an overload of buildJob()
will have to be added that sets the time field and time format to specific values rather than random values, so that the loop knows what to add.
} | ||
for (Map<String, Object> objectMap : objectMaps) { | ||
builder.map(objectMap); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's a problem here because if data has been added as a mixture of bytes references and maps then all the maps will be sent after all the bytes references, and that could mean the data is not sent in ascending time order.
Two possible solutions I can think of are:
- Document that all data must be supplied in the same format - either bytes references or maps - and enforce this in the
addDoc()
methods. - Convert the maps to bytes references in
addDoc()
so that there are only bytes references being stored.
This also shows that we should say somewhere in the Javadocs that docs will be processed by the job in the order they're added to the request and that therefore they should be added to the request in ascending time order.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking of doing option 2 and simply calling the bytesReferences
overload of addDoc
from the object map one.
The downside of serializing down to simply use the BytesReference
overload is performance. Though, I may be preoptimizing here.
* @param bytesReference document to add to bulk request, format must match the set XContentType | ||
*/ | ||
public void addDoc(BytesReference bytesReference) { | ||
this.bytesReferences.add(Objects.requireNonNull(bytesReference, "bytesReferences must not be null")); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we throw an exception if content != null
? Or otherwise return
early to ignore as the Javadoc says.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good thought, Since they set the the whole bulk content earlier, there is no need to continue to collect the individual docs.
* @param objectMap document object to add to bulk request | ||
*/ | ||
public void addDoc(Map<String, Object> objectMap) { | ||
this.objectMaps.add(Objects.requireNonNull(objectMap, "objectMap must not be null")); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we throw an exception if content != null
? Or otherwise return
early to ignore as the Javadoc says.
} | ||
|
||
/** | ||
* Set the total content to post. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe add again that this takes precedence over any individual docs already added.
It could also .clear()
the lists of individual docs.
An alternative to consider would be to throw an exception if individual docs have already been added. It seems like client code is badly designed if it adds individual docs then wipes them out by overriding with externally formatted content.
I've had a thought about this. Here is what I propose:
All these constructors end up building the Then the
The above keeps means the users cannot possibly create an invalid What do you think? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The new way of structuring the request object is much better. I just left a few more minor comments.
public PostDataRequest(String jobId, XContentType xContentType, BytesReference content) { | ||
this.jobId = Objects.requireNonNull(jobId, "job_id must not be null"); | ||
this.xContentType = Objects.requireNonNull(xContentType, "content_type must not be null"); | ||
this.content = content; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we also requireNonNull
for content
?
this.xContentType = Objects.requireNonNull(xContentType, "content_type must not be null"); | ||
ByteBuffer buffer = ByteBuffer.wrap(content); | ||
ByteBuffer[] buffers = new ByteBuffer[]{ buffer }; | ||
this.content = BytesReference.fromByteBuffers(buffers); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you avoid the ByteBuffer[]
by using this.content = new ByteArray(content)
? IIRC ByteArray
extends BytesReference
.
|
||
@Override | ||
public int hashCode() { | ||
return Objects.hash(jobId, resetStart, resetEnd, xContentType); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be good to add a comment that content
is deliberately left out as it is on the server side too. (I'm not convinced that was a good decision as it means two radically different posts can be the equal, but we are where we are.) But at least by having a comment it avoids someone adding it in in one place but not the other in the future.
Same for the comparison in equals()
below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
* master: (30 commits) Include fallback settings when checking dependencies (elastic#33522) [DOCS] Fixing formatting issues in breaking changes CRUD: Disable wait for refresh tests with delete Test: Fix test name (elastic#33510) HLRC: split ingest request converters (elastic#33435) Logging: Configure the node name when we have it (elastic#32983) HLRC: split xpack request converters (elastic#33444) HLRC: split watcher request converters (elastic#33442) HLRC: add enable and disable user API support (elastic#33481) [DOCS] Fixes formatting error TEST: Ensure merge triggered in _source retention test (elastic#33487) [ML] Add a file structure determination endpoint (elastic#33471) HLRC: ML Forecast Job (elastic#33506) HLRC: split migration request converters (elastic#33436) HLRC: split snapshot request converters (elastic#33439) Make Watcher validation message copy/pasteable Removes redundant test method in SQL tests (elastic#33498) HLRC: ML Post Data (elastic#33443) Pass Directory instead of DirectoryService to Store (elastic#33466) Collapse package structure for metrics aggs (elastic#33463) ...
Adds the ability to Post Data to an ML job in the HLRC.
This relates to (#29827)