-
Notifications
You must be signed in to change notification settings - Fork 66
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unique TaskIDs #106
Merged
Merged
Unique TaskIDs #106
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
continuation of #94 : had problems merging and rebasing and gave up. :) Just cherry-picked my changes into this new branch. @dsKarthick & @JessicaLHartog: please 👀 at your convenience. |
erikdw
force-pushed
the
unique-task-ids-take-2
branch
2 times, most recently
from
March 24, 2016 01:39
3ff2816
to
a772308
Compare
This is needed to avoid a problem with mesos-slave recovery resulting in LOST tasks. i.e., we discovered that if you relaunch a topology's task onto the same worker slot (so there are 2 different instances with the same "task ID" that have run), then when the mesos-slave process is recovering, it terminates the task upon finding a "terminal" update in the recorded state of the task. The terminal state having been recorded the 1st time the task with that task ID stopped. To solve this we ensure all task IDs are unique, by adding a milisecond-granularity timestamp onto the task IDs.
erikdw
force-pushed
the
unique-task-ids-take-2
branch
from
March 24, 2016 02:04
9d738c4
to
f4b5061
Compare
erikdw
added a commit
to erikdw/storm-mesos
that referenced
this pull request
Apr 2, 2016
This is intended to fix issue mesos#119. With the introduction of the TaskAssignment refactor for creating unique task IDs (mesos#106), I introduced a couple of bugs in the implementation of MesosSupervisor.getMetadata: 1. The slot counts in the Storm UI were broken -- the return from getMetadata was always a single element vector, due to using a Set instead of a java array previously. This was causing the PersistentVector.create(Object object) method to be matched, which doesn't unpack the items and repackage them into a new vector. Instead it was So the fix is to just create a List and pass that to PersistentVector.create(). 2. The returned Object must be serializable. Depending on the build and runtime environment, the serialization done by the storm supervisor during initialization will fail, crashing the supervisor. That was happening because we were passing back the ConcurrentHashMap$KeySetView object, which is not serializable. Here too, the fix is to just create a List and pass that to PersistentVector.create(). NOTE: I haven't been able to reproduce problem 2. unfortunately.
erikdw
added a commit
to erikdw/storm-mesos
that referenced
this pull request
Apr 2, 2016
This is intended to fix issue mesos#119. With the introduction of the TaskAssignment refactor for creating unique task IDs (mesos#106), I introduced a couple of bugs in the implementation of MesosSupervisor.getMetadata: 1. The slot counts in the Storm UI were broken -- the return from getMetadata was always a single element vector, due to using a Set instead of a java array previously. This was causing the PersistentVector.create(Object object) method to be matched, which just puts the passed object into a vector without iterating over its constituent elements. So the fix is to just create a List and pass that to PersistentVector.create(). 2. The returned Object must be serializable. Depending on the build and runtime environment, the serialization done by the storm supervisor during initialization will fail, crashing the supervisor. That was happening because we were passing back the ConcurrentHashMap$KeySetView object, which is not serializable. Here too, the fix is to just create a List and pass that to PersistentVector.create(). NOTE: I haven't been able to reproduce problem 2. unfortunately.
erikdw
added a commit
to erikdw/storm-mesos
that referenced
this pull request
Apr 2, 2016
This is intended to fix issue mesos#119. With the introduction of the TaskAssignment refactor for creating unique task IDs (mesos#106), I introduced a couple of bugs in the implementation of MesosSupervisor.getMetadata: 1. The slot counts in the Storm UI were broken -- the return from getMetadata was always a single element vector, due to using a Set instead of a java array previously. This was causing the PersistentVector.create(Object object) method to be matched, which just puts the passed object into a vector without iterating over its constituent elements. So the fix is to just create a List and pass that to PersistentVector.create(). 2. The returned Object must be serializable. Depending on the build and runtime environment, the serialization done by the storm supervisor during initialization will fail, crashing the supervisor. That was happening because we were passing back the ConcurrentHashMap$KeySetView object, which is not serializable. Here too, the fix is to just create a List and pass that to PersistentVector.create(). NOTE: I haven't been able to reproduce problem 2. unfortunately.
erikdw
added a commit
to erikdw/storm-mesos
that referenced
this pull request
Apr 2, 2016
This is intended to fix issue mesos#119. With the introduction of the TaskAssignment refactor for creating unique task IDs (mesos#106), I introduced a couple of bugs in the implementation of MesosSupervisor.getMetadata: 1. The slot counts in the Storm UI were broken -- the return from getMetadata was always a single element vector, due to using a Set instead of a java array previously. This was causing the PersistentVector.create(Object ... object) method to be matched, which just puts the passed objects into a vector without iterating over their constituent elements. Since we are passing a single Set object, we are getting a single element in the resultant vector. So the fix is to just create a List and pass that to PersistentVector.create(). 2. The returned Object must be serializable. Depending on the build and runtime environment, the serialization done by the storm supervisor during initialization will fail, crashing the supervisor. That was happening because we were passing back the ConcurrentHashMap$KeySetView object, which is not serializable. Here too, the fix is to just create a List and pass that to PersistentVector.create(). NOTE: I haven't been able to reproduce problem 2. unfortunately.
erikdw
added a commit
to erikdw/storm-mesos
that referenced
this pull request
Apr 2, 2016
This is intended to fix issue mesos#119. With the introduction of the TaskAssignments refactor for creating unique task IDs (mesos#106), I introduced a couple of bugs in the implementation of MesosSupervisor.getMetadata: 1. The slot counts in the Storm UI were broken -- the return from getMetadata was always a single element vector, due to using a Set instead of a java array previously. This was causing the PersistentVector.create(Object ... object) method to be matched, which just puts the passed objects into a vector without iterating over their constituent elements. Since we are passing a single Set object, we are getting a single element in the resultant vector. So the fix is to just create a List and pass that to PersistentVector.create(). 2. The returned Object must be serializable. Depending on the build and runtime environment, the serialization done by the storm supervisor during initialization will fail, crashing the supervisor. That was happening because we were passing back the ConcurrentHashMap$KeySetView object, which is not serializable. Here too, the fix is to just create a List and pass that to PersistentVector.create(). NOTE: I haven't been able to reproduce problem 2. unfortunately.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
create unique task ID per task launch
This is needed to avoid a problem with mesos-slave recovery resulting in LOST tasks.
i.e., we discovered that if you relaunch a topology's task onto the same worker slot (so there are 2 different instances with the same "task ID" that have run), then when the mesos-slave process is recovering, it terminates the task upon finding a "terminal" update in the recorded state of the task. The terminal state having been recorded the 1st time the task with that task ID stopped.
To solve this we ensure all task IDs are unique, by adding a milisecond-granularity timestamp onto the task IDs.