Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix grouping of contact constraints #6

Merged
merged 1 commit into from
Jun 10, 2020

Conversation

azeey
Copy link
Collaborator

@azeey azeey commented Jun 10, 2020

DART crashes when simulating a modest number of objects that come into contact with each other.

To test, run the following file in ign-gazebo (ign gazebo -v 4 cube5.sdf.txt)
cube5.sdf.txt (courtesy of @mjcarroll)

Before the PR, ign-gazebo crashes with

ODE INTERNAL ERROR 1: assertion "bNormalizationResult" failed in _dNormalize4() [odemath.h:42]

Peek 2020-06-09 19-22-crash

With the PR:
Peek 2020-06-09 19-23

For a given set of contact constraints, the constraint solver first creates a grouping of related constraints, and then solves each group independently. There seems to be a problem with this grouping and some constraints end up being in two groups.

The ContactConstraint::uniteSkeleton function unites two skeletons associated with a given contact constraint. It does that by setting the mUnionRootSkeleton member of one of the skeletons to point to the current root skeleton of the second skeleton. To decide which of the two skeletons' root node becomes the new union's root node, the size of the skeletons' current unions are compared.

As an example, let A,B,C, and D be skeletons associated with 3 contact constraints. When ContactConstraint::uniteSkeleton is called for constraints 1 and 2, skeleton B is chosen to be the root node.

1. A, B = A -> B
2. C, A = C -> B // Since the root of A is B, C's mUnionRootSkeleton now points to B
3. D, B = B -> D // Let's assume D is chosen because D's union size is larger.

When D is chosen for constraint 3, A and C are not updated. Thus, later on when ContactConstraing::getRootSkeleton is called on constraint 1 or 2, B is returned instead of D. This results in error in the grouping of constraints.

It's not clear to me why this error causes a crash, but the hint that led me to a solution is that I tried disabling the grouping altogether and letting all the constraints be solved in a single call to ConstraingSolver::solveConstrainedGroup and that didn't crash. My guess is that when a constraint ends up in two groups, a solution for that constraint is computed twice and the resulting impulses are erroneously accumulated.

I'm not sure how to properly/reliably test this as it seems to occur randomly.

/cc @scpeters, @mxgrey

Signed-off-by: Addisu Z. Taddese <addisu@openrobotics.org>
@@ -169,7 +169,7 @@ void BoxedLcpConstraintSolver::solveConstrainedGroup(ConstrainedGroup& group)
mFIndex.setConstant(n, -1); // set findex to -1

// Compute offset indices
mOffset.resize(n);
mOffset.setZero(numConstraints);
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI, this change is not necessarily part of the fix, but I'm backporting from an upstream PR since it's related.

@nkoenig
Copy link

nkoenig commented Jun 10, 2020

This also worked for me. My approval is just based on testing the solution.

@scpeters
Copy link
Collaborator

I tried disabling the grouping altogether and letting all the constraints be solved in a single call

so is that what this PR is doing? I'm not familiar with the rest of the code, so it's not clear to me

@nkoenig
Copy link

nkoenig commented Jun 10, 2020

I tried disabling the grouping altogether and letting all the constraints be solved in a single call

so is that what this PR is doing? I'm not familiar with the rest of the code, so it's not clear to me

@azeey

@scpeters
Copy link
Collaborator

he explained it to me on slack

Copy link
Collaborator

@scpeters scpeters left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's complicated, but I think this change makes sense

@nkoenig
Copy link

nkoenig commented Jun 10, 2020

Cool, safe to merge @azeey ?

@azeey
Copy link
Collaborator Author

azeey commented Jun 10, 2020

Yes, I think it's safe to merge. For additional clarity, I instrumented the code to output a dot file and generated graphs of the constraints and their grouping. For the attached SDF file, here is an example grouping for frame 413

constraints413

The ellipses represent the skeletons (cubes and ground_plane) and the edges represent a contact constraint between two skeletons. The light grey edges are constraints where one of the skeletons is a static body. The rectangles represent the constraint groupings. With proper grouping, each node in the graph should only have an edge to another node inside its group (rectangle). The exception is for static/immobile skeletons which are allowed to have constraints across multiple groups because impulses are not applied on them. But notice how model_3_0_3 has a constraint with a model_3_0_4, a skeleton outside its group. I believe in this example, the group labeled model_2_3_4 and model_3_0_0 should have been one group.

In contrast, here is frame 413 after the fix. Granted, the collisions in this frame are going to be different from the one run without the fix, but I wanted to show that this PR does not disable grouping:

constraints413

@azeey azeey merged commit da24cf3 into azeey/friction_per_shape_more_params Jun 10, 2020
@azeey azeey deleted the fix_constraint_grouping branch June 10, 2020 19:13
@diegoferigo
Copy link

Great work @azeey and amazing analysis in #6 (comment)! We are struggling a lot for a while now with this error and I can't wait to see this change in action. I'm just checking, do the released packages 6.10.0~osrf6~2020-06-10~da7758ae4b7a38 already contain this fix?

I also tag @jrivero here that might know as well.

@azeey
Copy link
Collaborator Author

azeey commented Jul 15, 2020

Thanks @diegoferigo. Yes, the latest debs (6.10.0~osrf6...) should have the fix.

@diegoferigo
Copy link

Ok got it, thanks! I just checked the machine where this error occurs more often and I realized that there was the older 6.10.0~osrf1~20190718~fdde7e7894ebc36b version. Thanks for confirming.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants