Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Print statements cause changes in optimization behavior. #1341

Closed
DanMcGann opened this issue Dec 13, 2022 · 5 comments
Closed

Print statements cause changes in optimization behavior. #1341

DanMcGann opened this issue Dec 13, 2022 · 5 comments

Comments

@DanMcGann
Copy link
Contributor

Description

Printing involved variables of a factor during calls to evaluateError causes changes in optimization behavior.

Steps to reproduce

  1. Recreate minimal project structure (see below in additional info)
  2. mkdir build && cd build
  3. cmake ..
  4. make
  5. ./run-test

Expected behavior

Prints are const and output to stdout should not have any affect on the behavior of optimization. However, we observe that adding these prints causes unexpected optimization results.

Ex.
Running the following script (see additional info for all code to run this script)

  NonlinearFactorGraph fg;
  Values vals;
  fg.push_back(MyPrior<Pose2>(0, Pose2(0, 0, 0), noiseModel::Unit::Create(3)));
  vals.insert(0, Pose2(1, 1, 1));

  NonlinearFactorGraph fg_print;
  Values vals_print;
  fg_print.push_back(MyPriorPrint<Pose2>(0, Pose2(0, 0, 0), noiseModel::Unit::Create(3)));
  vals_print.insert(0, Pose2(1, 1, 1));

  std::cout << "Without Prints:" << std::endl;
  GaussNewtonOptimizer optimizer(fg, vals);
  optimizer.optimize().print();

  std::cout << "With Prints:" << std::endl;
  GaussNewtonOptimizer optimizer_print(fg_print, vals_print);
  optimizer_print.optimize().print();

If we use a prior factor without print statements we get the following output

Values with 1 values:
Value 0: (gtsam::Pose2)
(0, 0, 0)

If we use a prior factor that prints both the value and the prior on each call to evaluateError we get ...

(1, 1, 1)
(0, 0, 0)
(1, 1, 1)
(0, 0, 0)
(1, 1, 0)
(0, 0, 0)
Values with 1 values:
Value 0: (gtsam::Pose2)
(1, 1, 0)

Where the optimizer output is not at the minimum of the cost function.

Environment

This bug was observed under 3 different environments.

Environmens

  • GTSAM 4.2a7 with TBB on ubuntu 20.04
  • GTSAM 4.2a7 without TBB on ubuntu 20.04
  • GTSAM 4.1.1, without TBB on ubuntu 20.04

Additional information

Minimal Working Example

  • Project Structure
    • CMakeLists.txt
    • factors.h
    • run-test.cpp
`CMakeLists.txt`

cmake_minimum_required(VERSION 3.0)
project(test_gtsam_lie)

# Extra Boost Libraries needed by experiments
FIND_PACKAGE(GTSAM)
INCLUDE_DIRECTORIES(${GTSAM_INCLUDE_DIRS})

add_executable(run-test "run-test.cpp" "factors.h")
target_link_libraries(run-test gtsam)

`factors.h`

#include <gtsam/nonlinear/NonlinearFactor.h>

template <class VALUE>
class MyPrior : public gtsam::NoiseModelFactor1<VALUE>
{
private:
    VALUE prior_;

public:
    MyPrior(gtsam::Key key, const VALUE &prior, const gtsam::SharedNoiseModel &model)
        : gtsam::NoiseModelFactor1<VALUE>(model, key), prior_(prior) {}

    gtsam::Vector evaluateError(const VALUE &val, boost::optional<gtsam::Matrix &> H = boost::none) const
    {
        if (H)
            (*H) = gtsam::Matrix::Identity(gtsam::traits<VALUE>::GetDimension(val), gtsam::traits<VALUE>::GetDimension(val));
        // manifold equivalent of z-x -> Local(x,z)
        return -gtsam::traits<VALUE>::Local(val, prior_);
    }

    virtual gtsam::NonlinearFactor::shared_ptr clone() const
    {
        return boost::static_pointer_cast<gtsam::NonlinearFactor>(
            gtsam::NonlinearFactor::shared_ptr(new MyPrior<VALUE>(*this)));
    }
};

template <class VALUE>
class MyPriorPrint : public gtsam::NoiseModelFactor1<VALUE>
{
private:
    VALUE prior_;

public:
    MyPriorPrint(gtsam::Key key, const VALUE &prior, const gtsam::SharedNoiseModel &model)
        : gtsam::NoiseModelFactor1<VALUE>(model, key), prior_(prior) {}

    gtsam::Vector evaluateError(const VALUE &val, boost::optional<gtsam::Matrix &> H = boost::none) const
    {
        if (H)
            (*H) = gtsam::Matrix::Identity(gtsam::traits<VALUE>::GetDimension(val), gtsam::traits<VALUE>::GetDimension(val));
        // manifold equivalent of z-x -> Local(x,z)
        auto error = -gtsam::traits<VALUE>::Local(val, prior_);
        val.print();
        prior_.print();
        return error;
    }

    virtual gtsam::NonlinearFactor::shared_ptr clone() const
    {
        return boost::static_pointer_cast<gtsam::NonlinearFactor>(
            gtsam::NonlinearFactor::shared_ptr(new MyPriorPrint<VALUE>(*this)));
    }
};

`run-test.cpp`

#include <gtsam/geometry/Pose2.h>
#include <gtsam/nonlinear/NonlinearFactorGraph.h>
#include <gtsam/nonlinear/GaussNewtonOptimizer.h>

#include "factors.h"
using namespace gtsam;

int main()
{
    NonlinearFactorGraph fg;
    Values vals;
    fg.push_back(MyPrior<Pose2>(0, Pose2(0, 0, 0), noiseModel::Unit::Create(3)));
    vals.insert(0, Pose2(1, 1, 1));

    NonlinearFactorGraph fg_print;
    Values vals_print;
    fg_print.push_back(MyPriorPrint<Pose2>(0, Pose2(0, 0, 0), noiseModel::Unit::Create(3)));
    vals_print.insert(0, Pose2(1, 1, 1));

    std::cout << "Without Prints:" << std::endl;
    GaussNewtonOptimizer optimizer(fg, vals);
    optimizer.optimize().print();

    std::cout << "With Prints:" << std::endl;
    GaussNewtonOptimizer optimizer_print(fg_print, vals_print);
    optimizer_print.optimize().print();
};

@dellaert
Copy link
Member

Wow, crazy! We pride ourselves on const correctness, but this could be a subtle memory bug. Unfortunately, it seems you are currently in the best position to uncover the bug. Maybe run with memory checking software, or do some print debugging? Printing a bunch of stuff and then doing a diff on both output files could uncover where the issue is introduced.

@dellaert
Copy link
Member

dellaert commented Jan 3, 2023

@DanMcGann I was able to reproduce this. On My MAc I can't run valgrind, though. Could you try this?

@dellaert
Copy link
Member

dellaert commented Jan 3, 2023

Interestingly, only occurs in Release mode, not in debug mode.

@dellaert
Copy link
Member

dellaert commented Jan 3, 2023

@DanMcGann, I tracked down the issue with Xcode. Turns out auto and Eigen do not play well together. Changing this line:

    auto error = -gtsam::traits<VALUE>::Local(val, prior_);

to

    const Vector error = -gtsam::traits<VALUE>::Local(val, prior_);

will fix the issue. FYI, address sanitizer said:

AddressSanitizer: stack-use-after-scope on address 0x00016fdfce20 at pc 0x0001000512fc bp 0x00016fdfc6a0 sp 0x00016fdfc698

Closing as not a GTSAM issue.

@dellaert dellaert closed this as completed Jan 3, 2023
dellaert added a commit that referenced this issue Jan 3, 2023
@DanMcGann
Copy link
Contributor Author

DanMcGann commented Jan 3, 2023

Interesting! Glad its not a deeper problem.

Also looks like this is well documented by Eigen which I will link here for posterity (https://eigen.tuxfamily.org/dox/TopicPitfalls.html).

Although, given this documentation I'm surprised to see the issue crop up here as the definition of traits<VALUE>::Local will have an explicit type, so I would not expect it to return an intermediate Eigen expression type. Something to be careful of I suppose.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants