Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor Cbc #374

Merged
merged 137 commits into from
Jan 2, 2022
Merged

Refactor Cbc #374

merged 137 commits into from
Jan 2, 2022

Conversation

tkralphs
Copy link
Member

@tkralphs tkralphs commented Mar 15, 2021

Although the task list is incomplete, we plan to merge what has been done so far on this long-standing PR on 1/1/2022 in order to encourage broader review and comment from the community (see #465). Work will continue following the merge, but in a separate PR. More details will be provided in an accompanying announcement, which will be linked here once it's made.

Cbc is is currently undergoing a major refactoring. Importantly, this refactoring is not expected to affect the performance of the code in any way. The goals only regard improving readability, maintainability, and usability.

This pull request is mainly for the purposes of documenting the changes being made so that other maintainers can review and follow along. It is also a task list that anyone who wants to contribute can consult to see what needs to be done.

See also: coin-or/Clp#166 and coin-or/CoinUtils#139

Here are some of the main goals of this refactoring.

  • Make code thread-safe and eliminate global variables.
  • Make structure of code easier to understand and maintain.
  • Simplify and unify the parameters setting mechanism and separate it completely from Clp.
  • Convert to C++ strings/streams everywhere to unify handling of input.
  • Unify all messaging through message handlers and eliminate back-door methods of printing and controlling print levels.
  • Unify error handling
  • Combine separate libraries into a single library.
  • Further develop the CbcSolver class to contain what is currently ClpMain0() and ClpMain1().
  • Modularize the implementation of CbcMain1(), splitting the while loop into separate functions.

Here is the task list, including what has already been done.

  • Build out the already existing parameter mechanism in CoinUtils.
  • Build out the already existing utilities in CoinUtils for reading input from command-line, file, or environment.
  • Create a CbcParam derived from CoinParam.
  • Create a CbcParameters class with methods to set up parameters and contain the vector of parameters.
  • Convert Cbc to use the new parameter and input mechanisms (thus separating it completely from Clp).
  • Convert to using string keywords rather than integer modes to the extent possible for code readability and also eliminate other "magic numbers" where possible.
  • Create new parameter types for file names and directories so that these can be parsed and validated correctly to ensure they represent valid paths, addressing existing bugs.
  • Create constructors in the CbcParameters class representing different possible parameters settings (e.g., emphasize optimality versus feasibility).
  • Ensure that default parameter values make sense.
  • Create enums for the keyword parameters.
  • Eliminate global variables and ensure thread safety
  • Use modern C++ string manipulation
  • Eliminate direct printing to std::cout and make sure all output (at least in CbcSolver) is controlled by the log parameter through the proper message handler.
  • Eliminate the separate libCbcSolver, which didn't seem to serve a purpose.
  • Change all integers mode values to their corresponding enums to make code more readable and maintainable.
  • Unify parameters setting in CbcModel with main parameter-setting mechanism (see CbcModelParameters.hpp).
  • Try to convert parameters set in obscure ways (special options) to a more transparent mechanism.
  • Build out the CbcSolver class.
  • Modularize CbcMain1, moving local variables into the CbcSolver class.
  • Use existing helper functions and create new ones to avoid replication of code blocks within Cbc and between Clp and Cbc.
  • Implement better error handling.
  • Eliminate irrelevant/old #ifdefs or change them to parameters instead.
  • Test extensively against current master branch to identify any regressions.

Things to consider

  • Use parameter "push" functions to do the work currently being done in much of CbcMain1 (this is up in the air).
  • Use readline to enable automatic command completion and further simplify parameter setting.
  • Create base class in CoinUtils to unify ClpParameters and CbcParameters

Some details about particular aspects of what has already been done.

Parameter mechanism

There are two main classes and some utilities.

  • The CbcParam class is derived from CoinParam and hold information for a single parameter (value, type, upper and lower limits, help strings, etc.).
  • The CbcParameters contains a vector of CbcParam objects representing the current settings.

CbcParam

Each parameter has a unique code, which is also its index in the parameter vector stored in the CbcParameters class (described below). The codes are specified in the CbcParamCode enum within the CbcParam class.

Parameters also have a type, which can be queried and is one of

  • Double: Parameters whose value is a double
  • Integer: Parameters whose value is an integer
  • String: Parameters whose value is a string
  • Directory: For storing directory names, such as default locations for file types
  • File: For storing the locations/names of files
  • Keyword: Parameters that have one of several enumerated values identified by either strings or corresponding integer modes.
  • Action: Not parameters in the traditional sense, but used to trigger actions, such as solving an instance.

There are different constructors for each type, as well as setup functions for populating an existing parameter object with

  • Name
  • Short help string
  • Long help string
  • Limits on values (for error checking)
  • Defaults values
  • Display priority (to limit which parameters get displayed when users ask for help)

For keyword parameters, there is also a mechanism for creating a mapping between keyword value strings and the so-called "mode" value, which is an integer. The value of a keyword parameter can be set or retrieved either as a keyword string or as an integer mode. The modes may also be specified in enums.

There are separate get/set methods for each parameter type, but for convenience, there is also a single setVal() and getVal() method that is overloaded and can set the value of any parameter (the input/output is interpreted based on the known parameter type), e.g.,

param.setVal(0);

Each parameter object also has optional "push" (and "pull") function that can perform actions or populate related data structures upon the setting of a parameter value. The push/pull functions are defined with the CbcParamUtils name space, described below.

None of the methods in the class produce output directly. The get/set methods can optionally populate a string object that is passed in as an optional second argument. This is to allow the calling function to control output by piping the string to a message handler as appropriate or simply ignoring it if printing is not desired. This obviates the need to control printing internal to the functions themselves, which would require a separate parameter.

std::string message;
if (param->setKwdVal(value, &message)){
   messageHandler->message(CBC_GENERAL, generalMessages) << message << CoinMessageEol;
}

CbcParameters

The CbcParameters class serves primarily as a container for a vector of parameters, but can (and will) be used to store other auxiliary information that needs to be accessed by the methods of (soon-to-be) CbcSolver class. The class has methods for setting parameter values, which are pass-throughs to the methods of the CbcParam class. It also defines a [] operator, so that parameter objects contained in the parameter vector, which is a class member, can be directly accessed, e.g.,

  CbcParameters parameters;
  parameters[CbcParam::CLIQUECUTS]->setVal("ifmove");;

The constructor of the class calls a method, which sets up all the parameters (as described above). It is envisioned that different constructors will eventually be used to obtain different sets of parameters for different purposes.

CbcParamUtils

There are a number of utilities contained in the CbcParamUtils namespace. Mostly, these are the push functions that can extend the functionality of the set methods of the CbcParam class.

Input/Output mechanism

The input/output mechanism is used to pass a sequence of parameters to Cbc, including action parameters. This parameter sequence can be set equivalently passed in five different ways.

  • Command line argument
  • Interactive prompt
  • Environment variable
  • Parameter file
  • Programatically from a driver

Regardless of the method, the parameter sequence is stored and accessed as a FIFO queue of strings (inputQueue). This can be passed as an input to what is currently CbcMain1 or constructed within the main while loop by parsing either interactive input, the contents of an environment variable, or the contents of a parameter file.

The main while loop then pops strings off the input queue, looks them up in the parameters list and deals with the result.

Changes to Behavior of the Solver

While the goal was to make the changes completely transparent to end users, there were a few changes.

  • In order to more cleanly separate "actions" from "parameters", several new parameters were introduced for storing default file names for reading and writing models and solutions of various types to disk.
  • Some actions related to reading in files that used to take an argument now just always use whatever file name is stored in the corresponding parameters, thus eliminating the confusion of the $ to stand for using the default file name.
  • Actions related to reading and writing files have been more consistently named.
  • A new action printSolution was introduced that always writes the solution to stdout.

TODO items to consider

  • There are still actions that double as parameters (e.g., import stores the file name of the file that was imported). Ideally, actions would not store anything, but it seems necessary to allow actions to take arguments.
  • A question is whether actions should be able to take optional arguments. This would probably be convenient and would allow some of the actions that used to take an argument (file name) to take one again as an option (thus, eliminating the $ for using the default file name).

@CLAassistant
Copy link

CLAassistant commented Mar 15, 2021

CLA assistant check
All committers have signed the CLA.

@tosttost
Copy link
Contributor

tosttost commented May 6, 2021

Concerning "Eliminate irrelevant/old #ifdefs or change them to parameters instead."

I suggest to drop "NEW_DEBUG_AND_FILL" in CbcSolver.cpp as unitialized reads can be found using valgrind, MemorySanitizer, ... without fiddling with overriding new/delete (and missing any malloc'd memory).

Does anyone know something about "CLP_MALLOC_STATISTICS" in CbcSolver.cpp? The code does not even compile due to

#include "stolen_from_ekk_malloc.cpp"

@tkralphs
Copy link
Member Author

tkralphs commented May 7, 2021

Concerning "Eliminate irrelevant/old #ifdefs or change them to parameters instead."

I suggest to drop "NEW_DEBUG_AND_FILL" in CbcSolver.cpp as unitialized reads can be found using valgrind, MemorySanitizer, ... without fiddling with overriding new/delete (and missing any malloc'd memory).

Does anyone know something about "CLP_MALLOC_STATISTICS" in CbcSolver.cpp? The code does not even compile due to

#include "stolen_from_ekk_malloc.cpp"

I suggest we open a new issue listing all of the ifdefs to see if we can collect information on which ones are important to retain or at least which ones actually build. @tosttost would you take this on?

@tosttost
Copy link
Contributor

tosttost commented May 7, 2021

I'll try to grep the #ifdef and run some parsing on the matches at some point in time, but currently my priority is
1.) I'm working on some uninitialized reads in the new parameter code
2.) kill as many global non-const variables as possible / being able to solve multiple problems in parallel within the same process. I have a use case that produces hundreds of easy mip that can be solved in parallel.

#ifdef is somewhat related to 2.) as some global variables are inside #ifdef, so there might be some quick wins.

@tkralphs tkralphs marked this pull request as ready for review December 30, 2021 19:19
This was referenced Dec 31, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants