Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Duplicated coefficients in LinearClassifier #1066

Open
wsperat opened this issue Jan 31, 2024 · 3 comments
Open

Duplicated coefficients in LinearClassifier #1066

wsperat opened this issue Jan 31, 2024 · 3 comments

Comments

@wsperat
Copy link

wsperat commented Jan 31, 2024

I recently inherited a legacy project that contains a Logistic Regression for binary classification, stored in ONNX format, and converted with skl2onnx. When trying to do a bit of model introspection, I was surprised to find that both the model coefficients and its intercept are stored twice, where the second copy seems to be the negative of the first. Looking into the conversion code I found the following line in skl2onnx.operator_converters.linear_classifier:

op = operator.raw_operator
coefficients = op.coef_.flatten().astype(float).tolist()
classes = get_label_classes(scope, op)
number_of_classes = len(classes)
# some more code
if number_of_classes == 2:
    coefficients = list(map(lambda x: -1 * x, coefficients)) + coefficients
    intercepts = list(map(lambda x: -1 * x, intercepts)) + intercepts

This certainly explains the duplication, but I was wondering about the reason for storing values in this way.
Thanks!

@sdpython
Copy link
Contributor

sdpython commented Feb 1, 2024

The first pass of converters was to go as fast as possible to cover as many models as possible. It could certainly be improved. Are you willing to contribute?

@wsperat
Copy link
Author

wsperat commented Feb 2, 2024

@sdpython Absolutely! However, I'd like to understand the reason behind doing it this way, mostly because I'm not sure how the ONNX runtime deals with the duplicated coefficients.

@xadupre
Copy link
Collaborator

xadupre commented Feb 8, 2024

This was one mayn years. I don't remember the reasons with everything we did. This PR onnx/onnx#5874 changes the way TreeEnsembleRegressor and TreeEnsembleClassifier are defined. It is now one single operator. We could do the same with LinearClassifier and LinearRegressor or just deprecated them as they can be expressed with regular operators.

A change involves three steps:

  • changes onnx standard if needed
  • implement a kernel in onnxruntime which runs the modified or added operator in onnx
  • update the converting library to use the new operator or the new way to convert regression or classification.

How would you like to contribute?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants