Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimised eif_new.py #24

Open
wants to merge 7 commits into
base: master
Choose a base branch
from
Open

Optimised eif_new.py #24

wants to merge 7 commits into from

Conversation

lpryszcz
Copy link

I've optimised Python version so it matches performance with C++ version and allow saving the models.
There is runtime examle added to Notebooks/comparison_py_cxx.ipynb
The code was rewritten entirely. Some functions are optimised with numba.
The iForest is now a numpy array, which allow fast computation and model dump with low storage footprint.

@lpryszcz lpryszcz mentioned this pull request Aug 31, 2020
@wundermahn
Copy link

Is this still an active project?

@lpryszcz
Copy link
Author

lpryszcz commented Jul 1, 2021

That's a good question @wundermahn . If you want optimised Python version, you can get it directly from my fork.

@psmgeelen
Copy link

Hi there, this would be the fix for my problem as well, would it? I am currently trying to pickle the isolationForest model and failing due to som Cython issue:

File "stringsource", line 2, in eif.iForest.__reduce_cython__
TypeError: no default __reduce__ due to non-trivial __cinit__

@lpryszcz
Copy link
Author

lpryszcz commented Dec 6, 2021

hi @psmgeelen , yes, you can't save models from Cython version. Try my fork - it has a performance similar to Cython version, but is implemented in Python (with Numba optimisations).

@psmgeelen
Copy link

psmgeelen commented Dec 6, 2021

@lpryszcz , you are the best! I will get on it now! So I really only need the eif_new.py file and that's it? Maybe it's worthwhile to have your version to be integrated in scikit. I recommended you anyhow scikit-learn/scikit-learn#16517

EDIT: It works out of the box, I love the script! Small questions though, does it make sense to have a threshold that is always 0.5? Instead you could just push the values directly.

@lpryszcz
Copy link
Author

lpryszcz commented Feb 4, 2022

I'm glad it works for you :) And thanks for the recommendation @psmgeelen . I'd be more than happy to contribute to scikit-learn given there is interest from their side.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants