Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[methods.partition GridPartitioning] #134

Open
1 of 3 tasks
FarnazH opened this issue Jun 12, 2023 · 4 comments
Open
1 of 3 tasks

[methods.partition GridPartitioning] #134

FarnazH opened this issue Jun 12, 2023 · 4 comments
Assignees

Comments

@FarnazH
Copy link
Member

FarnazH commented Jun 12, 2023

To do:

  • Add test to reach 100% coverage
  • Add comments in the code
  • Add formulas to docstring and polish docstring (add reference).
@Ali-Tehrani Ali-Tehrani self-assigned this Jun 12, 2023
@FanwangM
Copy link
Collaborator

FanwangM commented Aug 6, 2023

The current GridPartition method is missing tests (commented in the corresponding tests). The problem is the usage of compute_diversity in 89f1d5d is outdated where the argument names does not work for hypersphere_overlap_of_subset in compute_diversity function. Once #138 is merged, hope we can solve this problem automatically. But if not, I will provide a quick fix myself.

@FarnazH
Copy link
Member Author

FarnazH commented Oct 11, 2023

Post PR-#162: Any comments are welcomed:

  • [Feature] The same number of bins is used for partitioning each axis/dimension of feature space. It can be desirable to use different values for each axis. This extension should be easy to add.

@FarnazH
Copy link
Member Author

FarnazH commented Nov 12, 2023

@Ali-Tehrani, when putting together the quick_start.ipynb (see PR #186), I encountered the RuntimeWarning: invalid value encountered in floor_divide bin_index = np.floor_divide(X - axis_minimum, bin_length) from L124 of selector/methods/partition.py. The bin_length ends up being zero. I have copied the code snippet that reproduces this below. I didn't have time to look into it, can you please check what is going on?

from sklearn.datasets import make_blobs
from selector.methods.partition import GridPartition

# generate n_sample data in 2D feature space forming 3 clusters
X, labels = make_blobs(n_samples=500, n_features=2, centers=2, random_state=42)

selector = GridPartition(numb_bins_axis=5, grid_method="equisized_dependent")
selected = selector.select(X, size=50, labels=labels)
print(len(selected))

@FanwangM
Copy link
Collaborator

The current code coverage is

  Name                               Stmts   Miss  Cover   Missing
  -----------------------------------------------------------------------------
  selector/methods/partition.py      201      4    98%   375, 407, 520, 619

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants