Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add function to calculate batch ARI #175

Closed
cbethell opened this issue Jun 15, 2023 · 3 comments
Closed

Add function to calculate batch ARI #175

cbethell opened this issue Jun 15, 2023 · 3 comments

Comments

@cbethell
Copy link
Contributor

Is your feature request related to a problem? Please describe.
Per AlexsLemonade/scpca-downstream-analyses#352 (comment):
We are finding that we may have more than one use case for the integration performance metrics used in the sc-data-integration repo. That said, we should get these functions in a single place so that they can be utilized across multiple repos post-integration.

Describe the solution you'd like
Add a function for calculating batch ARI similar to the script in the sc-data-integration repo: https://github.com/AlexsLemonade/sc-data-integration/blob/main/scripts/utils/calculate-ARI.R

@allyhawkins
Copy link
Member

This function should calculate the ARI between labels (either batch or cell type) and clustering results for a merged and integrated SCE object. PCs will be downsampled before clustering and calculating the ARI. This should be done a set number of times (default: 20) based on the number of reps specified.

As mentioned in #177, the setup to calculate the batch or cell type ARI is the same as setting up for calculating the ASW. So I think we should create a single wrapper function that grabs all the PCs and performs downsampling (see #177 (comment) for a longer description). If the metric specified when running that wrapper function is ARI, then we would run the ARI function that I'm describing below:

  • Input:
    • PCs to cluster prior to computing ARI
    • vector of labels to compare to clustering results (e.g., cell type, batch, or for within-batch ARI, clusters from individual SCE object)
  • Function:
    • Cluster PCs using graph-based clustering
    • calculate ARI between provided labels and clustering results
  • Output:
    • A single ARI value

This function could also be used inside the function for calculating within-batch ARI mentioned in #176.
Note that the setup described here changes how we used to calculate batch ARI slightly. Instead of using k-means clustering and calculating the ARI across a range of k, this function would use graph-based clustering and output one ARI.

I'm going to tag folks that are out on Monday for their opinions. In the meantime @cbethell let me know if you have any additional thoughts.

@allyhawkins
Copy link
Member

@sjspielman and @jashapiro could you please leave any additional thoughts or comments by EOD on Wednesday 6/28? Thank you!

@cbethell
Copy link
Contributor Author

cbethell commented Aug 7, 2023

Addressed by #176

@cbethell cbethell closed this as completed Aug 7, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants