Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add function to calculate iLISI #174

Closed
cbethell opened this issue Jun 15, 2023 · 8 comments · Fixed by #198
Closed

Add function to calculate iLISI #174

cbethell opened this issue Jun 15, 2023 · 8 comments · Fixed by #198
Assignees

Comments

@cbethell
Copy link
Contributor

Is your feature request related to a problem? Please describe.
Per AlexsLemonade/scpca-downstream-analyses#352 (comment):
We are finding that we may have more than one use case for the integration performance metrics used in the sc-data-integration repo. That said, we should get these functions in a single place so that they can be utilized across multiple repos post-integration.

Describe the solution you'd like
Add a function for calculating iLISI similar to the script in the sc-data-integration repo: https://github.com/AlexsLemonade/sc-data-integration/blob/main/scripts/utils/calculate-LISI.R

@cbethell
Copy link
Contributor Author

cbethell commented Jun 21, 2023

The expected input here will be:

  1. A SingleCellExperiment object containing integrated (or unintegrated) results
  2. A string specifying the batch column
  3. The integration method(s) that were used
  4. TRUE or FALSE indicating whether or not integration has been performed; default will be unintegrated = FALSE (will be useful for comparing integrated to unintegrated results)

The expected output of this function should then be a table of iLISI results per cell, with the following columns:

  • lisi_score
  • cell_barcode,
  • integration_method
  • batch

@allyhawkins
Copy link
Member

Thanks for putting this together. Based on some of the restructuring I've proposed for the other functions, I actually want to hold off for now on deciding how this function will look. The setup for calculating LISI is different then ASW or ARI so we will be able to create a separate function, but the type of arguments should be similar.
I expect that the output you've mentioned here will be mostly the same, but the input might change if people are onboard with removing the unintegrated and integration_method arguments in place of a pc_name argument. Let's come back to this once we look at #176, #177, and #175.

@sjspielman sjspielman self-assigned this Jul 26, 2023
@sjspielman
Copy link
Member

Noting this will also require adding the R package lisi to the image - https://github.com/immunogenomics/LISI/

@sjspielman
Copy link
Member

Finally I've gotten started here, but run into a slight hitch! The way that filter_pcs() is written is not really compatible with iLISI. Unlike most other metrics we're calculating, ILISI returns a per-cell score, which means we need to keep track of cell barcodes as well as batch.

What if we add an argument to the function indicating to include cell barcodes in rownames? By default this could be set to FALSE (since for most metrics we only need to track the batch, so other metric functions don't need to be updated), but if set to TRUE, then the returned rownames would be <batch>-<barcode>. We'd set this argument to TRUE from the iLISI function.

@allyhawkins, thoughts here?

@allyhawkins
Copy link
Member

What if we add an argument to the function indicating to include cell barcodes in rownames? By default this could be set to FALSE (since for most metrics we only need to track the batch, so other metric functions don't need to be updated), but if set to TRUE, then the returned rownames would be -. We'd set this argument to TRUE from the iLISI function.

What if we change the argument name from batches to pc_rownames or something similar. Then in the iLISI function you would input a vector with the batch-barcode information rather than just batch?
The function should stay unaltered other than the argument name then.

@sjspielman
Copy link
Member

sjspielman commented Jul 28, 2023

What if we change the argument name from batches to pc_rownames or something similar. Then in the iLISI function you would input a vector with the batch-barcode information rather than just batch?

Yes, perfect!!

Edit, ugh, well, maybe not :/
I think it will still need batch information in case of NAs -

https://github.com/AlexsLemonade/scpcaTools/blob/62ef2e9ab3893a5f1a1cd0e5f2ec4a53a0753cad/R/processing_pcs.R#L9C1-L10

@allyhawkins
Copy link
Member

What if we change the argument name from batches to pc_rownames or something similar. Then in the iLISI function you would input a vector with the batch-barcode information rather than just batch?

Yes, perfect!!

Edit, ugh, well, maybe not :/ I think it will still need batch information in case of NAs -

https://github.com/AlexsLemonade/scpcaTools/blob/62ef2e9ab3893a5f1a1cd0e5f2ec4a53a0753cad/R/processing_pcs.R#L9C1-L10

Yes you're right... what about keeping batches and including a rename_pc argument? If TRUE, rename with batches. If FALSE, return just the filtered pcs without renaming.

@sjspielman
Copy link
Member

I think that should be fine, since the iLISI function will end up returning a data frame anyways with both batch and rownames (barcodes), so we'll be able to match everything up on the off-chance there's a repeated barcode across batches!

This was referenced Jul 28, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants