Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Compare cell type annotations for RMS samples to SingleR annotations #228

Merged
merged 11 commits into from
Jul 14, 2023

Conversation

allyhawkins
Copy link
Member

Closes #224

This PR adds an analysis notebook to compare the cell type annotations obtained from SingleR to manual annotations in RMS samples. I originally re-ran the notebook we used for AML to compare SingleR annotations in references with and without immune cells. However, I didn't feel that analysis applied to these samples when looking at those results. I wanted to directly compare the annotations obtained from the submitter to annotations with SingleR. Additionally, we are not interested in looking at cell types with/ without immune cells.

  • Before any analysis, I ran all the libraries for SCPCP000005 in scpca-processed-libraries.tsv through the add-celltypes.nf workflow in scpca-nf v0.5.2. These outputs can be found on S3 in s3://nextflow-ccdl-results/scpca/processed/results/SCPCP000005.
  • This means that for each SCE object we have annotations from all celldex refs we have used previously. We also have annotations for all labels - label.main, label.fine, and label.ont.
  • The first part of the notebook involves grabbing both the SingleR annotated SCE object and the manually annotated SCE object from S3. Then the manual annotations are added to the SCE object annotated from SingleR.
  • I looked at the overall distribution of cell types for all the references and compared them to the cell types annotated manually.
  • For the HumanPrimaryCellAtlasData and BlueprintEncodeData, I directly compared the annotations to the manual annotations. The BlueprintEncodeData categorized most tumor cells as myocytes or skeletal muscle cells, which seemed generally appropriate.
  • I calculated the median delta score as previously for all the SingleR annotations.
  • I did repeat some plots I have made previously with SingleR results so I created a singler-helper-functions.R script and added those there. I kept any functions specific to the RMS libraries at the top of the notebook.

Here's a copy of the rendered report for a single library:
singler-rms-comparison.nb.html.zip

Once we like what's here I can easily generate it for the other libraries.

Copy link
Member

@sjspielman sjspielman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Super quick round of first review - there are some params$ variables used that don't exist? If you fix those lines, I'll be able to run the notebook, and then I'll leave an in-depth review!

In the meantime, we definitely need a README since there are now 6 notebooks and maybe soon more! Can you add a quick "here's what each notebook does" README into analysis/? Thanks!

celltype_annotation/utils/singler-helper-functions.R Outdated Show resolved Hide resolved
toc: true
toc_float: true
params:
s3_singler_data_dir: "s3://nextflow-ccdl-results/scpca/processed/results/SCPCP000007"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
s3_singler_data_dir: "s3://nextflow-ccdl-results/scpca/processed/results/SCPCP000007"
s3_singler_data_dir: "s3://nextflow-ccdl-results/scpca/processed/results/SCPCP000005"

if(any(!(file.exists(all_paths)))){
# sync annotated SCE files
aws_includes <- paste("--include '", all_paths, "'", sep = '', collapse = ' ')
sync_call <- paste('aws s3 cp', params$s3_ref_dir, params$local_data_dir,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there is no params$s3_ref_dir or params$local_data_dir in this Rmd?

- There appears to be a higher proportion of `NA` cells that correspond to tumor cells than we saw with the AML samples.
- Generally I think we could use `SingleR` with solid tumors, but I think we will still want to be careful about which reference we use for each disease type.
I don't think we are going to find a single reference that works across all of our samples, although `BlueprintEncodeData` does seem to generally be doing a decent job here.
Although I'm making that conclusion mostly based on _vibes_...
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dont we all..

@allyhawkins
Copy link
Member Author

@sjspielman sorry about the issues running the notebook! I think that should be fixed now and you should be able to run it. On my end I added op run before the command and then was able to use rmarkdown::render() to render the notebook in R.

Copy link
Member

@sjspielman sjspielman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're on our way!! Left some more in-depth comments, let me know if I need to clarify anything! Mostly just formatting and code simplifying. The overall notebook looks good, though I'd still like to see an analysis/README.md file with a super quick 'n dirty bullet point about each notebook in this directory.

celltype_annotation/analysis/singler-rms-comparison.Rmd Outdated Show resolved Hide resolved
celltype_annotation/analysis/singler-rms-comparison.Rmd Outdated Show resolved Hide resolved
celltype_annotation/analysis/singler-rms-comparison.Rmd Outdated Show resolved Hide resolved
celltype_annotation/analysis/singler-rms-comparison.Rmd Outdated Show resolved Hide resolved
celltype_annotation/analysis/singler-rms-comparison.Rmd Outdated Show resolved Hide resolved

- `BlueprintEncodeData` seems like the most appropriate reference when looking at the direct comparison of cell type labels with `SingleR` to manual.
This is mostly based on the presence of skeletal muscle cells obtained from `SingleR` annotations, whereas muscle cells don't appear to be present in `HumanPrimaryCellAtlasData`.
- There appears to be a higher proportion of `NA` cells that correspond to tumor cells than we saw with the AML samples.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎉

stat_summary(
aes(group = reference),
color = "black",
# median and quartiles for point range
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this comment is just to say that i don't know why ggplot hasn't just implemented a convenient median/IQR stat_summary() in the first place. mean_se is the default. surely a median_iqr would be nice..!

@allyhawkins
Copy link
Member Author

Thank you again for helping with all the annoying errors in getting files from AWS @sjspielman! I implemented all of your suggestions, except the aes_string one. I tried removing the quotes and using aes and had issues. I think you made the same comment when I originally added that function to the other notebook, and we also had trouble.

Here's the new rendered notebook:
singler-rms-comparison.nb.html.zip

Copy link
Member

@sjspielman sjspielman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great!! Just need a newline at the end of celltype_annotation/utils/cellassign-helper-functions.R, definitely don't need to see again.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎉

@@ -27,6 +32,7 @@ compare_refs_heatmap <- function(original_assignment,
pheatmap::pheatmap(label_mtx,
cluster_rows = TRUE,
width = 10,
fontsize_col = 8)
fontsize_col = 8,
main = title)

}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

github new line 🙄
can't suggest unfortunately..

@allyhawkins allyhawkins merged commit d81a1c2 into main Jul 14, 2023
1 check passed
@allyhawkins allyhawkins deleted the allyhawkins/rms-singler-analysis branch July 14, 2023 14:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Rerun SingleR analysis with RMS libraries
2 participants