-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Compare cell type annotations for RMS samples to SingleR annotations #228
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Super quick round of first review - there are some params$
variables used that don't exist? If you fix those lines, I'll be able to run the notebook, and then I'll leave an in-depth review!
In the meantime, we definitely need a README
since there are now 6 notebooks and maybe soon more! Can you add a quick "here's what each notebook does" README into analysis/
? Thanks!
toc: true | ||
toc_float: true | ||
params: | ||
s3_singler_data_dir: "s3://nextflow-ccdl-results/scpca/processed/results/SCPCP000007" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s3_singler_data_dir: "s3://nextflow-ccdl-results/scpca/processed/results/SCPCP000007" | |
s3_singler_data_dir: "s3://nextflow-ccdl-results/scpca/processed/results/SCPCP000005" |
if(any(!(file.exists(all_paths)))){ | ||
# sync annotated SCE files | ||
aws_includes <- paste("--include '", all_paths, "'", sep = '', collapse = ' ') | ||
sync_call <- paste('aws s3 cp', params$s3_ref_dir, params$local_data_dir, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there is no params$s3_ref_dir
or params$local_data_dir
in this Rmd?
- There appears to be a higher proportion of `NA` cells that correspond to tumor cells than we saw with the AML samples. | ||
- Generally I think we could use `SingleR` with solid tumors, but I think we will still want to be careful about which reference we use for each disease type. | ||
I don't think we are going to find a single reference that works across all of our samples, although `BlueprintEncodeData` does seem to generally be doing a decent job here. | ||
Although I'm making that conclusion mostly based on _vibes_... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dont we all..
@sjspielman sorry about the issues running the notebook! I think that should be fixed now and you should be able to run it. On my end I added |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We're on our way!! Left some more in-depth comments, let me know if I need to clarify anything! Mostly just formatting and code simplifying. The overall notebook looks good, though I'd still like to see an analysis/README.md
file with a super quick 'n dirty bullet point about each notebook in this directory.
|
||
- `BlueprintEncodeData` seems like the most appropriate reference when looking at the direct comparison of cell type labels with `SingleR` to manual. | ||
This is mostly based on the presence of skeletal muscle cells obtained from `SingleR` annotations, whereas muscle cells don't appear to be present in `HumanPrimaryCellAtlasData`. | ||
- There appears to be a higher proportion of `NA` cells that correspond to tumor cells than we saw with the AML samples. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🎉
stat_summary( | ||
aes(group = reference), | ||
color = "black", | ||
# median and quartiles for point range |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this comment is just to say that i don't know why ggplot hasn't just implemented a convenient median/IQR stat_summary()
in the first place. mean_se
is the default. surely a median_iqr
would be nice..!
Co-authored-by: Stephanie <stephanie.spielman@gmail.com>
Thank you again for helping with all the annoying errors in getting files from AWS @sjspielman! I implemented all of your suggestions, except the Here's the new rendered notebook: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great!! Just need a newline at the end of celltype_annotation/utils/cellassign-helper-functions.R
, definitely don't need to see again.
celltype_annotation/README.md
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🎉
@@ -27,6 +32,7 @@ compare_refs_heatmap <- function(original_assignment, | |||
pheatmap::pheatmap(label_mtx, | |||
cluster_rows = TRUE, | |||
width = 10, | |||
fontsize_col = 8) | |||
fontsize_col = 8, | |||
main = title) | |||
|
|||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
github new line 🙄
can't suggest unfortunately..
Closes #224
This PR adds an analysis notebook to compare the cell type annotations obtained from
SingleR
to manual annotations in RMS samples. I originally re-ran the notebook we used for AML to compareSingleR
annotations in references with and without immune cells. However, I didn't feel that analysis applied to these samples when looking at those results. I wanted to directly compare the annotations obtained from the submitter to annotations withSingleR
. Additionally, we are not interested in looking at cell types with/ without immune cells.SCPCP000005
inscpca-processed-libraries.tsv
through theadd-celltypes.nf
workflow inscpca-nf v0.5.2
. These outputs can be found on S3 ins3://nextflow-ccdl-results/scpca/processed/results/SCPCP000005
.celldex
refs we have used previously. We also have annotations for all labels -label.main
,label.fine
, andlabel.ont
.SingleR
annotated SCE object and the manually annotated SCE object from S3. Then the manual annotations are added to the SCE object annotated fromSingleR
.HumanPrimaryCellAtlasData
andBlueprintEncodeData
, I directly compared the annotations to the manual annotations. TheBlueprintEncodeData
categorized most tumor cells as myocytes or skeletal muscle cells, which seemed generally appropriate.SingleR
annotations.SingleR
results so I created asingler-helper-functions.R
script and added those there. I kept any functions specific to the RMS libraries at the top of the notebook.Here's a copy of the rendered report for a single library:
singler-rms-comparison.nb.html.zip
Once we like what's here I can easily generate it for the other libraries.