Peppy #8

vreuter · 2019-04-29T20:41:10Z

Note that units_peppy.tsv is included not for necessity, but to show how a peppy user may encode a subsample/units table, and have it still function with the workflow.

…variant-calling into peppy_units

vreuter · 2019-04-29T20:42:36Z

pepkit/peppy#286

nsheff · 2019-04-29T22:12:21Z

that is a thing of beauty. 🍦

johanneskoester · 2019-04-30T08:27:34Z

rules/common.smk

 validate(samples, schema="../schemas/samples.schema.yaml")

-units = pd.read_table(config["units"], dtype=str).set_index(["sample", "unit"], drop=False)
+units = p.subsample_table


Really nice! What about always converting the subsample_table index into string inside peppy (in order to get rid of the line below)?

Hi johannes, it looks like this has been done (the line is now gone).

nsheff · 2019-05-03T15:47:10Z

hey @johanneskoester what do you think? What's the next step?

johanneskoester · 2019-05-03T16:36:20Z

So, as I have seen from your repo, there is now a SnakeProject subclass? Can you tell me what is the difference to the main Project class? If possible I would love to not have a special case, but maybe there is some reason I do not see.

vreuter · 2019-05-03T17:01:56Z

So, as I have seen from your repo, there is now a SnakeProject subclass? Can you tell me what is the difference to the main Project class? If possible I would love to not have a special case, but maybe there is some reason I do not see.

It handles naming differences you requested removed from the workflow.

nsheff · 2019-05-08T21:44:11Z

Hey @johanneskoester -- it would be possible to not use the SnakeProject class if you instead subscribed to the original PEP names (like subsample instead of unit). All the rest of the functionality is in the main Project class.

What do you think? Since we used different terminology to refer to these things, we have to either have an adapter, or we have to change one of the standards -- in this case we implemented an adapter.

johanneskoester · 2019-05-09T09:36:38Z

Changing from unit to subsample and the other colname changes would be fine for me. Regarding the rest:

this is not needed for Snakemake. In best practice workflows we ensure presence of certain columns via JSON schema validation.
this I would love to see in the original peppy project. Are there arguments against it? Could be also configurable, in the sense of Project(..., subsample_index_cols=[...]).

vreuter · 2019-05-09T13:32:33Z

Changing from unit to subsample and the other colname changes would be fine for me. Regarding the rest:

* [this](https://github.com/pepkit/peppy/blob/d2c23497077809b25cd37d70cbf2d4f25e4aca25/peppy/snake_project.py#L64) is not needed for Snakemake. In best practice workflows we ensure presence of certain columns via JSON schema validation.

OK

* [this](https://github.com/pepkit/peppy/blob/d2c23497077809b25cd37d70cbf2d4f25e4aca25/peppy/snake_project.py#L69) I would love to see in the original peppy project. Are there arguments against it? Could be also configurable, in the sense of `Project(..., subsample_index_cols=[...])`.

I'm fine with setting the index. Identifier column name(s) shouldn't be configurable, though. The solution in place bubbles knowledge of the schema with which standard data (like sample name and subsample name/index) may be referenced up to the level of the types themselves (Project, SnakeProject). Furthermore, we'd like to couple the column name used to denote something like sample identifier to an actual attribute on Sample objects that are created and used from the metadata sheets, so parameterizing the sheet column would break that coupling.

nsheff · 2019-05-09T13:44:00Z

For your first point: we are also planning to adopt the snakemake (or similar) validation for PEPs as well. So, sounds good. we should address that soon.

For your second point: I agree we should set the index in peppy and also I think that @vreuter is right that making the column name parameterizable could lead to some downstream trouble... one solution, maybe: could we take whatever column name the user provides and internally just standardize it to 'sample_name'? would solve your concern I think

johanneskoester · 2019-05-09T14:09:05Z

I think we have a misunderstanding here. With the parameter to Project I did not mean to make the column name parameterizable, but rather wanted to give the user the possibility to select over which columns the index shall be created (e.g., sample_name by default for the sample table, sample_name + subsample_name by default for the subsample_table).

vreuter · 2019-05-09T14:30:34Z

I think we have a misunderstanding here. With the parameter to Project I did not mean to make the column name parameterizable, but rather wanted to give the user the possibility to select over which columns the index shall be created (e.g., sample_name by default for the sample table, sample_name + subsample_name by default for the subsample_table).

Indeed! Can't speak for @nsheff , but I'd misinterpreted your suggestion. This sounds fine, but to minimally burden Snakemake workflow authors, would you want the default for the second component (besides name) of indexing for subsample_table indexing to be unit rather than subsample_name? (Is column naming in this workflow's units.tsv typical?)

nsheff · 2019-05-28T13:09:24Z

@johanneskoester, @vreuter -- where do we stand on this? I agree we can easily make this parameterizable. I think we should use subsample_name and not unit as the default. but feel free to speak out if you disagree

vreuter · 2019-05-28T14:22:38Z

The selection of columns to index, or certain names? Indexing flexibility is fine, but naming flexibility of particularly meaningful columns isn’t, at least in terms of how they’re referenced once the object is built. So yeah we could accept, for example, unit and subsample_name but have them referenced the same

nsheff · 2019-05-28T15:01:44Z

indexes are parameterize, not names.

With the parameter to Project I did not mean to make the column name parameterizable, but rather wanted to give the user the possibility to select over which columns the index shall be created (e.g., sample_name by default for the sample table, sample_name + subsample_name by default for the subsample_table).

I guess you can make it unit in the snakeproject object if you want. @johanneskoester can update if he wants otherwise I guess

johanneskoester · 2019-05-29T21:08:40Z

I am fine with renaming unit to subsample_name in our pipelines here. I would like to not have a snakeproject at all. If I am not wrong, once the indexing can be configured upon instantiating Project, there should be no need for a snakeproject.

johanneskoester · 2019-06-21T07:02:58Z

With the new release, can you give me a quick pointer how to adapt this PR to the new capabilities? Or do you want to do it yourself?

vreuter · 2019-06-21T14:30:51Z

@johanneskoester it looks like the latest release of Snakemake on PyPI targets Python 3.5 or at least supports it, but that this workflow uses string formatting introduced in PEP 498 that only targets Python 3.6?

johanneskoester · 2019-06-25T08:44:45Z

@johanneskoester it looks like the latest release of Snakemake on PyPI targets Python 3.5 or at least supports it, but that this workflow uses string formatting introduced in PEP 498 that only targets Python 3.6?

Good catch, that was not intended. Fixed in master.

nsheff · 2020-02-18T13:34:08Z

prjcfg.yaml

-  sample_annotation: samples.tsv
-  sample_subannotation: units.tsv
+  sample_table: samples.tsv
+  sample_subtable: units.tsv


this should be subsample_table, not sample_subtable, right?

…-variant-calling/pull/8/files#r380673348

vreuter added 23 commits March 13, 2019 12:27

initial peppy imports working

d34726f

more peppy interop

59eb4d4

set the index; use master config

f92ae7c

cleanup

484976b

remove additional print

b3661dc

more cleanup

ba4debb

peppy files

e1af9a1

minimize changes, shorten names

d5b6d46

remove unused import

5964ecf

get back validate

6b54e14

need to check files entry

37ded42

guards and cleanup

ca76544

clear unused KV in project config

e85a876

condense and explain

6711da2

peppy-compatible subannotation / units sheet

6fde15f

Merge branch 'master' of github.com:snakemake-workflows/dna-seq-gatk-…

72ba626

…variant-calling into peppy_units

see about adding units dynamically

bd8f36b

using SnakeProject

6e49167

use Snakemake naming

682bb5d

use base anns file due to identical content

0d2fc1f

add prj cfg that uses the base files

e7301f3

use the native encoding

1750ab6

condense config files

6192d29

vreuter mentioned this pull request Apr 29, 2019

Using peppy #6

Closed

johanneskoester reviewed Apr 30, 2019

View reviewed changes

vreuter mentioned this pull request Apr 30, 2019

Ensure string type for subsample/units pepkit/peppy#297

Closed

nsheff mentioned this pull request Jun 4, 2019

0.22.0 pepkit/peppy#314

Merged

update to reflect peppy updates; snakemake-workflows#8 (comment)

f728c37

nsheff reviewed Feb 18, 2020

View reviewed changes

fix name mistake; https://github.com/snakemake-workflows/dna-seq-gatk…

0e37cfb

…-variant-calling/pull/8/files#r380673348

stolarczyk mentioned this pull request Mar 2, 2020

use peppy #22

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Peppy #8

Peppy #8

vreuter commented Apr 29, 2019 •

edited

Loading

vreuter commented Apr 29, 2019

nsheff commented Apr 29, 2019

johanneskoester Apr 30, 2019

nsheff Feb 18, 2020

nsheff commented May 3, 2019

johanneskoester commented May 3, 2019

vreuter commented May 3, 2019 •

edited

Loading

nsheff commented May 8, 2019 •

edited

Loading

johanneskoester commented May 9, 2019

vreuter commented May 9, 2019

nsheff commented May 9, 2019

johanneskoester commented May 9, 2019

vreuter commented May 9, 2019

nsheff commented May 28, 2019

vreuter commented May 28, 2019

nsheff commented May 28, 2019

johanneskoester commented May 29, 2019

johanneskoester commented Jun 21, 2019

vreuter commented Jun 21, 2019 •

edited

Loading

johanneskoester commented Jun 25, 2019

nsheff Feb 18, 2020

Peppy #8

Are you sure you want to change the base?

Peppy #8

Conversation

vreuter commented Apr 29, 2019 • edited Loading

vreuter commented Apr 29, 2019

nsheff commented Apr 29, 2019

johanneskoester Apr 30, 2019

Choose a reason for hiding this comment

nsheff Feb 18, 2020

Choose a reason for hiding this comment

nsheff commented May 3, 2019

johanneskoester commented May 3, 2019

vreuter commented May 3, 2019 • edited Loading

nsheff commented May 8, 2019 • edited Loading

johanneskoester commented May 9, 2019

vreuter commented May 9, 2019

nsheff commented May 9, 2019

johanneskoester commented May 9, 2019

vreuter commented May 9, 2019

nsheff commented May 28, 2019

vreuter commented May 28, 2019

nsheff commented May 28, 2019

johanneskoester commented May 29, 2019

johanneskoester commented Jun 21, 2019

vreuter commented Jun 21, 2019 • edited Loading

johanneskoester commented Jun 25, 2019

nsheff Feb 18, 2020

Choose a reason for hiding this comment

vreuter commented Apr 29, 2019 •

edited

Loading

vreuter commented May 3, 2019 •

edited

Loading

nsheff commented May 8, 2019 •

edited

Loading

vreuter commented Jun 21, 2019 •

edited

Loading