Step target example broken #983

gaow · 2018-06-15T03:19:53Z

Separated from a private correspondence: the RNA-seq DE example workflow now is broken. To reproduce:

cd sos-docs/src/examples
sos run RNASeqDE.ipynb align -j 2 --samples SRR1039508

Error message:

ERROR: Multiple steps hg19_reference_1, hg19_reference_2, hg19_reference_3 to generate target sos_step("hg19_reference")

This used to work, though. I guess maybe it is similar to issue in #981 introduced due to recent changes of target dependency codes.

The text was updated successfully, but these errors were encountered:

BoPeng · 2018-06-15T05:46:02Z

In this example, you have

[hg19_reference_1 (download)]
[hg19_reference_2 (decompress hg19.fa)]
[hg19_reference_3 (gene annotations)]

and then

depends: sos_step('hg19_reference')

so are you depending on a subworkflow?

gaow · 2018-06-15T10:58:17Z

Yes. I wonder if this is okay, because I believe it used to work.

BoPeng · 2018-06-15T21:28:01Z

I never knew it worked, so I would say it was an undefined behavior and we need to formalize now. It is like triggering a subworkflow from dependents. I mean, if you can do

[sub_1]
[sub_2]

[step]
depends: sos_step('sub')

instead of

[sub_1]
[sub_2]

[step]
sos_run('sub')

why would not we allow formal workflow here?

[A_1]
[A_2]
[B_1]

[step]
depends: sos_step('A+B')

gaow · 2018-06-15T22:16:29Z

Sorry I'm a bit confused. Is this a proposal? It seems reasonable, if not difficult ... right?

BoPeng · 2018-06-15T22:19:35Z

I do not know how difficult it is. I am just saying that the behavior was undefined before and triggers an error now, and if we are to support it, we are essentially allowing the triggering of an entire workflow, not just sos_step, and then if we are triggering workflow, sos_step('A+B') makes sense, then the name sos_step might be inappropriate.

gaow · 2018-06-15T22:33:32Z

I see. I'm okay with the name sos_step though if we decide to support it.

But do we want to support it? It seems an natural extension and feature; but if it is not a robust feature and can cause potential problems we then would rather not have it? Currently we do not have an immediate need for it -- in the RNA-seq DE example I wanted to demonstrate the feature. Though I can also try to use file targets, the logic seems more intuitive with this dependency on the mini-workflow. What do you think?

BoPeng · 2018-06-15T22:36:16Z

In our manuscript, I specifically said something like: the trunk of the workflow can be outcome based where process-oriented subworkflows are triggered ... So conceptually speaking this is what we want to do, and we do not have an explicit syntax for it.

gaow · 2018-06-15T22:41:47Z

Indeed ... I guess we should formalize it. I think you've make it clear about the behavior, but we need new syntax like workflow_target() because we have file_target()? Or sos_workflow() similar to sos_variable()?

BoPeng · 2018-06-15T22:44:44Z

Yes, sos_workflow() is clearer than sos_step(workflow), but sos_step has caused enough trouble and I do not know if I want to stab into sos_workflow now.

BoPeng · 2018-06-15T22:49:29Z

a does not match a_1 etc now. The loophole was for a_0 as far as I remember but now a would not even match to a_0.

BoPeng · 2018-06-15T22:54:57Z

In your case

[hg19_reference_1 (download)]
[hg19_reference_2 (decompress hg19.fa)]
[hg19_reference_3 (gene annotations)]

[default]
depends: sos_step('hg19_reference')

would not work now, but

[hg19_reference_1 (download)]
[hg19_reference_2 (decompress hg19.fa)]
[hg19_reference_3 (gene annotations)]

[default]
depends: sos_step('hg19_reference_1'), sos_step('hg19_reference_2'), sos_step('hg19_reference_3')

would work. I am wondering if we can allow something like

[hg19_reference_1 (download)]
[hg19_reference_2 (decompress hg19.fa)]
[hg19_reference_3 (gene annotations)]

[default]
depends: sos_step('hg19_reference_*')

It is conceptually not a workflow, but multiple sos_step specified using a wildcard character.

gaow · 2018-06-15T22:59:57Z

I agree with not messing too much with the code at this moment until there is a real need. For mini-workflow, one can always do

[C]
sos_run('A+B')

[default]
depends: sos_step('C')

right? then it should be good to claim handling mini-workflows. But I think the _* case is one particular exception that worth a special handling. Because this is what people might often do -- the most common type of process-oriented subworkflow is one with numeric ordering. So i think it would be good to support sos_step('hg19_reference') directly and under the hood we expend it to match all steps.

BoPeng · 2018-06-15T23:13:39Z

There is a minor difference here: we claim the "connectedness" of workflow steps so in sos_run('a') we know that a_2 would inherit output from a_1. In sos_step('a_1'), sos_step)('a_2') (or sos_step('a') as you proposed), the steps are not connected and are executed as separate steps.

So in the end there might be a difference between

[C]
sos_run('A')

[default]
depends: sos_step('C')

and

[default]
depends: sos_step('A')

although the latter might just work in a majority of the cases.

gaow · 2018-06-15T23:23:51Z

Okay, but I guess if sos_step('hg19_reference_*') does not assume connectedness there might be more confusion than benefit, because my hg19_reference workflow does assume the steps involved follow numeric ordering ...

BoPeng · 2018-06-16T00:41:59Z

ok, I can ensure order so sos_step('a') would essentially be sos_workflow('a'), which actually make the latter unnecessary because sos_workflow('a+b') is rarely needed and users can use the extra-step trick if needed.

The problem with the implementation is that we are using separate DAGs for sos_run('a') but in this case the steps would be added to the master DAG, which assumes one numerically ordered forward-style trunk. Adding multiple numerically ordered mini-workflows would complicate the DAG building (but not necessarily difficult though).

gaow · 2018-06-16T02:28:11Z

Hmm now I guess my nested dependency has made things more complicated:

$ sos run RNASeqDE.ipynb align --samples SRR1039508

ERROR: Defined output fail to produce expected output: /home/gaow/GIT/LargeFiles/RNAseqDE/hg19/hg19.2bit /home/gaow/GIT/LargeFiles/RNAseqDE/hg19/twoBitToFa generated, /home/gaow/GIT/LargeFiles/RNAseqDE/hg19/genomeParameters.txt expected.

But sos run RNASeqDE.ipynb star now works as expected thanks to the latest patch. I'll try to run that step for now because I know this can take a few hours. Will test other steps after we sort this out.

gaow · 2018-06-16T02:53:24Z

Unfortunately, sos_step('a') does not work because it fails to connect the input / output. Here is MWE:

[hg_1]
output: "1.txt"
run:
  touch 1.txt

[hg_2]
output: "2.txt"
run: expand = True
   echo {_input[0]} > 2.txt

[default]
depends: sos_step('hg')

and it throws IndexError: list index out of range

gaow · 2018-06-16T04:51:11Z

Okay, now I get this error:

$ sos run RNASeqDE.ipynb star
INFO: download (index=0) is ignored due to saved signature
INFO: output:   /home/gaow/GIT/LargeFiles/RNAseqDE/hg19/hg19.2bit, /home/gaow/GIT/LargeFiles/RNAseqDE/hg19/twoBitToFa
INFO: decompress hg19.fa (index=0) is ignored due to saved signature
INFO: output:   /home/gaow/GIT/LargeFiles/RNAseqDE/hg19/hg19.fa
INFO: gene annotations (index=0) is ignored due to saved signature
INFO: output:   /home/gaow/GIT/LargeFiles/RNAseqDE/hg19/Homo_sapiens.GRCh38.91.gtf.gz
INFO: Target unavailable: sos_step("hg19_reference")
ERROR: Completed target sos_step("hg19_reference") is being re-executed. Please report this bug to SoS developers.

BoPeng · 2018-06-16T04:56:44Z

Trying to figure out how to trigger this bug from the test case

[a_1]
output: "a_1"
sh:
  echo whatever > a_1

[a_2]
output: "a_2"
sh: expand=True
  cp {_input} {_output}

[default]
depends: sos_step('a')

…s "_" #983

BoPeng · 2018-06-16T05:11:05Z

It was caused by the _ inside hg19_reference. Fixed now.

gaow · 2018-06-16T13:19:11Z

At this point I was able to test and verify separately the 2 commands:

sos run RNASeqDE.ipynb star
sos run RNASeqDE.ipynb align

but

sos run RNASeqDE.ipynb align

by itself still fails. See this thread for details: #983 (comment)

BoPeng · 2018-06-16T14:09:00Z

I see

$ sos run RNASeqDE.ipynb star
INFO: download (index=0) is ignored due to saved signature
INFO: output:   /Users/bpeng1/GIT/LargeFiles/RNAseqDE/hg19/hg19.2bit, /Users/bpeng1/GIT/LargeFiles/RNAseqDE/hg19/twoBitToFa
INFO: decompress hg19.fa (index=0) is ignored due to saved signature
INFO: output:   /Users/bpeng1/GIT/LargeFiles/RNAseqDE/hg19/hg19.fa
INFO: gene annotations (index=0) is ignored due to saved signature
INFO: output:   /Users/bpeng1/GIT/LargeFiles/RNAseqDE/hg19/Homo_sapiens.GRCh38.91.gtf.gz
STAR --runMode genomeGenerate \
   --genomeDir /Users/bpeng1/GIT/LargeFiles/RNAseqDE/hg19 \
   --genomeFastaFiles /Users/bpeng1/GIT/LargeFiles/RNAseqDE/hg19/hg19.fa \
   --sjdbGTFtagExonParentTranscript /Users/bpeng1/GIT/LargeFiles/RNAseqDE/hg19/Homo_sapiens.GRCh38.91.gtf.gz \
   --runThreadN 4
Jun 16 14:13:27 ..... started STAR run
Jun 16 14:13:27 ... starting to generate Genome files

EXITING because of INPUT ERROR: could not open genomeFastaFile: /Users/bpeng1/GIT/LargeFiles/RNAseqDE/hg19/hg19.fa

Jun 16 14:13:27 ...... FATAL ERROR, exiting
ERROR: [star]: Executing script in docker returns an error (exitcode=104).
The script has been saved to /Users/bpeng1/sos/sos-docs/src/examples/.sos/docker_run_76367.sh. To reproduce the error please run:
docker run --rm   -v /private/tmp:/private/tmp -v /Users/bpeng1/sos/sos-docs/src/examples/.sos/docker_run_76367.sh:/var/lib/sos/docker_run_76367.sh    -t -P -w=/private/tmp -u 1985961928:895809667    gaow/debian-ngs /bin/bash -ev /var/lib/sos/docker_run_76367.sh

I suppose this is because the Large file stuff is outside of the current working directory, right? Note that we do not mount home by default now.

gaow · 2018-06-16T14:11:05Z

@BoPeng yes. and I already updated the notebook to reflect the change, and added a note. Please pull the latest version.

currently the star pipeline by itself works though you are welcome to test it. The issue is, that we'd want that step to be triggered by align so people only have to type one command. It is designed that way, but not working now.

BoPeng · 2018-06-16T14:25:31Z

On my machine, I got the error for 2G ram, then also for 11G, and I am increaging to 20G. The error message is clear enough though.

ERROR: [star]: Script killed by docker, probably because of lack of RAM (available RAM=11.2GB, exitcode=137). The script has been saved to /Users/bpeng1/sos/sos-docs/src/examples/.sos/docker_run_82344.sh. To reproduce the error please run:

gaow · 2018-06-16T14:56:49Z

Yes, I have made a note that it requires 30g+ memory. My computer is 64G. This example is meant to be reproduced so I didn't make it external task. The other RNA seq example is external task with 96G memory requirement not meant to be reproduced.

BoPeng · 2018-06-16T15:07:17Z

Come on, I only ordered 32G of RAM for my new mac and thought that it would be enough even for virtual machines... Now I can not run the star step.

gaow · 2018-06-16T15:27:28Z

Well I guess that is why many people want to use tophat instead. But there is a sparse mode in STAR that will half the memory. I believe I have implemented its interface in this notebook too. I assume users can reserve an interactive cluster node for the full version though? I will make a note

gaow · 2018-06-16T15:31:39Z

BTW there should be no need to use VM for this one because I think most tools are dockerized. Maybe it fits if you don't use VM? My desktop is a $2600 Linux machine of 40 threads 64G memory ...

…liary step #983

BoPeng · 2018-06-16T15:57:39Z

$ sos run RNASeqDE.ipynb align --samples SRR1039508

should be fixed.

BoPeng pushed a commit that referenced this issue Jun 15, 2018

Disallow wildcard matching for sos_step #983

88cfa3d

gaow closed this as completed Jun 15, 2018

gaow reopened this Jun 15, 2018

BoPeng pushed a commit that referenced this issue Jun 16, 2018

Allow sos_step('a') to trigger a subworkflow a #983

4fd39a9

BoPeng pushed a commit that referenced this issue Jun 16, 2018

Connect mini forward workflow triggeed by extended sos_step target. #983

f8c40a4

BoPeng pushed a commit that referenced this issue Jun 16, 2018

Fix the handling of sos_step(workflow) when the workflow name contain…

a6e40b6

…s "_" #983

BoPeng pushed a commit that referenced this issue Jun 16, 2018

Fix the trigger of sos_step(workflow) when the step itself is an auxi…

2ca8a98

…liary step #983

gaow closed this as completed Jun 16, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Step target example broken #983

Step target example broken #983

gaow commented Jun 15, 2018

BoPeng commented Jun 15, 2018

gaow commented Jun 15, 2018

BoPeng commented Jun 15, 2018

gaow commented Jun 15, 2018

BoPeng commented Jun 15, 2018 •

edited

Loading

gaow commented Jun 15, 2018

BoPeng commented Jun 15, 2018

gaow commented Jun 15, 2018

BoPeng commented Jun 15, 2018

BoPeng commented Jun 15, 2018

BoPeng commented Jun 15, 2018

gaow commented Jun 15, 2018

BoPeng commented Jun 15, 2018

gaow commented Jun 15, 2018

BoPeng commented Jun 16, 2018

gaow commented Jun 16, 2018 •

edited

Loading

gaow commented Jun 16, 2018

gaow commented Jun 16, 2018

BoPeng commented Jun 16, 2018

BoPeng commented Jun 16, 2018

gaow commented Jun 16, 2018

BoPeng commented Jun 16, 2018

gaow commented Jun 16, 2018

BoPeng commented Jun 16, 2018

gaow commented Jun 16, 2018

BoPeng commented Jun 16, 2018

gaow commented Jun 16, 2018

gaow commented Jun 16, 2018

BoPeng commented Jun 16, 2018

Step target example broken #983

Step target example broken #983

Comments

gaow commented Jun 15, 2018

BoPeng commented Jun 15, 2018

gaow commented Jun 15, 2018

BoPeng commented Jun 15, 2018

gaow commented Jun 15, 2018

BoPeng commented Jun 15, 2018 • edited Loading

gaow commented Jun 15, 2018

BoPeng commented Jun 15, 2018

gaow commented Jun 15, 2018

BoPeng commented Jun 15, 2018

BoPeng commented Jun 15, 2018

BoPeng commented Jun 15, 2018

gaow commented Jun 15, 2018

BoPeng commented Jun 15, 2018

gaow commented Jun 15, 2018

BoPeng commented Jun 16, 2018

gaow commented Jun 16, 2018 • edited Loading

gaow commented Jun 16, 2018

gaow commented Jun 16, 2018

BoPeng commented Jun 16, 2018

BoPeng commented Jun 16, 2018

gaow commented Jun 16, 2018

BoPeng commented Jun 16, 2018

gaow commented Jun 16, 2018

BoPeng commented Jun 16, 2018

gaow commented Jun 16, 2018

BoPeng commented Jun 16, 2018

gaow commented Jun 16, 2018

gaow commented Jun 16, 2018

BoPeng commented Jun 16, 2018

BoPeng commented Jun 15, 2018 •

edited

Loading

gaow commented Jun 16, 2018 •

edited

Loading