You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Many users will use the scratch true directive, in part to minimize the size of the shared work directory - to ensure that the files saved to the work directory are restricted to only those necessary for downstream tasks and for the resume mechanism.
In cases where a process outputs glob pattern also matches the input file, the input file is unnecessarily copied back into the shared work directory
Note that the in.csv file is copied back to the shared work directory:
❯ nextflow run .
N E X T F L O W ~ version 23.04.1
Launching `./main.nf` [hopeful_church] DSL2 - revision: 06d2458686
executor > local (1)
[42/2fa08b] process > GreedyOutputGlob (1) [100%] 1 of 1 ✔
/private/tmp/foo/work/42/2fa08b2ef83cd1799c58833592deed/out.csv
/tmp/foo on ☁️ sts on ☁️ devstar2002@gcplab.me took 2s
❯ tree work
work
└── 42
└── 2fa08b2ef83cd1799c58833592deed
├── in.csv
└── out.csv
3 directories, 2 files
This is because the nxf_unstage command uses the output glob pattern directly, without regard to the input files:
To help users save storing the duplicated input files, it would be better if Nextflow excluded input files from being copied back to the shared work directory (unless the includeInputs: true argument is included in the outputs: block).
Environment
Nextflow version: 23.04.1
Java version: openjdk version "17.0.5" 2022-10-18
Operating system: all
Bash version: all
(Add any other context about the problem here)
The text was updated successfully, but these errors were encountered:
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
Bug report
Many users will use the
scratch true
directive, in part to minimize the size of the shared work directory - to ensure that the files saved to the work directory are restricted to only those necessary for downstream tasks and for the resume mechanism.In cases where a process outputs glob pattern also matches the input file, the input file is unnecessarily copied back into the shared work directory
Steps to reproduce the problem
Given
main.nf
:Note that the
in.csv
file is copied back to the shared work directory:This is because the
nxf_unstage
command uses the output glob pattern directly, without regard to the input files:Expected behaviour and actual behaviour
To help users save storing the duplicated input files, it would be better if Nextflow excluded input files from being copied back to the shared work directory (unless the
includeInputs: true
argument is included in theoutputs:
block).Environment
(Add any other context about the problem here)
The text was updated successfully, but these errors were encountered: