Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Updatable inputs #317

Merged
merged 30 commits into from
May 10, 2017
Merged

Updatable inputs #317

merged 30 commits into from
May 10, 2017

Conversation

tetron
Copy link
Member

@tetron tetron commented Mar 2, 2017

Not ready to merge!

This is for tracking work on the feature of being able to safely update input files in-place.

@tetron
Copy link
Member Author

tetron commented Mar 3, 2017

Extends the existing functionality of staging "writable" files to the output directory. If inplaceUpdate is true, files will link (via symlink or bind mount) to the real files and permit updating.

Enforces sequencing of updates by incrementing a "generation" number with each update (*). Files can only be update by one workflow step at a time, and the updated file must be passed explicitly to downstream steps. If two steps try to update a file of the same generation it will raise an error.

(*) In writing this, I realized that while write-write conflicts are detected, read-write conflicts are not. So the lock management probably needs a bit more work.

@tetron
Copy link
Member Author

tetron commented Mar 7, 2017

@gijzelaerr this is ready for testing

@gijzelaerr
Copy link
Member

awesome! Amazing! Thanks!

I'm trying it out now, but I run into troubles running your test/example:

gijs@sherlock:~/Work/cwltool (updatable-inputs) ✗
λ  .virtualenv/bin/cwltool tests/wf/updateval_inplace.cwl
.virtualenv/bin/cwltool 1.0.20170307024643
Resolved 'tests/wf/updateval_inplace.cwl' to 'file:///home/gijs/Work/cwltool/tests/wf/updateval_inplace.cwl'
Tool definition failed validation:
tests/wf/updateval_inplace.cwl:3:1: checking field `requirements`
tests/wf/updateval_inplace.cwl:8:3:   checking item
tests/wf/updateval_inplace.cwl:8:3:     Field `class` contains undefined reference to `file:///home/gijs/Work/cwltool/tests/wf/InplaceUpdateRequirement`

@tetron
Copy link
Member Author

tetron commented Mar 7, 2017

To enable the feature is hidden behind the flag --enable-ext

cwltool/job.py Outdated
if vol.type in ("File", "Directory") or (inplace_update and
vol.type in ("WritableFile", "WritableDirectory")):
if os.path.exists(vol.target):
os.remove(vol.target)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is giving me:

Traceback (most recent call last):
  File "/home/gijs/Work/cwltool/cwltool/job.py", line 209, in _execute
    relink_initialworkdir(self.generatemapper, inplace_update=self.inplace_update)
  File "/home/gijs/Work/cwltool/cwltool/job.py", line 99, in relink_initialworkdir
    os.remove(vol.target)
OSError: [Errno 21] Is a directory: '/tmp/tmpC7NZ5I/1484694604_1284.hh_vv.ms'
[job casa_flagdata.cwl] completed permanentFail
{}

@gijzelaerr
Copy link
Member

This fails in the case of a Directory input/output

Traceback (most recent call last):
  File "/home/gijs/Work/cwltool/cwltool/job.py", line 209, in _execute
    relink_initialworkdir(self.generatemapper, inplace_update=self.inplace_update)
  File "/home/gijs/Work/cwltool/cwltool/job.py", line 99, in relink_initialworkdir
    os.remove(vol.target)
OSError: [Errno 21] Is a directory: '/tmp/tmpC7NZ5I/1484694604_1284.hh_vv.ms'
[job casa_flagdata.cwl] completed permanentFail
{}

here is my CWL file
https://github.com/gijzelaerr/vermeerkat-cwl/blob/99d6e9456f18dc3e0a20f65729b27da6c8b02fb9/cwl/casa_flagdata.cwl

@tetron
Copy link
Member Author

tetron commented Mar 10, 2017

Oh, I forgot to even test it for WritableDirectory. Will take a look.

@tetron
Copy link
Member Author

tetron commented Mar 10, 2017

@gijzelaerr try it now, updatable directories should be fixed.

@mr-c
Copy link
Member

mr-c commented Mar 11, 2017

@tetron FYI: I merged in master

@gijzelaerr
Copy link
Member

I think the merge from master broke something?

 .virtualenv/bin/cwltool --enable-ext --outdir results --cachedir cache cwl/workflow.cwl cwl/job.cwl
.virtualenv/bin/cwltool 1.0.20170307024643
Resolved 'cwl/workflow.cwl' to 'file:///home/gijs/Work/vermeerkat-cwl/cwl/workflow.cwl'
Tool definition failed initialization:
cwl/aoflagger.cwl:15:5: Unsupported requirement InplaceUpdateRequirement

@gijzelaerr
Copy link
Member

yes, f812858 seems to eat the InplaceUpdateRequirement just fine.

@gijzelaerr
Copy link
Member

f812858 gives me

 .virtualenv/bin/cwltool --enable-ext --outdir results --cachedir cache cwl/workflow.cwl cwl/job.cwl
.virtualenv/bin/cwltool 1.0.20170310233005
Resolved 'cwl/workflow.cwl' to 'file:///home/gijs/Work/vermeerkat-cwl/cwl/workflow.cwl'
[job curl] Using cached output in /home/gijs/Work/vermeerkat-cwl/cache/e264380f650712a0bed41bd4e101b960
invalid field `_generation`, expected one of: 'class', 'location', 'path', 'basename', 'dirname', 'nameroot', 'nameext', 'checksum', 'size', 'secondaryFiles', 'format', 'contents'
Got workflow error
Traceback (most recent call last):
  File "/home/gijs/Work/cwltool/cwltool/main.py", line 238, in single_job_executor
    r.run(**kwargs)
  File "/home/gijs/Work/cwltool/cwltool/draft2tool.py", line 144, in run
    kwargs.get("compute_checksum", True)), "success")
  File "/home/gijs/Work/cwltool/cwltool/draft2tool.py", line 474, in collect_output_ports
    for r in readers.values():
AttributeError: 'NoneType' object has no attribute 'values'
Workflow error, try again with --debug for more information:
'NoneType' object has no attribute 'values'

@mr-c
Copy link
Member

mr-c commented Mar 14, 2017

Oops, sorry. Looks like I did the merge from a stale browser window. I'll fix it now

@mr-c
Copy link
Member

mr-c commented Mar 14, 2017

@tetron I've undone the merge from master with a force push and I'll keep my hands off this PR; mea culpa!

qiukunlong pushed a commit to qiukunlong/cwltool that referenced this pull request Mar 25, 2017
aa320ec Merge pull request common-workflow-language#342 from common-workflow-language/fix-mystery-package-errors
3c6eaff Update for rename of argparse2cwl to argparse2tool
b14404b remove errant jsonldPredicate
7f116da Merge pull request common-workflow-language#339 from common-workflow-language/dockerOutputDirectory
4f65d1f Add test for dockerOutputDirectory option of DockerRequirement.
cb1f928 Merge pull request common-workflow-language#335 from common-workflow-language/output-literals
d8b6c8f Add tests for file and directory literals in output as produced by expression tool.
ee3bb1c Merge pull request common-workflow-language#324 from common-workflow-language/secondaryfiles-array
5e6c2d4 Add NLeSC's Xenon
cbe1c8d Merge pull request common-workflow-language#334 from StarvingMarvin/master
8300e43 output ids must start with '#' in draft-2
f87d218 draft-3 test fixes: explicit output for args
8b10319 Fixes for draft-2 tests: adding explicit args output
83ed4b4 Merge remote-tracking branch 'rabix-cwl/master'
e0a5e7c Merge pull request common-workflow-language#328 from StarvingMarvin/master
d906c4a dir4 test fix
21bdcaf Merge remote-tracking branch 'upstream/master'
dee0f4e Disabled checksum check for listing input dir
fc4cb0f Added args output and empty.json for null jobs
a4950f4 Update tools list
8e1d67c Add Rabix Bunny CI badge
2a5b95e fixing mentions to draft3/4
1356c45 Add test for secondaryFiles with arrays
b60125b Single instead of double quotes in doc
179835b Merge pull request common-workflow-language#319 from common-workflow-language/test-cwl-out2
ed40737 Add additional test for cwl.output.json behavior.
a217bb6 Merge pull request common-workflow-language#317 from StarvingMarvin/master
8dd69f2 fixing test in v1.0 and v1.1.0-dev1
e831c20 Merge pull request common-workflow-language#314 from common-workflow-language/test-requirements-on-steps
8cad6f1 Add test for requirements/hints on workflow steps.
7679adb Merge pull request common-workflow-language#308 from common-workflow-language/embedded-subworkflow-test
1582677 Tighten up formatting
779c2d4 Add test for embedded subworkflow
e22a9c2 Merge pull request common-workflow-language#307 from alaindomissy/patch-3
125cb5f Update UserGuide.yml
99f4f9f Merge pull request common-workflow-language#303 from common-workflow-language/include-stdin-in-docs
5324dbb put stdin shortcut in correct place

git-subtree-dir: cwltool/schemas
git-subtree-split: aa320ec
@gijzelaerr
Copy link
Member

just did a rerun without debug, I think i have the same error

[step casa_flagdata] completed success
[workflow workflow.cwl] outdir is /home/gijs/Work/vermeerkat-cwl/cache/R88qhS
Unhandled error, try again with --debug for more information:
  [Errno 21] Is a directory: '/home/gijs/Work/vermeerkat-cwl/cache/cb5bf370951c5f2a7b4538e0e0cb7d41/result.ms'
make: *** [run] Error 1

@tetron
Copy link
Member Author

tetron commented Mar 29, 2017

@gijzelaerr I'm pretty swamped at the moment, would you be able to look into this? It looks like right at the end in relocateOutputs it assumes it is only copying files, it needs to detect if something is a directory and use shutil.copytree.

@gijzelaerr
Copy link
Member

It seems to be a bit more complex than that. When I add support for recursive dir copy there I get:

[workflow workflow.cwl] outdir is /home/gijs/Work/vermeerkat-cwl/cache/kx9MZe
Copying /home/gijs/Work/vermeerkat-cwl/cache/a1f3006db20fae39c82ec21e8297dd1c/result.ms to results/result.ms
Unhandled error, try again with --debug for more information:
  [Errno 2] No such file or directory: u'/result.ms/table.f7_TSM1'
Traceback (most recent call last):
  File "/home/gijs/Work/cwltool/cwltool/main.py", line 768, in main
    **vars(args))
  File "/home/gijs/Work/cwltool/cwltool/main.py", line 260, in single_job_executor
    kwargs["make_fs_access"](""))
  File "/home/gijs/Work/cwltool/cwltool/process.py", line 259, in relocateOutputs
    adjustFileObjs(outputObj, _check_adjust)
  File "/home/gijs/Work/cwltool/cwltool/pathmapper.py", line 41, in adjustFileObjs
    adjustFileObjs(rec[d], op)
  File "/home/gijs/Work/cwltool/cwltool/pathmapper.py", line 41, in adjustFileObjs
    adjustFileObjs(rec[d], op)
  File "/home/gijs/Work/cwltool/cwltool/pathmapper.py", line 44, in adjustFileObjs
    adjustFileObjs(d, op)
  File "/home/gijs/Work/cwltool/cwltool/pathmapper.py", line 39, in adjustFileObjs
    op(rec)
  File "/home/gijs/Work/cwltool/cwltool/process.py", line 256, in _check_adjust
    compute_checksums(fs_access, f)
  File "/home/gijs/Work/cwltool/cwltool/process.py", line 798, in compute_checksums
    with fs_access.open(fileobj["location"], "rb") as f:
  File "/home/gijs/Work/cwltool/cwltool/stdfsaccess.py", line 26, in open
    return open(self._abs(fn), mode)
IOError: [Errno 2] No such file or directory: u'/result.ms/table.f7_TSM1'

@tetron
Copy link
Member Author

tetron commented Mar 31, 2017

@gijzelaerr how is it working now?

@gijzelaerr
Copy link
Member

@mr-c
Copy link
Member

mr-c commented Apr 4, 2017

@gijzelaerr Thanks for the update. Can you try again w/o caching?

@gijzelaerr
Copy link
Member

same.

$   .virtualenv/bin/cwltool --debug --enable-ext --outdir results/ --tmpdir-prefix tmp/ --tmp-outdir-prefix tmp-output/ cwl/workflow.cwl cwl/job.cwl

[...]

[job casa_flagdata] Removing input staging directory /home/gijs/Work/vermeerkat-cwl/tmp/eaDlUr
[job casa_flagdata] Removing temporary directory /home/gijs/Work/vermeerkat-cwl/tmp/ROxfh_
[workflow workflow.cwl] outdir is /home/gijs/Work/vermeerkat-cwl/tmp-output/ORBBrQ
Moving /home/gijs/Work/vermeerkat-cwl/tmp-output/NrCwpP/result.ms to results/result.ms
Unhandled error, try again with --debug for more information:
  [Errno 2] No such file or directory: u'/result.ms/table.f7_TSM1'
Traceback (most recent call last):
  File "/home/gijs/Work/cwltool/cwltool/main.py", line 776, in main
    **vars(args))
  File "/home/gijs/Work/cwltool/cwltool/main.py", line 260, in single_job_executor
    kwargs["make_fs_access"](""))
  File "/home/gijs/Work/cwltool/cwltool/process.py", line 263, in relocateOutputs
    adjustFileObjs(outputObj, _check_adjust)
  File "/home/gijs/Work/cwltool/cwltool/pathmapper.py", line 41, in adjustFileObjs
    adjustFileObjs(rec[d], op)
  File "/home/gijs/Work/cwltool/cwltool/pathmapper.py", line 41, in adjustFileObjs
    adjustFileObjs(rec[d], op)
  File "/home/gijs/Work/cwltool/cwltool/pathmapper.py", line 44, in adjustFileObjs
    adjustFileObjs(d, op)
  File "/home/gijs/Work/cwltool/cwltool/pathmapper.py", line 39, in adjustFileObjs
    op(rec)
  File "/home/gijs/Work/cwltool/cwltool/process.py", line 260, in _check_adjust
    compute_checksums(fs_access, f)
  File "/home/gijs/Work/cwltool/cwltool/process.py", line 802, in compute_checksums
    with fs_access.open(fileobj["location"], "rb") as f:
  File "/home/gijs/Work/cwltool/cwltool/stdfsaccess.py", line 26, in open
    return open(self._abs(fn), mode)
IOError: [Errno 2] No such file or directory: u'/result.ms/table.f7_TSM1'

@tetron
Copy link
Member Author

tetron commented Apr 6, 2017

@gijzelaerr as a workaround, can you try --no-compute-checksum ?

Looks like it is trying to resolve a relative path without having a base directory to start from.

@gijzelaerr
Copy link
Member

No luck, same issue.

$  .virtualenv/bin/cwltool --no-compute-checksum --debug --enable-ext --outdir results/ --tmpdir-prefix tmp/ --tmp-outdir-prefix tmp-output/ cwl/workflow.cwl cwl/job.cwl

[...]
IOError: [Errno 2] No such file or directory: u'/result.ms/table.f7_TSM1'

@mr-c
Copy link
Member

mr-c commented Apr 7, 2017

@gijzelaerr Bummer. How hard would it be to make a complete minimal test case with data included?

@gijzelaerr
Copy link
Member

here you go. https://github.com/gijzelaerr/cwlfail

actually i think I isolated the error a bit more, it only occurs if you specify the --outdir

@tetron
Copy link
Member Author

tetron commented May 10, 2017

@gijzelaerr thanks for the minimal test case, it was extremely helpful.

A bit late on this, but moving Directory outputs to their final location should be much improved now (your test case does the right thing).

@tetron tetron merged commit a91c16c into master May 10, 2017
@tetron tetron deleted the updatable-inputs branch May 10, 2017 15:13
tetron pushed a commit that referenced this pull request Jul 23, 2017
fixing test in v1.0 and v1.1.0-dev1
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants