Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add gomc parser #78

Merged
merged 29 commits into from
Jul 30, 2019
Merged

Add gomc parser #78

merged 29 commits into from
Jul 30, 2019

Conversation

msoroush
Copy link
Contributor

Add GOMC parser to alchemlyb. Energy output file for Free Energy Calculation is similar to GROMCAS. This parser, reads, Total energy, PV, derivative of energy (for current lambda state), and change of energy (between current lambda state and all other lambda states).

@codecov-io
Copy link

codecov-io commented Apr 30, 2019

Codecov Report

Merging #78 into master will decrease coverage by 0.88%.
The diff coverage is 91.91%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master      #78      +/-   ##
==========================================
- Coverage   98.16%   97.27%   -0.89%     
==========================================
  Files          11       12       +1     
  Lines         599      698      +99     
  Branches      116      141      +25     
==========================================
+ Hits          588      679      +91     
- Misses          4        5       +1     
- Partials        7       14       +7
Impacted Files Coverage Δ
src/alchemlyb/parsing/gomc.py 91.91% <91.91%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 86d2d26...325dc3d. Read the comment docs.

@dotsdl dotsdl self-assigned this May 2, 2019
@dotsdl
Copy link
Member

dotsdl commented May 2, 2019

@msoroush thanks for opening this PR! I've scheduled it for review on 2019.05.05.

@orbeckst
Copy link
Member

orbeckst commented May 3, 2019

You'll need tests.

For tests you need to add a data set to alchemtest, so open an issue/PR over there, please. See https://github.com/alchemistry/alchemtest/wiki/contributing

This was referenced May 3, 2019
@msoroush
Copy link
Contributor Author

msoroush commented May 3, 2019

You'll need tests.

For tests you need to add a data set to alchemtest, so open an issue/PR over there, please. See https://github.com/alchemistry/alchemtest/wiki/contributing

I am still validating my implementation in GOMC. Once it validate it, I will create the test. I might cancel this PR because I noticed that slicing function does not work index name other than time.

@orbeckst
Copy link
Member

orbeckst commented May 3, 2019

See #79 (comment) for comments on slicing.

@orbeckst
Copy link
Member

orbeckst commented May 3, 2019

Btw, you don't have to close the PR. You can keep it open and update it.

Thanks for offering to contribute. As I said, it would be good to have code to deal with MC data. We just have to make sure that the code remains maintainable and one of the key solutions there is to do testing really well.

Please also be aware that there's no-one being paid to do development on alchemlyb so any code reviews or comments might take some time. We are all way to busy and we try to carve out some time to move this project forward. So please have some patience with us. It's ok to ping someone with their GitHub handle @-mention (eg @orbeckst for me) after a few days if there follow-up seems to be missing.

@msoroush
Copy link
Contributor Author

msoroush commented May 3, 2019

Please also be aware that there's no-one being paid to do development on alchemlyb so any code reviews or comments might take some time. We are all way to busy and we try to carve out some time to move this project forward. So please have some patience with us. It's ok to ping someone with their GitHub handle @-mention (eg @orbeckst for me) after a few days if there follow-up seems to be missing.

I will keep it open and work on it. I tried to keep the file format to be similar to GROMACS to make everyones life easy. I did not want to write analysis tools from scratch, so I started to use alchemical-analysis and then alchemlyb. This is a new field for me and I am trying to learn it as fast as possible. Thats why I have too many questions.

Copy link
Member

@dotsdl dotsdl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking very good so far @msoroush. I'll second @orbeckst's statement that for new parsers we'll need a corresponding dataset added to alchemtest. I'll also echo that this project ratchets forward slowly with an eye toward producing correct results for everything we release, so bear with us. We definitely want to support as many formats for alchemical free energy data as we can, and this is no exception.

Please feel free to push to this PR as you improve your parser. When ready, also open a PR on alchemtest for a dataset that works well for testing. You'll want to produce a dataset that is substantial enough to yield testing value but small enough to not balloon the testing package (shoot for less than 10MB if you can; definitely use e.g. bzip2 compression for individual files).

Thanks for your patience. Looking forward to reviewing more as it comes!

src/alchemlyb/parsing/gomc.py Outdated Show resolved Hide resolved
@msoroush
Copy link
Contributor Author

msoroush commented Jul 2, 2019

You'll need tests.

For tests you need to add a data set to alchemtest, so open an issue/PR over there, please. See https://github.com/alchemistry/alchemtest/wiki/contributing

Does it matter which forcefield or water model I am using to generate benzene solvation data set?

@orbeckst
Copy link
Member

orbeckst commented Jul 2, 2019

No, ideally it should be something publicly available. You should open an issue and PR in alchemtest and link it to this one by including #78 in the description.

@orbeckst
Copy link
Member

orbeckst commented Jul 2, 2019

See contributing to alchemtest for guide lines. Ask if you have questions.

- close alchemistry#83
- close alchemistry#84
- files were manually created based on the history (git log --format="format:%an %ad" --date="format:%Y" 0.1.0..HEAD | sort | uniq)
  and the merged PRs and closed issues
orbeckst pushed a commit to alchemistry/alchemtest that referenced this pull request Jul 18, 2019
* Add GOMC data sets (for alchemistry/alchemlyb#78)
* update the documentation
* Change the file formatting. Store the all free energy files in inWater directory instead of storing in separate VDW and Coulomb directory.
* compress the free energy files
* close #33
Copy link
Member

@orbeckst orbeckst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@msoroush now that PR alchemistry/alchemtest#34 was merged, can you please add tests for

  • parsing
  • calculating free energies: you have both FEP and TI parsers so you should also calculate your solvation free energy with BAR/MBAR and TI

@msoroush
Copy link
Contributor Author

can you please add tests for

  • parsing
  • calculating free energies: you have both FEP and TI parsers so you should also calculate your
    solvation free energy with BAR/MBAR and TI

Where should I add test for parsing and calculating free energies?

@orbeckst
Copy link
Member

Have a look at the existing tests and model yours after them.

@msoroush
Copy link
Contributor Author

@orbeckst I added the tests for GOMC parser and free energy estimator (BAR, MBAR, and TI). Please let me know If you want me to add additional test for GOMC parser?

I noticed that I would not load benzene data from GOMC datasets. I fixed the access.py file alchemistry/alchemtest#36.

@orbeckst
Copy link
Member

No need to update api_proposal – it's the general ideas and outline. Thanks for checking.

@orbeckst
Copy link
Member

The codecov status is missing because the upload from Travis is failing

Error: HTTPSConnectionPool(host='codecov.io', port=443): Max retries exceeded with url: /codecov/v4/raw/2019-07-26/4CA02E04D860053FE833B077F7D9C963/98bf9291d28dc957b5e82407d222dbcd847eb2c6/b3be7e92-5b6c-4e13-a9e6-223a1411d0b2.txt?AWSAccessKeyId=AKIAIHLZSCQCS4WIHD4A&Expires=1564165667&Signature=ouAwWnD%2FfiB%2FP0u7%2Bidshrlg%2Bqk%3D (Caused by NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object at 0x7fb6260d7da0>: Failed to establish a new connection: [Errno 110] Connection timed out'))

Not sure why.

@orbeckst
Copy link
Member

@dotsdl can you have a brief look at this PR and if your requirements were addressed? It seems good to go.

@dotsdl
Copy link
Member

dotsdl commented Jul 26, 2019

I should be able to review this PR before Sunday. Thanks @msoroush for pushing this through to the finish line, and @orbeckst for shepherding it forward.

@dotsdl
Copy link
Member

dotsdl commented Jul 26, 2019

Went ahead and restarted the build; we'll see if Travis succeeds in shipping to codecov this time.

@orbeckst
Copy link
Member

Nope – somethings is screwy with codecov. For some bizarre reason it lists the coverage for this PR as our project's coverage – see https://codecov.io/gh/alchemistry/alchemlyb. Maybe it gets confused because it comes from the master branch of a fork????

Anyway, we can revisit once this PR is merged.

@orbeckst
Copy link
Member

orbeckst commented Jul 26, 2019

I manually run the coverage

py.test --cov alchemlyb src/alchemlyb/tests
coverage html

with the following result:

============================= test session starts ==============================
platform darwin -- Python 3.6.7, pytest-3.2.3, py-1.4.34, pluggy-0.4.0
rootdir: /Volumes/Data/oliver/Biop/Projects/Methods/FreeEnergy/alchemlyb, inifile:
plugins: xdist-1.20.1, forked-0.2, cov-2.5.1
collected 146 items

src/alchemlyb/tests/test_fep_estimators.py .....................
src/alchemlyb/tests/test_import.py .
src/alchemlyb/tests/test_preprocessing.py ................................
src/alchemlyb/tests/test_ti_estimators.py ...........
src/alchemlyb/tests/test_version.py ..
src/alchemlyb/tests/parsing/test_amber.py ................................................................
src/alchemlyb/tests/parsing/test_gmx.py ..........
src/alchemlyb/tests/parsing/test_gomc.py ..
src/alchemlyb/tests/parsing/test_namd.py ..
src/alchemlyb/tests/parsing/test_util.py .

---------- coverage: platform darwin, python 3.6.7-final-0 -----------
Name                                         Stmts   Miss Branch BrPart  Cover
------------------------------------------------------------------------------
src/alchemlyb/__init__.py                        3      0      0      0   100%
src/alchemlyb/convergence/__init__.py            0      0      0      0   100%
src/alchemlyb/convergence/convergence.py         0      0      0      0   100%
src/alchemlyb/convergence/pade.py                0      0      0      0   100%
src/alchemlyb/estimators/__init__.py             3      0      0      0   100%
src/alchemlyb/estimators/bar_.py                41      0     10      0   100%
src/alchemlyb/estimators/mbar_.py               26      1      4      0    97%
src/alchemlyb/estimators/ti_.py                 31      0      4      0   100%
src/alchemlyb/parsing/__init__.py                0      0      0      0   100%
src/alchemlyb/parsing/amber.py                 235      2     98      4    98%
src/alchemlyb/parsing/gmx.py                   156      1     90      3    98%
src/alchemlyb/parsing/gomc.py                  105      5     48      7    92%
src/alchemlyb/parsing/namd.py                   32      0      8      0   100%
src/alchemlyb/parsing/util.py                   25      4      2      0    85%
src/alchemlyb/preprocessing/__init__.py          4      0      0      0   100%
src/alchemlyb/preprocessing/subsampling.py      43      0     16      0   100%
------------------------------------------------------------------------------
TOTAL                                          704     13    280     14    97%

The html pages are attached as coverage.zip; note that the current codecov looks similar, e.g., for alchemlyb/parsing/gomc.py.

I hope that helps @dotsdl in his review, even in the absence of codecov.

Based on these results I'll add my own comments in a moment.

@orbeckst
Copy link
Member

The 85% coverage in parsing/util.py is harmless – it just needs to be tested under Py2, too.

The lack of testing of a number of except statements in gomc.py is an issue, though, it think these lines were copied from the gmx parser without further thinking. I'll make comments in the code.

Copy link
Member

@orbeckst orbeckst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See issues based on the coverage analysis.

src/alchemlyb/parsing/gomc.py Outdated Show resolved Hide resolved
src/alchemlyb/parsing/gomc.py Outdated Show resolved Hide resolved
src/alchemlyb/parsing/gomc.py Outdated Show resolved Hide resolved
src/alchemlyb/parsing/gomc.py Show resolved Hide resolved
@orbeckst
Copy link
Member

Looking good:

(alchemlyb) yngvi:alchemlyb oliver$ py.test --cov alchemlyb src/alchemlyb/tests
============================================================ test session starts ============================================================
platform darwin -- Python 3.6.7, pytest-3.2.3, py-1.4.34, pluggy-0.4.0
rootdir: /Volumes/Data/oliver/Biop/Projects/Methods/FreeEnergy/alchemlyb, inifile:
plugins: xdist-1.20.1, forked-0.2, cov-2.5.1
collected 146 items

src/alchemlyb/tests/test_fep_estimators.py .....................
src/alchemlyb/tests/test_import.py .
src/alchemlyb/tests/test_preprocessing.py ................................
src/alchemlyb/tests/test_ti_estimators.py ...........
src/alchemlyb/tests/test_version.py ..
src/alchemlyb/tests/parsing/test_amber.py ................................................................
src/alchemlyb/tests/parsing/test_gmx.py ..........
src/alchemlyb/tests/parsing/test_gomc.py ..
src/alchemlyb/tests/parsing/test_namd.py ..
src/alchemlyb/tests/parsing/test_util.py .

---------- coverage: platform darwin, python 3.6.7-final-0 -----------
Name                                         Stmts   Miss Branch BrPart  Cover
------------------------------------------------------------------------------
src/alchemlyb/__init__.py                        3      0      0      0   100%
src/alchemlyb/convergence/__init__.py            0      0      0      0   100%
src/alchemlyb/convergence/convergence.py         0      0      0      0   100%
src/alchemlyb/convergence/pade.py                0      0      0      0   100%
src/alchemlyb/estimators/__init__.py             3      0      0      0   100%
src/alchemlyb/estimators/bar_.py                41      0     10      0   100%
src/alchemlyb/estimators/mbar_.py               26      1      4      0    97%
src/alchemlyb/estimators/ti_.py                 31      0      4      0   100%
src/alchemlyb/parsing/__init__.py                0      0      0      0   100%
src/alchemlyb/parsing/amber.py                 235      2     98      4    98%
src/alchemlyb/parsing/gmx.py                   156      1     90      3    98%
src/alchemlyb/parsing/gomc.py                   98      1     48      7    95%
src/alchemlyb/parsing/namd.py                   32      0      8      0   100%
src/alchemlyb/parsing/util.py                   25      4      2      0    85%
src/alchemlyb/preprocessing/__init__.py          4      0      0      0   100%
src/alchemlyb/preprocessing/subsampling.py      43      0     16      0   100%
------------------------------------------------------------------------------
TOTAL                                          697      9    280     14    98%


======================================================= 146 passed in 142.37 seconds ========================================================

@dotsdl
Copy link
Member

dotsdl commented Jul 28, 2019

I've made some small changes, including adding -lambda to index names for the u_nk parser.

One note as an optimization: it might speed up dataframe parsing to use the pandas.read_csv parser for the data rows as we did for Gromacs, since this parser can be quite a bit faster than a parser done at the Python interpreter level (it is written in C). This implementation is great as a first pass, however; we are happy to have it.

Thanks @msoroush for sticking it out through this process!

@orbeckst, looks like we lost a little test coverage(?), but it's not clear to me where it could really be improved on this file. My local test run shows it has 95% coverage. Please merge when satsified; thanks!

@orbeckst
Copy link
Member

Thanks. Because PR #85 was merged here and has not been merged into master, this PR has to wait until PR #85 is officially merged.

@orbeckst
Copy link
Member

orbeckst commented Jul 29, 2019

I also think we'll have to live with the coverage. I'll squash merge once PR #85 has been merged.

Thanks everyone!

@orbeckst orbeckst added this to the release 0.2.0 milestone Jul 29, 2019
@msoroush
Copy link
Contributor Author

Thank you @dotsdl @orbeckst for all your help.

@orbeckst orbeckst merged commit c90bb88 into alchemistry:master Jul 30, 2019
@orbeckst
Copy link
Member

Congratulations @msoroush , your first PR in alchemlyb was merged. Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants