Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Release candidate for 0.0.4 #146

Merged
merged 83 commits into from
Aug 11, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
83 commits
Select commit Hold shift + click to select a range
f9d7a0e
Added a start at a quantile implementation in the existing MR framework.
mmcdermott Jul 29, 2024
7685436
Merge branch 'dev' into 51_add_quantile_aggregation
mmcdermott Jul 30, 2024
9c3b497
Got the reducer to return reasonable numbers at least.
mmcdermott Jul 31, 2024
2d87cae
Merged.
mmcdermott Aug 2, 2024
4b3cfec
Added doctests and parametrization.
mmcdermott Aug 2, 2024
e721506
VERY preliminary code. Just committing so as to not lose changes. Not…
mmcdermott Aug 2, 2024
10b741a
Merge branch 'dev' into 101_match_revise
mmcdermott Aug 3, 2024
76ce942
Merge branch 'dev' into alt_51_raw_values
mmcdermott Aug 3, 2024
209abc1
Merged.
mmcdermott Aug 3, 2024
d898cb4
Merge branch 'dev' into alt_51_raw_values
mmcdermott Aug 3, 2024
bb88c1c
Renamed to 'code_modifiers'
mmcdermott Aug 4, 2024
5a39b7c
Merge branch 'dev' into 101_match_revise
mmcdermott Aug 5, 2024
949867a
Merge branch 'dev' into alt_51_raw_values
mmcdermott Aug 5, 2024
8e1a78d
Very preliminary implementation of match and revise
mmcdermott Aug 5, 2024
e260088
Started adding documentation and tests; not yet complete
mmcdermott Aug 5, 2024
8750b93
Tests added for everything except bind compute function.
mmcdermott Aug 5, 2024
8e418ab
Added all doctests.
mmcdermott Aug 6, 2024
f19ecc7
Added an integration test via the filter_measurement transform; also …
mmcdermott Aug 6, 2024
39aebbc
Attempted to update test workflow, add codecov badge, and add a workf…
mmcdermott Aug 6, 2024
33ebdd9
Merge pull request #123 from mmcdermott/update_CI_and_README
mmcdermott Aug 7, 2024
cff193b
Merge branch 'dev' into alt_51_raw_values
mmcdermott Aug 7, 2024
3b7aae0
Merge branch 'dev' into 101_match_revise
mmcdermott Aug 7, 2024
c9c581f
Fix split_and_shard_patients when the full split definition is provided
prenc Aug 8, 2024
0b924b3
External splits go after internal splits
prenc Aug 8, 2024
f352654
Merge branch 'refs/heads/dev' into fix-external-split
prenc Aug 8, 2024
3c69775
Added mkdocs documentation starter code.
mmcdermott Aug 8, 2024
ebabad7
Added readthedocs file.
mmcdermott Aug 8, 2024
3840385
Added docs badge.
mmcdermott Aug 8, 2024
8075963
Updated tests workflow to ignore docs.
mmcdermott Aug 8, 2024
1dedfad
Updated docs to move basic docs into readthedocs.
mmcdermott Aug 8, 2024
9c78213
Correct mdformat issue
mmcdermott Aug 8, 2024
e0d9fea
linted files.
mmcdermott Aug 8, 2024
39c626e
Added revision dates and authors to docs
mmcdermott Aug 8, 2024
3cc758f
Merge pull request #128 from mmcdermott/77_mkdocs_readthedocs
mmcdermott Aug 8, 2024
cca82d4
Merge branch 'dev' into 101_match_revise
mmcdermott Aug 8, 2024
3e68573
Merge branch 'dev' into alt_51_raw_values
mmcdermott Aug 8, 2024
36a172d
Merge pull request #98 from mmcdermott/alt_51_raw_values
mmcdermott Aug 8, 2024
fa1d6f9
Merge branch 'dev' into 101_match_revise
mmcdermott Aug 8, 2024
44f18b7
Add warning when split_fracs_dict not empty but performing the split …
prenc Aug 8, 2024
3fb5e8b
Merge branch 'refs/heads/dev' into fix-external-split
prenc Aug 8, 2024
2e1f875
Throw ValueError when external split lengths contradict n_patient_per…
prenc Aug 8, 2024
2fbb293
Add .editorconfig
prenc Aug 8, 2024
8839f67
Started removing unnecessary references to shards file
mmcdermott Aug 8, 2024
ba6b5b6
Removed usage of shards json file throughout, outside of the direct e…
mmcdermott Aug 9, 2024
d86fe1e
Used a hash function instead of true randomization to order the shard…
mmcdermott Aug 9, 2024
86d1acd
Made it so that docs use submodule README files as the source documen…
mmcdermott Aug 9, 2024
7d6c273
Merge pull request #131 from mmcdermott/add-editorconfig
prenc Aug 9, 2024
f318a4a
Merge branch 'refs/heads/dev' into fix-external-split
prenc Aug 9, 2024
546c763
Allow resharding of external splits, move rng declaration out of the …
prenc Aug 9, 2024
430a950
Merge pull request #124 from mmcdermott/fix-external-split
prenc Aug 9, 2024
d719ab4
Merge pull request #132 from mmcdermott/129_remove_splits_json_file_u…
mmcdermott Aug 9, 2024
224a1a8
Fixed merge conflict.
mmcdermott Aug 9, 2024
f2d2d0e
Fixed lint errors.
mmcdermott Aug 9, 2024
631819f
Merge pull request #119 from mmcdermott/101_match_revise
mmcdermott Aug 9, 2024
36add00
Moved parser around.
mmcdermott Aug 6, 2024
4263592
Updated mandatory types to be separate from mandatory MEDS columns
mmcdermott Aug 6, 2024
8fc4f48
Fixed bug with matcher import.
mmcdermott Aug 10, 2024
0fadd96
Merge pull request #133 from mmcdermott/update_parser_and_type_info
mmcdermott Aug 10, 2024
dd43d5a
Fixed typo
mmcdermott Aug 10, 2024
fd6e77f
Separated merge shards shard iterator out for ease of import.
mmcdermott Aug 10, 2024
dd22eeb
Starting to add integration test for eventual functionality; not yet …
mmcdermott Aug 10, 2024
1b2d023
Improved compliance by removing creation of shards.json file and addi…
mmcdermott Aug 10, 2024
fc877e3
Merge pull request #137 from mmcdermott/136_testing_code_compliance
mmcdermott Aug 10, 2024
4b2dee2
Merge branch 'dev' into 134_reshard_by_split
mmcdermott Aug 10, 2024
6ec7a8f
Updated test.
mmcdermott Aug 10, 2024
dd9ba08
started file; very much not ready yet.
mmcdermott Aug 10, 2024
4cc65b4
Improved some documentation and error handling for splitting patients.
mmcdermott Aug 10, 2024
ba9c3b4
Merge branch 'dev' into 134_reshard_by_split
mmcdermott Aug 10, 2024
d70128f
Implemented preliminary version of reshard to split.
mmcdermott Aug 10, 2024
0db3a23
fixed some more minor errors; test fails for content reasons now.
mmcdermott Aug 10, 2024
829cede
code was actually fine; it was a test error.
mmcdermott Aug 10, 2024
5e47f0c
Merge pull request #135 from mmcdermott/134_reshard_by_split
mmcdermott Aug 10, 2024
e0d1ecb
Corrected a small parallelism issue in reshard.
mmcdermott Aug 10, 2024
ac1587e
Made reshard not error out if the new split is identical to the shard…
mmcdermott Aug 11, 2024
2d1c4cf
Fixed formatting.
mmcdermott Aug 11, 2024
7b02df8
Move more to locked computation.
mmcdermott Aug 11, 2024
b1378bc
changes in progress to improve robustness
mmcdermott Aug 11, 2024
3a95794
changes in progress to improve robustness
mmcdermott Aug 11, 2024
d9ed79a
Made file checker wait for valid parquet files
mmcdermott Aug 11, 2024
d1f5953
Removed unneeded line.
mmcdermott Aug 11, 2024
e0c0f46
Made rwlock more robust and eliminated unused return mode.
mmcdermott Aug 11, 2024
3fc909e
Made resharding use rwlock wrap
mmcdermott Aug 11, 2024
cda4f06
Merge pull request #143 from mmcdermott/139_fix_reshard_to_split_in_p…
mmcdermott Aug 11, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
13 changes: 13 additions & 0 deletions .editorconfig
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
root = true

[*]
charset = utf-8
end_of_line = lf
indent_size = 4
indent_style = space
insert_final_newline = true
max_line_length = 110
tab_width = 4

[{*.yaml,*.yml}]
indent_size = 2
95 changes: 95 additions & 0 deletions .github/workflows/python-build.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,95 @@
name: Publish Python 🐍 distribution 📦 to PyPI and TestPyPI

on: push
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Trigger on specific branches or tags.

Currently, the workflow triggers on all pushes. Consider specifying branches or tags to avoid unnecessary builds.

on:
  push:
    branches:
      - main
      - 'release/*'
  tags:
    - 'v*'


jobs:
build:
name: Build distribution 📦
runs-on: ubuntu-latest

steps:
- uses: actions/checkout@v4
- name: Set up Python
uses: actions/setup-python@v4
with:
python-version: "3.12"
- name: Install pypa/build
run: >-
python3 -m
pip install
build
--user
- name: Build a binary wheel and a source tarball
run: python3 -m build
- name: Store the distribution packages
uses: actions/upload-artifact@v4
with:
name: python-package-distributions
path: dist/

publish-to-pypi:
name: >-
Publish Python 🐍 distribution 📦 to PyPI
if: startsWith(github.ref, 'refs/tags/') # only publish to PyPI on tag pushes
needs:
- build
runs-on: ubuntu-latest
environment:
name: pypi
url: https://pypi.org/p/MEDS-transforms # Replace <package-name> with your PyPI project name
permissions:
id-token: write # IMPORTANT: mandatory for trusted publishing

steps:
- name: Download all the dists
uses: actions/download-artifact@v4
with:
name: python-package-distributions
path: dist/

- name: Publish distribution 📦 to PyPI
uses: pypa/gh-action-pypi-publish@release/v1

github-release:
name: >-
Sign the Python 🐍 distribution 📦 with Sigstore
and upload them to GitHub Release
needs:
- publish-to-pypi
runs-on: ubuntu-latest

permissions:
contents: write # IMPORTANT: mandatory for making GitHub Releases
id-token: write # IMPORTANT: mandatory for sigstore

steps:
- name: Download all the dists
uses: actions/download-artifact@v4
with:
name: python-package-distributions
path: dist/

- name: Sign the dists with Sigstore
uses: sigstore/gh-action-sigstore-python@v2.1.1
with:
inputs: >-
./dist/*.tar.gz
./dist/*.whl
- name: Create GitHub Release
env:
GITHUB_TOKEN: ${{ github.token }}
run: >-
gh release create
'${{ github.ref_name }}'
--repo '${{ github.repository }}'
--notes ""
- name: Upload artifact signatures to GitHub Release
env:
GITHUB_TOKEN: ${{ github.token }}
# Upload to GitHub Release using the `gh` CLI.
# `dist/` contains the built packages, and the
# sigstore-produced signatures and certificates.
run: >-
gh release upload
'${{ github.ref_name }}' dist/**
--repo '${{ github.repository }}'
7 changes: 6 additions & 1 deletion .github/workflows/tests.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -33,9 +33,14 @@ jobs:
#----------------------------------------------
- name: Run tests
run: |
pytest -v --doctest-modules --cov=src -s
pytest -v --doctest-modules --cov=src --junitxml=junit.xml -s --ignore=docs

- name: Upload coverage to Codecov
uses: codecov/codecov-action@v4.0.1
with:
token: ${{ secrets.CODECOV_TOKEN }}
- name: Upload test results to Codecov
if: ${{ !cancelled() }}
uses: codecov/test-results-action@v1
with:
token: ${{ secrets.CODECOV_TOKEN }}
12 changes: 5 additions & 7 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
default_language_version:
python: python3.12

exclude: "sample_data|docs/MIMIC_IV_tutorial/wandb_reports"
exclude: "docs/index.md|MIMIC-IV_Example/README.md|eICU_Example/README.md"

repos:
- repo: https://github.com/pre-commit/pre-commit-hooks
Expand Down Expand Up @@ -95,10 +95,12 @@ repos:
- mdformat-gfm
- mdformat-tables
- mdformat_frontmatter
- mdformat-myst
- mdformat-black
- mdformat-config
- mdformat-shfmt
- mdformat-mkdocs
- mdformat-toc
- mdformat-admon

# word spelling linter
- repo: https://github.com/codespell-project/codespell
Expand All @@ -124,8 +126,4 @@ repos:
- id: nbqa-isort
args: ["--profile=black"]
- id: nbqa-flake8
args:
[
"--extend-ignore=E203,E402,E501,F401,F841",
"--exclude=logs/*,data/*",
]
args: ["--extend-ignore=E203,E402,E501,F401,F841", "--exclude=logs/*,data/*"]
21 changes: 21 additions & 0 deletions .readthedocs.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
# Read the Docs configuration file
# See https://docs.readthedocs.io/en/stable/config-file/v2.html for details

# Required
version: 2

# Set the version of Python and other tools you might need
build:
os: ubuntu-22.04
tools:
python: "3.12"

python:
install:
- method: pip
path: .
extra_requirements:
- docs

mkdocs:
configuration: mkdocs.yml
32 changes: 16 additions & 16 deletions MIMIC-IV_Example/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -72,14 +72,14 @@ several steps:
This is a step in a few parts:

1. Join a few tables by `hadm_id` to get the right times in the right rows for processing. In
particular, we need to join:
- the `hosp/diagnoses_icd` table with the `hosp/admissions` table to get the `dischtime` for each
`hadm_id`.
- the `hosp/drgcodes` table with the `hosp/admissions` table to get the `dischtime` for each `hadm_id`.
particular, we need to join:
- the `hosp/diagnoses_icd` table with the `hosp/admissions` table to get the `dischtime` for each
`hadm_id`.
- the `hosp/drgcodes` table with the `hosp/admissions` table to get the `dischtime` for each `hadm_id`.
2. Convert the patient's static data to a more parseable form. This entails:
- Get the patient's DOB in a format that is usable for MEDS, rather than the integral `anchor_year` and
`anchor_offset` fields.
- Merge the patient's `dod` with the `deathtime` from the `admissions` table.
- Get the patient's DOB in a format that is usable for MEDS, rather than the integral `anchor_year` and
`anchor_offset` fields.
- Merge the patient's `dod` with the `deathtime` from the `admissions` table.

After these steps, modified files or symlinks to the original files will be written in a new directory which
will be used as the input to the actual MEDS extraction ETL. We'll use `$MIMICIV_PREMEDS_DIR` to denote this
Expand All @@ -104,24 +104,24 @@ subdirectories of the same root directory).
This is a step in 4 parts:

1. Sub-shard the raw files. Run this command as many times simultaneously as you would like to have workers
performing this sub-sharding step. See below for how to automate this parallelism using hydra launchers.
performing this sub-sharding step. See below for how to automate this parallelism using hydra launchers.

This step uses the `./scripts/extraction/shard_events.py` script. See `joint_script*.sh` for the expected
format of the command.
This step uses the `./scripts/extraction/shard_events.py` script. See `joint_script*.sh` for the expected
format of the command.

2. Extract and form the patient splits and sub-shards. The `./scripts/extraction/split_and_shard_patients.py`
script is used for this step. See `joint_script*.sh` for the expected format of the command.
script is used for this step. See `joint_script*.sh` for the expected format of the command.

3. Extract patient sub-shards and convert to MEDS events. The
`./scripts/extraction/convert_to_sharded_events.py` script is used for this step. See `joint_script*.sh` for
the expected format of the command.
`./scripts/extraction/convert_to_sharded_events.py` script is used for this step. See `joint_script*.sh` for
the expected format of the command.

4. Merge the MEDS events into a single file per patient sub-shard. The
`./scripts/extraction/merge_to_MEDS_cohort.py` script is used for this step. See `joint_script*.sh` for the
expected format of the command.
`./scripts/extraction/merge_to_MEDS_cohort.py` script is used for this step. See `joint_script*.sh` for the
expected format of the command.

5. (Optional) Generate preliminary code statistics and merge to external metadata. This is not performed
currently in the `joint_script*.sh` scripts.
currently in the `joint_script*.sh` scripts.

## Limitations / TO-DOs:

Expand Down
Loading
Loading