Skip to content

Commit

Permalink
Merge branch 'dev' into 57_extract_numeric_values
Browse files Browse the repository at this point in the history
  • Loading branch information
mmcdermott committed Aug 8, 2024
2 parents c463548 + 3cc758f commit f8b260d
Show file tree
Hide file tree
Showing 16 changed files with 434 additions and 305 deletions.
2 changes: 1 addition & 1 deletion .github/workflows/tests.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,7 @@ jobs:
#----------------------------------------------
- name: Run tests
run: |
pytest -v --doctest-modules --cov=src --junitxml=junit.xml -s
pytest -v --doctest-modules --cov=src --junitxml=junit.xml -s --ignore=docs
- name: Upload coverage to Codecov
uses: codecov/codecov-action@v4.0.1
Expand Down
6 changes: 4 additions & 2 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
default_language_version:
python: python3.12

exclude: "sample_data|docs/MIMIC_IV_tutorial/wandb_reports"
exclude: "docs/index.md|MIMIC-IV_Example/README.md|eICU_Example/README.md"

repos:
- repo: https://github.com/pre-commit/pre-commit-hooks
Expand Down Expand Up @@ -95,10 +95,12 @@ repos:
- mdformat-gfm
- mdformat-tables
- mdformat_frontmatter
- mdformat-myst
- mdformat-black
- mdformat-config
- mdformat-shfmt
- mdformat-mkdocs
- mdformat-toc
- mdformat-admon

# word spelling linter
- repo: https://github.com/codespell-project/codespell
Expand Down
21 changes: 21 additions & 0 deletions .readthedocs.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
# Read the Docs configuration file
# See https://docs.readthedocs.io/en/stable/config-file/v2.html for details

# Required
version: 2

# Set the version of Python and other tools you might need
build:
os: ubuntu-22.04
tools:
python: "3.12"

python:
install:
- method: pip
path: .
extra_requirements:
- docs

mkdocs:
configuration: mkdocs.yml
32 changes: 16 additions & 16 deletions MIMIC-IV_Example/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -72,14 +72,14 @@ several steps:
This is a step in a few parts:

1. Join a few tables by `hadm_id` to get the right times in the right rows for processing. In
particular, we need to join:
- the `hosp/diagnoses_icd` table with the `hosp/admissions` table to get the `dischtime` for each
`hadm_id`.
- the `hosp/drgcodes` table with the `hosp/admissions` table to get the `dischtime` for each `hadm_id`.
particular, we need to join:
- the `hosp/diagnoses_icd` table with the `hosp/admissions` table to get the `dischtime` for each
`hadm_id`.
- the `hosp/drgcodes` table with the `hosp/admissions` table to get the `dischtime` for each `hadm_id`.
2. Convert the patient's static data to a more parseable form. This entails:
- Get the patient's DOB in a format that is usable for MEDS, rather than the integral `anchor_year` and
`anchor_offset` fields.
- Merge the patient's `dod` with the `deathtime` from the `admissions` table.
- Get the patient's DOB in a format that is usable for MEDS, rather than the integral `anchor_year` and
`anchor_offset` fields.
- Merge the patient's `dod` with the `deathtime` from the `admissions` table.

After these steps, modified files or symlinks to the original files will be written in a new directory which
will be used as the input to the actual MEDS extraction ETL. We'll use `$MIMICIV_PREMEDS_DIR` to denote this
Expand All @@ -104,24 +104,24 @@ subdirectories of the same root directory).
This is a step in 4 parts:

1. Sub-shard the raw files. Run this command as many times simultaneously as you would like to have workers
performing this sub-sharding step. See below for how to automate this parallelism using hydra launchers.
performing this sub-sharding step. See below for how to automate this parallelism using hydra launchers.

This step uses the `./scripts/extraction/shard_events.py` script. See `joint_script*.sh` for the expected
format of the command.
This step uses the `./scripts/extraction/shard_events.py` script. See `joint_script*.sh` for the expected
format of the command.

2. Extract and form the patient splits and sub-shards. The `./scripts/extraction/split_and_shard_patients.py`
script is used for this step. See `joint_script*.sh` for the expected format of the command.
script is used for this step. See `joint_script*.sh` for the expected format of the command.

3. Extract patient sub-shards and convert to MEDS events. The
`./scripts/extraction/convert_to_sharded_events.py` script is used for this step. See `joint_script*.sh` for
the expected format of the command.
`./scripts/extraction/convert_to_sharded_events.py` script is used for this step. See `joint_script*.sh` for
the expected format of the command.

4. Merge the MEDS events into a single file per patient sub-shard. The
`./scripts/extraction/merge_to_MEDS_cohort.py` script is used for this step. See `joint_script*.sh` for the
expected format of the command.
`./scripts/extraction/merge_to_MEDS_cohort.py` script is used for this step. See `joint_script*.sh` for the
expected format of the command.

5. (Optional) Generate preliminary code statistics and merge to external metadata. This is not performed
currently in the `joint_script*.sh` scripts.
currently in the `joint_script*.sh` scripts.

## Limitations / TO-DOs:

Expand Down
Loading

0 comments on commit f8b260d

Please sign in to comment.