Merge branch 'dev' into 57_extract_numeric_values

mmcdermott · Aug 8, 2024 · f8b260d · f8b260d
2 parents c463548 + 3cc758f
commit f8b260d
Show file tree

Hide file tree

Showing 16 changed files with 434 additions and 305 deletions.
diff --git a/.github/workflows/tests.yaml b/.github/workflows/tests.yaml
@@ -33,7 +33,7 @@ jobs:
       #----------------------------------------------
       - name: Run tests
         run: |
-          pytest -v --doctest-modules --cov=src --junitxml=junit.xml -s
+          pytest -v --doctest-modules --cov=src --junitxml=junit.xml -s --ignore=docs
 
       - name: Upload coverage to Codecov
         uses: codecov/codecov-action@v4.0.1

diff --git a/.pre-commit-config.yaml b/.pre-commit-config.yaml
@@ -1,7 +1,7 @@
 default_language_version:
   python: python3.12
 
-exclude: "sample_data|docs/MIMIC_IV_tutorial/wandb_reports"
+exclude: "docs/index.md|MIMIC-IV_Example/README.md|eICU_Example/README.md"
 
 repos:
   - repo: https://github.com/pre-commit/pre-commit-hooks
@@ -95,10 +95,12 @@ repos:
           - mdformat-gfm
           - mdformat-tables
           - mdformat_frontmatter
-          - mdformat-myst
           - mdformat-black
           - mdformat-config
           - mdformat-shfmt
+          - mdformat-mkdocs
+          - mdformat-toc
+          - mdformat-admon
 
   # word spelling linter
   - repo: https://github.com/codespell-project/codespell

diff --git a/.readthedocs.yaml b/.readthedocs.yaml
@@ -0,0 +1,21 @@
+# Read the Docs configuration file
+# See https://docs.readthedocs.io/en/stable/config-file/v2.html for details
+
+# Required
+version: 2
+
+# Set the version of Python and other tools you might need
+build:
+  os: ubuntu-22.04
+  tools:
+    python: "3.12"
+
+python:
+  install:
+    - method: pip
+      path: .
+      extra_requirements:
+        - docs
+
+mkdocs:
+  configuration: mkdocs.yml
diff --git a/MIMIC-IV_Example/README.md b/MIMIC-IV_Example/README.md
@@ -72,14 +72,14 @@ several steps:
 This is a step in a few parts:
 
 1. Join a few tables by `hadm_id` to get the right times in the right rows for processing. In
-   particular, we need to join:
-   - the `hosp/diagnoses_icd` table with the `hosp/admissions` table to get the `dischtime` for each
-     `hadm_id`.
-   - the `hosp/drgcodes` table with the `hosp/admissions` table to get the `dischtime` for each `hadm_id`.
+    particular, we need to join:
+    - the `hosp/diagnoses_icd` table with the `hosp/admissions` table to get the `dischtime` for each
+        `hadm_id`.
+    - the `hosp/drgcodes` table with the `hosp/admissions` table to get the `dischtime` for each `hadm_id`.
 2. Convert the patient's static data to a more parseable form. This entails:
-   - Get the patient's DOB in a format that is usable for MEDS, rather than the integral `anchor_year` and
-     `anchor_offset` fields.
-   - Merge the patient's `dod` with the `deathtime` from the `admissions` table.
+    - Get the patient's DOB in a format that is usable for MEDS, rather than the integral `anchor_year` and
+        `anchor_offset` fields.
+    - Merge the patient's `dod` with the `deathtime` from the `admissions` table.
 
 After these steps, modified files or symlinks to the original files will be written in a new directory which
 will be used as the input to the actual MEDS extraction ETL. We'll use `$MIMICIV_PREMEDS_DIR` to denote this
@@ -104,24 +104,24 @@ subdirectories of the same root directory).
 This is a step in 4 parts:
 
 1. Sub-shard the raw files. Run this command as many times simultaneously as you would like to have workers
-   performing this sub-sharding step. See below for how to automate this parallelism using hydra launchers.
+    performing this sub-sharding step. See below for how to automate this parallelism using hydra launchers.
 
-   This step uses the `./scripts/extraction/shard_events.py` script. See `joint_script*.sh` for the expected
-   format of the command.
+    This step uses the `./scripts/extraction/shard_events.py` script. See `joint_script*.sh` for the expected
+    format of the command.
 
 2. Extract and form the patient splits and sub-shards. The `./scripts/extraction/split_and_shard_patients.py`
-   script is used for this step. See `joint_script*.sh` for the expected format of the command.
+    script is used for this step. See `joint_script*.sh` for the expected format of the command.
 
 3. Extract patient sub-shards and convert to MEDS events. The
-   `./scripts/extraction/convert_to_sharded_events.py` script is used for this step. See `joint_script*.sh` for
-   the expected format of the command.
+    `./scripts/extraction/convert_to_sharded_events.py` script is used for this step. See `joint_script*.sh` for
+    the expected format of the command.
 
 4. Merge the MEDS events into a single file per patient sub-shard. The
-   `./scripts/extraction/merge_to_MEDS_cohort.py` script is used for this step. See `joint_script*.sh` for the
-   expected format of the command.
+    `./scripts/extraction/merge_to_MEDS_cohort.py` script is used for this step. See `joint_script*.sh` for the
+    expected format of the command.
 
 5. (Optional) Generate preliminary code statistics and merge to external metadata. This is not performed
-   currently in the `joint_script*.sh` scripts.
+    currently in the `joint_script*.sh` scripts.
 
 ## Limitations / TO-DOs: