Issue 700 (#774)

* model_metadata -> triage_metadata * Alembic version. Closes #700 * For some reason, the python version has a "\n" in the middle of the string * Replacing the function, given that the schema name changed * Pasted code in wrong file 🤦 * Missing import. Unnecessary import. (A tale about two imports) * Ignoring directories created by running triage * Bump version: 4.0.1 → 4.1.0 * Closes #756
dssg · Jun 30, 2020 · e804cf3 · e804cf3
1 parent 4e76e3a
commit e804cf3
Show file tree

Hide file tree

Showing 46 changed files with 284 additions and 220 deletions.
diff --git a/.gitignore b/.gitignore
@@ -18,4 +18,6 @@ my_db_config.yaml
 database.yaml
 dirtyduck/triage/**
 
-*~
+*~
+**/trained_models/**
+**/matrices
diff --git a/dirtyduck/food_db/05_nuke_triage.sql b/dirtyduck/food_db/05_nuke_triage.sql
@@ -7,8 +7,8 @@ create or replace function nuke_triage()
 
     begin
 
-    execute 'drop schema if exists model_metadata cascade';
-    raise notice 'model_metadata deleted';
+    execute 'drop schema if exists triage_metadata cascade';
+    raise notice 'triage_metadata deleted';
     execute 'drop schema if exists features cascade';
     raise notice 'features deleted';
     execute 'drop schema if exists train_results cascade';

diff --git a/docs/sources/audition/audition.md b/docs/sources/audition/audition.md
@@ -53,13 +53,13 @@ We included a [simple configuration file](file:///home/nanounanue/projects/dsapp
 model_groups:
     query: |
         select distinct(model_group_id)
-        from model_metadata.model_groups
+        from triage_metadata.model_groups
         where model_config ->> 'experiment_type' ~ 'inspection'
 # CHOOSE TIMESTAMPS/TRAIN END TIMES
 time_stamps:
     query: |
         select distinct train_end_time
-        from model_metadata.models
+        from triage_metadata.models
         where model_group_id in ({})
         and extract(day from train_end_time) in (1)
         and train_end_time >= '2015-01-01'

diff --git a/docs/sources/dirtyduck/data_preparation.md b/docs/sources/dirtyduck/data_preparation.md
@@ -104,7 +104,7 @@ The transformation "road" that we will take in this tutorial is as follows:
 2.  Apply some simple transformations and store the resulting data in the `cleaned` schema.
 3.  Organize the data into two *unnormalized*[^3] tables in the
     semantic schema: `events` and `entities`.
-4.  Run `triage`. It will create several schemas (`model_metadata`, `test_results`, `train_results`).
+4.  Run `triage`. It will create several schemas (`triage_metadata`, `test_results`, `train_results`).
 
 ![img](images/data_road.png)
 

diff --git a/docs/sources/dirtyduck/dirty_duckling.md b/docs/sources/dirtyduck/dirty_duckling.md
@@ -241,14 +241,14 @@ it would mean that `triage` actually built (in this order) cohort
 (table `cohort_all_entities...`),
 labels (table `labels_failed_inspections...`), features (schema
 `features`), matrices (table `model_metdata.matrices` and folder
-`matrices`), models (tables `model_metadata.models` and
-`model_metadata.model_groups`; folder `trained_models`), predictions
+`matrices`), models (tables `triage_metadata.models` and
+`triage_metadata.model_groups`; folder `trained_models`), predictions
 (table `test_results.predictions`)
 and evaluations (table `test_results.evaluations`).
 
 ### 5. Look at results of your duckling!
 
-Next, let's quickly check the tables in the schemas `model_metadata` and
+Next, let's quickly check the tables in the schemas `triage_metadata` and
 `test_results` to make sure everything worked. There you will find a lot 
 of information related to the performance of your models.
 
@@ -261,15 +261,15 @@ Again, you should see the postgreSQL prompt:
 
     food=#
 
-Tables in the `model_metadata` schema have some general information about
+Tables in the `triage_metadata` schema have some general information about
 experiments that you've run and the models they created. The `quickstart`
 model grid preset should have built 3 models. Let's check with:
 
 ```sql
 select 
   model_id, model_group_id, model_type 
   from 
-      model_metadata.models;
+      triage_metadata.models;
 ```
 
 This should give you a result that looks something like:

diff --git a/docs/sources/dirtyduck/eis.md b/docs/sources/dirtyduck/eis.md
@@ -429,7 +429,7 @@ After the experiment finishes, we can create the following table:
         model_group_id,
         split_part(unnest(feature_list), '_', 1) as feature_groups
     from
-        model_metadata.model_groups
+        triage_metadata.model_groups
     ),
 
     features_arrays as (
@@ -452,7 +452,7 @@ After the experiment finishes, we can create the following table:
         array_agg(to_char(stochastic_value, '0.999') order by
     train_end_time asc) as "precision@10% (stochastic)"
     from
-        model_metadata.models
+        triage_metadata.models
         join
         features_arrays using(model_group_id)
         join
@@ -593,13 +593,13 @@ compared to the inspection’s one:
 model_groups:
     query: |
         select distinct(model_group_id)
-        from model_metadata.model_groups
+        from triage_metadata.model_groups
         where model_config ->> 'experiment_type' ~ 'eis'
 # CHOOSE TIMESTAMPS/TRAIN END TIMES
 time_stamps:
     query: |
         select distinct train_end_time
-        from model_metadata.models
+        from triage_metadata.models
         where model_group_id in ({})
         and extract(day from train_end_time) in (1)
         and train_end_time >= '2014-01-01'
@@ -767,7 +767,7 @@ and we will use the complete set of model groups selected by audition:
              m.num_labeled_above_threshold,
              m.num_positive_labels
        from test_results.evaluations m
-       left join model_metadata.models g
+       left join triage_metadata.models g
        using(model_id)
        where g.model_group_id = 20
              and metric = 'precision@'
@@ -866,9 +866,9 @@ select
     mg.hyperparameters,
     array_agg(model_id order by train_end_time) as models
 from
-    model_metadata.model_groups as mg
+    triage_metadata.model_groups as mg
     inner join
-    model_metadata.models
+    triage_metadata.models
     using (model_group_id)
 where model_group_id = 76
 group by 1,2,3

diff --git a/docs/sources/dirtyduck/index.md b/docs/sources/dirtyduck/index.md
@@ -11,7 +11,7 @@ implementation are prone to error, given their multi-dimensional,
 multi-entity, time-series structure.
 
 !!! info Triage version
-    This tutorial is in sync with the latest version of `triage`. At this moment [v4.0.1](https://github.com/dssg/triage/releases/tag/v4.0.1).
+    This tutorial is in sync with the latest version of `triage`. At this moment [v4.1.0](https://github.com/dssg/triage/releases/tag/v4.1.0).
 
 !!! info "How you can help to improve this tutorial"
 

diff --git a/docs/sources/dirtyduck/inspections.md b/docs/sources/dirtyduck/inspections.md
@@ -183,7 +183,7 @@ The config file for this first experiment is located in
 The first lines of the experiment config file specify the config-file
 version (`v7` at the moment of writing this tutorial), a comment
 (`model_comment`, which will end up as a value in the
-`model_metadata.models` table), and a list of user-defined metadata
+`triage_metadata.models` table), and a list of user-defined metadata
 (`user_metadata`) that can help to identify the resulting model
 groups. For this example, if you run experiments that share a temporal
 configuration but that use different label definitions (say, labeling
@@ -480,7 +480,7 @@ select
     total_features,
     matrices_needed,
     models_needed
-from model_metadata.experiments;
+from triage_metadata.experiments;
 ```
 
   experiment  | description | total_features | matrices_needed | models_needed
@@ -518,7 +518,7 @@ select
     model_type,
     hyperparameters
 from
-    model_metadata.model_groups
+    triage_metadata.model_groups
 where
     model_config ->> 'experiment_type' ~ 'inspection'
 ```
@@ -536,7 +536,7 @@ select
     array_agg(model_id) as models,
     array_agg(train_end_time::date) as train_end_times
 from
-    model_metadata.models
+    triage_metadata.models
 where
     model_comment ~ 'inspection'
 group by
@@ -566,9 +566,9 @@ select
     ma.num_observations as observations,
     ma.lookback_duration as feature_lookback_duration,  ma.feature_start_time
 from
-    model_metadata.models as mo
+    triage_metadata.models as mo
     join
-    model_metadata.matrices as ma
+    triage_metadata.matrices as ma
     on train_matrix_uuid = matrix_uuid
 where
     mo.model_comment ~ 'inspection'
@@ -603,11 +603,11 @@ select distinct
     substring(ev.matrix_uuid,1,5) as test_matrix_uuid,
     ma.num_observations as observations
 from
-    model_metadata.models as mo
+    triage_metadata.models as mo
     join
     test_results.evaluations as ev using (model_id)
     join
-    model_metadata.matrices as ma on ev.matrix_uuid = ma.matrix_uuid
+    triage_metadata.matrices as ma on ev.matrix_uuid = ma.matrix_uuid
 where
     mo.model_comment ~ 'inspection'
 order by
@@ -654,11 +654,11 @@ select distinct
     to_char(ev.num_positive_labels*1.0 / ev.num_labeled_examples, '0.999') as baserate,
     :k * 100 as "k%"
 from
-    model_metadata.models as mo
+    triage_metadata.models as mo
     join
     test_results.evaluations as ev using (model_id)
     join
-    model_metadata.matrices as ma on ev.matrix_uuid = ma.matrix_uuid
+    triage_metadata.matrices as ma on ev.matrix_uuid = ma.matrix_uuid
 where
     ev.metric || ev.parameter = 'precision@15_pct'
     and
@@ -956,7 +956,7 @@ select
     total_features,
     matrices_needed,
     models_needed
-from model_metadata.experiments;
+from triage_metadata.experiments;
 ```
 
 
@@ -979,7 +979,7 @@ select
     model_type,
     hyperparameters
 from
-    model_metadata.model_groups
+    triage_metadata.model_groups
 where
     model_group_id not in (1);
 ```
@@ -1015,7 +1015,7 @@ select
     array_agg(model_id) as models,
     array_agg(train_end_time) as train_end_times
 from
-    model_metadata.models
+    triage_metadata.models
 where
     model_group_id not in (1)
 group by
@@ -1061,9 +1061,9 @@ select
     array_agg(to_char(ev.num_positive_labels, '999,999') order by ev.evaluation_start_time asc) as total_positive_labels,
     array_agg(to_char(ev.stochastic_value, '0.999') order by ev.evaluation_start_time asc) as "precision@15%"
 from
-    model_metadata.models as mo
+    triage_metadata.models as mo
     inner join
-    model_metadata.model_groups as mg using(model_group_id)
+    triage_metadata.model_groups as mg using(model_group_id)
     inner join
     test_results.evaluations  as ev using(model_id)
 where
@@ -1298,7 +1298,7 @@ select
     model_group_id,
     split_part(unnest(feature_list), '_', 1) as feature_groups
 from
-    model_metadata.model_groups
+    triage_metadata.model_groups
 ),
 
 features_arrays as (
@@ -1319,7 +1319,7 @@ select
      array_agg(to_char(stochastic_value, '0.999') order by train_end_time asc) filter (where metric = 'precision@') as "precision@15%",
     array_agg(to_char(stochastic_value, '0.999') order by train_end_time asc) filter (where metric = 'recall@') as "recall@15%"
 from
-    model_metadata.models
+    triage_metadata.models
     join
     features_arrays using(model_group_id)
     join
@@ -1424,13 +1424,13 @@ with some rules:
 model_groups:
     query: |
         select distinct(model_group_id)
-        from model_metadata.model_groups
+        from triage_metadata.model_groups
         where model_config ->> 'experiment_type' ~ 'inspection'
 # CHOOSE TIMESTAMPS/TRAIN END TIMES
 time_stamps:
     query: |
         select distinct train_end_time
-        from model_metadata.models
+        from triage_metadata.models
         where model_group_id in ({})
         and extract(day from train_end_time) in (1)
         and train_end_time >= '2014-01-01'
@@ -1629,7 +1629,7 @@ baseline_query: | # SQL query for defining a baseline for comparison in plots. I
              m.num_labeled_above_threshold,
              m.num_positive_labels
        from test_results.evaluations m
-       left join model_metadata.models g
+       left join triage_metadata.models g
        using(model_id)
        where g.model_group_id = 1
              and metric = 'precision@'
@@ -1748,9 +1748,9 @@ select
     mg.hyperparameters,
     array_agg(model_id order by train_end_time) as models
 from
-    model_metadata.model_groups as mg
+    triage_metadata.model_groups as mg
     inner join
-    model_metadata.models
+    triage_metadata.models
     using (model_group_id)
 where model_group_id = 39
 group by 1,2,3