Skip to content

Commit

Permalink
Issue 700 (#774)
Browse files Browse the repository at this point in the history
* model_metadata -> triage_metadata

* Alembic version. Closes #700

* For some reason, the python version has a "\n" in the middle of the string

* Replacing the function, given that the schema name changed

* Pasted code in wrong file 🤦

* Missing import. Unnecessary import. (A tale about two imports)

* Ignoring directories created by running triage

* Bump version: 4.0.1 → 4.1.0

* Closes #756
  • Loading branch information
nanounanue authored Jun 30, 2020
1 parent 4e76e3a commit e804cf3
Show file tree
Hide file tree
Showing 46 changed files with 284 additions and 220 deletions.
4 changes: 3 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -18,4 +18,6 @@ my_db_config.yaml
database.yaml
dirtyduck/triage/**

*~
*~
**/trained_models/**
**/matrices
4 changes: 2 additions & 2 deletions dirtyduck/food_db/05_nuke_triage.sql
Original file line number Diff line number Diff line change
Expand Up @@ -7,8 +7,8 @@ create or replace function nuke_triage()

begin

execute 'drop schema if exists model_metadata cascade';
raise notice 'model_metadata deleted';
execute 'drop schema if exists triage_metadata cascade';
raise notice 'triage_metadata deleted';
execute 'drop schema if exists features cascade';
raise notice 'features deleted';
execute 'drop schema if exists train_results cascade';
Expand Down
4 changes: 2 additions & 2 deletions docs/sources/audition/audition.md
Original file line number Diff line number Diff line change
Expand Up @@ -53,13 +53,13 @@ We included a [simple configuration file](file:///home/nanounanue/projects/dsapp
model_groups:
query: |
select distinct(model_group_id)
from model_metadata.model_groups
from triage_metadata.model_groups
where model_config ->> 'experiment_type' ~ 'inspection'
# CHOOSE TIMESTAMPS/TRAIN END TIMES
time_stamps:
query: |
select distinct train_end_time
from model_metadata.models
from triage_metadata.models
where model_group_id in ({})
and extract(day from train_end_time) in (1)
and train_end_time >= '2015-01-01'
Expand Down
2 changes: 1 addition & 1 deletion docs/sources/dirtyduck/data_preparation.md
Original file line number Diff line number Diff line change
Expand Up @@ -104,7 +104,7 @@ The transformation "road" that we will take in this tutorial is as follows:
2. Apply some simple transformations and store the resulting data in the `cleaned` schema.
3. Organize the data into two *unnormalized*[^3] tables in the
semantic schema: `events` and `entities`.
4. Run `triage`. It will create several schemas (`model_metadata`, `test_results`, `train_results`).
4. Run `triage`. It will create several schemas (`triage_metadata`, `test_results`, `train_results`).

![img](images/data_road.png)

Expand Down
10 changes: 5 additions & 5 deletions docs/sources/dirtyduck/dirty_duckling.md
Original file line number Diff line number Diff line change
Expand Up @@ -241,14 +241,14 @@ it would mean that `triage` actually built (in this order) cohort
(table `cohort_all_entities...`),
labels (table `labels_failed_inspections...`), features (schema
`features`), matrices (table `model_metdata.matrices` and folder
`matrices`), models (tables `model_metadata.models` and
`model_metadata.model_groups`; folder `trained_models`), predictions
`matrices`), models (tables `triage_metadata.models` and
`triage_metadata.model_groups`; folder `trained_models`), predictions
(table `test_results.predictions`)
and evaluations (table `test_results.evaluations`).

### 5. Look at results of your duckling!

Next, let's quickly check the tables in the schemas `model_metadata` and
Next, let's quickly check the tables in the schemas `triage_metadata` and
`test_results` to make sure everything worked. There you will find a lot
of information related to the performance of your models.

Expand All @@ -261,15 +261,15 @@ Again, you should see the postgreSQL prompt:

food=#

Tables in the `model_metadata` schema have some general information about
Tables in the `triage_metadata` schema have some general information about
experiments that you've run and the models they created. The `quickstart`
model grid preset should have built 3 models. Let's check with:

```sql
select
model_id, model_group_id, model_type
from
model_metadata.models;
triage_metadata.models;
```

This should give you a result that looks something like:
Expand Down
14 changes: 7 additions & 7 deletions docs/sources/dirtyduck/eis.md
Original file line number Diff line number Diff line change
Expand Up @@ -429,7 +429,7 @@ After the experiment finishes, we can create the following table:
model_group_id,
split_part(unnest(feature_list), '_', 1) as feature_groups
from
model_metadata.model_groups
triage_metadata.model_groups
),
features_arrays as (
Expand All @@ -452,7 +452,7 @@ After the experiment finishes, we can create the following table:
array_agg(to_char(stochastic_value, '0.999') order by
train_end_time asc) as "precision@10% (stochastic)"
from
model_metadata.models
triage_metadata.models
join
features_arrays using(model_group_id)
join
Expand Down Expand Up @@ -593,13 +593,13 @@ compared to the inspection’s one:
model_groups:
query: |
select distinct(model_group_id)
from model_metadata.model_groups
from triage_metadata.model_groups
where model_config ->> 'experiment_type' ~ 'eis'
# CHOOSE TIMESTAMPS/TRAIN END TIMES
time_stamps:
query: |
select distinct train_end_time
from model_metadata.models
from triage_metadata.models
where model_group_id in ({})
and extract(day from train_end_time) in (1)
and train_end_time >= '2014-01-01'
Expand Down Expand Up @@ -767,7 +767,7 @@ and we will use the complete set of model groups selected by audition:
m.num_labeled_above_threshold,
m.num_positive_labels
from test_results.evaluations m
left join model_metadata.models g
left join triage_metadata.models g
using(model_id)
where g.model_group_id = 20
and metric = 'precision@'
Expand Down Expand Up @@ -866,9 +866,9 @@ select
mg.hyperparameters,
array_agg(model_id order by train_end_time) as models
from
model_metadata.model_groups as mg
triage_metadata.model_groups as mg
inner join
model_metadata.models
triage_metadata.models
using (model_group_id)
where model_group_id = 76
group by 1,2,3
Expand Down
2 changes: 1 addition & 1 deletion docs/sources/dirtyduck/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ implementation are prone to error, given their multi-dimensional,
multi-entity, time-series structure.

!!! info Triage version
This tutorial is in sync with the latest version of `triage`. At this moment [v4.0.1](https://github.com/dssg/triage/releases/tag/v4.0.1).
This tutorial is in sync with the latest version of `triage`. At this moment [v4.1.0](https://github.com/dssg/triage/releases/tag/v4.1.0).

!!! info "How you can help to improve this tutorial"

Expand Down
44 changes: 22 additions & 22 deletions docs/sources/dirtyduck/inspections.md
Original file line number Diff line number Diff line change
Expand Up @@ -183,7 +183,7 @@ The config file for this first experiment is located in
The first lines of the experiment config file specify the config-file
version (`v7` at the moment of writing this tutorial), a comment
(`model_comment`, which will end up as a value in the
`model_metadata.models` table), and a list of user-defined metadata
`triage_metadata.models` table), and a list of user-defined metadata
(`user_metadata`) that can help to identify the resulting model
groups. For this example, if you run experiments that share a temporal
configuration but that use different label definitions (say, labeling
Expand Down Expand Up @@ -480,7 +480,7 @@ select
total_features,
matrices_needed,
models_needed
from model_metadata.experiments;
from triage_metadata.experiments;
```

experiment | description | total_features | matrices_needed | models_needed
Expand Down Expand Up @@ -518,7 +518,7 @@ select
model_type,
hyperparameters
from
model_metadata.model_groups
triage_metadata.model_groups
where
model_config ->> 'experiment_type' ~ 'inspection'
```
Expand All @@ -536,7 +536,7 @@ select
array_agg(model_id) as models,
array_agg(train_end_time::date) as train_end_times
from
model_metadata.models
triage_metadata.models
where
model_comment ~ 'inspection'
group by
Expand Down Expand Up @@ -566,9 +566,9 @@ select
ma.num_observations as observations,
ma.lookback_duration as feature_lookback_duration, ma.feature_start_time
from
model_metadata.models as mo
triage_metadata.models as mo
join
model_metadata.matrices as ma
triage_metadata.matrices as ma
on train_matrix_uuid = matrix_uuid
where
mo.model_comment ~ 'inspection'
Expand Down Expand Up @@ -603,11 +603,11 @@ select distinct
substring(ev.matrix_uuid,1,5) as test_matrix_uuid,
ma.num_observations as observations
from
model_metadata.models as mo
triage_metadata.models as mo
join
test_results.evaluations as ev using (model_id)
join
model_metadata.matrices as ma on ev.matrix_uuid = ma.matrix_uuid
triage_metadata.matrices as ma on ev.matrix_uuid = ma.matrix_uuid
where
mo.model_comment ~ 'inspection'
order by
Expand Down Expand Up @@ -654,11 +654,11 @@ select distinct
to_char(ev.num_positive_labels*1.0 / ev.num_labeled_examples, '0.999') as baserate,
:k * 100 as "k%"
from
model_metadata.models as mo
triage_metadata.models as mo
join
test_results.evaluations as ev using (model_id)
join
model_metadata.matrices as ma on ev.matrix_uuid = ma.matrix_uuid
triage_metadata.matrices as ma on ev.matrix_uuid = ma.matrix_uuid
where
ev.metric || ev.parameter = 'precision@15_pct'
and
Expand Down Expand Up @@ -956,7 +956,7 @@ select
total_features,
matrices_needed,
models_needed
from model_metadata.experiments;
from triage_metadata.experiments;
```


Expand All @@ -979,7 +979,7 @@ select
model_type,
hyperparameters
from
model_metadata.model_groups
triage_metadata.model_groups
where
model_group_id not in (1);
```
Expand Down Expand Up @@ -1015,7 +1015,7 @@ select
array_agg(model_id) as models,
array_agg(train_end_time) as train_end_times
from
model_metadata.models
triage_metadata.models
where
model_group_id not in (1)
group by
Expand Down Expand Up @@ -1061,9 +1061,9 @@ select
array_agg(to_char(ev.num_positive_labels, '999,999') order by ev.evaluation_start_time asc) as total_positive_labels,
array_agg(to_char(ev.stochastic_value, '0.999') order by ev.evaluation_start_time asc) as "precision@15%"
from
model_metadata.models as mo
triage_metadata.models as mo
inner join
model_metadata.model_groups as mg using(model_group_id)
triage_metadata.model_groups as mg using(model_group_id)
inner join
test_results.evaluations as ev using(model_id)
where
Expand Down Expand Up @@ -1298,7 +1298,7 @@ select
model_group_id,
split_part(unnest(feature_list), '_', 1) as feature_groups
from
model_metadata.model_groups
triage_metadata.model_groups
),
features_arrays as (
Expand All @@ -1319,7 +1319,7 @@ select
array_agg(to_char(stochastic_value, '0.999') order by train_end_time asc) filter (where metric = 'precision@') as "precision@15%",
array_agg(to_char(stochastic_value, '0.999') order by train_end_time asc) filter (where metric = 'recall@') as "recall@15%"
from
model_metadata.models
triage_metadata.models
join
features_arrays using(model_group_id)
join
Expand Down Expand Up @@ -1424,13 +1424,13 @@ with some rules:
model_groups:
query: |
select distinct(model_group_id)
from model_metadata.model_groups
from triage_metadata.model_groups
where model_config ->> 'experiment_type' ~ 'inspection'
# CHOOSE TIMESTAMPS/TRAIN END TIMES
time_stamps:
query: |
select distinct train_end_time
from model_metadata.models
from triage_metadata.models
where model_group_id in ({})
and extract(day from train_end_time) in (1)
and train_end_time >= '2014-01-01'
Expand Down Expand Up @@ -1629,7 +1629,7 @@ baseline_query: | # SQL query for defining a baseline for comparison in plots. I
m.num_labeled_above_threshold,
m.num_positive_labels
from test_results.evaluations m
left join model_metadata.models g
left join triage_metadata.models g
using(model_id)
where g.model_group_id = 1
and metric = 'precision@'
Expand Down Expand Up @@ -1748,9 +1748,9 @@ select
mg.hyperparameters,
array_agg(model_id order by train_end_time) as models
from
model_metadata.model_groups as mg
triage_metadata.model_groups as mg
inner join
model_metadata.models
triage_metadata.models
using (model_group_id)
where model_group_id = 39
group by 1,2,3
Expand Down
Loading

0 comments on commit e804cf3

Please sign in to comment.