Only Build Features for Cohort [Resolves #513] #567

thcrock · 2019-01-07T22:38:47Z

By default, only build/save features for the given cohort. This is to
optimize speed for new users (and visit-based problems) to only what is
necessary.

For other users, the save_all_features flag is introduced. If this is
set to True, the old behavior that builds features independently of the
cohort is utilized.

To make the use of this flag in combination with the replace Flag safe,
the feature table "needs features?" check now sees if there are any
cohort rows not present in the imputed table and rebuilds if anything is
missing.

jesteria

Can't really comment beyond a couple superficial things -- up to @saleiro to be more profound -- but, in case this helps....

src/tests/collate_tests/test_from_obj.py

src/tests/collate_tests/test_spacetime.py

src/triage/component/architect/feature_generators.py

By default, only build/save features for the given cohort. This is to optimize speed for new users (and visit-based problems) to only what is necessary. For other users, the `save_all_features` flag is introduced. If this is set to True, the old behavior that builds features independently of the cohort is utilized. To make the use of this flag in combination with the replace Flag safe, the feature table "needs features?" check now sees if there are any cohort rows not present in the imputed table and rebuilds if anything is missing.

codecov-io · 2019-01-18T17:20:20Z

Codecov Report

Merging #567 into master will decrease coverage by 0.05%.
The diff coverage is 76.66%.

@@            Coverage Diff             @@
##           master     #567      +/-   ##
==========================================
- Coverage   83.13%   83.07%   -0.06%     
==========================================
  Files          83       83              
  Lines        4755     4780      +25     
==========================================
+ Hits         3953     3971      +18     
- Misses        802      809       +7

Impacted Files	Coverage Δ
src/triage/cli.py	`0% <0%> (ø)`	⬆️
src/triage/experiments/base.py	`94.34% <100%> (+0.04%)`	⬆️
...c/triage/component/architect/feature_generators.py	`84.47% <100%> (+0.64%)`	⬆️
src/triage/component/collate/spacetime.py	`99% <100%> (+0.04%)`	⬆️
src/triage/component/results_schema/__init__.py	`65.78% <0%> (-0.88%)`	⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 7d617d7...dc3cd42. Read the comment docs.

saleiro · 2019-01-24T05:16:21Z

docs/sources/experiments/algorithm.md

+JOIN {cohort_table} ON (
+    {cohort_table.entity_id} = {from_obj.entity_id}
+    AND {cohort_table.date} = {as_of_time}
+)
 WHERE {knowledge_date_column} >= {as_of_time} - interval {interval}
 GROUP BY {group}
 ```


Yes! I tested with explain analyze and everything looks good!

saleiro · 2019-01-24T05:21:25Z

docs/sources/experiments/algorithm.md

@@ -110,11 +110,16 @@ a column or SQL expression representing a numeric quantity present in the `from_
 of aggregate functions we want to use. The aggregate function is applied to the quantity.
 * Each `group` is a column applied to the GROUP BY clause. Generally this is 'entity_id', but higher-level groupings
 (for instance, 'zip_code') can be used as long as they can be rolled up to 'entity_id'.
+* By default the query is joined with the cohort table (see 'state table' above) to remove unnecessary rows. If `save_all_features` is passed to the Experiment this is not done.



Personally, save_all_features makes me think of columns and not rows... Maybe something like 'features_ignore_cohort'. or 'features_besides_cohort', or 'features_for_all' :) ?

I agree. My preference is features_ignore_cohort, I'll change that

saleiro

This is great, improved a lot my user experience, and the triage db disk requirements.

thcrock assigned saleiro Jan 7, 2019

jesteria reviewed Jan 8, 2019

View reviewed changes

thcrock added 2 commits January 18, 2019 10:47

Set save_all_features to True if no cohort config is sent

891dbe5

thcrock force-pushed the cohort_features branch from a852981 to 891dbe5 Compare January 18, 2019 17:06

Changes from review

3a766e7

saleiro reviewed Jan 24, 2019

View reviewed changes

saleiro approved these changes Jan 24, 2019

View reviewed changes

Rename from save_all_features to features_ignore_cohort

dc3cd42

thcrock merged commit 970badb into master Jan 24, 2019

thcrock deleted the cohort_features branch January 24, 2019 17:32

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Only Build Features for Cohort [Resolves #513] #567

Only Build Features for Cohort [Resolves #513] #567

thcrock commented Jan 7, 2019

jesteria left a comment

codecov-io commented Jan 18, 2019 •

edited

Loading

saleiro Jan 24, 2019

saleiro Jan 24, 2019 •

edited

Loading

thcrock Jan 24, 2019

saleiro left a comment

Only Build Features for Cohort [Resolves #513] #567

Only Build Features for Cohort [Resolves #513] #567

Conversation

thcrock commented Jan 7, 2019

jesteria left a comment

Choose a reason for hiding this comment

codecov-io commented Jan 18, 2019 • edited Loading

Codecov Report

saleiro Jan 24, 2019

Choose a reason for hiding this comment

saleiro Jan 24, 2019 • edited Loading

Choose a reason for hiding this comment

thcrock Jan 24, 2019

Choose a reason for hiding this comment

saleiro left a comment

Choose a reason for hiding this comment

codecov-io commented Jan 18, 2019 •

edited

Loading

saleiro Jan 24, 2019 •

edited

Loading