Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Potential swapping of columns on save to the DB (was: Triage swaps importance score and feature value in the test_results.individual_importances) #744

Closed
adunmore opened this issue May 27, 2020 · 1 comment · Fixed by #755
Assignees
Labels

Comments

@adunmore
Copy link
Contributor

When calculating individual importances, Triage appears to store values for importance score in test_results.individual_importances.feature_value, and values for feature value in test_results.individual_importances.importance_score.

@nanounanue nanounanue self-assigned this Jun 3, 2020
@nanounanue
Copy link
Contributor

Ugh!

This is true!

And possible with deeper implications:

The culprit is:

@db_retry
def save_db_objects(db_engine, db_objects):
"""Saves a collection of SQLAlchemy model objects to the database using a COPY command
Args:
db_engine (sqlalchemy.engine)
db_objects (iterable) SQLAlchemy model objects, corresponding to a valid table
"""
db_objects = iter(db_objects)
first_object = next(db_objects)
type_of_object = type(first_object)
with PipeTextIO(partial(
_write_csv,
db_objects=chain((first_object,), db_objects),
type_of_object=type_of_object
)) as pipe:
postgres_copy.copy_from(pipe, type_of_object, db_engine, format="csv")

The last line postgres_copy.copy_from inserts in the order given by sqlalchemymodel.__table__.columns

In our case:

In [15]: [col.name for col in i.__table__.columns]
Out[15]:
['model_id',
 'entity_id',
 'as_of_date',
 'feature',
 'method',
 'feature_value',
 'importance_score']

But the table test_results.individual_importances is in the following order:

food# \d test_results.individual_importances ;
                   Table "test_results.individual_importances"
      Column      │            Type             │ Collation │ Nullable │ Default
══════════════════╪═════════════════════════════╪═══════════╪══════════╪═════════
 model_id         │ integer                     │           │ not null │
 entity_id        │ bigint                      │           │ not null │
 as_of_date       │ timestamp without time zone │           │ not null │
 feature          │ character varying           │           │ not null │
 method           │ character varying           │           │ not null │
 importance_score │ double precision            │           │          │
 feature_value    │ double precision            │           │          │
Indexes:
    "individual_importances_pkey" PRIMARY KEY, btree (model_id, entity_id, as_of_date, feature, method)
Foreign-key constraints:
    "individual_importances_model_id_fkey" FOREIGN KEY (model_id) REFERENCES model_metadata.models(model_id)

I am worried that this happens in other tables or we just were lucky!

@nanounanue nanounanue changed the title Triage swaps importance score and feature value in the test_results.individual_importances Potential swapping of columns on save to the DB (was: Triage swaps importance score and feature value in the test_results.individual_importances) Jun 4, 2020
nanounanue added a commit that referenced this issue Jun 4, 2020
@nanounanue nanounanue mentioned this issue Jun 4, 2020
nanounanue added a commit that referenced this issue Jun 5, 2020
* Python 3 interprets string literals as Unicode strings. We should
prepend with 'r' the strings in a regex

* Closes #744
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants