Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Working solution serialization bug. #5874

Merged
merged 3 commits into from
Oct 18, 2022

Conversation

VersusFacit
Copy link
Contributor

@VersusFacit VersusFacit commented Sep 18, 2022

resolves #5436
resolves #5385

Description

Ho, boy! What an odyssey. This PR has some reworks to the adapter logging functions for dbt. These changes help us conform to the intended spec of the logging classes. The resulting behavior makes our log functions reminiscent of python's print().

reproducing the errors

5385 models/asdf.sql:

{% set some_list = ['a', 'b', 'c'] %}
{{ log(some_list, info = true) }}
select 1 as id
DBT_ENV_SECRET_WHATEVER=1234 dbt run -s asdf

5436 asdf2.sql:

dbt --no-use-colors --log-format json run -s asdf2 | jq .msg
select * from foo.bar

——————

dbt-bigquery==1.0.0
dbt-core==1.0.8

1 of 1 START table model my_dataset.asdf........................................ [RUN]
Unhandled error while executing model.jaffle_shop.asdf
Pickling client objects is explicitly not supported.
Clients have non-trivial state that is local and unpickleable.
1 of 1 ERROR creating table model my_dataset.asdf............................... [ERROR in 0.61s]
Finished running 1 table model in 1.47s.
Completed with 1 error and 0 warnings:
Pickling client objects is explicitly not supported.
Clients have non-trivial state that is local and unpickleable.

dbt-bigquery==1.1.0
dbt-core==1.1.2

dbt-bigquery==1.2.0
dbt-bigquery==1.3.0b2

dbt-core==1.3.0b2
dbt-core==1.2.1

1 of 1 START table model my_dataset.asdf ....................................... [RUN]
1 of 1 ERROR creating table model my_dataset.asdf .............................. [ERROR in 0.53s]
Finished running 1 table model in 1.17s.
Completed with 1 error and 0 warnings:
Compilation Error in macro statement (macros/etc/statement.sql)
Object of type NotFound is not JSON serializable

in macro materialization_table_bigquery (macros/materializations/table.sql)
called by macro statement (macros/etc/statement.sql)

Error 1: Serialization Woes

json.dumps presumes that all elements within a dictionary can be simply rendered as a json string. But complex objects inside the argument will cause a serialization error. This happens precisely at line 209 of core/dbt/events/functions.

Several classes have msg parameter. This spec does not always match reality: Some arguments are not strings. This leads to trouble when logger member function calls are made. So, if we force the message given to logger functions to be a string, the spec matches reality and the error vanishes. Local proof of it working on development.

image

This error is what the user OwenKephart graciously documented over in Bigquery issue 206 as a desired output for my test case model, designed to fail.

QED.

Error 2: Secret scrubbing implodes when provided a non-str type

Jinja log hands a type to Python which should but isn't always a string. Although, we have a function scrub_secrets which expects that.

image

By making sure these MacroEvent* types are all in fact strings, we solve the current problem and hopefully, future ones.

Checklist

@VersusFacit VersusFacit self-assigned this Sep 18, 2022
@VersusFacit VersusFacit requested a review from a team as a code owner September 18, 2022 10:31
@cla-bot cla-bot bot added the cla:yes label Sep 18, 2022
@github-actions
Copy link
Contributor

Thank you for your pull request! We could not find a changelog entry for this change. For details on how to document a change, see the contributing guide.

Copy link
Contributor

@gshank gshank left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're converting all of these Exception object to strings in the next structured logging update anyway, because everything in the event has to be encoded into a protobuf message.

@VersusFacit VersusFacit added backport 1.1.latest backport 1.2.latest This PR will be backported to the 1.2.latest branch labels Sep 20, 2022
Copy link
Contributor

@colin-rogers-dbt colin-rogers-dbt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@nathaniel-may
Copy link
Contributor

nathaniel-may commented Sep 22, 2022

So it looks like #5385 has the exact same problem (passing a non-string to log). Since it's jinja log, the first the value time exits Jinja-land into our control is in fire_event. My question to you is- do you think we should solve this lower in the call stack to knock both of these bugs out?

…utes.

Before this change, the spec of various classes expected base_msg and msg params to be str's. This assumption did not always hold true. post_init hooks ensures the spec is obeyed.
@VersusFacit
Copy link
Contributor Author

Revised in light of the recent structured logging change.

PR description revised, mostly expanded to describe the second bug.

Employs the same subclass-as-trait paradigm to . And for good measure, I did check the new structured logging code on main still exhibits the bugs described in the linked issues before folding these changes into the code.

@VersusFacit VersusFacit force-pushed the CT-803/serialization_errors_in_AdapterLogger branch from 7e59699 to 2dae7e7 Compare October 17, 2022 01:20
class AdapterEventStringFunctor:
def __post_init__(self):
super().__post_init__()
if not isinstance(self.base_msg, str):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need to check if base_msg exists?

Copy link
Contributor Author

@VersusFacit VersusFacit Oct 17, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thought about this. We're talking data classes so these objects must be instantiated with params. A little toy example:

from dataclasses import dataclass

@dataclass
class Item:
    a:str
    b:str
    c:str

asdf = Item()

Try running this and it complains:

TypeError: init() missing 3 required positional arguments: 'a', 'b', and 'c'

Hence, if a dataclass exists, it's gotta have the params. And this functor should only be used in classes that have this param. I think that renders an existential check redundant? (if it's not, tell me)

Copy link
Contributor

@colin-rogers-dbt colin-rogers-dbt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense, just one question. Also, do we need unit tests to validate this behavior?

@VersusFacit
Copy link
Contributor Author

Unsure about unit tests. Is it worth having tests to check that the intended spec holds for something like the structured logging? That feels extra to me, but I'm also open to writing some quick ones if it seems like a good idea from your POV!

Copy link
Contributor

@colin-rogers-dbt colin-rogers-dbt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@VersusFacit VersusFacit merged commit ff2f1f4 into main Oct 18, 2022
@VersusFacit VersusFacit deleted the CT-803/serialization_errors_in_AdapterLogger branch October 18, 2022 19:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport 1.1.latest backport 1.2.latest This PR will be backported to the 1.2.latest branch backport 1.3.latest cla:yes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[CT-803] Serialization errors in AdapterLogger [CT-756] Stringify user-provided messages to {{ log() }}
4 participants