Add Jinja Sampling to Stable Static Parser #3970

nathaniel-may · 2021-09-29T16:09:32Z

Description

Infrequently sample jinja parsing to compare against the new static parser. Without this, we'll have no data to compare the parity of jinja rendering and the stable static parser for the 1.0 version. This sampling is separate from the sampling pathway for the experimental parser, and both can be done at the same time. Since we expect a huge number of people to upgrade to 1.0, even though we have done our due diligence and expect the static parser to match parity with jinja rendering, this tracking information will allow us to validate our prior work while the feature is turned on by default for users.

The trade off here is that adding a second sampling path adds another bit of complexity to the render_update method and jinja rendering is often two orders of magnitude greater than the static parser making sampling extremely expensive. Right now the plan is to sample 1/5000 model files instead of 1/100 but this sample rate is flexible. A lower rate gives us less feedback and impacts users less, and a higher rate gives us more feedback sooner at a performance cost to users.

This is not critical to ship in an early 1.0 pre-release.

Reviewers

First and foremost, do we even need this? The biggest drawback is that sampling jinja rendering is expensive and would add additional variation to project load times. The benefit of course is that we have any sort of warning that we have deviated from the existing jinja semantics. That being said, unless the project source is available to us we won't really have the ability to reproduce. However, if there are no problems after many samples, we can have great confidence that it works as designed across the wide variety of users that will inevitably upgrade to the big 1.0. Personally, If you're leaning towards "we don't need this" I'm leaning towards "what if we sample REALLY infrequently."

In order to jinja render separately, I deep copy both the node and config objects and pass them to the super class's render_update method in the event we're taking a jinja sample. What do you think about this approach? I'm trying to avoid mutating the objects I'm working on currently for correctness, but I'm also not using the jinja rendering for the run when I have it so that subsequent user runs of dbt are deterministic and not dependent on the sampling rng. These deep copies only happen during a sample, so they won't be a huge impact to overall performance, but they do make taking the sample more expensive.

There's no great way to test the tracking logic, so please take a good look at _get_stable_sample_result.

Checklist

I have signed the CLA
I have run this code in development and it appears to resolve the stated issue
This PR includes tests, or tests are not required/relevant for this PR
I have updated the CHANGELOG.md and added information about my change to the "dbt next" section.

jtcohen6 · 2021-09-29T16:12:03Z

core/dbt/parser/models.py

+    # look for false positive configs
+    for k in config._config_call_dict:
+        if k not in config._config_call_dict:
+            result += ["82_stable_false_positive_config_value"]


ah, a different error code! really good. I was way overcomplicating this in my head

I try to make things easy for you when I can 🙂

jtcohen6 · 2021-09-29T20:11:44Z

Small thing, while we're hanging out in the model parser's render_update: I just realized that this line got lost in the shuffle (or did we remove it intentionally?):

https://github.com/dbt-labs/dbt/blob/733f35f5e133327e8e54477e94166ec76eec926e/core/dbt/parser/models.py#L32

We include that value when we fire off the total # of files "eligible" for static parsing, as the denominator when calculating the efficacy of our current grammar for one project.

nathaniel-may · 2021-09-29T21:33:00Z

@jtcohen6, It's definitely not the plan to remove that line. It looks like it's still there on develop and in this branch. So I suspect it's just the PR view that makes it look like it disappeared.

jtcohen6 · 2021-09-30T11:47:50Z

@nathaniel-may That's static_analysis_parsed_path_count (numerator), which is distinct from static_analysis_path_count (denominator). I think static_analysis_path_count was incidentally removed in 3944705#diff-dc357e8548e098dd9f293a19d206d90baf4796d1f2e05e237342334fd6657356L32.

I observed this because I parsed a project with develop, inspected target/perf_info.json, and noticed that static_analysis_path_count is 0, at the same time that static_analysis_parsed_path_count is >0 and is_static_analysis_enabled is True.

nathaniel-may · 2021-09-30T13:52:19Z

My mistake. Good catch though! I opened PR#3981.

nathaniel-may · 2021-09-30T14:40:42Z

rebased so PR#3981 is reflected here now.

automatic commit by git-black, original commits: b501f43

nathaniel-may requested a review from jtcohen6 September 29, 2021 16:09

cla-bot bot added the cla:yes label Sep 29, 2021

jtcohen6 reviewed Sep 29, 2021

View reviewed changes

nathaniel-may force-pushed the sampling2 branch 2 times, most recently from db5c188 to ca69e84 Compare September 29, 2021 16:44

Base automatically changed from static-parser-by-default to develop September 29, 2021 20:01

add jinja sampling to stable static parser

a5f2c1a

nathaniel-may force-pushed the sampling2 branch from ca69e84 to a5f2c1a Compare September 30, 2021 14:39

nathaniel-may requested a review from gshank October 1, 2021 17:11

nathaniel-may marked this pull request as ready for review October 1, 2021 17:11

gshank approved these changes Oct 5, 2021

View reviewed changes

nathaniel-may requested a review from jtcohen6 October 6, 2021 13:35

jtcohen6 approved these changes Oct 6, 2021

View reviewed changes

nathaniel-may merged commit b501f43 into develop Oct 6, 2021

nathaniel-may deleted the sampling2 branch October 6, 2021 13:48

iknox-fa pushed a commit that referenced this pull request Feb 8, 2022

add jinja sampling to stable static parser (#3970)

5da10a9

automatic commit by git-black, original commits: b501f43

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Jinja Sampling to Stable Static Parser #3970

Add Jinja Sampling to Stable Static Parser #3970

nathaniel-may commented Sep 29, 2021 •

edited

Loading

jtcohen6 Sep 29, 2021

nathaniel-may Sep 29, 2021

jtcohen6 commented Sep 29, 2021

nathaniel-may commented Sep 29, 2021

jtcohen6 commented Sep 30, 2021

nathaniel-may commented Sep 30, 2021

nathaniel-may commented Sep 30, 2021

Add Jinja Sampling to Stable Static Parser #3970

Add Jinja Sampling to Stable Static Parser #3970

Conversation

nathaniel-may commented Sep 29, 2021 • edited Loading

Description

Reviewers

Checklist

jtcohen6 Sep 29, 2021

Choose a reason for hiding this comment

nathaniel-may Sep 29, 2021

Choose a reason for hiding this comment

jtcohen6 commented Sep 29, 2021

nathaniel-may commented Sep 29, 2021

jtcohen6 commented Sep 30, 2021

nathaniel-may commented Sep 30, 2021

nathaniel-may commented Sep 30, 2021

nathaniel-may commented Sep 29, 2021 •

edited

Loading