Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rename "schema" + "data" tests #3234

Closed
jtcohen6 opened this issue Apr 7, 2021 · 18 comments · Fixed by #3880
Closed

Rename "schema" + "data" tests #3234

jtcohen6 opened this issue Apr 7, 2021 · 18 comments · Fixed by #3880
Labels
1.0.0 Issues related to the 1.0.0 release of dbt dbt tests Issues related to built-in dbt testing functionality enhancement New feature or request

Comments

@jtcohen6
Copy link
Contributor

jtcohen6 commented Apr 7, 2021

For v0.20.0, we want to make tests more powerful and (crucially) much more consistent. Today, there are real functional differences between "schema" tests and "data" tests. Instead, tests should just be tests. They'll still have two points of entry, but this should reflect only the dbt developer's choices to trade off clarity and reusability.

As such, we're thinking about renaming:

  • schema test → generic test
    • alternatives: "macro test," "reusable test"
  • data test → bespoke test singular test
    • alternatives: "file test," "one-off test"

This renaming has real implications for:

  • Internal objects (e.g. ParsedDataTestNode, ParsedSchemTestNode). This shouldn't have any implications for end users.
  • File names, e.g. builtin generic tests are defined in global_project/macros/schema_tests/. This is cosmetic only.
  • Resource FQNs + output of dbt ls. (Today, the unique_id is test.my_project.unique_model_a_fun, but the fqn is my_project.schema_test.unique_model_a_fun.)
  • Selection criteria: this represents a functional change, so I'll say more below

Selection criteria

  • dbt test --data, dbt test -m test_type:data
  • dbt test --schema, dbt test -m test_type:schema

Should instead become:

  • dbt test -m test_type:singular
  • dbt test -m test_type:generic

Notes:

  • We should make this backward compatible by continuing to supporting schema + data as aliases for the names we ultimately decide
  • We should nix the CLI flag form entirely, in favor of the test_type method. I'm open to whether we should continue supporting the --schema and --data flags for backwards compatibility.

Describe alternatives you've considered

  • Leaving the old names in place
@jtcohen6 jtcohen6 added enhancement New feature or request dbt tests Issues related to built-in dbt testing functionality labels Apr 7, 2021
@jtcohen6 jtcohen6 added this to the Margaret Mead milestone Apr 7, 2021
@boxysean
Copy link
Contributor

boxysean commented May 6, 2021

I agree with this point tests should just be tests. But I am concerned that the rename (schema -> generic, data -> bespoke) could add confusion to existing users without adding value. What do you think?

@jtcohen6
Copy link
Contributor Author

jtcohen6 commented May 6, 2021

I really appreciate you raising the concern @boxysean! For what it's worth, I'm not in love with the terms generic or bespoke either.

Part of the dbt v1.0 project is acknowledging that, given current growth trajectory in weekly active projects, the majority of post-v1.0 dbt users have still never used dbt. If we think there's something, anything, that could be improved for the sake of making it more intuitive and less confusing, the sooner the better.

My perspective here is that "data test" and "schema test" are themselves quite confusing names today—they're just the confusing names that we're used to. (All tests on are data; "schema" is an overloaded term, and schema tests don't need to be defined in files named schema.yml.)

I'm not after a 1:1 rename, exactly. Within the codebase, I'd like to see us redefine these to all just be test, wherever possible, and then in our documentation we can differentiate between tests defined via one-off queries vs. tests defined via reusable/generic/parametrized queries, as different implementation mechanisms of the same functionality.

There are a few places where we need to pick specific words, however, for the sake of existing functional parity. The biggest one that comes to mind is the selection criteria:

$ dbt test -m test_type:data
$ dbt test -m test_type:schema

We're trying to make this distinction less explicit, but it still exists, so I'd like to find some words that would sit comfortably in the codebase, in the selection syntax docs, and in the hearts & minds of our community members.

@jtcohen6 jtcohen6 removed this from the Margaret Mead milestone Jun 2, 2021
@joellabes
Copy link
Contributor

Can I propose that

in the hearts & minds of our community members

is actually a two-parter as follows?

  • a pair of words that are effective opposites to aid discoverability (if you know what a generic test does, you can infer the behaviour of a bespoke test, as opposed to schema vs data)
  • each word should work individually in conversation (eg one-off/single purpose is better than bespoke - on its own, does bespoke imply that's the only type of test I can make for myself?)

I'm assuming the code and selection constraints boil down to short and unique is good, one word is better.

FWIW, I'm ok with generic as a name, but don't care for bespoke. Some alternative options: targeted/focused/single-use/single-purpose

Templated vs fixed?

Templated vs non-templated? A bit 🤢, but maybe just crazy enough to work. Particularly inelegant around node selection unless it changed to --exclude:test_type:templated

@noel
Copy link

noel commented Jun 12, 2021

I agree that the current terms can be confusing to users. If you think about it, is a unique test not testing data? New users might understand terns like built-in or reusable vs custom. I think that is the main difference for a newcomer. You can use these simple words in your schema, pass in some parameters and leave the SQL to us wether that comes from dbt, dbt-utils, dbt-expectations, etc. OR if you want, you can create some custom test over here but it is likely that is not reusable. When you advance you can take that custom test and make it reusable via a macro.

@boxysean
Copy link
Contributor

Okay this just hit me, and perhaps it's too late for additional input, but...

Splitting the two types of tests on a different axis: what if they were called YAML (file) tests and SQL (file) tests?

@jtcohen6
Copy link
Contributor Author

Ok, I think it's high time to make a final call here:

  • I'm quite happy with generic as a name for reusable macro-like tests
  • I'm significantly less happy with bespoke. We don't shy away from strange words when they're the right ones (hello ephemeral), but this one doesn't feel right. I'm now thinking that one_off or single_use (thanks @joellabes!) is much closer to the mark.

I'm open to one last round of persuading, if anyone wants to get a final word in. Then it's going west, going east, gone till 2.0 at least :)

@MartinGuindon
Copy link

MartinGuindon commented Aug 10, 2021

I agree with @jtcohen6 , generic test is fine. Less of a fan of bespoke, and harder to explain to new users.

I'm not a fan of one_off or single_use either, to me they sound like you'd run these tests only once.

How about the antonym of generic: specific tests?

A generic test is re-usable across multiple models.
A specific test is written to test one specific use case.

@jtcohen6
Copy link
Contributor Author

Good point about potential confusion around running only once.

I think specific is a solid option. It does make think of https://en.wikipedia.org/wiki/Sensitivity_and_specificity

@MartinGuindon What do you think of singular? Better/worse?

@MartinGuindon
Copy link

Singular is better than one off or single use, but not sure I like it more than specifc. But still a good option I think.

@joellabes
Copy link
Contributor

joellabes commented Aug 11, 2021

My current order of preference, having read all the comments:

  1. Single purpose
  2. Singular
  3. Specific
  4. Single use

999: Bespoke

I originally had specific and singular flipped, but then thought about the verbal gymnastics I'd have to do in slack trying to talk about a specific individually named test 😰

@boxysean
Copy link
Contributor

"Generic test" 👍.

I am strongly against bespoke test. My favorite of the recent options is "singular".

I'll throw in different options of a slightly different dimension: "SQL statement test", "statement test", "unparameterized test", "custom test" (throwback to @noel's earlier comment).

@joellabes
Copy link
Contributor

Ooh I don't mind unparameterized. At that point I'd make a play for renaming generic tests to parameterized ones though.

Also there are like 4 different ways to spell parametrised by the time you sneak the extra E in and get the Ss and Zs in the mix 😥

@noel
Copy link

noel commented Aug 11, 2021

I think one big difference is having a macro for generic test and NOT having a macro in a bespoke test.
So... Generic Macro Test and Custom SQL test

@jtcohen6
Copy link
Contributor Author

jtcohen6 commented Aug 11, 2021

Thank you all! Sounds like we can really agree about "not bespoke" 😅

@noel : It's a fair distinction, and it's definitely how I think deep-down about the difference between the two: one of these is macro-like, one of these is just my SQL in a file. I struggle with names like custom, sql_test, or sql_file_test because it's desirable for users to write their own custom generic tests, as (parametrized) SQL, in .sql files!

@boxysean Parametrized is a really good word to use when communicating the underlying implementation. I've been using this word to explain how a generic test works:

Screen Shot 2021-08-11 at 7 41 13 AM

I don't love unparametrized for the test-formerly-known-as-data, though, for the same reason I wouldn't like non_generic: I'd rather find a positive word that stands in sharper contrast.

@joellabes That's a really good point about how specific might make it even harder to talk about another test-related thing that I find myself struggling to find good words for, "a specific instance of a generic test," i.e.

models:
  - name: my_model
    columns:
      - name: id
        tests:
          - unique  # <---- this one
       - name: another_col
         tests:
           - unique # <---- not this one

So I'm leaning toward singular because:

  • It's an odd-enough word that it's unlikely to collide with other words we need to use in everyday conversation
  • Its meaning is intuitive for a person who's relatively new to dbt
  • By its other meaning, a singular test is also a really excellent one: standout, one-of-a-kind, suis generis
  • As its antonym, generic tests ought to be plural, which they sort of are!

I edited the issue above, replacing bespoke with singular. Take a look, check the vibes, feel out whether you can live with it for a couple of years :)

@jtcohen6 jtcohen6 added the 1.0.0 Issues related to the 1.0.0 release of dbt label Aug 11, 2021
@noel
Copy link

noel commented Aug 11, 2021

@jtcohen6 when you say "it's desirable for users to write their own custom generic tests, as (parametrized) SQL" at this point they become a Generic Macro test, no? It's like a custom generic macro test or a custom sql only test.

Maybe custom is not the "right" word. Could be something like personalized or tailored

  • tailored reusable test (in macros dir)
  • tailored specific test (in tests dir)

@MartinGuindon
Copy link

@jtcohen6 I'd be happy with singular + generic.

@boxysean
Copy link
Contributor

Agreed that singular + generic is good. :-) I agree singular won't collide with other words and is reasonably related.

@joellabes
Copy link
Contributor

Lock it in 👍 🔒

@jtcohen6 jtcohen6 self-assigned this Sep 1, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
1.0.0 Issues related to the 1.0.0 release of dbt dbt tests Issues related to built-in dbt testing functionality enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants