Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CT-403] [Feature] Allow programmatic access to macro properties #4919

Closed
1 task done
tiffanygwilson opened this issue Mar 22, 2022 · 11 comments
Closed
1 task done
Assignees
Labels
enhancement New feature or request Refinement Maintainer input needed

Comments

@tiffanygwilson
Copy link

Is there an existing feature request for this?

  • I have searched the existing issues

Describe the Feature

I would like to be able to access the properties of macros, such as names and tags, at runtime in a similar way as you can access the properties of sources using graph.sources.values().

Describe alternatives you've considered

It is possible to write a python script to parse a macros.yml file that contains macro properties to locate specific keywords. However, since dbt cannot execute arbitrary python scripts during execution, any functionality gained by said scripts requires developers to take an extra step of executing manually, which is likely to be forgotten since it is not a standard part of dbt.

Who will this benefit?

This feature would benefit users who want to use jinja to dynamically generate sql based on the current properties of macros. For example, I am using the method described here to manage Snowflake user-defined functions in dbt. While it is not difficult to add the "create_function" macro to a list in a create_udfs() macro, it would be really nice to save developers that step. Using jinja templating in the create_udfs() macro, one could loop through the macros configured in a yaml file (which, while not necessary, we want for the sake of documenting the arguments and output of each macro/function) and dynamically add any function with the tag udf to the create_udfs() macro.

I am currently using a similar method to populate a macro using the tags applied to sources via graph.sources.values(). It would be useful for anything configured in a yaml file to have those properties available when running dbt.

Are you interested in contributing this feature?

Unfortunately I do not have the bandwidth right now to try this as a first contribution.

Anything else?

Illustrative example of how this could work if macros were in the graph:

{% macro execute_macros() %}
{% if execute %}
        {%- for macro in graph.macros.values()
                | selectattr('resource_type', 'equalto', macro')
                | selectattr('tags') %}
            {%- if 'udf' in macro.tags -%}
                {{ macro.name }}();
                {%- do log("Ran operation " ~ macro.name , info=True) -%}
            {%- endif -%}
        {%- endfor -%}
        commit;
{% endif %}
{% endmacro %}
@tiffanygwilson tiffanygwilson added enhancement New feature or request triage labels Mar 22, 2022
@github-actions github-actions bot changed the title [Feature] Allow programmatic access to macro properties [CT-403] [Feature] Allow programmatic access to macro properties Mar 22, 2022
@gshank
Copy link
Contributor

gshank commented Mar 22, 2022

You've provided a nice example of how this would be useful. We don't include macro nodes in the "graph" (i.e. the DAG used for processing nodes in dbt) at this time, and adding them in the graph per se would throw off the way that a fairly large number of things work. There might be some other place that would work to store macro information, perhaps a separate macro manifest. Do you only want to loop through macro nodes?

@gshank gshank removed the triage label Mar 22, 2022
@tiffanygwilson
Copy link
Author

Yes, sorry for the confusion - I wasn't proposing adding them to the graph, only having access to them in something like the graph, such as the separate macro manifest you mentioned. Looping through a filtered set of macro nodes is my current use case, although I'm sure some creative developers would figure out another way to use it!

@jtcohen6
Copy link
Contributor

@tiffanygwilson Very cool issue, thanks for opening!

I think the specific use case you're outlining, which I'll call "UDF management," deserves to be considered and tackled as its own problem—a step further than what's outlined in the linked discourse article. Should dbt be able to create arbitrary object types in the database, with arbitrary configurations (#3391)? Should we support creating UDFs/functions as their own dedicated node type (#136)? Or even, as a kind of model materialization? They're DAG-aware, logic-bearing, addressable as database objects... it sorta makes sense!

More generally, not thinking about UDFs in particular — the problem you're describing is kind of like a custom dbt task: For each DAG node / project resource meeting certain selection criteria, run some SQL, using that node's configuration as input. It has a lot in common with dbt-external-tables, and with the recent discussion around run-operation: #5005. There are things that a real dbt task can do, that a run-operation can't—DAG ordering, multi-threaded execution, writing run_results.json.

To pull the two thoughts together: those are the same advantages that UDF-as-model-materialization has over UDF-as-macro-of-macros, since the former uses a "real" task (run) rather than the more-limited run-operation. In either case, I think there are better solutions available to us than macro-on-macro introspection, though I totally appreciate the ergonomic improvement that would offer for the approach you're using right now.

@francescomucio
Copy link

👍 on the "UDF (but not only) management, it is definitely a needed feature. I ended up on this thread because it is the most recent one on the topic and I felt I needed to add my voice/use case for this feature.

A UDF is a way to keep queries more clean, store the logic somewhere else, keeping it compact and easier to maintain. Not everything can be solved with a macro that expands to inline SQL, because sometimes UDF are not even written in SQL, for example an external UDF in Databricks.

In other threads I saw considerations about UDF performances, well, I would live them to the developer or team creating UDFs. Maybe the trade-off speed vs code maintainability is acceptable for them.

We are also using the above mentioned method to create UDFs with macros, but what would be really nice to have is the possibility to invoke UDFs like

select {{ my_udf_function (column1, column2) }}

or

select {{ my_udf_function }}(column1, column2)

But I understand that this can be tricky and requires some work to be done. The benefit will be to have UDFs (but other objects too) as first class citizens in dbt.

@mroy-seedbox
Copy link

@francescomucio: With something like DDO (or your own equivalent implementation), you can reference UDFs (or any other Snowflake object) like a regular DBT model:

select {{ ref('my_udf_function') }}(column1, column2)

@tiffanygwilson: This might also be interesting for you. 👆

@karunpoudel
Copy link
Contributor

karunpoudel commented Sep 22, 2022

We can expose "macros" variable in Jinja context, similar to other jinja context variables (not dbt project variables).
I am also trying to create a macro that need access to macro_sql and dependent macros of some other macro.

We may not want to update macro properties with this, but just reading the properties should be ok.

@jtcohen6
Copy link
Contributor

jtcohen6 commented Sep 23, 2022

We're thinking more about UDF management, and it's increasingly likely we treat these as a first-class resource type in a future release: #5741

We may not want to update macro properties with this, but just reading the properties should be ok.

Definitely read-only, like the other properties in flat_graph.

@karunpoudel To be honest, I'm still not convinced that this is a good change. Could you say more about your intended use case? What are you hoping to be able to do here? Given that UDF management was the original motivation in this issue, I'd much rather we tackle that problem head-on, as I think it deserves.

@karunpoudel
Copy link
Contributor

karunpoudel commented Sep 23, 2022

My main goal is to capture the state of a model and it's dependent macros when a model was executed, and use that information in --state parameter during SlimCI.

Different models in a project can be refreshed in different schedule (some daily, some hourly, weekly etc) so manifest.json from any single run would not represent current state of a database. I am trying to save the definition of each model and it macros in a database using a macro in post_hook. This post_hook could be set in project.yml.

Then, I am going to use above data in database to generate a manifest.json for --state parameter. Since same macro can have different definition (at different point in time) for different models, macro definition should be part of a resource node for this use-case. I am thinking of adding a field in ParsedNode called "dependent_macro_state" that will have it dependent macro's definition. When --state parameter is used with a new parameter called --macro-state-in-respective-node then the --select state:modified selector will read the macro_sql from "dependent_macro_state" property instead of the root macros. Default behavior will be how its working currently. I am going to create an Issue for this to start a discussion. (There might be other better way of solving this)

For the 1st part, currently, we can already get the model definition from model jinja variable but not the macros definition.

@karunpoudel
Copy link
Contributor

Even if UDF becomes 1st class object, people might want to access macros definition in jinja context for various use case.
We may also need to expose udf definition in jinja context.

@jtcohen6 jtcohen6 self-assigned this Nov 17, 2022
@jtcohen6 jtcohen6 added the Refinement Maintainer input needed label Nov 17, 2022
@jtcohen6 jtcohen6 mentioned this issue Feb 15, 2023
6 tasks
@markproctor1
Copy link

I'd like programmatic access to macro properties, so I can make a list of macros based on their folder location.

@gshank
Copy link
Contributor

gshank commented Jun 12, 2023

The code that we use to load macros and the order in which we load them is fragile, complicated and subject to change (we're working on a future change to that right now). Unfortunately allowing this to work the way that you request would only complicate further an already complicated area of the code and would constitute a programmatic interface which would make some changes harder or impossible. For example, most macros cannot be counted on to return correct values at parse time. We do often discuss ways that we could make macros more useful, particularly for simpler string related settings, but what's proposed here would only make those solutions more difficult.

Embedding data in macro files to be retrieved in other places is an anti-pattern that can be solved in a number of other ways, other than using macros.

We we're going to close this ticket for now. If you have additional information that you'd like to add to the discussion, please do.

@gshank gshank closed this as completed Jun 12, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request Refinement Maintainer input needed
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants