You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In reviewing #4, I echo @comaraDOTcom's sentiment when they say they'd be "more inclined to use the output of the macro as a CTE and select a subset of cols in a subsequent CTE." To that end, I'd like to propose two macros (names debatable):
one_hot_encoder_wrapper: the existing functionality with include_columns and exclude_columns, and
one_hot_encoder which takes only the source_table, source_column, category_values, and handle_unknown params.
To me, the benefits would be:
more direct alignment with sklearn.preprocessing.OneHotEncoder and pandas.get_dummies()
enable smaller code footprint for adapters that require dispatching
better complement existing package functionality such as dbt_utils.star(),
this way dbt models that lever one_hot_encoder will look like other dbt models, instead of a single macro call with no SELECT statement.
Example usage
Suppose a table, fruits that:
has 3 columns: id, species, and `color; and,
the color columns has two values: orange and yellow
In reviewing #4, I echo @comaraDOTcom's sentiment when they say they'd be "more inclined to use the output of the macro as a CTE and select a subset of cols in a subsequent CTE." To that end, I'd like to propose two macros (names debatable):
one_hot_encoder_wrapper
: the existing functionality withinclude_columns
andexclude_columns
, andone_hot_encoder
which takes only thesource_table
,source_column
,category_values
, andhandle_unknown
params.To me, the benefits would be:
sklearn.preprocessing.OneHotEncoder
andpandas.get_dummies()
dbt_utils.star()
,one_hot_encoder
will look like other dbt models, instead of a single macro call with noSELECT
statement.Example usage
Suppose a table,
fruits
that:id
,species
, and `color; and,color
columns has two values:orange
andyellow
goal compiled SQL
possible uses
alternatively if one would like to include or exclude certain columns from the source table, they could do so like this
The text was updated successfully, but these errors were encountered: