-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Evaluate open source data catalog options for integration into this platform #35
Comments
Per the feature-set comparisons on awesome-data-catalogs, it looks like my assessment about this product space was accurate; DataHub and OpenMetadata are the most feature-rich and developed options, but there's one more comparably feature-rich project: OpenDataDiscovery. That project only has 680 stars at the moment, is a month older than OpenMetadata, and it is growing much less rapidly than OpenMetadata or DataHub. |
Looks like I'll have to upgrade |
Misc notes DataHub Metadata Enrichment
Shift Left enrichmentEnrich at source (e.g., via comments in SQL table definitions, or in meta blocks in dbt schema.yml files, in description fields in LookML dimension/metric definitions, etc) Transform EnrichmentUseful when there are patterns in the source data (e.g. common terms, field names, or concepts), CSV: Bulk Enrichment EmportIf you have a google doc or something defining ownership and definitions, you can ingest that API EnrichmentFor programmatic metadata (e.g., outputs from CI/CD processes) DataHub UIThe initial one shown where you add info through the UI |
did you consider Amundsen? https://github.com/amundsen-io/amundsen - not entirely sure if it's considered a data catalog, but just a callout it has 3,700 stars |
@bbrewington I gave it a brief look but the relatively modest amount of activity on the amundsen repo put it below DataHub and OpenMetadata on my list of things to check. I will confess, I couldn't get a great sense of the feature-sets of either of those tools from their websites and decided I'd just spin up test deployments for both and scan through the features. Here's my test setup of OpenMetadata and I'll probably spin up a DataHub test run tomorrow. Have you used it? If so, what did you think of it? I checked through your repos and commits to see if you were a contributor but I didn't check too far. By the way, it looks like we've been looking at a lot of the docs and projects over the past few months, and I like the commit msgs on your dbt-BQ-info_schema repo. |
@MattTriano haha sounds like the clickbait hooked you in (having some fun with that one) - here's link for reference: https://github.com/bbrewington/dbt-bigquery-information-schema TBH I'm still pretty new to Metadata tools...actually the above linked repo might be a good use case to try some of these out. I assumed Amundsen was best in class, but now will consider the 3 against each other |
A data catalog should have:
dbt's built-in doc server does include most of that functionality (even access control, apparently https://www.getdbt.com/blog/teaching-dbt-about-grants/), but it doesn't allow users to edit things through the portal, and I think it's intended more as a dev tool than a production option.
There are two options I want to evaluate:
I've looked at Amundsen, but its community is about 5% as active as OpenMetadata's community, and I don't think it will keep up.
The text was updated successfully, but these errors were encountered: