Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Draft grouphashmetadata table #75980

Closed
wants to merge 2 commits into from

Conversation

lobsterkatie
Copy link
Member

@lobsterkatie lobsterkatie commented Aug 12, 2024

NOTE: This is an overview of what the eventual state of the table might look like, but is not meant to be merged. It's just a place to have conversation about the overall plan.


My first stab at creating the new table. General feedback welcome, but also, I have a two general questions:

  • This is all of the fields that - as of now - are likely to be added in this first go-round. I'm sure there'll be changes, though. Do you think it's better to create the table with all the fields and possibly write migrations to change fields, or just add the fields as I add code related to them?

  • Are there any bits I'm missing, things besides just the fields?

@github-actions github-actions bot added the Scope: Backend Automatically applied to PRs that change backend components label Aug 12, 2024
Copy link
Contributor

github-actions bot commented Aug 12, 2024

This PR has a migration; here is the generated SQL for src/sentry/migrations/0748_create_grouphashmetadata_table.py ()

--
-- Create model GroupHashMetadata
--
CREATE TABLE "sentry_grouphashmetadata" ("id" bigint NOT NULL PRIMARY KEY GENERATED BY DEFAULT AS IDENTITY, "date_added" timestamp with time zone NOT NULL, "grouping_method" integer NOT NULL CHECK ("grouping_method" >= 0), "grouping_config" varchar NOT NULL, "hash_basis" varchar NULL, "hashing_metadata" text NULL, "enhancements" text NULL, "fingerprint" text NULL, "seer_model" varchar NULL, "seer_date_sent" timestamp with time zone NULL, "seer_results" text NULL, "event_sent" varchar(32) NULL, "grouphash_id" bigint NOT NULL UNIQUE, "secondary_hash_match_id" bigint NULL);
ALTER TABLE "sentry_grouphashmetadata" ADD CONSTRAINT "sentry_grouphashmeta_grouphash_id_c47122d9_fk_sentry_gr" FOREIGN KEY ("grouphash_id") REFERENCES "sentry_grouphash" ("id") DEFERRABLE INITIALLY DEFERRED NOT VALID;
ALTER TABLE "sentry_grouphashmetadata" VALIDATE CONSTRAINT "sentry_grouphashmeta_grouphash_id_c47122d9_fk_sentry_gr";
ALTER TABLE "sentry_grouphashmetadata" ADD CONSTRAINT "sentry_grouphashmeta_secondary_hash_match_f564cbf0_fk_sentry_gr" FOREIGN KEY ("secondary_hash_match_id") REFERENCES "sentry_grouphash" ("id") DEFERRABLE INITIALLY DEFERRED NOT VALID;
ALTER TABLE "sentry_grouphashmetadata" VALIDATE CONSTRAINT "sentry_grouphashmeta_secondary_hash_match_f564cbf0_fk_sentry_gr";
CREATE INDEX CONCURRENTLY "sentry_grouphashmetadata_secondary_hash_match_id_f564cbf0" ON "sentry_grouphashmetadata" ("secondary_hash_match_id");

Copy link

codecov bot commented Aug 12, 2024

Test Failures Detected: Due to failing tests, we cannot provide coverage reports at this time.

❌ Failed Test Results:

Completed 21748 tests with 5 failed, 21542 passed and 201 skipped.

View the full list of failed tests

pytest

  • Class name: tests.sentry.backup.test_comparators
    Test name: test_default_comparators
    Flags:
    • backend

    #x1B[1m#x1B[.../sentry/backup/test_comparators.py#x1B[0m:2162: in test_default_comparators
    insta_snapshot(serialized)
    #x1B[1m#x1B[31mE Failed: ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~#x1B[0m
    #x1B[1m#x1B[31mE Snapshot .../snapshots/test_comparators/test_default_comparators.pysnap changed!#x1B[0m
    #x1B[1m#x1B[31mE #x1B[0m
    #x1B[1m#x1B[31mE #x1B[0m
    #x1B[1m#x1B[31mE Re-run pytest with SENTRY_SNAPSHOTS_WRITEBACK=new and then use 'make review-python-snapshots' to review.#x1B[0m
    #x1B[1m#x1B[31mE #x1B[0m
    #x1B[1m#x1B[31mE Or: Use SENTRY_SNAPSHOTS_WRITEBACK=1 to update snapshots directly.#x1B[0m
    #x1B[1m#x1B[31mE #x1B[0m
    #x1B[1m#x1B[31mE #x1B[0m
    #x1B[1m#x1B[31mE --- #x1B[0m
    #x1B[1m#x1B[31mE #x1B[0m
    #x1B[1m#x1B[31mE +++ #x1B[0m
    #x1B[1m#x1B[31mE #x1B[0m
    #x1B[1m#x1B[31mE @@ -569,6 +569,12 @@#x1B[0m
    #x1B[1m#x1B[31mE #x1B[0m
    #x1B[1m#x1B[31mE - group_tombstone_id#x1B[0m
    #x1B[1m#x1B[31mE - project#x1B[0m
    #x1B[1m#x1B[31mE model_name: sentry.grouphash#x1B[0m
    #x1B[1m#x1B[31mE +- comparators:#x1B[0m
    #x1B[1m#x1B[31mE + - class: ForeignKeyComparator#x1B[0m
    #x1B[1m#x1B[31mE + fields:#x1B[0m
    #x1B[1m#x1B[31mE + - grouphash#x1B[0m
    #x1B[1m#x1B[31mE + - secondary_hash_match#x1B[0m
    #x1B[1m#x1B[31mE + model_name: sentry.grouphashmetadata#x1B[0m
    #x1B[1m#x1B[31mE - comparators:#x1B[0m
    #x1B[1m#x1B[31mE - class: ForeignKeyComparator#x1B[0m
    #x1B[1m#x1B[31mE fields:#x1B[0m
    #x1B[1m#x1B[31mE ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~#x1B[0m
  • Class name: tests.sentry.backup.test_dependencies
    Test name: test_detailed
    Flags:
    • backend

    #x1B[1m#x1B[.../sentry/backup/test_dependencies.py#x1B[0m:54: in test_detailed
    assert_model_dependencies(expect, actual)
    #x1B[1m#x1B[.../sentry/backup/test_dependencies.py#x1B[0m:42: in assert_model_dependencies
    raise AssertionError(
    #x1B[1m#x1B[31mE AssertionError: Model dependency graph does not match fixture. This means that you have changed the model dependency graph in some load bearing way. If you are seeing this in CI, and the dependency changes are intentional, please run `bin/generate-model-dependency-fixtures` and re-upload:#x1B[0m
    #x1B[1m#x1B[31mE #x1B[0m
    #x1B[1m#x1B[31mE --- #x1B[0m
    #x1B[1m#x1B[31mE #x1B[0m
    #x1B[1m#x1B[31mE +++ #x1B[0m
    #x1B[1m#x1B[31mE #x1B[0m
    #x1B[1m#x1B[31mE @@ -2306,6 +2306,33 @@#x1B[0m
    #x1B[1m#x1B[31mE #x1B[0m
    #x1B[1m#x1B[31mE ]#x1B[0m
    #x1B[1m#x1B[31mE ]#x1B[0m
    #x1B[1m#x1B[31mE },#x1B[0m
    #x1B[1m#x1B[31mE + "sentry.grouphashmetadata": {#x1B[0m
    #x1B[1m#x1B[31mE + "dangling": false,#x1B[0m
    #x1B[1m#x1B[31mE + "foreign_keys": {#x1B[0m
    #x1B[1m#x1B[31mE + "grouphash": {#x1B[0m
    #x1B[1m#x1B[31mE + "kind": "FlexibleForeignKey",#x1B[0m
    #x1B[1m#x1B[31mE + "model": "sentry.grouphash",#x1B[0m
    #x1B[1m#x1B[31mE + "nullable": false#x1B[0m
    #x1B[1m#x1B[31mE + },#x1B[0m
    #x1B[1m#x1B[31mE + "secondary_hash_match": {#x1B[0m
    #x1B[1m#x1B[31mE + "kind": "FlexibleForeignKey",#x1B[0m
    #x1B[1m#x1B[31mE + "model": "sentry.grouphash",#x1B[0m
    #x1B[1m#x1B[31mE + "nullable": true#x1B[0m
    #x1B[1m#x1B[31mE + }#x1B[0m
    #x1B[1m#x1B[31mE + },#x1B[0m
    #x1B[1m#x1B[31mE + "model": "sentry.grouphashmetadata",#x1B[0m
    #x1B[1m#x1B[31mE + "relocation_dependencies": [],#x1B[0m
    #x1B[1m#x1B[31mE + "relocation_scope": "Excluded",#x1B[0m
    #x1B[1m#x1B[31mE + "silos": [#x1B[0m
    #x1B[1m#x1B[31mE + "Region"#x1B[0m
    #x1B[1m#x1B[31mE + ],#x1B[0m
    #x1B[1m#x1B[31mE + "table_name": "sentry_grouphashmetadata",#x1B[0m
    #x1B[1m#x1B[31mE + "uniques": [#x1B[0m
    #x1B[1m#x1B[31mE + [#x1B[0m
    #x1B[1m#x1B[31mE + "grouphash"#x1B[0m
    #x1B[1m#x1B[31mE + ]#x1B[0m
    #x1B[1m#x1B[31mE + ]#x1B[0m
    #x1B[1m#x1B[31mE + },#x1B[0m
    #x1B[1m#x1B[31mE "sentry.grouphistory": {#x1B[0m
    #x1B[1m#x1B[31mE "dangling": false,#x1B[0m
    #x1B[1m#x1B[31mE "foreign_keys": {#x1B[0m
  • Class name: tests.sentry.backup.test_dependencies
    Test name: test_flat
    Flags:
    • backend

    #x1B[1m#x1B[.../sentry/backup/test_dependencies.py#x1B[0m:63: in test_flat
    assert_model_dependencies(expect, actual)
    #x1B[1m#x1B[.../sentry/backup/test_dependencies.py#x1B[0m:42: in assert_model_dependencies
    raise AssertionError(
    #x1B[1m#x1B[31mE AssertionError: Model dependency graph does not match fixture. This means that you have changed the model dependency graph in some load bearing way. If you are seeing this in CI, and the dependency changes are intentional, please run `bin/generate-model-dependency-fixtures` and re-upload:#x1B[0m
    #x1B[1m#x1B[31mE #x1B[0m
    #x1B[1m#x1B[31mE --- #x1B[0m
    #x1B[1m#x1B[31mE #x1B[0m
    #x1B[1m#x1B[31mE +++ #x1B[0m
    #x1B[1m#x1B[31mE #x1B[0m
    #x1B[1m#x1B[31mE @@ -316,6 +316,9 @@#x1B[0m
    #x1B[1m#x1B[31mE #x1B[0m
    #x1B[1m#x1B[31mE "sentry.grouptombstone",#x1B[0m
    #x1B[1m#x1B[31mE "sentry.project"#x1B[0m
    #x1B[1m#x1B[31mE ],#x1B[0m
    #x1B[1m#x1B[31mE + "sentry.grouphashmetadata": [#x1B[0m
    #x1B[1m#x1B[31mE + "sentry.grouphash"#x1B[0m
    #x1B[1m#x1B[31mE + ],#x1B[0m
    #x1B[1m#x1B[31mE "sentry.grouphistory": [#x1B[0m
    #x1B[1m#x1B[31mE "sentry.group",#x1B[0m
    #x1B[1m#x1B[31mE "sentry.organization",#x1B[0m
  • Class name: tests.sentry.backup.test_dependencies
    Test name: test_sorted
    Flags:
    • backend

    #x1B[1m#x1B[.../sentry/backup/test_dependencies.py#x1B[0m:72: in test_sorted
    assert_model_dependencies(expect, actual)
    #x1B[1m#x1B[.../sentry/backup/test_dependencies.py#x1B[0m:42: in assert_model_dependencies
    raise AssertionError(
    #x1B[1m#x1B[31mE AssertionError: Model dependency graph does not match fixture. This means that you have changed the model dependency graph in some load bearing way. If you are seeing this in CI, and the dependency changes are intentional, please run `bin/generate-model-dependency-fixtures` and re-upload:#x1B[0m
    #x1B[1m#x1B[31mE #x1B[0m
    #x1B[1m#x1B[31mE --- #x1B[0m
    #x1B[1m#x1B[31mE #x1B[0m
    #x1B[1m#x1B[31mE +++ #x1B[0m
    #x1B[1m#x1B[31mE #x1B[0m
    #x1B[1m#x1B[31mE @@ -217,6 +217,7 @@#x1B[0m
    #x1B[1m#x1B[31mE #x1B[0m
    #x1B[1m#x1B[31mE "sentry.sentryappinstallationtoken",#x1B[0m
    #x1B[1m#x1B[31mE "sentry.sentryappinstallationforprovider",#x1B[0m
    #x1B[1m#x1B[31mE "sentry.incident",#x1B[0m
    #x1B[1m#x1B[31mE + "sentry.grouphashmetadata",#x1B[0m
    #x1B[1m#x1B[31mE "sentry.dashboardwidgetqueryondemand",#x1B[0m
    #x1B[1m#x1B[31mE "sentry.alertruletriggerexclusion",#x1B[0m
    #x1B[1m#x1B[31mE "sentry.alertruletriggeraction",#x1B[0m
  • Class name: tests.sentry.backup.test_dependencies
    Test name: test_truncate
    Flags:
    • backend

    #x1B[1m#x1B[.../sentry/backup/test_dependencies.py#x1B[0m:81: in test_truncate
    assert_model_dependencies(expect, actual)
    #x1B[1m#x1B[.../sentry/backup/test_dependencies.py#x1B[0m:42: in assert_model_dependencies
    raise AssertionError(
    #x1B[1m#x1B[31mE AssertionError: Model dependency graph does not match fixture. This means that you have changed the model dependency graph in some load bearing way. If you are seeing this in CI, and the dependency changes are intentional, please run `bin/generate-model-dependency-fixtures` and re-upload:#x1B[0m
    #x1B[1m#x1B[31mE #x1B[0m
    #x1B[1m#x1B[31mE --- #x1B[0m
    #x1B[1m#x1B[31mE #x1B[0m
    #x1B[1m#x1B[31mE +++ #x1B[0m
    #x1B[1m#x1B[31mE #x1B[0m
    #x1B[1m#x1B[31mE @@ -217,6 +217,7 @@#x1B[0m
    #x1B[1m#x1B[31mE #x1B[0m
    #x1B[1m#x1B[31mE "sentry_sentryappinstallationtoken",#x1B[0m
    #x1B[1m#x1B[31mE "sentry_sentryappinstallationforprovider",#x1B[0m
    #x1B[1m#x1B[31mE "sentry_incident",#x1B[0m
    #x1B[1m#x1B[31mE + "sentry_grouphashmetadata",#x1B[0m
    #x1B[1m#x1B[31mE "sentry_dashboardwidgetqueryondemand",#x1B[0m
    #x1B[1m#x1B[31mE "sentry_alertruletriggerexclusion",#x1B[0m
    #x1B[1m#x1B[31mE "sentry_alertruletriggeraction",#x1B[0m

Copy link
Member

@armenzg armenzg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is pretty cool to see. Good start 👍🏻

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[optional] You can add a module comment to help folks get the gist of what this module is about.

FULL_STACKTRACE = "stack"
APP_STACKTRACE = "app_stack"
FINGERPRINT = "fingerprint"
# TODO: Are there others?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Custom fingerprint.

Today I've found this new way of grouping
image

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I actually think I need to go through all of the strategies, variants, and components to figure out which of them can turn into an over-arching reason for grouping. Will definitely take some more research, but I'll cross that bridge when I come to it.

# Most recent config to produce this hash
# TODO: do we need first, also?
# TODO: Is there a way to get choices for this from grouping.strategies.configurations.CONFIGURATIONS?
grouping_config = models.CharField()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Beware: I'm handwaving here!

Maybe something like this:

from foo import CONFIGURATIONS

class ProjectConfig(models.TextChoices):
        CONFIG_0 = CONFIGURATIONS[0]
        CONFIG_1 = CONFIGURATIONS[1]
        ...
        CONFIG_N = CONFIGURATIONS[N]

Also, we should add an assertion for the size of it somewhere.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes sense. I think the big issue here is going to be circular dependencies - lots of the grouping modules import from models, and in this model I'd have to be importing from one of those modules. Even if it technically works right now (because whatever I need doesn't happen to come from modules which import from models), I'd be nervous to rely on that continuing to the the case. So move it to a neutral location? But then it's populated by grouping code...


@region_silo_model
class GroupHashMetadata(Model):
__relocation_scope__ = RelocationScope.Excluded
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not understand the implications of using this.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is part of that backup/relocation project Alex has been working on. When you migrate an org from one region to another, or from on-prem to SAAS, not all of your data comes along for the ride, and in particular none of your event/group data does.

# Eg. message, in-app stacktrace, full stacktrace, fingerprint, etc
hash_basis = models.CharField(choices=HashBasis, null=True)
# Eg. if message parameterized, what type of parameterization?, etc)
hashing_metadata: models.Field[dict[str, Any] | None, dict[str, Any] | None] = JSONField(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you have an example to help me understand?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are examples of all of these in the Notion doc, but one possibility for this field might be:

{
  "message_parametrized": true,
  "parameterization_types": ["int", "date"]
}

)
# Built-in and custom stacktrace rules used
enhancements: models.Field[dict[str, Any] | None, dict[str, Any] | None] = JSONField(null=True)
# Client fingerprint, value, matched rule, rule source
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you have examples of each property you are referring to?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup - in the Notion doc linked in the above comment.

# Client fingerprint, value, matched rule, rule source
fingerprint: models.Field[dict[str, Any] | None, dict[str, Any] | None] = JSONField(null=True)
# If this hash was associated to its group via a grouping config transition, what secondary hash
# made the link?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm having trouble picturing when this would happen.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In a config transition, if the secondary config calculates hash A and the new config calculates hash B, we use the group associated with hash A for hash B as well. This field would exist on hash B's metadata, and would point to hash A.

)

# SEER
seer_model = models.CharField(null=True)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What could the values be?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right now we're going with vX-type versioning:

SEER_SIMILARITY_MODEL_VERSION = "v0"

# This will be different than `date_added` if we send it to Seer as part of a backfill
seer_date_sent = models.DateTimeField(null=True)
# TODO Figure out how to best reuse the types in seer.similarity.types
seer_results: models.Field[list[Any] | None, list[Any] | None] = JSONField(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What can the seer results be?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It'll be a list of dict versions of SeerSimilarIssueData instances. For now there'll only ever be one item in the list, since we're only planning to store the single best match (if any), but here and elsewhere in the code it's a list just in case that changes in the future.

class SeerSimilarIssueData:
stacktrace_distance: float
message_distance: float
should_group: bool
parent_group_id: int
parent_hash: str

@lobsterkatie lobsterkatie changed the title [WIP] Create grouphashmetadata table [WIP] Draft grouphashmetadata table Aug 13, 2024
@getsantry
Copy link
Contributor

getsantry bot commented Sep 4, 2024

This pull request has gone three weeks without activity. In another week, I will close it.

But! If you comment or otherwise update it, I will reset the clock, and if you add the label WIP, I will leave it alone unless WIP is removed ... forever!


"A weed is but an unloved flower." ― Ella Wheeler Wilcox 🥀

@getsantry getsantry bot added the Stale label Sep 4, 2024
@getsantry getsantry bot closed this Sep 11, 2024
@lobsterkatie lobsterkatie deleted the kmclb-grouphashmetadata-rough-draft branch September 20, 2024 05:14
@github-actions github-actions bot locked and limited conversation to collaborators Oct 5, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Scope: Backend Automatically applied to PRs that change backend components Stale
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants