Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat/fmd-206 add ingestion timing #213

Merged
merged 5 commits into from
Aug 8, 2024
Merged

Conversation

tom-webber
Copy link
Contributor

  • feat: add calls to time unix command to datahub ingest calls
  • fix: Notify on failure GHA steps referencing incorrect input var for environment
  • chore: type checker linting (fixes pylance type checking errors)
  • feat: add st_time decorator for timing function runs
  • chore: use non-depricated suggested methods from datahub utils

@tom-webber tom-webber force-pushed the feat/fmd-201-add-ingestion-timing branch from 5a82e82 to 6e484f0 Compare July 23, 2024 14:33
@tom-webber tom-webber marked this pull request as ready for review July 23, 2024 14:34
@tom-webber tom-webber linked an issue Jul 24, 2024 that may be closed by this pull request
@tom-webber tom-webber changed the title feat/fmd-201 add ingestion timing feat/fmd-206 add ingestion timing Jul 24, 2024
ingestion/utils.py Outdated Show resolved Hide resolved
Copy link
Contributor

@MatMoore MatMoore left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good overall, I'm just a bit confused about a few bits - see inline comments

@MatMoore MatMoore force-pushed the feat/fmd-201-add-ingestion-timing branch from ccba08e to 4eda154 Compare August 6, 2024 14:47
hjribeiro-moj
hjribeiro-moj previously approved these changes Aug 7, 2024
Copy link
Contributor

@hjribeiro-moj hjribeiro-moj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

- add calls to `time` unix command to `datahub ingest` calls
- add decorators for timing function/iterator runs
- use the timers for
  - create_cadet_databases_source
  - justice_data_source
  - `AssignCadetDatabases transformer

Co-authored-by: Mat Moore <mat.moore@noreply.github.com>
@MatMoore
Copy link
Contributor

MatMoore commented Aug 8, 2024

This is working on dev, and output confirms our hypothesis that AssignCadetDomains is what is slowing things down. Pretty sure we can significantly optimise this!

  • "create cadet domains" 2m 48s
    • __init__ 0s
    • get_workunits 2m 37s
      • get_cadet_mainfest 2m 37s
      • _get_databases_with_domains_and_display_tags 0s
  • "push metadata to datahub" 3h 11m 32s
    • __init__ 0s
    • get_workunits 2m 27s
      • _get_cadet_manifest 2m 36s
      • _get_domain_mapping 0s
    • AssignCadetDomains transformer 2h 32m 10s
    • AssignCadetDatabases 2m 42s
      • get_cadet_manifest 2m 41s
      • _get_table_database_mappings 0s

@MatMoore
Copy link
Contributor

MatMoore commented Aug 8, 2024

@hjribeiro-moj would you mind rereviewing? I had to push up a small fix yesterday

Copy link
Contributor

@mitchdawson1982 mitchdawson1982 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@MatMoore MatMoore merged commit ffcbfe9 into main Aug 8, 2024
4 checks passed
@MatMoore MatMoore deleted the feat/fmd-201-add-ingestion-timing branch August 8, 2024 15:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add logging to ingestion so we know how long each process takes
4 participants