You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
torngit.github.list_repos() uses GitHub's REST API to enumerate repos visible to a given user (or the currently signed-in one). The REST API responds with pages of 100 repos, each with all of its details filled out (example). If we convert these requests to use GitHub's GraphQL API, we can request just the fields we need and maybe see a performance boost.
query {
viewer {
login
id
repositories(
ownerAffiliations: [OWNER, COLLABORATOR, ORGANIZATION_MEMBER]
first: 10
after: "Y3Vyc29yOnYyOpHOAXbfGQ=="
) {
pageInfo {
endCursor
hasNextPage
}
edges {
cursor
node {
owner {
login
id
}
id
name
}
}
}
}
}
}
first is the page size, after is the "cursor" indicating the end of the last page. pageInfo gives you endCursor, the end of the current page, and hasNextPage, which is what it says on the tin.
The ownerAffiliations thing is telling GitHub which repos to fetch. We want to test and understand any differences between what this returns and what the REST API returned.
Hopefully, this will be a significant performance improvement for users that, between all their orgs, are syncing hundreds of repos.
There's a timer wrapping the call to list_repos() in sync_repos which is viewable on this Elastic dashboard or is available in Grafana under the name worker_SyncReposTask_sync_repos_list_repos. Watch this metric after deploying to see whether the median, P75, or P95 are made lower or flatter. If so, look for more opportunities!
If networking was not the bottleneck here, or if the bloated REST API responses are actually compressed super well, then we won't see much of an impact here. But it's worth investigating!
The text was updated successfully, but these errors were encountered:
i compared three approaches using a token for my account
current behavior
pre-fetching the total + concurrently downloading
current behavior but using the graphql api
the graphql API performed about as well as the current behavior with 140 repos to fetch (tried page size of 10 and 100) but the pre-fetching approach, which can't be done with the graphql api, performed significantly better.
the rest API uses numbered pages so if you know how many repos you have to fetch you can precompute the URL for each page and download them all concurrently. the graphQL API uses "cursors" for pagination and with that approach each page's query depends on data from the previous page (its endCursor field) so they can't all be downloaded concurrently
closing this issue, the prefetch/concurrent approach will probably be done in codecov/shared#24 when i'm back from vacation
torngit.github.list_repos()
uses GitHub's REST API to enumerate repos visible to a given user (or the currently signed-in one). The REST API responds with pages of 100 repos, each with all of its details filled out (example). If we convert these requests to use GitHub's GraphQL API, we can request just the fields we need and maybe see a performance boost.GitHub has a GraphQL Explorer app that you can use to draft your queries: https://docs.github.com/en/graphql/overview/explorer
An example query would look something like:
first
is the page size,after
is the "cursor" indicating the end of the last page.pageInfo
gives youendCursor
, the end of the current page, andhasNextPage
, which is what it says on the tin.The
ownerAffiliations
thing is telling GitHub which repos to fetch. We want to test and understand any differences between what this returns and what the REST API returned.Hopefully, this will be a significant performance improvement for users that, between all their orgs, are syncing hundreds of repos.
There's a timer wrapping the call to
list_repos()
insync_repos
which is viewable on this Elastic dashboard or is available in Grafana under the nameworker_SyncReposTask_sync_repos_list_repos
. Watch this metric after deploying to see whether the median, P75, or P95 are made lower or flatter. If so, look for more opportunities!If networking was not the bottleneck here, or if the bloated REST API responses are actually compressed super well, then we won't see much of an impact here. But it's worth investigating!
The text was updated successfully, but these errors were encountered: