Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generate a weekly report of activity #24672

Closed
atoulme opened this issue Jul 29, 2023 · 21 comments
Closed

Generate a weekly report of activity #24672

atoulme opened this issue Jul 29, 2023 · 21 comments
Assignees
Labels

Comments

@atoulme
Copy link
Contributor

atoulme commented Jul 29, 2023

I'd love to automate some of the begrudging work of looking after the project.

  • Once a week on Wednesday, open an issue for review by maintainers with the following data:
    • Number of new issues (with trend compared to week prior)
    • Number of issues in need triage (with trend compared to week prior)
    • Number of issues in ready to merge
    • Number of sponsor needed issues (with trend compared to week prior)
      • New sponsor needed issues

We use this issue as a report during the SIG call, and close it in the call.

@atoulme
Copy link
Contributor Author

atoulme commented Jul 29, 2023

This would run as github action. It might make sense to store historical data somewhere. Or just have a way to piggy back on the last closed issue to get the prior numbers.

@atoulme
Copy link
Contributor Author

atoulme commented Jul 31, 2023

Additional information to report: the unmaintained components and deprecated components.

@kevinslin
Copy link
Contributor

Is anyone working on this? Happy to take this on if not

@atoulme
Copy link
Contributor Author

atoulme commented Aug 15, 2023

Go for it @kevinslin :)

@kevinslin
Copy link
Contributor

On it :)

Any preferred output? (eg. email, csv, markdown table added to the repo)

@atoulme
Copy link
Contributor Author

atoulme commented Aug 16, 2023

The description of the issue says "open an issue for review" - however you want to present the info in the issue is up to you.

@kevinslin
Copy link
Contributor

Got an initial version of the OTEL weekly report going.

Currently the implementation is a single yaml file which uses https://github.com/actions/github-script

Note that the following information is currently missing as I wanted to check in before implementing:

  1. delta for last week
  2. maintained components and deprecated components

Questions:

  • preference on tracking issue deltas
  • format for visualizing maintained/deprecated components
  • for maintained/deprecated components, which components we are tracking

1. Delta For Last Week

Some ways of getting the delta:

  • a) writing stats to a file in the repo and reading from file to get previous values

  • b) using cache action to cache previous values

  • c) querying from github issues previous values

  • a) writing stats to a file

    • pros:
      • simple
    • cons:
      • side effects (require a commit)
      • will require permissions to push to repo or a pull request
  • b) using cache action

    • pros:
      • simple & no side effects
    • cons:
      • fragile, we're relying on cache as a database
  • c) querying from github issues

    • pros:
      • no side effects
    • cons:
      • more complicated logic

Recommendations: c.

Non-fragile and side effect free.
The plan:

- add a `JSON Data` section to each issue that has data in json format
- in gh action, fetch previous week issue
    - optional but recommended: weekly report issues should have the custom label `report` to filter down results
- use regex to find and parse the json data
- calculate deltas

Example of JSON Payload at the bottom of the issue

## Issues
...
 
## JSON Data
<!-- MACHINE GENERATED: DO NOT EDIT -->
<details>
<summary>Expand</summary>
<pre>
{
  ...
}
</pre>
</details>

2. maintained components and deprecated components

For format, I was thinking of the following :

  • maintained components: X1 (Y1)
  • deprecated components: X2 (Y2)

NOTE: X* is number of components, Y* is change from last week

In terms of implementation, the plan is to crawl the repo and get status via metadata.yaml. The following components have a metadata.yaml file:

  • confmap/provider/s3provider/metadata.yaml
  • examples/demo/metadata.yaml
  • cmd/*
  • connector/*
  • exporter/*
  • extension/*
  • internal/*
  • pkg/*
  • processor/*
  • receiver/*

Questions:

  • Are we reporting on the status of everything? Or just the public facing components? (connector/exporter/extension/processor/receiver)
  • Do we care about a breakdown per component type?

@codeboten
Copy link
Contributor

Is the data for this not already available in devstats? Curious if there's way we can leverage what's already there https://opentelemetry.devstats.cncf.io/dashboards

@kevinslin
Copy link
Contributor

@codeboten the raw data for devstats is stored in a postgres database and queried via grafana: https://github.com/cncf/devstats-helm#architecture

since they store everything from gh-archive, i assume that the data is there somewhere in the postgres.
the issue with devstats is that unless you are publishing a new dashboard to https://opentelemetry.devstats.cncf.io/dashboards, I didn't find an accessible way to access that info from outside the hosted grafana

@codeboten
Copy link
Contributor

Would it not be possible to produce a dashboard there that could include the same information? Not sure what the process to do that is, but it would be super useful to have that information there

@atoulme
Copy link
Contributor Author

atoulme commented Aug 23, 2023

This is not a matter of data or access to data, it's a matter of process. My idea of this report is that we generate it every week, make comments on the issue to resolve or acknowledge the state of the project, and close it. This helps us reduce the overhead of having some of those conversations.

@codeboten
Copy link
Contributor

@atoulme I see the benefit around reducing the overhead to bringing this information to light.

I guess I was thinking about this in aggregate, where we could see the overall trend in the project health if we had access to the data over a longer period of time, which seemed like it would be more easily done in a dashboard like devstats.

Was an alternative to use a project to facilitate the discussions investigated?

@kevinslin
Copy link
Contributor

one other issue with using devstats is that we are also generating business specific stats based on metadata.yaml for components that are maintained/deprecated. I think devstats visualizes information based off the raw github data but not aware of custom repo crawling for custom metadata

@atoulme
Copy link
Contributor Author

atoulme commented Aug 24, 2023

I know nothing of those dashboards. I wasn't aware of the grafana instance before you mentioned it today :| It never came up in SIG meetings AFAICT. I don't know how to manage it and I'm just interested in getting topical information in a timely manner with a bias for action through an issue. I'm not sure how we'd use a project to get the same info.

Upon digging a bit, there are no dashboards related to labels. The reason is because of the decoupling to labels in a separate table:
https://github.com/cncf/devstats/blob/a61d46b04aae75ddddb073fc12bf701ad55554b7/docs/tables/gha_issues_labels.md

I can create a chart of occurrences of the event but in effect it tells me little. Maybe there's a better way to go about it, here is a chart of occurrences of the "needs triage" label:

https://opentelemetry.devstats.cncf.io/dashboard/new?orgId=1&from=1672560000000&to=1704095999999&editPanel=1

Pictures of the chart this year and last 7 days:
Screenshot 2023-08-23 at 18 01 57
Screenshot 2023-08-23 at 18 01 35

If we can instead have a cumulative count or keep track of it in some way, that'd work ok. I otherwise am lazy and like the directness of getting an issue with the info I need to query for on the morning of the SIG meeting.

@kevinslin
Copy link
Contributor

submitted pr > #26111

@kevinslin
Copy link
Contributor

the report for this week > kevinslin#19

@atoulme
Copy link
Contributor Author

atoulme commented Aug 30, 2023

Kevin, can you change the style of the list to use Github markdown such as lists show as checklists?
- [ ] Issue blah
This might allow us to check issues off if we review them.

@kevinslin
Copy link
Contributor

updated issue format with gh style checkboxes: kevinslin#21

@kevinslin
Copy link
Contributor

report for this week > kevinslin#22

@kevinslin
Copy link
Contributor

PR has been reviewed and approved. ready to merge :)

report for this week > kevinslin#25

TylerHelmuth added a commit that referenced this issue Sep 29, 2023
**Description:** 

This PR creates a gh action that generates a weekly report on repo
statistics.
It delivers on the requirements specified in
#24672
You can see the sample output here:
kevinslin#16

**Link to tracking Issue:** #24672

**Testing:** 

Manual testing in fork:
https://github.com/kevinslin/opentelemetry-collector-contrib/actions
Example output:
kevinslin#17

**Documentation:** 


The architecture:
- we use `actions/github-script@v6` to make calls to the gh apis
- we require installing `js-yaml` in order to parse `metadata.yaml`
files in order to get components. this dependency is installed during
the github action run and not persisted

Some caveats about the logic:
- when this action runs, it looks back the previous 7 days and gets
issues created in that time period, normalizing times to UTC
- eg. if running this on wednesday (eg. 2023-08-25 17:35:00), it will
scan issues from the previous wednesday (2023-08-18 00:00:00Z) to
beginning of this wednesday (2023-08-28 0:00:00Z)
- this action writes the json payload of the report inside the issue.
the payload is parsed by future reports to calculate deltas
- the report issue has a custom label: `report` - this is used so we can
properly filter previous issues when calculating deltas. the [github
issues
api](https://docs.github.com/en/rest/issues/issues?apiVersion=2022-11-28#list-repository-issues)
only does filtering based on labels and `since` date

This action currently runs every Tuesday at 1AM UTC

---------

Co-authored-by: Antoine Toulme <antoine@toulme.name>
Co-authored-by: Tyler Helmuth <12352919+TylerHelmuth@users.noreply.github.com>
jmsnll pushed a commit to jmsnll/opentelemetry-collector-contrib that referenced this issue Nov 12, 2023
**Description:** 

This PR creates a gh action that generates a weekly report on repo
statistics.
It delivers on the requirements specified in
open-telemetry#24672
You can see the sample output here:
kevinslin#16

**Link to tracking Issue:** open-telemetry#24672

**Testing:** 

Manual testing in fork:
https://github.com/kevinslin/opentelemetry-collector-contrib/actions
Example output:
kevinslin#17

**Documentation:** 


The architecture:
- we use `actions/github-script@v6` to make calls to the gh apis
- we require installing `js-yaml` in order to parse `metadata.yaml`
files in order to get components. this dependency is installed during
the github action run and not persisted

Some caveats about the logic:
- when this action runs, it looks back the previous 7 days and gets
issues created in that time period, normalizing times to UTC
- eg. if running this on wednesday (eg. 2023-08-25 17:35:00), it will
scan issues from the previous wednesday (2023-08-18 00:00:00Z) to
beginning of this wednesday (2023-08-28 0:00:00Z)
- this action writes the json payload of the report inside the issue.
the payload is parsed by future reports to calculate deltas
- the report issue has a custom label: `report` - this is used so we can
properly filter previous issues when calculating deltas. the [github
issues
api](https://docs.github.com/en/rest/issues/issues?apiVersion=2022-11-28#list-repository-issues)
only does filtering based on labels and `since` date

This action currently runs every Tuesday at 1AM UTC

---------

Co-authored-by: Antoine Toulme <antoine@toulme.name>
Co-authored-by: Tyler Helmuth <12352919+TylerHelmuth@users.noreply.github.com>
Copy link
Contributor

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants