Skip to content

Commit

Permalink
chore: update flamingo rotation manual (ethereum#1376)
Browse files Browse the repository at this point in the history
  • Loading branch information
morph-dev committed Aug 20, 2024
1 parent f3e19fb commit b4a81e3
Show file tree
Hide file tree
Showing 2 changed files with 38 additions and 20 deletions.
2 changes: 1 addition & 1 deletion book/src/developers/contributing/releases/deployment.md
Original file line number Diff line number Diff line change
Expand Up @@ -48,7 +48,7 @@ This step directs Ansible to use the current master version of trin. Read [about

### Run ansible
- Check monitoring tools to understand network health, and compare against post-deployment, eg~
- [Glados](http://glados.ethportal.net/content/)
- [Glados](https://glados.ethdevops.io/)
- [Grafana](https://trin-bench.ethdevops.io/d/e23mBdEVk/trin-metrics?orgId=1)
- Activate the virtual environment in the cluster repo: `. venv/bin/activate`
- Make sure you've pulled the latest master branch of the deployment scripts, to include any recent changes: `git pull origin master`
Expand Down
56 changes: 37 additions & 19 deletions book/src/developers/contributing/rotation/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@ Recently, we started a new process. It is evolving, so this doc will evolve with
Every Monday, a full-time member of trin rotates into a maintenance position. The person in that role is called the "Flamingo" ([truly amazing animals](https://www.reddit.com/r/Flamingo/comments/odzxry/are_flamingos_extremophiles/)).

Some responsibilities for the Flamingo are:

- monitor glados & hive to watch for regressions; investigate & resolve them
- answer questions from other Portal teams
- run releases & deployments
Expand All @@ -20,31 +21,54 @@ There are some daily tasks, and some transition tasks: Kickoff on Monday & Hando

Unlike your code-heavy weeks, this can be an interruption-heavy week. Although it's often good practice to not get distracted by inbound messages while coding, this week is the opposite: as soon as you see new messages (like on Discord), jump into it.

Start with Kickoff on Monday morning:

### Kickoff

At the end of the previous week, you will get the [Flamingo notes](https://notes.ethereum.org/YAiNsmc8SSq05TwYdk-8Eg) from the previous Flamingo. They will include incidents from the prior weeks and any in-progress tasks. This will be crucial to get a quick understanding of the state of the network and any in-progress tasks to resume.

Read through the notes, and then generate the new checklist for the week, by [creating a note from this template](https://notes.ethereum.org/?nav=overview&template=b35733cd-b374-4b79-bc57-f2bb58ee651e).
Have a discussion with the previous Flamingo on any ongoing issues or in-progress tasks. This is crucial to get a quick understanding of the state of the network and any in-progress tasks to resume.

Link the generated checklist into the Flamingo notes for your week. Make sure your status is "online" in Discord. Make sure you're tagged under the `trin-flamingo` role. Put on your favorite pink shirt. Watch a [silly flamingo video](https://www.youtube.com/watch?v=gWNWtbPEWw0). Fly.
Make sure your status is "online" in Discord. Make sure you're tagged under the `trin-flamingo` role (ping discord Admin). Put on your favorite pink shirt. Watch a [silly flamingo video](https://www.youtube.com/watch?v=gWNWtbPEWw0). Fly.

### First

Read through the "Setup" section of the [Deployment Instructions](../releases/deployment.md) and follow the steps to make sure that your PGP and SSH keys are in place and ready for a deployment.
Read through the "Setup" section of the [Deployment Instructions](../releases/deployment.md) and follow the steps to make sure that your PGP and SSH keys are in place and ready for a deployment.

### Daily

#### Checklist

Every day, go down the checklist for that day of the week, resolve items and check them off. If you think of new daily things to add to the checklist, comment on the template to suggest it.
Every day, go down the daily and items for that day of the week. If you think of new daily things to add to the checklist, create a PR.

- **Daily**
- Read Discord, especially for help requests or signs of network issues
- Monitor [Glados](https://glados.ethdevops.io/) changes, and correlate with releases (trin, glados, other clients?)
- Monitor [portal-hive](https://portal-hive.ethdevops.io/) changes
- Check the dates, did all test suites run on the previous cycle?
- For each suite, did the expected number of tests run?
- Did clients start failing any new tests?
- If trin failing, create rotation issue to pursue
- If other client failing, notify the team
- Look for [inspiration for Flamingo projects](../rotation/index.md#maintenance-inspiration)

- **Monday** - kickoff
- Announce that you are on rotation in Discord
- Give yourself the Discord role `@trin-flamingo`
- Discuss ongoing issues and in-progress tasks with previous Flamingo
- Give weekly summary update in all-Portal call
- Pick day of week for deployment (usually Wednesday or Thursday), discuss in `#trin`

- **Wednesday or Thursday** - release
- [Release](../releases/release_checklist.md) and [deploy](../releases/deployment.md) new version of trin

- **Friday** - wrap-up
- Haven't deployed yet? Oops, a bit late. Get it done as early as possible.
- Comment on the checklist to add/update/delete anything?
- Identify the next Flamingo and prepare for [handoff](#handoff)

As long as there aren't any major incidents, you should finish the checklist with plenty of time left in your day. See [Maintenance Inspiration](#maintenance-inspiration) for what to do for the rest of the day.

#### Maintenance Inspiration

When you get to the end of your checklist, here are ideas for what to work on next:

- Respond to [Github Participating Notifications](https://github.com/notifications?query=reason%3Aparticipating)
- Review PRs that have been stuck for >24 hours
- Find a [Flamingo Issue](https://github.com/ethereum/trin/issues?q=is%3Aopen+is%3Aissue+label%3Aflamingo) that seems promising, and assign it to yourself (and the project dashboard)
Expand All @@ -66,22 +90,15 @@ When you get to the end of your checklist, here are ideas for what to work on ne
- Add new integration tools (eg~ portal-hive -> Discord bot?)
- Scratch your own itch! What do you wish was easier? Now is the time to build it.

Be sure to update the week's notes as you go, adding: notable incidents, summaries of what you are working on, and ideas for future work.

### Handoff

Before handing off the notes on Friday, be sure to read through the notes to cleanly summarize the week's activities. Link to any helpful issues, and delete any text that is only meaningful to you.
Look back at your week as Flamingo and summarize it in notes if needed. Prepare for a kickoff discussion with the next Flamingo and update them on your work from previous week.

Double check the [Flamingo schedule](https://notes.ethereum.org/@njgheorghita/r1angO2lT) and make sure you're available for your next rotation. If not, please switch with somebody asap.

Think through: what kinds of things do you wish were on the notes that you got on Monday?
Especially while we continue to develop the procedure, try to be available the following Monday to help the subsequent Flamingo transition in, and get started.

After a while, everyone should have the Flamingo Notes link. Sending the link anyway helps accomplish a few things:
1. Signal to the next Flamingo that you're done writing up the notes
2. Give the next Flamingo a reminder that their turn is coming up
3. A failsafe to make sure we didn't forget to pick a Flamingo for the following week
Think through: what kinds of things do you think should be on the checklist?

Especially while we continue to develop the procedure, try to be available the following Monday to help the subsequent Flamingo transition in, and get started.
Double check the [Flamingo schedule](https://notes.ethereum.org/@njgheorghita/r1angO2lT) and make sure you're available for your next rotation. If not, please switch with somebody asap.

## Being the Announcer

Expand All @@ -90,6 +107,7 @@ The Flamingo is often the first to notice downtime. Whether it's something that
## *Not* Flamingo

If you're not Flamingo currently, there are some situations where it is important to ping the Flamingo in Discord with `@flamingo-trin`. For example:

- You notice the network behaving abnormally
- You notice any of our tooling behaving abnormally (github, Circle, glados, etc)
- You have a PR that has had no review activity after asking more than 24 hours ago
Expand Down

0 comments on commit b4a81e3

Please sign in to comment.