Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Automating initialization of on-demand self-hosted CNCF CIL runner #39

Merged
merged 1 commit into from
Mar 15, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
133 changes: 133 additions & 0 deletions .github/workflows/configurable-benchmark-test-self-hosted.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,133 @@
name: Configurable Benchmark Test on Self-hosted Runner
on:
workflow_dispatch:
inputs:
profile_name:
description: "performance profile to use"
required: false
profile_filename:
description: "test configuration file"
required: false
service_mesh:
type: choice
required: false
description: "service mesh being tested"
options:
- istio
- linkerd
load_generator:
type: choice
required: false
description: "load generator to run tests with"
options:
- fortio
- wrk2
- nighthawk

jobs:
start-runner:
name: Start self-hosted CNCF CIL runner
runs-on: ubuntu-latest
if: ${{ github.event_name == 'workflow_dispatch' }}
outputs:
hostname: ${{ steps.start-cil-runner.outputs.hostname }}
label: ${{ steps.start-cil-runner.outputs.label }}
device_id: ${{ steps.start-cil-runner.outputs.device_id }}
steps:
- name: Checkout Code
uses: actions/checkout@v2

- name: Configure CNCF CIL credentials
run: |
chmod +x .github/workflows/scripts/self-hosted-credentails.sh
.github/workflows/scripts/self-hosted-credentails.sh ${{ secrets.CNCF_CIL_TOKEN }}
shell: bash

- name: Create registration token for CNCF CIL runner
id: getRegToken
run: |
reg_token=$(curl -s -X POST -H "Accept: application/vnd.github.v3+json" \
-H 'Authorization: token ${{ secrets.PAT }}' \
https://api.github.com/repos/${{github.repository}}/actions/runners/registration-token | jq -r .token)
echo REG_TOKEN=$reg_token >> $GITHUB_ENV
echo REPOSITORY=${{github.repository}} >> $GITHUB_ENV
shell: bash

- name: Start CNCF CIL runner
id: start-cil-runner
run: |
chmod +x .github/workflows/scripts/start-cil-runner.sh
.github/workflows/scripts/start-cil-runner.sh ${{ secrets.cncf_cil_token }} ${{ github.event.inputs.service_mesh }}-${{ github.event.inputs.load_generator }}
shell: bash

run-benchmarks:
name: Run the configurable benchmarks on the runner
needs:
- start-runner # required to start the main job when the runner is ready
runs-on: ${{ needs.start-runner.outputs.label }} # run the job on the newly created runner
steps:
- name: Install dependencies
run: |
echo "Current user: $(whoami)"
echo "Installing kubectl..."
curl -LO https://dl.k8s.io/release/v1.23.2/bin/linux/amd64/kubectl
sudo install -o smp -g smp -m 0755 kubectl /usr/local/bin/kubectl
echo "Installing docker..."
sudo apt update -y
sudo apt install -y jq unzip apt-transport-https ca-certificates software-properties-common
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo apt-key add -
sudo add-apt-repository "deb [arch=amd64] https://download.docker.com/linux/ubuntu focal stable"
sudo apt-cache policy docker-ce
sudo apt install -y docker-ce
sudo systemctl status docker

- name: Setup Kubernetes
uses: manusa/actions-setup-minikube@v2.4.3
with:
minikube version: 'v1.23.2'
kubernetes version: 'v1.23.2'
driver: docker

- name: Checkout Code
uses: actions/checkout@v2

- name: Install Service Mesh and Deploy Application
run: |
chmod +x .github/workflows/scripts/${{ github.event.inputs.service_mesh }}_deploy.sh
.github/workflows/scripts/${{ github.event.inputs.service_mesh }}_deploy.sh
shell: bash

- name: Run Benchmark Tests
uses: layer5io/meshery-smp-action@self-hosted
with:
provider_token: ${{ secrets.MESHERY_TOKEN }}
platform: docker
profile_name: ${{ github.event.inputs.profile_name }}
profile_filename: ${{ github.event.inputs.profile_filename }}
endpoint_url: ${{env.ENDPOINT_URL}}
service_mesh: ${{env.SERVICE_MESH}}
load_generator: ${{ github.event.inputs.load_generator }}
test_name: '${{ github.event.inputs.service_mesh }}-${{ github.event.inputs.load_generator }}-${{ github.event.inputs.profile_filename }}${{ github.event.inputs.profile_name }}'

stop-runner:
name: Stop self-hosted runner
needs:
- start-runner # required to get output from the start-runner job
- run-benchmarks # required to wait when the main job is done
runs-on: ubuntu-latest
if: ${{ always() }} # required to stop the runner even if the error happened in the previous jobs
steps:
- name: Checkout Code
uses: actions/checkout@v2

- name: Stop CNCF CIL runner
run: |
chmod +x .github/workflows/scripts/stop-cil-runner.sh
.github/workflows/scripts/stop-cil-runner.sh ${{ secrets.cncf_cil_token }} ${{ needs.start-runner.outputs.device_id }} ${{ needs.start-runner.outputs.hostname }}
shell: bash

- name: Remove CNCF CIL runner from github repository
run: |
runner_id=$(curl -s -H 'Authorization: token ${{ secrets.PAT }}' -H "Accept: application/vnd.github.v3+json" https://api.github.com/repos/${{github.repository}}/actions/runners | jq '.runners[] | select(.name == "${{ needs.start-runner.outputs.hostname }}") | {id}' | jq .id)
curl -X DELETE -H 'Authorization: token ${{ secrets.PAT }}' -H "Accept: application/vnd.github.v3+json" https://api.github.com/repos/${{github.repository}}/actions/runners/$runner_id
shell: bash
4 changes: 2 additions & 2 deletions .github/workflows/configurable-benchmark-test.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,7 @@ on:
jobs:
manual-test:
name: Configurable Benchmark Test
runs-on: self-hosted
runs-on: ubuntu-latest
if: ${{ github.event_name == 'workflow_dispatch' }}
steps:
- name: Setup Kubernetes
Expand All @@ -55,7 +55,7 @@ jobs:
uses: layer5io/meshery-smp-action@self-hosted
with:
provider_token: ${{ secrets.MESHERY_TOKEN }}
platform: docker
platform: kubernetes
profile_name: ${{ github.event.inputs.profile_name }}
profile_filename: ${{ github.event.inputs.profile_filename }}
endpoint_url: ${{env.ENDPOINT_URL}}
Expand Down
14 changes: 14 additions & 0 deletions .github/workflows/scripts/self-hosted-credentails.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
#!/usr/bin/env bash

# This script is used to configure and verify the token of self-hosted runner
# Token must be set as a github repo secret named "CNCF_CIL_TOKEN"

token=$1

# https://metal.equinix.com/developers/api/authentication/#authentication
result=$(curl -I -s -w %{http_code} -o /dev/null -H "X-Auth-Token: $token" https://api.equinix.com/metal/v1)
if [[ $result != "200" ]]; then
echo "ERROR: Failed to authenticate the CNCF CIL token"
exit 1
fi
echo "Authenticate CNCF CIL token sucessfully!"
49 changes: 49 additions & 0 deletions .github/workflows/scripts/start-cil-runner.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,49 @@
#!/usr/bin/env bash

# This script is used to start a CNCF CIL runner

token=$1
hostname=$2

# Generate random number from datastamp as the hostname of runner
label=$(date +%N)

hostname="$hostname-$label"
echo "Creating CNCF CIL machine: $hostname..."

# Use user_data_scripts to register the CNCF CIL runner as a self-hosted runner
user_data_scripts="#cloud-config\nusers:\n - default\n - name: smp\n groups: sudo, docker\n sudo: ALL=(ALL) NOPASSWD:ALL\n lock_passwd: true\nruncmd:\n - [runuser, -l, smp, -c, \'mkdir actions-runner && cd actions-runner\']\n - [runuser, -l, smp, -c, \'curl -o actions-runner-linux-x64-2.287.1.tar.gz -L https://github.com/actions/runner/releases/download/v2.287.1/actions-runner-linux-x64-2.287.1.tar.gz\']\n - [runuser, -l, smp, -c, \'tar xzf ./actions-runner-linux-x64-2.287.1.tar.gz\']\n - [runuser, -l, smp, -c, \'export RUNNER_ALLOW_RUNASROOT=1\']\n - [runuser, -l, smp, -c, \'./config.sh --url https://github.com/$REPOSITORY --token $REG_TOKEN --labels $hostname >> github-action-registeration.log\']\n - [runuser, -l, smp, -c, \'./run.sh >> github-action-registeration.log\']"

# TODO: the options "operating_system", "facility", "plan" are hardcoded now, we should make them configurable
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this TODO still pending, would this also come in the scope of this PR or should we just make a new issue and document it there, till someone else picks it up?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should be in another PR instead of this one. This PR are the initial automation of self-hosted runner, so I won't include too many changes. And regarding this TODO, I think there is still something to discuss, it's a good idea to create an new issue and see if anyone can picks it up.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure! Let's create another issue on this

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gyohuangxin I'll let you take care of making an issue on this one as you would understand the details better.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hershd23 I created the issue #41 to track it.

# https://metal.equinix.com/developers/api/devices/#devices-createdevice
device_id=$(curl -X POST -H "X-Auth-Token: $token" -s -H "Content-Type: application/json" \
-d '{"operating_system": "ubuntu_20_04", "facility": "da11", "plan": "c3.small.x86", "hostname": "'"${hostname}"'", "userdata": "'"${user_data_scripts}"'"}' \
https://api.equinix.com/metal/v1/projects/96a9d336-541b-42f7-9827-d845010da550/devices | jq -r .id)
if [[ -z $device_id ]]; then
echo "ERROR: Failed to create CNCF CIL machine: $hostname..."
exit 1
fi

# Wait 10 minutes until the machine is running
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason to wait 10 minutes? Is standard waiting time for an instance to come up mentioned somewhere?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no standard waiting time in my opinion. My thought about waiting 10 minutes is that if it takes too long to wait for machine to be running, there must be something wrong with it.
However, on my second thought, it would be better if we could take more advantage of "state" field instead of just waiting 10 minutes:

  1. If "state" == "provisioning", sleep 10s...
  2. If "state" == "active", echo "Machine successfully created!" and continue.
  3. If "state" == "failed", echo "Failed to create machine" and exit.
    How do you think about this?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gyohuangxin this new flow makes a lot of sense. But let's capture this in another issue. There might be slight experimentation required here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand this issue and will be creating an issue ticket for this one

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, please go ahead.

echo "Waiting for $hostname to run..."
n=0
while [[ $n -le 10 ]]
do
if [[ $n -eq 10 ]]; then
echo "Waiting too long for $hostname to start, exiting..."
exit 1
fi
sleep 1m
state=$(curl -s -H "X-Auth-Token: $token" https://api.equinix.com/metal/v1/devices/$device_id | jq -r .state)
if [[ $state == "active" ]]; then
echo "$hostname successfully created!"
break
fi
echo "Still waiting..."
let n++
done

# Set the outputs
echo "::set-output name=hostname::$hostname"
echo "::set-output name=label::$hostname"
echo "::set-output name=device_id::$device_id"
17 changes: 17 additions & 0 deletions .github/workflows/scripts/stop-cil-runner.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
#!/usr/bin/env bash

# This script is used to start a CNCF CIL runner

token=$1
device_id=$2
hostname=$3

echo "Removing CNCF CIL machine: $hostname..."

# https://metal.equinix.com/developers/api/devices/#devices-deletedevice
remove_cil_result=$(curl -X DELETE -I -s -w %{http_code} -o /dev/null -H "X-Auth-Token: $token" https://api.equinix.com/metal/v1/devices/$device_id)

if [[ $remove_cil_result != "204" ]]; then
echo "ERROR: Failed to remove CNCF CIL machine: $hostname."
exit 1
fi
2 changes: 1 addition & 1 deletion meshery.sh
Original file line number Diff line number Diff line change
Expand Up @@ -35,7 +35,7 @@ main() {
kubectl config view --minify --flatten > ~/minified_config
mv ~/minified_config ~/.kube/config

curl -L https://git.io/meshery | PLATFORM=$PLATFORM bash -
curl -L https://git.io/meshery | sudo PLATFORM=$PLATFORM bash - &
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason for running this command in the background?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When I tested on the self-hosted runner, it will be pending always if not running it in the background. But I don't know why this doesn't happen on github runner.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm okay, it's a minor thing let's just keep a note of this behaviour and come back to it later.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it's a good reminder.


sleep 60
}
Expand Down