-
Notifications
You must be signed in to change notification settings - Fork 129
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Auto osd cache drop #570
Auto osd cache drop #570
Conversation
use KUBECONFIG if defined else use cluster config
quiet down logging
use single-quote within double-quote
for the reviewers, @chaitanyaenr and @dry923 , easiest way to review this is by looking at the "files" tab instead of each commit.
The problem with both of these is that I get HTTP 403 error code from the K8S API when I attempt to start them in roles/ceph_osd_cache_drop/tasks/main.yml . I know it's just privs because the same pod startups work with $KUBECONFIG (admin) as my authorization. And when I start the pods by hand using "oc" then the code all works. I've tried to start the cache dropper pod in my-ripsaw but then it can't get access to the secrets that are only available within the openshift-storage namespace. But I don't have privs to start a pod within openshift-storage namespace. |
/rerun all |
namespace: "{{ rook_ceph_namespace }}" | ||
register: drop_pod_already_exists | ||
|
||
#- debug: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We good to remove this?
@jtaleric I want to be able to remove the rook_ceph_namespace bit, but I don't know how to get authentication and authorization to get the secrets yet, still working that out. Same with the oc patch OCSInitialization task in the same file. |
Results for ec2_jjb
|
I'd merge it now but this depends on bohica quay.io/cloud-bulldozer/ceph-cache-dropper being updated and that hasn't rebuilt. How to trigger rebuild? Also I'd like to figure out how to be able to put the ceph_cache_dropper pod in my-ripsaw namespace instead of openshift-storage/rook-ceph namespace. |
Is this still relevant? If so, what is blocking it? Is there anything you can do to help move it forward? This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. |
I think I'm going to have to change this so that the user is responsible for actually starting the cache dropper pod and the ceph toolbox pod, but the rest of it can stay the same. This is because the benchmark-operator doesn't have authorization to start pods in the openshift-storage namespace. Will try to get to this next week and finish up. |
@bengland2 ack - so are we good with your current implementation? LGTM.. |
@jtaleric current implementation will not work for reasons discussed above, but if I make the user create the ceph toolbox pod and the cache dropper pod using a provided YAML file, then the authorization problem goes away and the rest of it should work fine. I still think there is a way to get some sort of token and do it but for now that's beyond what I know how to do. At least the CR will not need to change to contain the IP addr of the cache dropper pod - benchmark-operator will discover the pod and make use of it automatically if the user specifies that Ceph OSD cache dropping is desired. This change is small and I should be able to test it this week and merge it. Sorry for the delays, the Multus Alias allocation required me to postpone work on this. Thanks for the other merge. |
Results for ec2_jjb
|
Results for ec2_jjb
|
I'm merging because the test_fiod.sh script passed when run against an AWS cluster, and the test result in terms of passes and fails is identical to what it was before I included OSD cache dropping tests in the PR. Specifically there was an fio failure back 28 days ago when none of the test_crs used OSD cache dropping. So none of the OSD cache dropping code actually executed, and fio test passed 2nd time. This makes me think there was a timeout due to slow image load instead of actual problem with the code. I would like to see the CI save the logs for each test run so that it would be possible to further diagnose things instead of guessing. |
only do ceph osd cache dropping if user requests it default to openshift for benchmark-operator add option to drop Ceph OSD cache to CR document Ceph OSD cache dropping user must start cache dropper and ceph toolbox pod test both OSD cache dropping and kernel cache dropping at same time only if openshift-storage namespace is defined
* add hammerdb vm support CNV-6501 and pod support for mariadb and postgres * add generic hammerdb cr * add hammerdb vm example * change hammerdb crds hirarchy according to database type * fixes after review * fix hammerdb mssql test * revert sql server namespace * revert transactions number * update transactions number to 500k * update transactions to 100000 * update transactions to 100000 * update transactions to 10000 for fast run * fix hammer workload name * add creator pod wait * add debug true * revert app label to hammerdb_workload * fix type name * temporary fix in common.sh * revert my common.sh changes * change db init to false * change db init to true * update changes to support operator-sdk version 1.5.0 * update changes to support operator-sdk version 1.5.0 * enlarge the timeout from 500 to 800 * increase timeout to 1000 * revert timout to 500s * add pin and resources support * add mssql 2019 image and creator pod * revert it back to legacy mssql test * add es custom fields support * fix image example name * add hammerdb vm support CNV-6501 and pod support for mariadb and postgres * add generic hammerdb cr * add hammerdb vm example * change hammerdb crds hirarchy according to database type * fixes after review * fix hammerdb mssql test * revert sql server namespace * revert transactions number * update transactions number to 500k * update transactions to 100000 * update transactions to 100000 * update transactions to 10000 for fast run * fix hammer workload name * add creator pod wait * add debug true * revert app label to hammerdb_workload * fix type name * temporary fix in common.sh * revert my common.sh changes * change db init to false * change db init to true * update changes to support operator-sdk version 1.5.0 * enlarge the timeout from 500 to 800 * increase timeout to 1000 * revert timout to 500s * add pin and resources support * add mssql 2019 image and creator pod * revert it back to legacy mssql test * add es custom fields support * fix image example name * update changes to support operator-sdk version 1.5.0 * add latest changes * update changes to support operator-sdk version 1.5.0 * fix operator-sdk version 1.5.0 * add os version * fixes after changes * add es_os_version * update changes to support operator-sdk version 1.5.0 * fix hammer doc - database per the CR file * fix hammedb doc * fix hammedb doc * add es_kind * add es_kind to cr and fix merge conflict * remove .idea * update changes to support operator-sdk version 1.5.0 * adding cerberus validate certs parameter Signed-off-by: Kedar Vijay Kulkarni <kkulkarni@redhat.com> * Remove magzine section from CONTRIBUTING.md Signed-off-by: Kedar Vijay Kulkarni <kkulkarni@redhat.com> * Add support for kafka as log backend for verification Signed-off-by: Sai Sindhur Malleni <smalleni@redhat.com> * Expand README Signed-off-by: Kedar Vijay Kulkarni <kkulkarni@redhat.com> * Quiesce logging in pod for log generator workload This along with cloud-bulldozer/benchmark-wrapper#273 helps suppress any unneeded logging in the pod, so that we can accurately and easily count the number of log emssages recevied in a backend like kafka merely by using the offsets. The plan is to deploy the log generator pods in a separate namesapce and forward those logs to a topic in kafka. That way we would be able to reliably count the messages received just by looking at kafka topic offset. Otherwise there would be other logs from the log generator pods as well as benchmark-operator pod that would make it hard to reliably count logs received just by kafka offset. Signed-off-by: Sai Sindhur Malleni <smalleni@redhat.com> * removed line breaks for trex tasks only * mounting module path for mlnx * updated doc for mlnx sriov policy * Update installation.md * Make sink verification optional for kafka Signed-off-by: Sai Sindhur Malleni <smalleni@redhat.com> * Auto osd cache drop (#570) only do ceph osd cache dropping if user requests it default to openshift for benchmark-operator add option to drop Ceph OSD cache to CR document Ceph OSD cache dropping user must start cache dropper and ceph toolbox pod test both OSD cache dropping and kernel cache dropping at same time only if openshift-storage namespace is defined * replace preview by working in hammerdb doc * update changes to support operator-sdk version 1.5.0 * replace preview by working in hammerdb doc * remove stressng fixes Co-authored-by: Kedar Vijay Kulkarni <kkulkarni@redhat.com> Co-authored-by: Sai Sindhur Malleni <smalleni@redhat.com> Co-authored-by: Murali Krishnasamy <mukrishn@redhat.com> Co-authored-by: Ayesha Vijay Kumar <84931574+Ayesha279@users.noreply.github.com> Co-authored-by: Ben England <bengland@redhat.com>
Description
Depends on benchmark-wrapper PR 269.
This adds support for ceph OSD cache dropping, without each workload having to specify a pod IP.
At present, it cannot automatically start the Ceph toolbox or the cache dropping pod because of a need for authorization
from openshift-storage/rook-ceph namespaces, but the user can start these pods and leave them running and the rest of it works.
for openshift, ceph toolbox can be started with
for the cache drop pod, you can start it by filling in the vars in roles/ceph_osd_cache_drop/rook_ceph_drop_cache_pod.yaml
and running the pod with kubectl/oc.
Fixes
lack of ceph cache dropping makes benchmark-operator incomplete for OCS.