-
Notifications
You must be signed in to change notification settings - Fork 129
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Uperf benchmark fails: Unable to run uperf as benchmark #646
Comments
Can you please provide the log / description of the client pod to help us diagnose the
|
@jtaleric I am using these CRs as mentioned in the benchmark operator's workloads. The main difference between these two is the resource definition for server and client pods. CR2 is the actual CR given in docs at (https://github.com/cloud-bulldozer/benchmark-operator/blob/master/docs/uperf.md). For For CR2, the Uperf benchmark fails to run any pod (server and client). Ideally, CR2 should work as given in the above link. |
Well you have pin set to true, so effectively CR1 and CR2 are the same -- minus the message size? Can you please provide more detailed output from the operator to help diagnose this. If the client never came up, that tells me there is a configuration issue. |
@jtaleric The output stats are given below. These are obtained with
Benchmark status
Operator logs
Kindly let me know if you need more details, I'll grab those for you. |
Describing isn't the log. Please capture the ansible log. |
For CR2, you need to have a valid "runtime_class" If you don't have one, just do not have this variable in the CR2. Or, pod will not start. For CR1, can you capture "oc logs [your-operator-pod] -c manager" |
@HughNhan & @jtaleric Benchmark status
Pods status
Sriov Operator pods list
Gathering logs for operator's pod (sriov-network-operator-6bf9ccff5c-tcn5h).
Without container manager argument the logs are obtained as follows.
I have to exclude some part of logs because the logs were too lengthy to be added here. The last parts for logs are:
Plus @jtaleric I couldn't understand your concern on capturing ansible logs. Could you please elaborate? Thanks |
@MuhammadMunir12, find your benchmark operator pod i.e "benchmark-controller-manager-xxxxx-xxxx". Then capture the log as I have mentioned above, "oc logs [your-operator-pod] -c manager". |
@HughNhan Here's the output and it does not show any issues.
|
@MuhammadMunir12 - the logs above was from the beginning, deploying operator. Can you capture the logs during applying your CR. Having said that, I am seeing your earlier logs were: Then you showed: |
@HughNhan What I understand is: I have to delete the existing operator (benchmark operator) and redeploy it and then run my CR for uperf as a benchmark. Right?
|
@MuhammadMunir12 - Let us start fresh.
|
@HughNhan Hi, I have recreated the environment from scratch.
Helm doesn't install in benchmark-operator as mentioned in this command:
I am still facing the same issue as mentioned above. The logs are given below:
Looking forward to your response. |
@MuhammadMunir12 - currently, there might be issue(s) with helm method. Not sure if you are seeing the same, but please do 1-4 as recommend above. Especially, step 3 "make deploy" (no helm) |
@HughNhan Helm does not work with this command as mentioned in the operator's guide " Deleted namespace my-ripsaw
I have cloned the operator from the repo. Then run "make deploy" in the benchmark-operator directory.
Deployed Uperf CRIt is not working.
So, what I understand is, we have to run it via helm charts. |
MuhammadMunir12 - The namespace to use after "make deploy" is "benchmark-operator". Make sure your CR uses it. Also include the logs like you did before "oc logs benchmark-controller-manager-xxxx -c manager" if there is ANY issue. We cannot debug with just "It is not working" |
@HughNhan I have updated its namespace, I earlier run it under openshift-sriov-network-operator namespace. The benchmark is running now under: benchmark-operator namespace. Still the client pods are in starting state and only server pod is running. Logs from " The logs end at:
Whereas the detailed logs are:
|
@MuhammadMunir12 - your CR is not right. For now if you do not have ES just remove both "elasticsearch" and "url". TASK [Start Client(s) w/o serviceIP] ******************************** |
@HughNhan I am not using any elasticsearch and url parameter in my CR. It's the same CR1 as mentioned above in case.
After removing the hostnetwork and service ip arguments.
The error message appears to be same:
|
@MuhammadMunir12 - Let's try this. Add a bogus elasticserach URL into your CR, |
@HughNhan With the bogus value for ES, it is still failing on same ansible task for starting client pod.
|
@HughNhan I have been able to run it. Both client and server pods are running.
Could you please guide me on how to gather traffic information and if it needs an ES setup with Grafana, how I can I do that to see how much traffic is being sent and what's throughput at that time. |
@MuhammadMunir12 - After you pick up this merge cloud-bulldozer/benchmark-wrapper#335, you should see UPERF stats in the client logs. |
@HughNhan So, here what I'll do is:
Also, kindly tell what's the main difference between these two repos |
@MuhammadMunir12 - Correction, the stats PR is cloud-bulldozer/benchmark-wrapper#332. Wrapper builds the container images. Operator creates env and pods. You can wait for a new uperf container image to be available after the PR merge. OR, you can use this private (no support) uperf image for now by adding.
|
@MuhammadMunir12 - if you have no more issue. please close this issue. |
same here.
Then created a Benchmark (default config taken from the doc in git version 0.1 https://github.com/cloud-bulldozer/benchmark-operator/blob/5757262604727addea352c7142726aac53840a91/docs/uperf.md) - name: Deploy Uperf benchmark
kubernetes.core.k8s:
kubeconfig: "{{ ocp_ignition_file_path }}/auth/kubeconfig"
state: present
definition:
apiVersion: ripsaw.cloudbulldozer.io/v1alpha1
kind: Benchmark
metadata:
name: uperf-benchmark
namespace: benchmark-operator
spec:
workload:
name: uperf
args:
client_resources:
requests:
cpu: 500m
memory: 500Mi
limits:
cpu: 500m
memory: 500Mi
server_resources:
requests:
cpu: 500m
memory: 500Mi
limits:
cpu: 500m
memory: 500Mi
serviceip: false
runtime_class: class_name
hostnetwork: false
networkpolicy: false
pin: false
kind: pod
pin_server: "{{ two_first_worker_nodes[0] }}"
pin_client: "{{ two_first_worker_nodes[1] }}"
pair: 1
multus:
enabled: false
samples: 1
test_types:
- stream
protos:
- tcp
sizes:
- 16384
nthrs:
- 1
runtime: 30
colocate: false
density_range: [low, high]
node_range: [low, high]
step_size: addN, log2
Logs:
|
@Sispheor For a start, you may want to baby-step with using the simpler mode, the "pin=true" to get some familiarity. In your failed run, you have "pin=false" hence you activated "scale" mode in which all: colocate, node_range, density_range and step_size must be valid. However, your CR has "low, high, addN, logs" which are textual for documentation and not valid params. Please read the scale description section when you start using the scale mode. |
Thanks for your answer. Actually I took the default doc config to have a simple mode. I updated pin flag to true. And removed the last flags that provide scalability feature. The error is not the same now.
|
I removed the |
Same as @MuhammadMunir12
And
|
@Sispheor FYI, I am out of the office for a couple days. I will not be able to help you until then. Hope you can figure it out yourself, or someone else will. With a few corrections in your CR, it will work. |
I've added Elasticsearch url. But still an issue
|
How are you installing the operator? I would recommend not using OperatorHub, but using the install method mentioned in github. As quick example see : https://github.com/cloud-bulldozer/benchmark-operator/blob/master/tests/test_uperf.sh That is what runs in our CI with each PR. We also have some scripting to automate the entire install/run/cleanup process. Also, if you have ideas on how to make the docs better, happy to see a PR 😃 |
Hi,
I am trying to run the Uperf as benchmark using the following two CRs. First one create server only and gets stuck on creating client pod. Whereas, the second script fails to run it as benchmark.
CR 1
CR2
I need some clarity on running it as a benchmark. Help from the community will be highly appreciated.
The text was updated successfully, but these errors were encountered: