-
Notifications
You must be signed in to change notification settings - Fork 88
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Dask helm deployment not working in AKS #386
Comments
Could you share your Dask Gateway config? Particularly your auth config. |
Hi @jacobtomlinson , Thanks for looking into this. Please find the values.yaml and the helm debug output below. gateway:
# Number of instances of the gateway-server to run
replicas: 1
# Annotations to apply to the gateway-server pods.
annotations: {}
# Resource requests/limits for the gateway-server pod.
resources: {}
# Path prefix to serve dask-gateway api requests under
# This prefix will be added to all routes the gateway manages
# in the traefik proxy.
prefix: /
# The gateway server log level
loglevel: INFO
# The image to use for the gateway-server pod.
image:
name: <azure_container_registry>/daskgateway/dask-gateway-server
tag: 0.9.0
pullPolicy: IfNotPresent
# Image pull secrets for gateway-server pod
imagePullSecrets: []
# Configuration for the gateway-server service
service:
annotations: {}
auth:
# The auth type to use. One of {simple, kerberos, jupyterhub, custom}.
type: simple
simple:
# A shared password to use for all users.
password: null
kerberos:
# Path to the HTTP keytab for this node.
keytab: null
jupyterhub:
# A JupyterHub api token for dask-gateway to use. See
# https://gateway.dask.org/install-kube.html#authenticating-with-jupyterhub.
apiToken: null
# JupyterHub's api url. Inferred from JupyterHub's service name if running
# in the same namespace.
apiUrl: null
custom:
# The full authenticator class name.
class: null
# Configuration fields to set on the authenticator class.
options: {}
livenessProbe:
# Enables the livenessProbe.
enabled: true
# Configures the livenessProbe.
initialDelaySeconds: 5
timeoutSeconds: 2
periodSeconds: 10
failureThreshold: 6
readinessProbe:
# Enables the readinessProbe.
enabled: true
# Configures the readinessProbe.
initialDelaySeconds: 5
timeoutSeconds: 2
periodSeconds: 10
failureThreshold: 3
backend:
# The image to use for both schedulers and workers.
image:
name: <azure_container_registry>/daskgateway/dask-gateway
tag: 0.9.0
pullPolicy: IfNotPresent
# The namespace to launch dask clusters in. If not specified, defaults to
# the same namespace the gateway is running in.
namespace: null
# A mapping of environment variables to set for both schedulers and workers.
environment: null
scheduler:
# Any extra configuration for the scheduler pod. Sets
# `c.KubeClusterConfig.scheduler_extra_pod_config`.
extraPodConfig: {}
# Any extra configuration for the scheduler container.
# Sets `c.KubeClusterConfig.scheduler_extra_container_config`.
extraContainerConfig: {}
# Cores request/limit for the scheduler.
cores:
request: null
limit: null
# Memory request/limit for the scheduler.
memory:
request: null
limit: null
worker:
# Any extra configuration for the worker pod. Sets
# `c.KubeClusterConfig.worker_extra_pod_config`.
extraPodConfig: {}
# Any extra configuration for the worker container. Sets
# `c.KubeClusterConfig.worker_extra_container_config`.
extraContainerConfig: {}
# Cores request/limit for each worker.
cores:
request: null
limit: null
# Memory request/limit for each worker.
memory:
request: null
limit: null
# Settings for nodeSelector, affinity, and tolerations for the gateway pods
nodeSelector: {}
affinity: {}
tolerations: []
# Any extra configuration code to append to the generated `dask_gateway_config.py`
# file. Can be either a single code-block, or a map of key -> code-block
# (code-blocks are run in alphabetical order by key, the key value itself is
# meaningless). The map version is useful as it supports merging multiple
# `values.yaml` files, but is unnecessary in other cases.
extraConfig: {}
# Configuration for the gateway controller
controller:
# Whether the controller should be deployed. Disabling the controller allows
# running it locally for development/debugging purposes.
enabled: true
# Any annotations to add to the controller pod
annotations: {}
# Resource requests/limits for the controller pod
resources: {}
# Image pull secrets for controller pod
imagePullSecrets: []
# The controller log level
loglevel: INFO
# Max time (in seconds) to keep around records of completed clusters.
# Default is 24 hours.
completedClusterMaxAge: 86400
# Time (in seconds) between cleanup tasks removing records of completed
# clusters. Default is 5 minutes.
completedClusterCleanupPeriod: 600
# Base delay (in seconds) for backoff when retrying after failures.
backoffBaseDelay: 0.1
# Max delay (in seconds) for backoff when retrying after failures.
backoffMaxDelay: 300
# Limit on the average number of k8s api calls per second.
k8sApiRateLimit: 50
# Limit on the maximum number of k8s api calls per second.
k8sApiRateLimitBurst: 100
# The image to use for the controller pod.
image:
name: <azure_container_registry>/daskgateway/dask-gateway-server
tag: 0.9.0
pullPolicy: IfNotPresent
# Settings for nodeSelector, affinity, and tolerations for the controller pods
nodeSelector: {}
affinity: {}
tolerations: []
# Configuration for the traefik proxy
traefik:
# Number of instances of the proxy to run
replicas: 1
# Any annotations to add to the proxy pods
annotations: {}
# Resource requests/limits for the proxy pods
resources: {}
# The image to use for the proxy pod
image:
name: traefik
tag: 2.1.3
# Any additional arguments to forward to traefik
additionalArguments: []
# The proxy log level
loglevel: WARN
# Whether to expose the dashboard on port 9000 (enable for debugging only!)
dashboard: false
# Additional configuration for the traefik service
service:
type: LoadBalancer
annotations: {}
spec: {}
ports:
web:
# The port HTTP(s) requests will be served on
port: 80
nodePort: null
tcp:
# The port TCP requests will be served on. Set to `web` to share the
# web service port
port: web
nodePort: null
# Settings for nodeSelector, affinity, and tolerations for the traefik pods
nodeSelector: {}
affinity: {}
tolerations: []
rbac:
# Whether to enable RBAC.
enabled: true
# Existing names to use if ClusterRoles, ClusterRoleBindings, and
# ServiceAccounts have already been created by other means (leave set to
# `null` to create all required roles at install time)
controller:
serviceAccountName: null
gateway:
serviceAccountName: null
traefik:
serviceAccountName: null
Best Regards, |
You're getting a 403 error. How are you authenticating with Dask Gateway? |
I was trying to test this component in AKS by following this document : https://gateway.dask.org/install-kube.html. So I havent configured any authenticator and by default I believe the simple authenticator would be used. So I was trying to connect to the dask-gateway using the below code from a jupyter notebook instance deployed in the same cluster in another namespace: from dask_gateway import Gateway I presume since no password is configured I can omit the auth parameter. |
Yeah I would expect this to work. Could you also share the pod logs for the gateway server? |
@jacobtomlinson please find the logs below: |
|
@aravindp ah I missed that! So you've modified the chart? I think you will need the Traefik components here, it is being used here to do some specific proxying of the scheduler. |
@jacobtomlinson , yes, I modified it since we already have traefik ingress configured in our cluster. In that case, how should I do it? From the documentation it was not clear for me, how should I integrate it with an already existing traefik ingress. |
Dask gateway does not use traefik as an ingress, just as a service to proxy traffic. Configure it the same way you would any other service. |
@jacobtomlinson I added back the below files from the traefik templates folder.
I believe deployment & rbac yamls are not required as the cluster already has traefik ingress. After the new deployment traefik load balancer services is also available. But still when I try to connect to the url using the external ip of the service getting 403 error |
@aravindrp please do not modify the YAML in the chart. It makes it much harder for us to test and support it. If you need to disable things please do so in the config, and if it's not possible to disable in config then please raise an issue so we can get that fixed. Please could you try installing the vanilla chart without modifications and let us know how you get on. |
@jacobtomlinson sorry for the delay in responding. Used this shortcut method of manually modifying the helm charts as I was in the phase of evaluating dask gateway. I am planning to do it in the proper way during the final implementation. I was doing some further analysis and it feels like a problem with aiohttp package used within dask-gateway. Because when I try to curl or use urllib3 package and try to connect with the api server its working without any issues. I have raised a ticket in the stackoverflow for this. Meanwhile, have you seen any simiar behavior before? |
@aravindrp thanks for summarizing that this may be related to As we have not arrived at a clear action point to take with regards to the code in this repo, I suggest we close this issue at this point. |
What happened:
I am trying to deploy the dask gateway to azure following the documentation : https://gateway.dask.org/install-kube.html
We already have an AKS cluster which is configured to use traefik ingress. In order to avoid a duplicate deployment of traefik, I downloaded the latest version of chart and created a modified version by removing the contents inside the template/traefik folder. Rest everything is same as the the official helmchart.
I deployed dask gateway successfully and the pods are also running without crashing. Then I tried to access the deployed dask gateway instance from a jupyternotebook also deployed within the same cluster. Since I only need to access it within the cluster tried directly accessing the clusterIP service : api--dask-gateway, but its failing with 403 forbidden error.
Could you please help in resolving this issue
What you expected to happen:
Minimal Complete Verifiable Example:
# Put your MCVE code here
Anything else we need to know?:
Environment:
The text was updated successfully, but these errors were encountered: