Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Metrics not being persisted in single binary mode #6119

Open
balajisa09 opened this issue Jul 26, 2024 · 1 comment
Open

Metrics not being persisted in single binary mode #6119

balajisa09 opened this issue Jul 26, 2024 · 1 comment

Comments

@balajisa09
Copy link

balajisa09 commented Jul 26, 2024

Describe the bug
I am running Cortex in single binary mode in kubernetes with pvc, and I have noticed that metrics are not being persisted for more than 5 hours. I have attached the config. I have a prometheus instance sending metrics to cortex via remotewrite. There are enough space in the disk too.

To Reproduce
Steps to reproduce the behavior:

  1. Start Cortex with v1.17.1 version and prometheus with v2.52.0
  2. visualize the metrics via grafana or any other tool.

Expected behavior
The metrics to stay for the give retention period in cortex config.

Environment:

  • Infrastructure: Kubernetes
  • Deployment tool: helm

Additional Context

cortex config:

config.yaml: |
  auth_enabled: false

  server:
    http_listen_port: 9009
  
    # Configure the server to allow messages up to 100MB.
    grpc_server_max_recv_msg_size: 104857600
    grpc_server_max_send_msg_size: 104857600
    grpc_server_max_concurrent_streams: 1000

    http_tls_config:
      client_auth_type: RequireAndVerifyClientCert

    grpc_tls_config:
      client_auth_type: RequireAndVerifyClientCert
    log_level: debug
      
  distributor:
    shard_by_all_labels: true
    pool:
      health_check_ingesters: true

  ingester_client:
    grpc_client_config:
      # Configure the client to allow messages up to 100MB.
      max_recv_msg_size: 104857600
      max_send_msg_size: 104857600
      grpc_compression: gzip
  
  ingester:
    lifecycler:
      # The address to advertise for this ingester.  Will be autodiscovered by
      # looking up address on eth0 or en0; can be specified if this fails.
      # address: 127.0.0.1
  
      # We want to start immediately and flush on shutdown.
      min_ready_duration: 0s
      final_sleep: 0s
      num_tokens: 512
  
      # Use an in memory ring store, so we don't need to launch a Consul.
      ring:
        kvstore:
          store: inmemory
        replication_factor: 1
  
  blocks_storage:
    tsdb:
      dir: /data
      retention_period: 168h
  
    bucket_store:
      sync_dir: /data

    backend: filesystem
    filesystem:
      dir: /data/fake

  compactor:
    data_dir: /tmp/cortex/compactor
    sharding_ring:
      kvstore:
        store: inmemory
  
  frontend_worker:
    match_max_concurrent: true

prometheus remotewrite config:

additionalRemoteWrite: 
- url: http://ingest.abc.com/metrics/v1/push
  writeRelabelConfigs:
  - sourceLabels: [__name__]
    regex: '.*'
    action: 'replace'
    targetLabel: 'captain_domain'
    replacement: {{ .Values.captain_domain }}
  - sourceLabels: [__name__]
    regex: '.*'
    action: 'replace'
    targetLabel: 'abc_platform_version'
    replacement: {{ .Chart.Version }}

@danielblando
Copy link
Contributor

Do you know if the data is being deleted or just not being queried? Can you see blocks older than 5h in disk if you check /data?

Also we should have some logs when deleting blocks.

msg="Deleting obsolete block" block=blockId

Can you see those logs? Is it possible to check how old the blocksIds being deleted are? You can try to look for other logs with blockId or if luck still get info from block on disk.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants