Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Provide detailed memory metrics via prometheus plugin #11743

Closed
der-eismann opened this issue Jul 18, 2024 · 6 comments · Fixed by #11746
Closed

Provide detailed memory metrics via prometheus plugin #11743

der-eismann opened this issue Jul 18, 2024 · 6 comments · Fixed by #11746
Milestone

Comments

@der-eismann
Copy link

Is your feature request related to a problem? Please describe.

Hey everyone, we are currently working on replacing the soon-to-be EOL https://github.com/kbudde/rabbitmq_exporter with the built-in prometheus plugin. With that exporter it was possible to get detailed memory statistics from the management plugin, which have helped us debug issues:
https://github.com/rabbitmq/rabbitmq-server/blob/main/deps/rabbitmq_management/priv/www/js/tmpl/memory.ejs#L9-L31

Unfortunately I was unable to get these metrics from the prometheus plugin, the only thing that came close was process_resident_memory_bytes (https://github.com/rabbitmq/rabbitmq-server/blob/main/deps/rabbitmq_prometheus/src/collectors/prometheus_rabbitmq_core_metrics_collector.erl#L72)

Describe the solution you'd like

Provide all memory metrics from the management UI via prometheus plugin

Describe alternatives you've considered

No response

Additional context

No response

@michaelklishin
Copy link
Member

This is open source software, you are welcome to contribute what you find missing.

The data comes from rabbit_vm:memory/0 and the metrics belong to this group.

michaelklishin added a commit that referenced this issue Jul 18, 2024
michaelklishin added a commit that referenced this issue Jul 18, 2024
@michaelklishin michaelklishin added this to the 3.13.5 milestone Jul 18, 2024
mergify bot pushed a commit that referenced this issue Jul 18, 2024
mergify bot pushed a commit that referenced this issue Jul 18, 2024
(cherry picked from commit c361edd)
mergify bot pushed a commit that referenced this issue Jul 18, 2024
Closes #11743.

(cherry picked from commit 5dad0f8)
(cherry picked from commit d1a7167)

# Conflicts:
#	deps/rabbitmq_prometheus/src/collectors/prometheus_rabbitmq_core_metrics_collector.erl
mergify bot pushed a commit that referenced this issue Jul 18, 2024
(cherry picked from commit c361edd)
(cherry picked from commit 5396599)
michaelklishin added a commit that referenced this issue Jul 19, 2024
michaelklishin added a commit that referenced this issue Jul 19, 2024
@der-eismann
Copy link
Author

Wow, I wasn't even able to finish my Erlang introductory course in that short time. Thanks for adding these metrics so quickly!

@michaelklishin michaelklishin modified the milestones: 3.13.5, 4.0.0 Jul 19, 2024
@michaelklishin
Copy link
Member

These metrics are fairly expensive with many queues and streams, so we will limit this to 4.0 and look for ways to optimize this or make this opt-in.

@mkuratczyk
Copy link
Contributor

@der-eismann We have now merged this into main/4.0 (but not 3.13). There's a dedicated endpoint for these metrics: https://www.rabbitmq.com/docs/next/prometheus#memory-breakdown-endpoint

However, I struggle to find a nice Grafana vizualization for these metrics. There are quite a few of them and multiple by the number of nodes in the cluster, you get a lot of data points. Are you currently visualizing these metrics from the exporter?
Can you share wha that looks like? Ideally, if you could contribute a panel for them, that'd be great.

The RabbitMQ Overview dashboard JSON source file is here if you want to give it a try: https://github.com/rabbitmq/rabbitmq-server/blob/main/deps/rabbitmq_prometheus/docker/grafana/dashboards/RabbitMQ-Overview.json

@der-eismann
Copy link
Author

Hey @mkuratczyk, we used these metrics to figure out why the memory consumption is so high and with them we noticed that a huge chunk is allocated unused. The visualization is more of a quick and dirty kind, but I can try to polish to contribute it.
But these are from the old exporter, we don't have the 4.0 beta running yet. Need to invest some time for that, not sure when I can find that in the next two weeks.

screenshot-20240723-153120

@mkuratczyk
Copy link
Contributor

That's ok, no rush. Seems like the external exporter provided fewer metrics and you still presented them separate for each node (which totally makes sense). As usual, the problem for us is that when we provide something, users expect it to "just work everywhere" and some users have 9 nodes in the cluster or more so that's suddenly quite a few new panels. Perhaps a separate dashboard would be useful. Then we can just do it per node and use Grafana's repeat option.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants