Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[PoC] TPU receiver prometheus metrics #76

Draft
wants to merge 2 commits into
base: prometheus
Choose a base branch
from

Conversation

Szymongib
Copy link

Summary of Changes

This is proof of concept on how we could get some other metrics to Prometheus.

This approach works and does not change the behaviour of existing solana metrics, but have some downsides, mainly:

  • We need to pipe some struct to all relevant places when stats are not accessible outside the module.
  • Quite a bit of boilerplate is required create additional _total variables and update them correctly (perhaps we could do some of that with proc macros, building on @enriquefynn idea).

This is a draft so we can discuss this or other possible approaches.

let max_channel_len = self.packets_count.swap(0, Ordering::Relaxed);

{
let mut stats = self.total_stats.lock().unwrap();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! We accumulate these values now 👌

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are packet_batches_count, full_packet_batches_count and max_channel_len?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this is all kind of messy, but from what I understand:

  • packets_count here is just number of UDP packets received specifically on TPU sockets.
  • packet_batches_count is number of "batches" received - Solana will read packets from the socket in batches of up to 64 packets.
  • full_packet_batches_count - is number of batches that were full (so containing 64 packets).
  • max_channel_len - after reading from the socket packets are forwarded to unbounded channel. So this value is is max length of this channel (across all current measurements), measured each time before forwarding new packets. So this will basically be a max value over 1 second since then the stats are reset 🤷

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add those as the HELP line in the exported metrics?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this will basically be a max value over 1 second since then the stats are reset

I don’t know how it measures this, but if the channel is backed by something similar to a Vec (I think it might be a deque in practice, but the same applies), and the max measures the capacity of the vec/deque, then it would persist even if the metrics are reset, unless they shrink_to_fit. But this is pure speculation, I haven’t looked into how it is implemented.

@@ -379,6 +380,7 @@ impl Validator {
socket_addr_space: SocketAddrSpace,
use_quic: bool,
tpu_connection_pool_size: usize,
prometheus_collector: Option<PrometheusCollector>,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although it makes sense to put new arguments at the end of the function parameters, to avoid conflicts, it's better to put new arguments in the middle of the other arguments, that makes rebasing easier if something changes in the arguments.

Copy link

@ruuda ruuda left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks pretty clean and not very invasive, which is a good thing 👍 I wonder though what would happen if we turn it around, instead of making StreamerReceiveStats have a references to metrics and push to it, could the Prometheus config have a reference to StreamerReceiveStats and read from it? I’m can’t predict what upstream Solana people will say, but so far it looks like they are very concerned about this slowing down the validator. (Though a single branch is negligible either way compared to all the string formatting for the logs ...) I genuinely don’t know how it would turn out, but it may worth trying to see which approach is cleaner.

Also it would be nice if we can avoid the polling/pushing delay, but I don’t know if if that will make it much more complex. Hooking the report function is definitely tempting because it makes it a small and self-contained change.

let stats = {
tpu_stats.total_stats.lock().unwrap().clone()
};
collector.lock().unwrap().save_tpu_receiver_stats(stats)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand correctly, this will keep one copy of the metrics in with the Prometheus config, and periodically “push” to it, and then the endpoint serves whatever the latest value is that it had.

This creates a polling delay, the value that we serve on /metrics can be outdated by the cycle interval of this loop. It looks like that internal is 1 second, so it’s not too bad, but it would be nicer if we can get the actual current values. If we can get the StreamerReceiveStats into the Prometheus endpoint, it could do the atomic reads there, but it means we would need to make them totals. We can handle the resetting here at this point then.

Something like this:

struct GoatCounter<T> {
    num_goats_teleported: T,
    num_goats_received: T,
}

type GoatCounterCurrent = GoatCounter<AtomicU64>;
type GoatCounterSnapshot = GoatCounter<u64>;

impl GoatCounterCurrent {
    pub fn snapshot(&self) -> GoatCounterSnapshot {
        GoatCounterSnapshot {
            num_goats_teleported: self.num_goats_teleported.load(Ordering::Relaxed),
            num_goats_received: self.num_goats_received.load(Ordering::Relaxed),
        }
    }
}

impl std::ops::Sub for GoatCounterSnapshot {
    fn sub(self, rhs: Self) -> Self {
        Self {
            num_goats_teleported: self.num_goats_teleported - rhs.num_goats_teleported,
            num_goats_received: self.num_goats_received - rhs.num_goats_received,
        }
    }
}

// In the place where we print the metrics:
let mut last_logged: GoatCounterSnapshot = todo!();
let current = goat_counter.snapshot();
log(current - last_logged);
last_logged = current;

// In the Prometheus endpoint:
let current = goat_counter.snapshot();
write_metrics(&current, ...);

Not sure if this is feasible in practice though, it looks like there is something going on with accumulation over multiple threads? Also, having the separate fields as atomic ints can lead to surprising results. For example, if you are trying to compute an error rate, and you have counters num_requests and num_errors, where it should be the case that num_errors <= num_requests, then if we load those counters with Ordering::Relaxed, that inequality could be violated. It requires some thought both at the place where the counters are read and the place where they are incremented to get it right, I’m not sure Solana does this, since they prioritize speed over correctness, and the Ordering::Relaxed does give the compiler and CPU more freedom to optimize than some of the other orderings ...

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I see, so that would basically result in changing how the logic of current Solana metrics are implemented. I would be a bit hesitant to change this Solana metrics logic because to make it consistent we would likely need to do it in all places this pattern is followed (which seems to be a lot).

Not sure if this is feasible in practice though, it looks like there is something going on with accumulation over multiple threads?

Yes, that is what is happening.

Also, having the separate fields as atomic ints can lead to surprising results. For example, if you are trying to compute an error rate, and you have counters num_requests and num_errors

Yeah there are definitely possible inconsistencies, but I guess that is really common in metrics. Using the example of num_requests and num_errors in some other application, num_requests is likely to be incremented at the beginning of handling the request while num_errors for obvious reasons can only be incremented later, so it won't be atomic operation.

P.S. I now want to see this whole application for teleporting goats 🤣

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

num_requests is likely to be incremented at the beginning of handling the request while num_errors for obvious reasons can only be incremented later, so it won't be atomic operation.

That by itself is not an issue, since you still couldn’t get a >100% error rate. If you increment the error counter before the request counter, then you could get a >100% error rate, which is really counter-intuitive. (Or if you read the request counter before the error counter.)

I now want to see this whole application for teleporting goats

You might enjoy these :-)
https://crbug.com/31482
https://codereview.chromium.org/543045

@ruuda
Copy link

ruuda commented Jan 26, 2023

@Szymongib do you still plan to pick this up some time? It’s not high prio, but I think it would be nice to have.

@Szymongib
Copy link
Author

@Szymongib do you still plan to pick this up some time?

Hmm, I thought we do not really want to pursue it further, but if you think it is worth it I will add it back to my todolist.

hritique pushed a commit that referenced this pull request Apr 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants