Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Stack Monitoring] Logstash Pipelines view can trigger OOM #37246

Closed
pickypg opened this issue May 28, 2019 · 31 comments
Closed

[Stack Monitoring] Logstash Pipelines view can trigger OOM #37246

pickypg opened this issue May 28, 2019 · 31 comments
Assignees
Labels
bug Fixes for quality problems that affect the customer experience discuss Team:Monitoring Stack Monitoring team

Comments

@pickypg
Copy link
Member

pickypg commented May 28, 2019

Kibana version:

Observed in both 6.5.x, 6.7.x. and 7.x

Elasticsearch version:

Matching.

Description of the problem including expected versus actual behavior:

Loading the Logstash Pipeline listing view in the Stack Monitoring UI, with a relatively large number of pipelines (in the screenshot there are 96), then it can trigger a very large amount of memory utilization across Elasticsearch and result in OOM if you are unlucky.

Steps to reproduce:

  1. Create a lot of Logstash Pipelines.
  2. Use Stack Monitoring to monitor them all as part of the same deployment.
  3. Load the Logstash Pipeline listing.

listing

Each heap usage that spiked above 75% was from me allowing the listing to load (otherwise I had it paused to avoid heap usage).

heap

@pickypg pickypg added the Team:Monitoring Stack Monitoring team label May 28, 2019
@elasticmachine
Copy link
Contributor

Pinging @elastic/stack-monitoring

@pickypg pickypg added the bug Fixes for quality problems that affect the customer experience label May 28, 2019
@cachedout
Copy link
Contributor

@ycombinator You worked on something similar. Anything come to mind here?

@ycombinator
Copy link
Contributor

Without a proper investigation, my guess is this has to do with the sparklines on the Logstash pipeline listing page. I suspect the aggs for the queries that generate those are the source of this issue.

@pickypg
Copy link
Member Author

pickypg commented Jun 17, 2019

my guess is this has to do with the sparklines on the Logstash pipeline listing page. I suspect the aggs for the queries that generate those are the source of this issue.

Same.

@cachedout
Copy link
Contributor

This could be an interesting issue for @igoristic to pick up. @chrisronline what do you think?

@igoristic
Copy link
Contributor

This is definitely something I can take a look at

@igoristic igoristic self-assigned this Jun 18, 2019
@ycombinator
Copy link
Contributor

Thanks @igoristic. I'm happy to answer any questions you might have about the code or the history here.

@cachedout
Copy link
Contributor

Thanks, @igoristic. Since this is negatively affecting some production systems, please consider this issue a higher priority.

@igoristic
Copy link
Contributor

igoristic commented Jul 8, 2019

The current PR address the issue, but now the Pipelines will only refresh when the time range changes.

The problem is that x-pack/legacy/plugins/monitoring/server/lib/logstash/get_pipeline_stats_aggregation.js gets all the pipelines regardless of pagination.

I initially went with a different approach by adding "from" : 0, "size" : 10 but that did not change the memory usage, guessing because it still aggregates to all buckets and then returns the offseted data.

Another idea I had was to maybe add summaryOnly: true bool to all */pipelines routes and only return clusterStatus or nodeSummary in this case. This way we can load all the pipelines once and then use that flag to only update the summary (and then reload the pipelines again if ever the date range changes).

Let me know what you guys think would really appreciate any suggestions @ycombinator @chrisronline

@ycombinator
Copy link
Contributor

ycombinator commented Jul 8, 2019

The problem is that x-pack/legacy/plugins/monitoring/server/lib/logstash/get_pipeline_stats_aggregation.js gets all the pipelines regardless of pagination.

Indeed, and this is a general problem we have with all our listing pages. What we probably want is server-side pagination but we should implement that for all our listing pages, not just one.

[EDIT] When I say "server-side pagination", I mean the following:

  • Only requesting one "page" of data from Elasticsearch at a time.
  • For each returned "row", making additional calls to ES to retrieve expensive data for that row.

initially went with a different approach by adding "from" : 0, "size" : 10 but that did not change the memory usage, guessing because it still aggregates to all buckets and then returns the offseted data.

Interesting. Does this mean server-side pagination would not work in this case?

Another idea I had was to maybe add summaryOnly: true bool to all */pipelines routes and only return clusterStatus/nodeSummary in this case. This way we can load all the pipelines once and then use that flag to only update the summary (and then reload the pipelines again if ever the date range changes).

If we do this, wouldn't any of the data in each row of the pipeline listing not update every 10s (or whatever interval the user sets)? That would be at odds with the rest of the listing pages in the Stack Monitoring UI, leading to an inconsistent UX, no?


Further up in this issue both @pickypg and I speculated that the memory usage has to do with the sparkline-related aggregations. Did you do any investigation along those lines? For instance, did you try to remove just the aggs needed for the sparklines' data and see if that helps?

More generally (i.e. disregarding our specific speculation about sparklines for a moment), it would be good to know what the long pole in the tent is on this page, w.r.t. to performance. Specifically which element in the UI is having the most performance impact? Once we've narrowed that down, it might become clearer how to proceed by perhaps handling that element differently than we do today.

@igoristic
Copy link
Contributor

I don't think anyone ever watches the sparklines anticipating their next tick, or expects them to be 100% accurate. The general assumption with all sparklines (at least from my trading crypto experience) is that they are static. This was my main motivation behind the solution.

If we do this, wouldn't any of the data in each row of the pipeline listing not update every 10s (or whatever interval the user sets)? That would be at odds with the rest of the listing pages in the Stack Monitoring UI, leading to an inconsistent UX, no?

True, but I think we can start introducing values that don't get updated/aggregated similarly in context with: #39308 this can either be performance or capability driven. But, of course we will still express that somehow based on the outcome of mentioned ticket

  • Only requesting one "page" of data from Elasticsearch at a time.

I was initially thinking the same, but even if we do go with the "only aggregate what you see" approach this only fixes the issue partially, since their row count can be set to 50.

  • For each returned "row", making additional calls to ES to retrieve expensive data for that row

Wouldn't this make it worse? Since, now we'll be making X number of requests more multiplied by the row count (every ten seconds). We would also need to do this each time they get a result from the search field. Maybe I'm miss understanding something?

Specifically which element in the UI is having the most performance impact? Once we've narrowed that down, it might become clearer how to proceed by perhaps handling that element differently than we do today.

@ycombinator @pickypg I'm kind of confused as to which memory hiccup we are concerned about? Browser, JVM, or Both? The ticket has a screenshot of JVM Heap chart, so that's what I've been focusing on

@pickypg
Copy link
Member Author

pickypg commented Jul 9, 2019

I was initially thinking the same, but even if we do go with the "only aggregate what you see" approach this only fixes the issue partially, since their row count can be set to 50.

Even for large pages, this would still be far superior to that existing approach that we've stuck ourselves with in the past. On the one hand, we'd still suffer with pages of 50, but that would be dramatically be better than unlimited pages.

Wouldn't this make it worse? Since, now we'll be making X number of requests more multiplied by the row count (every ten seconds).

Most likely not. The problem with the existing request is that it has to hold onto a massive amount of memory and pass that between nodes, until it finally is able to respond to the caller (Kibana). If you "walked" the list via the browser and just requested them on-demand, even in batches, it would be superior to a single massive request -- probably even if you managed to send them all in parallel because it can throw away the memory in pieces (and browsers limit the number of calls, so it couldn't fire 50 at once).

The bigger problem is that we would have to be intelligent about the paging so that it was efficient and also fast. We don't want to fire 10 in parallel at a time because that could end up, under defaults, being at least 70 shards getting touched per request so you'd quickly hit search rejections.

We would also need to do this each time they get a result from the search field.

It's the downside to having a more dynamic API, but paging also implies that the search field is actually using ES search rather than local search like it does today, which kind of further implies that we'd have to do that anyway.

I'm kind of confused as to which memory hiccup we are concerned about? Browser, JVM, or Both? The ticket has a screenshot of JVM Heap chart, so that's what I've been focusing on

100% JVM. Taking down the node(s) is problematic to the entire Stack. Taking down the browser would be pretty bad, but it's not the problem that I've been concerned about at all. My assumption is that @ycombinator meant "performance" in the timing sense rather than browser memory. If you removed the spark lines from the pipeline list, for debugging purposes, I can practically guarantee that memory pressure would disappear with 96 pipelines or even 293 as I saw recently.

Massive aggregations because of combinatorially large requests is my major concern here.

@ycombinator
Copy link
Contributor

Thanks @pickypg. I was traveling so couldn't get to this soon enough but you covered everything. :)

@igoristic
Copy link
Contributor

I forgot to mention, but there is a direct correlation between pipeline aggregation and JVM memory usage based on the simple tests I did in the beginning by removing:

pipelines: await getPipelines(req, lsIndexPattern, metricSet),

from x-pack/legacy/plugins/monitoring/server/routes/api/v1/logstash/pipelines/cluster_pipelines.js (and node_pipelines.js)


I'm thinking the per row aggregation might not be such a bad idea here especially if we limit the max row count for pagination to 20 (and set the default to 5)

@ycombinator
Copy link
Contributor

ycombinator commented Jul 10, 2019

Doesn't getPipelines get all pipelines for the page with each pipeline object containing sparkline data but also other data for that pipeline? I don't think removing getPipelines to test the memory impact is granular enough. I'll repeat what I've said earlier in this issue: please try removing just the aggregations related to sparklines. That way we can narrow it down even further. Without this how do we know for sure it's the sparklines that are the problem vs. other data?

Also, narrowing down the problem target like this could potentially open up other solutions, like lazily fetching just sparkline data after the initial page load (we did something along these lines in another performance-related issue not too long ago).

In general, I'd like us to be able to characterize the root cause of this problem as narrowly as possible before we start considering any solutions.

@chrisronline
Copy link
Contributor

chrisronline commented Jul 10, 2019

Late to the party, but it's not clear to me what request we're doing that is causing the performance issue. It'd probably be useful to outline all the requests made from Kibana server -> ES (which starts with an XHR request from Kibana client -> Kibana server) and then comment out each individual request separately, and test which one has the biggest impact on performance. It feels like we're jumping the gun a bit on what to do actually do here.

EDIT: In the case that it might actually only be a single request from Kibana server -> ES, we should break down the individual aggregations done in the single request and figure out which one is impacting performance the most

@pickypg
Copy link
Member Author

pickypg commented Jul 10, 2019

Ran into a new variation of this issue today, where there were only 41 pipelines, but in 7.x the page simply would not load because the default max buckets (10000) were exceeded. That protection exists to save ES from consuming too much JVM heap, and in this case is saving the user, but also making the entire pipeline listing unusable.

image

I couldn't even get to the page to load without increasing search.max_buckets to 15000. To see 4 hours worth of data, I had to increase it to 30000 (for only 41 pipelines!).

@inqueue
Copy link
Member

inqueue commented Jul 11, 2019

... 7.x the page simply would not load because the default max buckets (10000) were exceeded

Added 7.x to the initial issue description.

For anyone hitting this issue:

(from the initial issue description):

Use Stack Monitoring to monitor them all as part of the same deployment.

It is highly recommended to setup a separate monitoring cluster for production environments which will avoid conflict of having search.max_buckets too small for monitoring to successfully complete requests.

@inqueue
Copy link
Member

inqueue commented Jul 11, 2019

Ran into a new variation of this issue today, where there were only 41 pipelines, but in 7.x the page simply would not load because the default max buckets (10000) were exceeded.

Same as #36892?

@pickypg
Copy link
Member Author

pickypg commented Jul 11, 2019

@inqueue No, that's for the ES Node listing. It's the same symptom (blocked request because the monitoring listings try to return everything in one pass), but it's for a functionally different reason.

@igoristic
Copy link
Contributor

igoristic commented Jul 12, 2019

I was able to test this more granularly. I created about 40 fake pipelines using:

- pipeline.id: "random_0"
  pipeline.workers: 1
  pipeline.batch.size: 10
  config.string: "input { generator {} } filter { sleep { time => 1 } } output { stdout { codec => dots } }"

Gave it some time to bake in (for logstash to create in/out events etc), and was able to confirm that the aggregation does spike the JVM heap usage in Elasticsearch.

I created a new API call with a simple non-aggregated query for polling eg:

GET .monitoring-logstash-6-*,.monitoring-logstash-7-*/_search
{
	"query": {
		"bool": {
			"filter": [{
				"term": {
					"cluster_uuid": "0Fl90z31QCmpxOY3SCbiyw"
				}
			}, {
				"range": {
					"logstash_stats.timestamp": {
						"format": "epoch_millis",
						"gte": 1562778618444,
						"lte": 1562941838109
					}
				}
			}]
		}
	},
	"_source": ["logstash_stats.pipelines"]
}

And saw that the JVM usage went down significantly.

My testing:

  1. While observing ES's JVM memory I opened another window and went to: Stack Monitoring > Logstash Pipelines to see the memory spikes

  2. Navigated from pipelines list to a different app altogether (eg Dev Tools).

  3. Waited, and watched the spikes go down

  4. Then opened up the pipelines list again to see the JVM memory spikes go back up

One thing I also discovered is that watching the Monitoring in http://localhost:5601/app/monitoring#/elasticsearch/nodes... causes a small JVM heap spikes in itself. To test it: go to another app besides Stack Monitoring for a while; then go to ES node monitoring charts and notice the JVM usage is somewhat flat, and then starts to spike up more and more (to a somewhat consistent level).

I'm thinking maybe it'll be a lot cheaper to do some of this aggregation on the frontend? Thoughts @ycombinator @pickypg @chrisronline


EDIT:

Forgot to mention that narrowing aggregation to just ids eg:

GET .monitoring-logstash-6-*,.monitoring-logstash-7-*/_search
{
	"query": {
		"bool": {
			"filter": [{
				"term": {
					"cluster_uuid": "0Fl90z31QCmpxOY3SCbiyw"
				}
			}, {
				"range": {
					"logstash_stats.timestamp": {
						"format": "epoch_millis",
						"gte": 1562778618444,
						"lte": 1562941838109
					}
				}
			}]
		}
	},

	"aggs": {
		"check": {
			"date_histogram": {
				"field": "logstash_stats.timestamp",
				"interval": "30s"
			},
			"aggs": {
				"pipelines_nested": {
					"nested": {
						"path": "logstash_stats.pipelines"
					},
					"aggs": {
						"by_pipeline_id": {
							"terms": {
								"field": "logstash_stats.pipelines.id",
								"size": 1000
							}
						}
					}
				}
			}
		}
	}
}

Also spiked the memory usage a little bit. I couldn't test it thoroughly, because I eventually ran into too_many_buckets_exception

@pickypg
Copy link
Member Author

pickypg commented Jul 12, 2019

@igoristic Yeah, once we start doing arbitrarily sized aggregations (e.g., date_histogram with a fixed interval) with sub-aggregations, then you enter a dangerous combinatorial problem.

Looking at your aggregation, your date range is about 45 hours, but your aggregation is for 30s. That's 5,400 top-level buckets. From there, you're requesting [up to] 1,000 sub-buckets for the ID. That would have 5,400,000 total buckets (!) if you had that many pipelines. As it happens, you only have 40, so it's actually 5,400 * 40 (216,000!) total buckets (which gets stopped at 10,001 by default).

Looking at the edit, there's two things that I think we should consider moving forward for this solution and three things overall based on your comment.

The first is unrelated to the Logstash Pipelines issue here, but related to:

http://localhost:5601/app/monitoring#/elasticsearch/nodes... causes a small JVM heap spikes in itself.

I think it's safe to discuss this problem generically because whatever you all do to introduce paging here should extend to the other screens. However, beyond that, I think it would be ideal to discuss the nodes own problems (specifically it aggregates the shards because node_stats documents don't have that data in them yet amongst a few other light-weight values) in its own issue: #36892.

Relative to the LS pipeline issues:

I'm thinking maybe it'll be a lot cheaper to do some of this aggregation on the frontend? Thoughts

For a long time, I have thought that we need to completely change how our listings work and I think this is fundamentally what we must do. We should page by taking advantage of ES, only selecting the subset that we want to display, then performing a follow-on aggregation that only applies to those documents if we even have to aggregate. Be aware that your example polling query only fetched the top 10 hits, which isn't 100% comparable to what the existing UI functionally does: it effectively fetches every hit it can.

Although this would fix the memory issues that we face (and bucket quantities), this does introduce some problems relative to the existing UX. First, anything aggregated would become unsortable because we wouldn't have all values in memory to know if we were sorting properly. Second, any search / filter boxes would have to dynamically work against the generated ES query rather than the in-memory data because, again, we don't have it all. That's not an easy thing to implement because it has implications with understanding the mappings of the data and what's therefore possible (the EUI search bar examples should help here).

"date_histogram": {

Having just gone through my explanation of the costs associated with bucketing massive amounts, a very easy win for us would be to stop using date_histogram altogether and, instead, use the relatively new auto_date_histogram so that we can say how many buckets we want rather than how big each bucket should be.

Particularly with the spark line, which has a lot of dots, this should help tremendously and we should reduce the number of buckets we have to probably something on the order of 10. Using your own edited query / agg as an example, this would have reduced it from 5,400 top-level buckets to 10 and made even the 1,000 sub-bucket request allowable within the context of the ES max bucket cap.

We could then extend this auto_date_histogram usage to the rest of the Monitoring UI's charts to make things significantly more predictable across the UI (note: the response from the aggregation returns the generated interval that it used, so we can still display that in the tooltips) both in terms of performance as well as visually.

@igoristic
Copy link
Contributor

igoristic commented Jul 22, 2019

Upon thorough investigation into auto_date_histogram vs date_histogram I concluded that auto_date_histogram does use less JVM memory. My investigation wasn't all conclusive, since auto_date_histogram might have a bug dealing with bucket_script (ES team is aware). This prevented me from truly stressing the system with pipelines` min/max aggregations:

"aggs": {
  "events_stats": {
    "stats": {
      "field": "logstash_stats.pipelines.events.out"
    }
  },
  "throughput": {
    "bucket_script": {
      "script": "params.max - params.min",
      "buckets_path": {
        "min": "events_stats.min",
        "max": "events_stats.max"
      }
    }
  }
}

I was still able to derive comparison, just from observing the Stack Monitoring overview page, since it also uses date_histogram for pipelines count/type etc.

We are actually dealing with two different types of issues here:

  1. Very fine granularity on date_histogram can trigger a max buckets error (default is 10,000 buckets) depending on range/aggs/interval. And, if you're lucky enough to get passed it, you'll most likely hit the 2nd issue...

  2. date_histogram can significantly spike ES's JVM memory usage (especially with pipeline aggregation). This is also because it's done every 10 seconds

I also tested and investigated different possible solutions for both issues:

  1. Use auto_date_histogram instead of date_histogram once the issue with bucket_script is resolved.

  2. Calculate an optimal interval for date_histogram based on range eg: if range is around 30min use 1m interval; if range is several hours use 10min interval; if days use 1h; etc...

  3. Get and aggregate each data item (in this case a pipeline) separately. Might still need to do some chunking here since a single pipeline can cause a max buckets error if range is 5 days: 2880 half minutes (30s) in a day * 5 days = 14400 buckets

  4. Split requests by hours (this is if we still want to keep our 30s granularity). Eg: Break down the range into hours (or smaller depending on the total items) and get each hour individually, then stitch it all together. Please note this will only bypass max bucket exception, and might not yield much improvements in JVM memory usage

  5. Load some data only once (instead of every 10 seconds). Since, the pipeline sparklines are not very legible and provide almost no value as a real-time component it's worth considering to make them static. This still does not solve the max buckets issue, but it can go in conjuncture with solution # 1. I already have a small example here: Issue 37246: Removed pipeline refresh to avoid jvm OOM errors #40549

EDIT: Forgot to mention that in addition to all these solutions we can do some of the aggregations on the frontend to take the load off ES

**One thing to note that paging the hits has no impact on aggregation, so we always get all the items (pipelines), so it would be great to also include our own type of paging solution in addition with the solutions described above (since there could be 100 pipeline, but only 5 visible at a time).

Would love to hear your guys' opinion: @pickypg @ycombinator @chrisronline @cachedout

@pickypg
Copy link
Member Author

pickypg commented Jul 22, 2019

1. Use auto_date_histogram instead of date_histogram once the issue with bucket_script is resolved.

This is the ideal solution to me, assuming the ES team can fix the pipeline issue that we observed. We get direct control over the number of buckets, removing 100% of the unpredictable nature of it that I'm about to discuss in the second point.

2. Calculate an optimal interval for date_histogram based on range eg: if range is around 30min use 1m interval; if range is several hours use 10min interval; if days use 1h; etc...

We actually already try to do this. I'm not sure that the sparklines take advantage of the same logic as the rest of the charts, but we try to set "intelligent" intervals based on the requested time range. The inherent variability here is half of what our problem is though because we the number of buckets increases as the range does until we effectively shift gears to the next higher interval.

3. Get and aggregate each data item (in this case a pipeline) separately.

I think we want to do this, to some degree. #37246 (comment) Instead of one-by-one, I think we could still safely do it page-by-page (e.g., 20 - 50 at a time). Note, this means we'd have to do search via hits, then do a second request that aggregates against those specific IDs.

4. Split requests by hours (this is if we still want to keep our 30s granularity).

This would end up generating roughly the same number of buckets, but over a set of requests instead of one. That would be helpful to the GC being able to kick in and save ES, but with auto_date_histogram, it should simply be unnecessary.

5. Load some data only once (instead of every 10 seconds).

This sounds like just a mechanism for hoping to avoid the issue, but it wouldn't safely work. Two users exploring pipeline issues together could still trigger massive heap usage in parallel.

@ycombinator
Copy link
Contributor

ycombinator commented Sep 19, 2019

It's been a while since there's been activity on this issue and IMO it's a pretty critical one as it keeps coming up. So I wanted to try and summarize the discussion so far and see if we could move forward, at least with a short-term fix.

Long-term fix: I think the consensus here seems to be smarter pagination. That is, delegate pagination to ES and only request enough data for one page at a time. Even in that data, we might want to only request just enough to render enough useful data on a row initially, and then do follow-on aggregations per row to render more data asynchronously.

Short-term fix #1: Using auto_date_histogram instead of date_histogram. Currently blocked on ES.

Short-term fix #2: For sparklines used in listing page (AFAIK the LS Pipeline Listing page is the only one doing this today), use a different mapping of time picker intervals => bucket intervals than the ones we use for other timeseries charts in the Stack Monitoring UI. The idea here is to come up with larger bucket intervals, therefore reducing the # of buckets. I think this could help alleviate the problem to a certain extent, but if the # of pipelines (i.e. # of sparklines) in the listing grows beyond a certain point we will hit the same problem again.

Short-term fix #3: Split requests by hours. This means fewer buckets being requested in each request. Again, similar to #2, this could help alleviate the problem to a certain extent, but if the # of pipelines (i.e. # of sparklines) in the listing grows beyond a certain point we will hit the same problem again.

Short-term fix #4: AFAICT this hasn't been proposed yet but what about providing users with a checkbox on the LS Pipeline Listing page to show/hide the sparklines? There are some UX details to work out here but the general idea here is to give users an escape hatch to avoid the OOM by avoiding requesting data for the sparklines altogether.

I think we should vote on either short term fix #2, #3, or #4 and make progress on that, just to move this issue forward. Meanwhile we should continue to follow up with the ES folks on the blocker for short term fix #1 (if we still need it) and also work on the long term fix.

@igoristic @chrisronline @cachedout @pickypg WDYT?

@cachedout
Copy link
Contributor

I like option 2. I don't think that granularity in the history is as critical as it might be in a larger time-series graph.

@pickypg
Copy link
Member Author

pickypg commented Sep 19, 2019

Long-term fix

It's unclear to me why we can't start moving in this direction (beyond scheduling of course), especially for the Pipeline page as the first pass. This would ultimately fix the issue across the board, but also serve as the basis for every other part of Stack Monitoring.

Short-term fix #2
Short-term fix #3

I like both of these options, but I am not sure that either of these work enough for the larger users out there to warrant the effort versus the long term fix, and we wouldn't know how much they helped in 40+ LS pipeline users without implementing them.

Short-term fix #4

I am not sure that this would work since the page would sometimes never actually load to give them the opportunity.

Meanwhile we should continue to follow up with the ES folks on the blocker for short term fix #1 (if we still need it)

100% this. auto_date_histogram should be the preferred date histogram moving forward, both for its general memory benefits, but also it simplifies Stack Monitoring by removing the need for us to calculate the range based on the wishful data size, versus the data size.

All of Kibana should be moving in that direction.

@ycombinator
Copy link
Contributor

It's unclear to me why we can't start moving in this direction (beyond scheduling of course)

Yeah, I think it's scheduling more than anything else, given how we haven't been able to attend to this issue for ~2 months. If we can prioritize this fix over other commitments, I agree we should focus on this over any of the short term options mentioned here.

I am not sure that this would work since the page would sometimes never actually load to give them the opportunity.

True, if the default is to show the sparklines. We could flip the default to be to hide the sparklines. Again, keeping in mind that this is meant to be more of a stop gap.

@cachedout
Copy link
Contributor

True, if the default is to show the sparklines. We could flip the default to be to hide the sparklines. Again, keeping in mind that this is meant to be more of a stop gap.

I think this is a perfectly reasonable course of action for a short-term fix.

@chrisronline
Copy link
Contributor

Long-term fix for this particular page: #46587

@chrisronline
Copy link
Contributor

I'm going to close this out, as #46587 should fix this for effectively everyone. Please reopen if you see the behavior persist after the fix is released

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Fixes for quality problems that affect the customer experience discuss Team:Monitoring Stack Monitoring team
Projects
None yet
Development

No branches or pull requests

7 participants