Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory budget exceeded error when attempting to create a new vpc #4283

Closed
askfongjojo opened this issue Oct 16, 2023 · 10 comments · Fixed by #4298
Closed

Memory budget exceeded error when attempting to create a new vpc #4283

askfongjojo opened this issue Oct 16, 2023 · 10 comments · Fixed by #4298
Assignees
Milestone

Comments

@askfongjojo
Copy link

The error was encountered on rack2 when I tried to create a new project:

17:28:52.650Z INFO 65a11c18-7f59-41ac-b9e7-680627f996e7 (dropshot_external): request completed
    error_message_external = Internal Server Error
    error_message_internal = saga ACTION error at node "vpc": database error (kind = Unknown): root: memory budget exceeded: 50328352 bytes requested, 117717552 currently allocated, 134217728 bytes in budget\nHINT: Consider increasing --max-sql-memory startup parameter.\n
    file = /home/build/.cargo/git/checkouts/dropshot-a4a923d29dccc492/fa728d0/dropshot/src/server.rs:841
    latency_us = 4283076
    local_addr = 172.30.2.5:443
    method = POST
    remote_addr = 172.20.17.42:58746
    req_id = 68591636-0a4e-4e81-80b3-0a846f0e9aa5
    response_code = 500
    uri = https://oxide-local.sys.rack2.eng.oxide.computer/v1/projects

Given the error was related to vpc, I then tried to create a new vpc in an existing project and hit the same error.

@askfongjojo
Copy link
Author

This is a CockroachDB error, indicating that it's at risk of crashing due to OOM.
@davepacheco - any insight into what we can do to tune the max-sql-memory setting?

@davepacheco
Copy link
Collaborator

I haven't seen this before but we haven't done much tuning of CockroachDB and I don't remember this coming up in the stress testing we did back in 2020. (But those were relatively simple queries. It sounds like memory usage here depends on the complexity of the queries.) I found some information here:

@askfongjojo I assume you're referring to this quote from that second link:

A memory budget exceeded error is an early warning that the cockroach process on a node is at risk of crashing due to an out-of-memory (OOM) crash.

I think that's a Linux-ism and does not apply to us. (Helios avoids overcommitting memory rather than terminates processes when memory gets tight.)

It's quite possible that their default value for --max-sql-memory of "25%" [of what? what CockroachDB thinks the machine has?] is too low for our deployment. In calculating a default, they may be assuming lots of overhead for other parts of the system. Linux's behavior makes it particularly risky for them not to do this since getting OOM-killed has a major impact. But if CockroachDB in our system is looking at the memory available to the zone, and that's only the slice of the system we've carved off for CockroachDB, then we should probably tune this way up.

The big question is really: are we spending too much memory or have we just tuned it too tightly? To answer these I'm wondering:

  • what query was this running? do we expect it to require so much memory?
  • what other sessions were open and how much memory were they using?
  • when CockroachDB is configured to use "25% of memory", in our deployment, how does it determine how much total memory there is? does it get the amount of memory we've allocated to the zone, or the total memory on the system, or something else?

(not asking anybody to dig into these -- though that would be helpful! -- just writing these down)

@askfongjojo
Copy link
Author

Thanks dap for taking a look. I've been trying to look for any useful log message on the CRDB side but haven't figured out the exact query triggering the error. This will help answer the question of whether it's reasonable for it to require so much memory. Meanwhile, I got only some heap usage report from the database health logs to get a sense of the consumption. I'll keep digging the logs for a bit more.

Sled 9

BRM44220005 # zlogin oxz_cockroachdb_4c3ef132-ec83-4b1b-9574-7c7d3035f9e9
root@oxz_cockroachdb_4c3ef132-ec83-4b1b-9574-7c7d3035f9e9# tail /data/logs/cockroach-health.log
I231016 17:55:42.549978 436 2@server/status/runtime_log.go:47 ⋮ [n2] 56932  runtime stats: 0 B RSS, 729 goroutines (stacks: 14 MiB), 296 MiB/488 MiB Go alloc/total(stale) (heap fragmentation: 52 MiB, heap reserved: 56 MiB, heap released: 258 MiB), 0 B/0 B CGO alloc/total (16.4 CGO/sec), 0.0/0.0 %(u/s)time, 0.0 %gc (23904x), 0 B/0 B (r/w)net
I231016 17:55:52.551207 436 2@util/log/event_log.go:32 ⋮ [n2] 56933 ={"Timestamp":1697478952551205701,"EventType":"runtime_stats","GoroutineCount":729,"MemStackSysBytes":15040512,"GoAllocBytes":333823640,"GoTotalBytes":511136936,"GoStatsStaleness":0.000196751,"HeapFragmentBytes":49701224,"HeapReservedBytes":40132608,"HeapReleasedBytes":270139392,"CGoCallRate":0.39994633,"GCRunCount":23904}
I231016 17:55:52.551286 436 2@server/status/runtime_log.go:47 ⋮ [n2] 56934  runtime stats: 0 B RSS, 729 goroutines (stacks: 14 MiB), 318 MiB/488 MiB Go alloc/total(stale) (heap fragmentation: 47 MiB, heap reserved: 38 MiB, heap released: 258 MiB), 0 B/0 B CGO alloc/total (0.4 CGO/sec), 0.0/0.0 %(u/s)time, 0.0 %gc (23904x), 0 B/0 B (r/w)net
I231016 17:56:02.551800 436 2@util/log/event_log.go:32 ⋮ [n2] 56935 ={"Timestamp":1697478962551798234,"EventType":"runtime_stats","GoroutineCount":729,"MemStackSysBytes":12877824,"GoAllocBytes":181682976,"GoTotalBytes":544511144,"GoStatsStaleness":0.000278336,"HeapFragmentBytes":90823904,"HeapReservedBytes":186687488,"HeapReleasedBytes":236765184,"CGoCallRate":3113.6155,"GCPausePercent":0.020839885,"GCRunCount":23911}
I231016 17:56:02.551914 436 2@server/status/runtime_log.go:47 ⋮ [n2] 56936  runtime stats: 0 B RSS, 729 goroutines (stacks: 12 MiB), 173 MiB/519 MiB Go alloc/total(stale) (heap fragmentation: 87 MiB, heap reserved: 178 MiB, heap released: 226 MiB), 0 B/0 B CGO alloc/total (3113.6 CGO/sec), 0.0/0.0 %(u/s)time, 0.0 %gc (23911x), 0 B/0 B (r/w)net
I231016 17:56:12.552637 436 2@util/log/event_log.go:32 ⋮ [n2] 56937 ={"Timestamp":1697478972552635175,"EventType":"runtime_stats","GoroutineCount":730,"MemStackSysBytes":14090240,"GoAllocBytes":211921136,"GoTotalBytes":544847016,"GoStatsStaleness":0.000258732,"HeapFragmentBytes":83384080,"HeapReservedBytes":163012608,"HeapReleasedBytes":236429312,"CGoCallRate":6.6994395,"GCRunCount":23911}
I231016 17:56:12.552749 436 2@server/status/runtime_log.go:47 ⋮ [n2] 56938  runtime stats: 0 B RSS, 730 goroutines (stacks: 13 MiB), 202 MiB/520 MiB Go alloc/total(stale) (heap fragmentation: 80 MiB, heap reserved: 156 MiB, heap released: 226 MiB), 0 B/0 B CGO alloc/total (6.7 CGO/sec), 0.0/0.0 %(u/s)time, 0.0 %gc (23911x), 0 B/0 B (r/w)net
I231016 17:56:22.553072 436 2@util/log/event_log.go:32 ⋮ [n2] 56939 ={"Timestamp":1697478982553070981,"EventType":"runtime_stats","GoroutineCount":730,"MemStackSysBytes":14581760,"GoAllocBytes":235549016,"GoTotalBytes":544847016,"GoStatsStaleness":0.000200899,"HeapFragmentBytes":74370728,"HeapReservedBytes":147906560,"HeapReleasedBytes":236429312,"CGoCallRate":12.999433,"GCRunCount":23911}
I231016 17:56:22.553152 436 2@server/status/runtime_log.go:47 ⋮ [n2] 56940  runtime stats: 0 B RSS, 730 goroutines (stacks: 14 MiB), 225 MiB/520 MiB Go alloc/total(stale) (heap fragmentation: 71 MiB, heap reserved: 141 MiB, heap released: 226 MiB), 0 B/0 B CGO alloc/total (13.0 CGO/sec), 0.0/0.0 %(u/s)time, 0.0 %gc (23911x), 0 B/0 B (r/w)net

Sled 10

BRM42220009 # zlogin oxz_cockroachdb_a3628a56-6f85-43b5-be50-71d8f0e04877
root@oxz_cockroachdb_a3628a56-6f85-43b5-be50-71d8f0e04877#  tail /data/logs/cockroach-health.log
I231016 18:01:09.539609 342 2@util/log/event_log.go:32 ⋮ [n4] 56959 ={"Timestamp":1697479269539607727,"EventType":"runtime_stats","GoroutineCount":712,"MemStackSysBytes":14450688,"GoAllocBytes":231082696,"GoTotalBytes":460818584,"GoStatsStaleness":0.000165172,"HeapFragmentBytes":65828152,"HeapReservedBytes":78790656,"HeapReleasedBytes":259964928,"CGoCallRate":0.7999532,"GCRunCount":33931}
I231016 18:01:09.539688 342 2@server/status/runtime_log.go:47 ⋮ [n4] 56960  runtime stats: 0 B RSS, 712 goroutines (stacks: 14 MiB), 220 MiB/440 MiB Go alloc/total(stale) (heap fragmentation: 63 MiB, heap reserved: 75 MiB, heap released: 248 MiB), 0 B/0 B CGO alloc/total (0.8 CGO/sec), 0.0/0.0 %(u/s)time, 0.0 %gc (33931x), 0 B/0 B (r/w)net
I231016 18:01:19.540209 342 2@util/log/event_log.go:32 ⋮ [n4] 56961 ={"Timestamp":1697479279540208063,"EventType":"runtime_stats","GoroutineCount":712,"MemStackSysBytes":14614528,"GoAllocBytes":252233808,"GoTotalBytes":460818584,"GoStatsStaleness":0.00016907,"HeapFragmentBytes":61118384,"HeapReservedBytes":62185472,"HeapReleasedBytes":259964928,"CGoCallRate":0.59996396,"GCRunCount":33931}
I231016 18:01:19.540285 342 2@server/status/runtime_log.go:47 ⋮ [n4] 56962  runtime stats: 0 B RSS, 712 goroutines (stacks: 14 MiB), 240 MiB/440 MiB Go alloc/total(stale) (heap fragmentation: 58 MiB, heap reserved: 59 MiB, heap released: 248 MiB), 0 B/0 B CGO alloc/total (0.6 CGO/sec), 0.0/0.0 %(u/s)time, 0.0 %gc (33931x), 0 B/0 B (r/w)net
I231016 18:01:29.541768 342 2@util/log/event_log.go:32 ⋮ [n4] 56963 ={"Timestamp":1697479289541766640,"EventType":"runtime_stats","GoroutineCount":713,"MemStackSysBytes":14778368,"GoAllocBytes":273127336,"GoTotalBytes":460818584,"GoStatsStaleness":0.000172336,"HeapFragmentBytes":55199832,"HeapReservedBytes":47046656,"HeapReleasedBytes":259964928,"CGoCallRate":0.7998753,"GCRunCount":33931}
I231016 18:01:29.541845 342 2@server/status/runtime_log.go:47 ⋮ [n4] 56964  runtime stats: 0 B RSS, 713 goroutines (stacks: 14 MiB), 260 MiB/440 MiB Go alloc/total(stale) (heap fragmentation: 53 MiB, heap reserved: 45 MiB, heap released: 248 MiB), 0 B/0 B CGO alloc/total (0.8 CGO/sec), 0.0/0.0 %(u/s)time, 0.0 %gc (33931x), 0 B/0 B (r/w)net
I231016 18:01:39.541367 342 2@util/log/event_log.go:32 ⋮ [n4] 56965 ={"Timestamp":1697479299541365913,"EventType":"runtime_stats","GoroutineCount":713,"MemStackSysBytes":14942208,"GoAllocBytes":302698296,"GoTotalBytes":460818584,"GoStatsStaleness":0.000186813,"HeapFragmentBytes":47771848,"HeapReservedBytes":24739840,"HeapReleasedBytes":259964928,"CGoCallRate":17.600706,"GCRunCount":33931}
I231016 18:01:39.541441 342 2@server/status/runtime_log.go:47 ⋮ [n4] 56966  runtime stats: 0 B RSS, 713 goroutines (stacks: 14 MiB), 289 MiB/440 MiB Go alloc/total(stale) (heap fragmentation: 46 MiB, heap reserved: 24 MiB, heap released: 248 MiB), 0 B/0 B CGO alloc/total (17.6 CGO/sec), 0.0/0.0 %(u/s)time, 0.0 %gc (33931x), 0 B/0 B (r/w)net
I231016 18:01:49.541914 342 2@util/log/event_log.go:32 ⋮ [n4] 56967 ={"Timestamp":1697479309541912987,"EventType":"runtime_stats","GoroutineCount":713,"MemStackSysBytes":13434880,"GoAllocBytes":186844328,"GoTotalBytes":460818584,"GoStatsStaleness":0.000230005,"HeapFragmentBytes":82172760,"HeapReservedBytes":107700224,"HeapReleasedBytes":259964928,"CGoCallRate":21.498825,"GCPausePercent":0.009896369,"GCRunCount":33932}
I231016 18:01:49.541999 342 2@server/status/runtime_log.go:47 ⋮ [n4] 56968  runtime stats: 0 B RSS, 713 goroutines (stacks: 13 MiB), 178 MiB/440 MiB Go alloc/total(stale) (heap fragmentation: 78 MiB, heap reserved: 103 MiB, heap released: 248 MiB), 0 B/0 B CGO alloc/total (21.5 CGO/sec), 0.0/0.0 %(u/s)time, 0.0 %gc (33932x), 0 B/0 B (r/w)net
I231016 18:01:59.542513 342 2@util/log/event_log.go:32 ⋮ [n4] 56969 ={"Timestamp":1697479319542511981,"EventType":"runtime_stats","GoroutineCount":712,"MemStackSysBytes":14188544,"GoAllocBytes":211739952,"GoTotalBytes":460818584,"GoStatsStaleness":0.000178658,"HeapFragmentBytes":73710288,"HeapReservedBytes":90513408,"HeapReleasedBytes":259964928,"CGoCallRate":0.299982,"GCRunCount":33932}
I231016 18:01:59.542589 342 2@server/status/runtime_log.go:47 ⋮ [n4] 56970  runtime stats: 0 B RSS, 712 goroutines (stacks: 14 MiB), 202 MiB/440 MiB Go alloc/total(stale) (heap fragmentation: 70 MiB, heap reserved: 86 MiB, heap released: 248 MiB), 0 B/0 B CGO alloc/total (0.3 CGO/sec), 0.0/0.0 %(u/s)ti
me, 0.0 %gc (33932x), 0 B/0 B (r/w)net

Sled 14

BRM42220051 # zlogin oxz_cockroachdb_8bbea076-ff60-4330-8302-383e18140ef3
root@oxz_cockroachdb_8bbea076-ff60-4330-8302-383e18140ef3# tail /data/logs/cockroach-health.logI231016 
I231016 18:13:11.393470 179 2@util/log/event_log.go:32 ⋮ [n3] 57097 ={"Timestamp":1697479991393468752,"EventType":"runtime_stats","GoroutineCount":762,"MemStackSysBytes":13926400,"GoAllocBytes":207280440,"GoTotalBytes":530019496,"GoStatsStaleness":0.000203494,"HeapFragmentBytes":89065160,"HeapReservedBytes":146522112,"HeapReleasedBytes":268820480,"GCPausePercent":0.010586003,"GCRunCount":25083}
I231016 18:13:11.393548 179 2@server/status/runtime_log.go:47 ⋮ [n3] 57098  runtime stats: 0 B RSS, 762 goroutines (stacks: 13 MiB), 198 MiB/506 MiB Go alloc/total(stale) (heap fragmentation: 85 MiB, heap reserved: 140 MiB, heap released: 256 MiB), 0 B/0 B CGO alloc/total (0.0 CGO/sec), 0.0/0.0 %(u/s)time, 0.0 %gc (25083x), 0 B/0 B (r/w)net
I231016 18:13:21.393161 179 2@util/log/event_log.go:32 ⋮ [n3] 57099 ={"Timestamp":1697480001393160294,"EventType":"runtime_stats","GoroutineCount":763,"MemStackSysBytes":13795328,"GoAllocBytes":226189104,"GoTotalBytes":534770856,"GoStatsStaleness":0.000196331,"HeapFragmentBytes":71655632,"HeapReservedBytes":149905408,"HeapReleasedBytes":264069120,"CGoCallRate":2090.6646,"GCPausePercent":0.005709536,"GCRunCount":25088}
I231016 18:13:21.393241 179 2@server/status/runtime_log.go:47 ⋮ [n3] 57100  runtime stats: 0 B RSS, 763 goroutines (stacks: 13 MiB), 216 MiB/510 MiB Go alloc/total(stale) (heap fragmentation: 68 MiB, heap reserved: 143 MiB, heap released: 252 MiB), 0 B/0 B CGO alloc/total (2090.7 CGO/sec), 0.0/0.0 %(u/s)time, 0.0 %gc (25088x), 0 B/0 B (r/w)net
I231016 18:13:31.394871 179 2@util/log/event_log.go:32 ⋮ [n3] 57101 ={"Timestamp":1697480011394869821,"EventType":"runtime_stats","GoroutineCount":763,"MemStackSysBytes":14811136,"GoAllocBytes":264942352,"GoTotalBytes":534770856,"GoStatsStaleness":0.000200188,"HeapFragmentBytes":67112176,"HeapReservedBytes":114679808,"HeapReleasedBytes":264069120,"CGoCallRate":12.297898,"GCRunCount":25088}
I231016 18:13:31.394951 179 2@server/status/runtime_log.go:47 ⋮ [n3] 57102  runtime stats: 0 B RSS, 763 goroutines (stacks: 14 MiB), 253 MiB/510 MiB Go alloc/total(stale) (heap fragmentation: 64 MiB, heap reserved: 109 MiB, heap released: 252 MiB), 0 B/0 B CGO alloc/total (12.3 CGO/sec), 0.0/0.0 %(u/s)time, 0.0 %gc (25088x), 0 B/0 B (r/w)net
I231016 18:13:41.394476 179 2@util/log/event_log.go:32 ⋮ [n3] 57103 ={"Timestamp":1697480021394475228,"EventType":"runtime_stats","GoroutineCount":763,"MemStackSysBytes":15138816,"GoAllocBytes":297956560,"GoTotalBytes":534770856,"GoStatsStaleness":0.000176704,"HeapFragmentBytes":57862960,"HeapReservedBytes":90587136,"HeapReleasedBytes":264069120,"CGoCallRate":6.0002365,"GCRunCount":25088}
I231016 18:13:41.394557 179 2@server/status/runtime_log.go:47 ⋮ [n3] 57104  runtime stats: 0 B RSS, 763 goroutines (stacks: 14 MiB), 284 MiB/510 MiB Go alloc/total(stale) (heap fragmentation: 55 MiB, heap reserved: 86 MiB, heap released: 252 MiB), 0 B/0 B CGO alloc/total (6.0 CGO/sec), 0.0/0.0 %(u/s)time, 0.0 %gc (25088x), 0 B/0 B (r/w)net
I231016 18:13:51.394921 179 2@util/log/event_log.go:32 ⋮ [n3] 57105 ={"Timestamp":1697480031394919848,"EventType":"runtime_stats","GoroutineCount":762,"MemStackSysBytes":14286848,"GoAllocBytes":198669712,"GoTotalBytes":534770856,"GoStatsStaleness":0.000199981,"HeapFragmentBytes":84666992,"HeapReservedBytes":163921920,"HeapReleasedBytes":264069120,"CGoCallRate":5.9997334,"GCPausePercent":0.0018859762,"GCRunCount":25089}
I231016 18:13:51.395000 179 2@server/status/runtime_log.go:47 ⋮ [n3] 57106  runtime stats: 0 B RSS, 762 goroutines (stacks: 14 MiB), 190 MiB/510 MiB Go alloc/total(stale) (heap fragmentation: 81 MiB, heap reserved: 156 MiB, heap released: 252 MiB), 0 B/0 B CGO alloc/total (6.0 CGO/sec), 0.0/0.0 %(u/s)time, 0.0 %gc (25089x), 0 B/0 B (r/w)net

Sled 16

I231016 18:13:11.390096 520 2@server/status/runtime_log.go:47 ⋮ [n5] 58475  runtime stats: 0 B RSS, 744 goroutines (stacks: 14 MiB), 217 MiB/410 MiB Go alloc/total(stale) (heap fragmentation: 47 MiB, heap reserved: 71 MiB, heap released: 151 MiB), 0 B/0 B CGO alloc/total (0.0 CGO/sec), 0.0/0.0 %(u/s)time, 0.0 %gc (36823x), 0 B/0 B (r/w)net
I231016 18:13:21.389646 520 2@util/log/event_log.go:32 ⋮ [n5] 58476 ={"Timestamp":1697480001389644828,"EventType":"runtime_stats","GoroutineCount":745,"MemStackSysBytes":14942208,"GoAllocBytes":253801816,"GoTotalBytes":430377080,"GoStatsStaleness":0.000200026,"HeapFragmentBytes":40544936,"HeapReservedBytes":56623104,"HeapReleasedBytes":158375936,"GCRunCount":36823}
I231016 18:13:21.389729 520 2@server/status/runtime_log.go:47 ⋮ [n5] 58477  runtime stats: 0 B RSS, 745 goroutines (stacks: 14 MiB), 242 MiB/410 MiB Go alloc/total(stale) (heap fragmentation: 39 MiB, heap reserved: 54 MiB, heap released: 151 MiB), 0 B/0 B CGO alloc/total (0.0 CGO/sec), 0.0/0.0 %(u/s)time, 0.0 %gc (36823x), 0 B/0 B (r/w)net
I231016 18:13:31.390014 520 2@util/log/event_log.go:32 ⋮ [n5] 58478 ={"Timestamp":1697480011390013340,"EventType":"runtime_stats","GoroutineCount":745,"MemStackSysBytes":13402112,"GoAllocBytes":156265312,"GoTotalBytes":431392888,"GoStatsStaleness":0.000206319,"HeapFragmentBytes":86398112,"HeapReservedBytes":110862336,"HeapReleasedBytes":157360128,"GCPausePercent":0.010703575,"GCRunCount":36824}
I231016 18:13:31.390095 520 2@server/status/runtime_log.go:47 ⋮ [n5] 58479  runtime stats: 0 B RSS, 745 goroutines (stacks: 13 MiB), 149 MiB/411 MiB Go alloc/total(stale) (heap fragmentation: 82 MiB, heap reserved: 106 MiB, heap released: 150 MiB), 0 B/0 B CGO alloc/total (0.0 CGO/sec), 0.0/0.0 %(u/s)time, 0.0 %gc (36824x), 0 B/0 B (r/w)net
I231016 18:13:41.390999 520 2@util/log/event_log.go:32 ⋮ [n5] 58480 ={"Timestamp":1697480021390998060,"EventType":"runtime_stats","GoroutineCount":745,"MemStackSysBytes":14319616,"GoAllocBytes":188534144,"GoTotalBytes":431392888,"GoStatsStaleness":0.000193715,"HeapFragmentBytes":73790080,"HeapReservedBytes":90284032,"HeapReleasedBytes":157360128,"GCRunCount":36824}
I231016 18:13:41.391079 520 2@server/status/runtime_log.go:47 ⋮ [n5] 58481  runtime stats: 0 B RSS, 745 goroutines (stacks: 14 MiB), 180 MiB/411 MiB Go alloc/total(stale) (heap fragmentation: 70 MiB, heap reserved: 86 MiB, heap released: 150 MiB), 0 B/0 B CGO alloc/total (0.0 CGO/sec), 0.0/0.0 %(u/s)time, 0.0 %gc (36824x), 0 B/0 B (r/w)net
I231016 18:13:51.391604 520 2@util/log/event_log.go:32 ⋮ [n5] 58482 ={"Timestamp":1697480031391602935,"EventType":"runtime_stats","GoroutineCount":744,"MemStackSysBytes":14647296,"GoAllocBytes":211845344,"GoTotalBytes":431392888,"GoStatsStaleness":0.00018553,"HeapFragmentBytes":65314592,"HeapReservedBytes":75120640,"HeapReleasedBytes":157360128,"GCRunCount":36824}
I231016 18:13:51.391686 520 2@server/status/runtime_log.go:47 ⋮ [n5] 58483  runtime stats: 0 B RSS, 744 goroutines (stacks: 14 MiB), 202 MiB/411 MiB Go alloc/total(stale) (heap fragmentation: 62 MiB, heap reserved: 72 MiB, heap released: 150 MiB), 0 B/0 B CGO alloc/total (0.0 CGO/sec), 0.0/0.0 %(u/s)time, 0.0 %gc (36824x), 0 B/0 B (r/w)net
I231016 18:14:01.393108 520 2@util/log/event_log.go:32 ⋮ [n5] 58484 ={"Timestamp":1697480041393107130,"EventType":"runtime_stats","GoroutineCount":744,"MemStackSysBytes":14909440,"GoAllocBytes":236410592,"GoTotalBytes":431392888,"GoStatsStaleness":0.000278154,"HeapFragmentBytes":58607904,"HeapReservedBytes":56999936,"HeapReleasedBytes":157360128,"GCRunCount":36824}
I231016 18:14:01.393205 520 2@server/status/runtime_log.go:47 ⋮ [n5] 58485  runtime stats: 0 B RSS, 744 goroutines (stacks: 14 MiB), 226 MiB/411 MiB Go alloc/total(stale) (heap fragmentation: 56 MiB, heap reserved: 54 MiB, heap released: 150 MiB), 0 B/0 B CGO alloc/total (0.0 CGO/sec), 0.0/0.0 %(u/s)time, 0.0 %gc (36824x), 0 B/0 B (r/w)net

sled 17

BRM42220017 # zlogin oxz_cockroachdb_3237a532-acaa-4ebe-bf11-dde794fea739
root@oxz_cockroachdb_3237a532-acaa-4ebe-bf11-dde794fea739# tail /data/logs/cockroach-health.log
I231016 18:14:45.901969 882 2@util/log/event_log.go:32 ⋮ [n1] 57137 ={"Timestamp":1697480085901967994,"EventType":"runtime_stats","GoroutineCount":762,"MemStackSysBytes":14876672,"GoAllocBytes":322385112,"GoTotalBytes":766459128,"GoStatsStaleness":0.000206774,"HeapFragmentBytes":132008744,"HeapReservedBytes":189292544,"HeapReleasedBytes":402595840,"CGoCallRate":6.0996985,"GCPausePercent":0.017132644,"GCRunCount":18056}
I231016 18:14:45.902049 882 2@server/status/runtime_log.go:47 ⋮ [n1] 57138  runtime stats: 0 B RSS, 762 goroutines (stacks: 14 MiB), 308 MiB/731 MiB Go alloc/total(stale) (heap fragmentation: 126 MiB, heap reserved: 180 MiB, heap released: 384 MiB), 0 B/0 B CGO alloc/total (6.1 CGO/sec), 0.0/0.0 %(u/s)time, 0.0 %gc (18056x), 0 B/0 B (r/w)net
I231016 18:14:55.903231 882 2@util/log/event_log.go:32 ⋮ [n1] 57139 ={"Timestamp":1697480095903230024,"EventType":"runtime_stats","GoroutineCount":761,"MemStackSysBytes":15859712,"GoAllocBytes":380480432,"GoTotalBytes":766459128,"GoStatsStaleness":0.000177769,"HeapFragmentBytes":97170512,"HeapReservedBytes":165052416,"HeapReleasedBytes":402595840,"CGoCallRate":0.39994976,"GCRunCount":18056}
I231016 18:14:55.903310 882 2@server/status/runtime_log.go:47 ⋮ [n1] 57140  runtime stats: 0 B RSS, 761 goroutines (stacks: 15 MiB), 363 MiB/731 MiB Go alloc/total(stale) (heap fragmentation: 93 MiB, heap reserved: 157 MiB, heap released: 384 MiB), 0 B/0 B CGO alloc/total (0.4 CGO/sec), 0.0/0.0 %(u/s)time, 0.0 %gc (18056x), 0 B/0 B (r/w)net
I231016 18:15:05.902890 882 2@util/log/event_log.go:32 ⋮ [n1] 57141 ={"Timestamp":1697480105902888844,"EventType":"runtime_stats","GoroutineCount":761,"MemStackSysBytes":16089088,"GoAllocBytes":420336048,"GoTotalBytes":766459128,"GoStatsStaleness":0.000175825,"HeapFragmentBytes":78704208,"HeapReservedBytes":143433728,"HeapReleasedBytes":402595840,"CGoCallRate":2.500084,"GCRunCount":18056}
I231016 18:15:05.902972 882 2@server/status/runtime_log.go:47 ⋮ [n1] 57142  runtime stats: 0 B RSS, 761 goroutines (stacks: 15 MiB), 401 MiB/731 MiB Go alloc/total(stale) (heap fragmentation: 75 MiB, heap reserved: 137 MiB, heap released: 384 MiB), 0 B/0 B CGO alloc/total (2.5 CGO/sec), 0.0/0.0 %(u/s)time, 0.0 %gc (18056x), 0 B/0 B (r/w)net
I231016 18:15:15.903443 882 2@util/log/event_log.go:32 ⋮ [n1] 57143 ={"Timestamp":1697480115903441723,"EventType":"runtime_stats","GoroutineCount":757,"MemStackSysBytes":14680064,"GoAllocBytes":324097872,"GoTotalBytes":759323896,"GoStatsStaleness":0.000232113,"HeapFragmentBytes":134940848,"HeapReservedBytes":177709056,"HeapReleasedBytes":409731072,"CGoCallRate":0.5999668,"GCPausePercent":0.0017595526,"GCRunCount":18057}
I231016 18:15:15.903520 882 2@server/status/runtime_log.go:47 ⋮ [n1] 57144  runtime stats: 0 B RSS, 757 goroutines (stacks: 14 MiB), 309 MiB/724 MiB Go alloc/total(stale) (heap fragmentation: 129 MiB, heap reserved: 170 MiB, heap released: 391 MiB), 0 B/0 B CGO alloc/total (0.6 CGO/sec), 0.0/0.0 %(u/s)time, 0.0 %gc (18057x), 0 B/0 B (r/w)net
I231016 18:15:25.904138 882 2@util/log/event_log.go:32 ⋮ [n1] 57145 ={"Timestamp":1697480125904136722,"EventType":"runtime_stats","GoroutineCount":758,"MemStackSysBytes":16449536,"GoAllocBytes":365738888,"GoTotalBytes":757325048,"GoStatsStaleness":0.000175615,"HeapFragmentBytes":112288888,"HeapReservedBytes":154951680,"HeapReleasedBytes":411729920,"CGoCallRate":0.7999444,"GCRunCount":18057}
I231016 18:15:25.904222 882 2@server/status/runtime_log.go:47 ⋮ [n1] 57146  runtime stats: 0 B RSS, 758 goroutines (stacks: 16 MiB), 349 MiB/722 MiB Go alloc/total(stale) (heap fragmentation: 107 MiB, heap reserved: 148 MiB, heap released: 393 MiB), 0 B/0 B CGO alloc/total (0.8 CGO/sec), 0.0/0.0 %(u/s)time, 0.0 %gc (18057x), 0 B/0 B (r/w)net

@bnaecker
Copy link
Collaborator

bnaecker commented Oct 16, 2023

If you are able to generate an API request which triggers the error, then it should be possible to use DTrace to extract the SQL queries which result from that. The trace-db-queries D script prints all the SQL queries, including their latency. I'm not sure that will work for you out of the box, because the query fails. But if it does not work, you can use this line to copy in and then print the raw SQL itself.

There will be many SQL queries from that one API call, but the last one will probably be the relevant one, though I'm not positive about that. I can also help narrowing down the queries further if that's useful.

@davepacheco
Copy link
Collaborator

I have not traced the running system, but based on the error message, it seems we were in svc_create_vpc() (the "vpc" node in the vpc_create saga), which calls DataStore::project_create_vpc(), etc. It looks pretty straightforward except for the NextItem part of the query. It'd be worth confirming this was the problematic query but it seems promising. Whatever the current query was: it was apparently trying to use at least another 50 MB of our 134 MB budget, which seems like a lot for inserting a VPC.

@bnaecker
Copy link
Collaborator

The NextItem query is definitely a good guess. I made this issue when I wrote that query, which points out the possible memory limitations of the query. Assuming this is the issue, we could restrict the search range and add a retry loop at the query as a workaround. That should be relatively quick to implement.

An alternative would be to restructure the query, although it doesn't look like the underlying CRDB behavior of storing an entire subquery in memory has been improved. Another alternative is to implement the "freelist" style query that @davepacheco and I have talked about for a long time. It's not clear how that would work exactly, but the gist is: store free ranges in some table; select the next item from one of those ranges, reducing its size by 1; use the now-allocated value in the desired insertion query.

@askfongjojo askfongjojo added this to the 3 milestone Oct 17, 2023
@askfongjojo askfongjojo self-assigned this Oct 17, 2023
@askfongjojo
Copy link
Author

FWIW, here are the firewall rules I had at the time which do involve exploding an IP address range:
fwrules

The rules were still there after dogfood update but I haven't run into the error since. It's possible that CRDB will eventually get into the situation when the memory available is insufficient for the VPC creation again.

@bnaecker
Copy link
Collaborator

I'm tracing the connection probes we have inside Nexus, and trying to create a project. I get what looks like the same 500 error described here, and the probes show this:

 18  24214 _ZN91_$LT$diesel_dtrace..DTraceConnection$LT$C$GT$$u20$as$u20$diesel..connection..Connection$GT$23execute_returning_count17h482:query-start WITH found_row AS MATERIALIZED (SELECT "project"."id", "project"."name", "project"."description", "project"."time_created", "project"."time_modified", "project"."time_deleted", "project"."rcgen", "project"."silo_id" FROM "project" WHERE (("project"."id" = $1) AND ("project"."time_deleted" IS NULL)) FOR UPDATE), dummy AS MATERIALIZED (SELECT IF(EXISTS(SELECT "id" FROM found_row), TRUE, CAST(1/0 AS BOOL))), updated_row AS MATERIALIZED (UPDATE "project" SET "rcgen" = "rcgen" + 1 WHERE "id" IN (SELECT "id" FROM found_row) RETURNING 1), inserted_row AS (INSERT INTO "vpc" ("id", "name", "description", "time_created", "time_modified", "time_deleted", "project_id", "system_router_id", "dns_name", "vni", "ipv6_prefix", "firewall_gen", "subnet_gen")SELECT $2 AS "id", $3 AS "name", $4 AS "description", $5 AS "time_created", $6 AS "time_modified", $7 AS "time_deleted", $8 AS "project_id", $9 AS "system_router_id", $10 AS "dns_name", (SELECT $11 + "shift" AS "vni" FROM (SELECT generate_series(0, $12) AS "shift" UNION ALL SELECT generate_series($13, -1) AS "shift") LEFT OUTER JOIN "vpc" ON ("vni", "time_deleted" IS NULL) = ($14 + "shift", TRUE)  WHERE "vni" IS NULL LIMIT 1) AS "vni", $15 AS "ipv6_prefix", $16 AS "firewall_gen", $17 AS "subnet_gen" RETURNING "vpc"."id", "vpc"."name", "vpc"."description", "vpc"."time_created", "vpc"."time_modified", "vpc"."time_deleted", "vpc"."project_id", "vpc"."system_router_id", "vpc"."vni", "vpc"."ipv6_prefix", "vpc"."dns_name", "vpc"."firewall_gen", "vpc"."subnet_gen") SELECT * FROM inserted_row -- binds: [5d9b52f1-52ef-4a95-a659-e27c1718e87f, 4b3c8260-04d6-4fb9-9c62-56f303675215, Name(Name("default")), "Default VPC", 2023-10-17T23:17:43.158576625Z, 2023-10-17T23:17:43.158576625Z, None, 5d9b52f1-52ef-4a95-a659-e27c1718e87f, 32436cfc-3723-4208-ab68-59d13a5b34d4, Name(Name("default")), Vni(Vni(3321368)), 13455847, -3320344, Vni(Vni(3321368)), V6(Ipv6Network { addr: fd53:6f90:e478::, prefix: 48 }), Generation(Generation(1)), Generation(Generation(1))]

 18  24214 _ZN91_$LT$diesel_dtrace..DTraceConnection$LT$C$GT$$u20$as$u20$diesel..connection..Connection$GT$23execute_returning_count17h482:query-start INSERT INTO "saga_node_event" ("saga_id", "node_id", "event_type", "data", "event_time", "creator") VALUES ($1, $2, $3, $4, $5, $6) -- binds: [SagaId(988d7b54-d5ff-4929-90c6-6b7ba6736063), SagaNodeId(7), "failed", Object {"ActionFailed": Object {"source_error": Object {"InternalError": Object {"internal_message": String("database error (kind = Unknown): root: memory budget exceeded: 8396856 bytes requested, 127586776 currently allocated, 134217728 bytes in budget\nHINT: Consider increasing --max-sql-memory startup parameter.\n")}}}}, 2023-10-17T23:17:46.770951380Z, 65a11c18-7f59-41ac-b9e7-680627f996e7]

So that does appear that the NextItem query to generate a VNI is failing. Digging more now.

@bnaecker
Copy link
Collaborator

Running the generate_series subquery manually with the same parameters, we see this:

root@[fd00:1122:3344:105::3]:32221/omicron> select 3321368 + shift as vni from (select generate_series(0, 13455847) as shift union all select generate_series(-3320344, -1) as shift);
warning: buffering more than 10000 result rows in client - RAM usage growing, consider another formatter instead

That takes many seconds to complete, and then spews to the console. What we're really doing here is generating a sequence of all possible VNIs and then doing a LEFT OUTER JOIN the extant VNIs in the vpc table. We then filter the join down to those without a match in the vpc table, i.e., a free VNI. We can do that manually, limiting the query to only 100 possible VNIs to try:

root@[fd00:1122:3344:105::3]:32221/omicron> select 3321368 + shift as vni from (select generate_series(0, 100) as shift) left outer join vpc on (vni, time_deleted is NULL) = (3321368 + shift, TRUE) where vni IS NULL LIMIT 1;
    vni
-----------
  3321368
(1 row)


Time: 6ms total (execution 6ms / network 0ms)

So the first VNI we chose (randomly by the application) is actually available. I think we should probably implement the quick fix of trying only a small range, with a few application-level retries.

@bnaecker
Copy link
Collaborator

bnaecker commented Oct 17, 2023

For some more context, here are the VNIs we've currently allocated. There ain't much there, so the query that just picked the random VNI directly would have succeeded fine:

root@[fd00:1122:3344:105::3]:32221/omicron> select vni from vpc where time_deleted is null;
    vni
------------
       100
   1203680
   1298904
   1409142
   1508093
   1732935
   2848183
   2960565
   3355675
   5692613
  11250956
  12142779
  12368039
  12673371
  13435426
  13834134
  14230741
  14253103
  14363509
  14928278
  15267980
  16542806
(22 rows)


Time: 3ms total (execution 2ms / network 0ms)

VNIs are 24-bit numbers, so we've allocated basically none of the available range.

For completeness though, let's consider retrying at the application level, using a NextItem query that only searches from the starting (random) VNI through the next 100 elements. What are the chances of a retry? We need to retry if all items in that range are allocated. The probability of any one VNI being taken is 1 / (2 ** 24). These are a bunch of independent Bernoulli trials, so the distribution of the full 100 trials follows a binomial distribution. The probability of at least one success (a free item) in a list of 100 in this case is: 0.9999999999824142. So, uhh, I think retrying is fine. We'll almost certainly get an item on the first try anyway, until we start adding a lot more VPCs. We need to allocate more than 25k VNIs beofre we have more than a 1% chance of needing to retry at all.

bnaecker added a commit that referenced this issue Oct 18, 2023
- Fixes #4283.
- Adds a relatively small limit to the `NextItem` query used for
  finding a free VNI during VPC creation. This limits the memory
  consumption to something very reasonable, but is big enough that we
  should be extremely unlikely to find _no_ available VNIs in the range.
- Add an application-level retry loop when inserting _customer_ VPCs,
  which catches the unlikely event that there really are no VNIs
  available, and retries a few times.
- Adds tests for the computation of the limited search range.
- Adds tests for the actual exhaustion-detection and retry behavior.
bnaecker added a commit that referenced this issue Oct 18, 2023
- Fixes #4283.
- Adds a relatively small limit to the `NextItem` query used for
  finding a free VNI during VPC creation. This limits the memory
  consumption to something very reasonable, but is big enough that we
  should be extremely unlikely to find _no_ available VNIs in the range.
- Add an application-level retry loop when inserting _customer_ VPCs,
  which catches the unlikely event that there really are no VNIs
  available, and retries a few times.
- Adds tests for the computation of the limited search range.
- Adds tests for the actual exhaustion-detection and retry behavior.
bnaecker added a commit that referenced this issue Oct 20, 2023
- Fixes #4283.
- Adds a relatively small limit to the `NextItem` query used for
  finding a free VNI during VPC creation. This limits the memory
  consumption to something very reasonable, but is big enough that we
  should be extremely unlikely to find _no_ available VNIs in the range.
- Add an application-level retry loop when inserting _customer_ VPCs,
  which catches the unlikely event that there really are no VNIs
  available, and retries a few times.
- Adds tests for the computation of the limited search range.
- Adds tests for the actual exhaustion-detection and retry behavior.
bnaecker added a commit that referenced this issue Oct 21, 2023
- Fixes #4283.
- Adds a relatively small limit to the `NextItem` query used for
  finding a free VNI during VPC creation. This limits the memory
  consumption to something very reasonable, but is big enough that we
  should be extremely unlikely to find _no_ available VNIs in the range.
- Add an application-level retry loop when inserting _customer_ VPCs,
  which catches the unlikely event that there really are no VNIs
  available, and retries a few times.
- Adds tests for the computation of the limited search range.
- Adds tests for the actual exhaustion-detection and retry behavior.

Review feedback

- Add const-assert that VNI search range is valid
- Rename `Vpc::with_random_vni()`

Search entire VNI range in chunks

- Remove `IncompleteVpc::with_random_vni()`
- Add iterator to search ranges of VNIs sequentially, yielding all
  range starts in the valid guest VNI range from a random starting VNI.
- Instead of a limited retry loop when creating VPCs, search until the
  iterator is exhausted and we've searched the whole range.

Throw in some DTrace probes

fmt
bnaecker added a commit that referenced this issue Oct 21, 2023
- Fixes #4283.
- Adds a relatively small limit to the `NextItem` query used for finding
a free VNI during VPC creation. This limits the memory consumption to
something very reasonable, but is big enough that we should be extremely
unlikely to find _no_ available VNIs in the range.
- Add an application-level retry loop when inserting _customer_ VPCs,
which catches the unlikely event that there really are no VNIs
available, and retries a few times.
- Adds tests for the computation of the limited search range.
- Adds tests for the actual exhaustion-detection and retry behavior.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants