Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory leak #107

Closed
amorskoy opened this issue Dec 21, 2016 · 21 comments
Closed

Memory leak #107

amorskoy opened this issue Dec 21, 2016 · 21 comments

Comments

@amorskoy
Copy link

amorskoy commented Dec 21, 2016

I have ~ 23K objects of multipolygon type, representing California boundies.
Enter hook assigned for each poligon --> ~ 23K web hooks

When I am starting point insertion with SET commands (2 points per second) I see that memory, consumed by tile38 server grows from initial 200MB to 700MB (and it will grow if I continue ingestion).
The bad thing is that after I stop ingestion - memory is not freed.

When I am restarting Tile38 - AOF is reloaded and initial memory is again 200MB

p.s. working with master branch

@tidwall
Copy link
Owner

tidwall commented Dec 21, 2016

Could you send over the response to a SERVER command? Also try to send the server a GC command. It's possible that Tile38 is hanging on to the memory a little longer than it needs to. Usually the memory is freed periodically or when the OS requests for it.

@amorskoy
Copy link
Author

@tidwall GC did not change the memory acquired (neither res, not virt according to top).
Tomorrow I will obtain SERVER output at my work laptop

@amorskoy
Copy link
Author

amorskoy commented Dec 22, 2016

Here is it. Currently after ingestion initial consumtion increased from 200MB to 577 MB. Then I have stopped ingestion (otherwise it will grow more, up to 10G RAM as I saw in experiment yesterday):

127.0.0.1:9851> server {"ok":true,"stats":{"aof_size":21310261,"avg_item_size":713,"heap_size":379549576,"id":"3ac8137746a82d6c77259bc0e307a05c","in_memory_size":13227334,"max_heap_size":0,"mem_alloc":379549576,"num_collections":1,"num_hooks":23198,"num_objects":23198,"num_points":532270,"num_strings":0,"pointer_size":8,"read_only":false},"elapsed":"1.855124ms"}

root@debian:/tmp/s# ps aux | grep tile38-se adminis+ 13598 1.1 3.5 1031420 576680 pts/7 Sl+ Dec21 12:19 ./tile38-server -vv

@tidwall
Copy link
Owner

tidwall commented Dec 22, 2016

@amorskoy This information is very useful. I'm going to try to reproduce the problem on my side and I have a few questions.

When I am starting point insertion with SET commands (2 points per second) I see that memory, consumed by tile38 server grows from initial 200MB to 700MB (and it will grow if I continue ingestion).

From the SERVER data you provided, I see the 23,198 objects -> 23,198 hooks, but what about the point objects that you are creating 2x per sec with the SET command? Are you sending a DEL command to remove them?

Roughly how much time has elapsed before the Tile38 instance has grown to 10GB? Is the point insertion a steady 2x per sec during that duration? About how many Enter hook notifications fire during that time?

Thanks for you help with this.

@amorskoy
Copy link
Author

@tidwall no, it is not DEL - I am using SET.... EX 5 instead

Grow from 200 MB to 10GB ~ in 5 mins.
Seems that absoltely normal behavior of queue - nothink stucks. The only thing is mem consumption

@tidwall
Copy link
Owner

tidwall commented Dec 22, 2016

I'm still investigating, but I haven't been able to reproduce the leak yet. I've tried thousands of hooks and hundred of thousands of points (using SET EX), yet the reported stat.heap_memory always seemed to return to a sane value.

I may be wrong but perhaps the memory isn't leaking. Rather Tile38 is notifying the OS to reclaim unused memory and Linux is like "thanks. I don't need it right now, but I'll grab it later". Then it resides with the Tile38 process until the OS reclaims it.

Here's more info on the topic:
golang/go#14521
https://groups.google.com/forum/#!topic/golang-nuts/vfmd6zaRQVs
http://stackoverflow.com/questions/24376817
http://stackoverflow.com/questions/27497512

I'm wondering if this is happening in your case.

I added the new stats.heap_released field to the SERVER command. This will tell how much memory has been released to the OS. I also added the more aggressive debug.FreeOSMemory() to the GCcommand. I'm hoping that these will give us a little more insight.

If you don't mind trying out the memoptz branch, I would very much appreciate it.

Please let me know you thoughts on this and I'll keep investigating.

Thanks!

@amorskoy
Copy link
Author

@tidwall Thanks - sure I will. I hope to find a free time slot for this as soon as possible.

@amorskoy
Copy link
Author

@tidwall just finished experiment with memoptz branch

So:

  • initial Tile38 mem is 200 MB (~23000 Multipoligons, 1 Hook per Multypoligon)
  • Ingestion during 2 minutes: s"SET people $hName EX 10 POINT ${step.y} ${step.x}"
  • ingestion rate ~ 6 messages/second
  • total json objects generated by hook - 426
  • memory rised up to 4.5G Res (4.8G Virt) according to top
  • server command output:

127.0.0.1:9851> server {"ok":true,"stats":{"aof_size":20346998,"avg_item_size":3357,"heap_released":0,"heap_size":1787047424,"id":"281b8d0f24a5a1a3f73b8a31acdefbcd","in_memory_size":13227334,"max_heap_size":0,"mem_alloc":1787047424,"num_collections":1,"num_hooks":23198,"num_objects":23198,"num_points":532270,"num_strings":0,"pid":15301,"pointer_size":8,"read_only":false},"elapsed":"11.337898ms"}

@amorskoy
Copy link
Author

P.s. if I kill and start tile38 (so AOF is loaded) - mem is at initial volume 200MB again

@tidwall
Copy link
Owner

tidwall commented Dec 23, 2016

Good info. Could you also execute the GC command after full ingestion? Then send over the SERVER response before and after GC?
I curious about the heap_size and heap_released values specifically. This will help inform me if the Go runtime is requesting the OS to reclaim the memory of if there's a leak.

@amorskoy
Copy link
Author

Ok - I will try to make it asap

@amorskoy
Copy link
Author

Btw, another interesting thing: before the ingestion, I just launched SERVER command, with the 1 minute delay between commands:

127.0.0.1:9851> server {"ok":true,"stats":{"aof_size":20346998,"avg_item_size":175,"heap_released":0,"heap_size":93272256,"id":"281b8d0f24a5a1a3f73b8a31acdefbcd","in_memory_size":13227334,"max_heap_size":0,"mem_alloc":93272256,"num_collections":1,"num_hooks":23198,"num_objects":23198,"num_points":532270,"num_strings":0,"pid":6300,"pointer_size":8,"read_only":false},"elapsed":"587.509µs"} 127.0.0.1:9851> server {"ok":true,"stats":{"aof_size":20346998,"avg_item_size":175,"heap_released":0,"heap_size":93315008,"id":"281b8d0f24a5a1a3f73b8a31acdefbcd","in_memory_size":13227334,"max_heap_size":0,"mem_alloc":93315008,"num_collections":1,"num_hooks":23198,"num_objects":23198,"num_points":532270,"num_strings":0,"pid":6300,"pointer_size":8,"read_only":false},"elapsed":"473.415µs"} 127.0.0.1:9851> server {"ok":true,"stats":{"aof_size":20346998,"avg_item_size":175,"heap_released":0,"heap_size":93350872,"id":"281b8d0f24a5a1a3f73b8a31acdefbcd","in_memory_size":13227334,"max_heap_size":0,"mem_alloc":93350872,"num_collections":1,"num_hooks":23198,"num_objects":23198,"num_points":532270,"num_strings":0,"pid":6300,"pointer_size":8,"read_only":false},"elapsed":"1.392265ms"}

You can see that heap incresed - but nothing was sent to Tile38 server in fact

@amorskoy
Copy link
Author

And after 3 minutes i see that heap is released - so probably it is ok:
127.0.0.1:9851> server {"ok":true,"stats":{"aof_size":20346998,"avg_item_size":175,"heap_released":38576128,"heap_size":93266504,"id":"281b8d0f24a5a1a3f73b8a31acdefbcd","in_memory_size":13227334,"max_heap_size":0,"mem_alloc":93266504,"num_collections":1,"num_hooks":23198,"num_objects":23198,"num_points":532270,"num_strings":0,"pid":6300,"pointer_size":8,"read_only":false},"elapsed":"439.991µs"}

@amorskoy
Copy link
Author

Ok, so repeating the experiment

  1. Before ingestion:
    127.0.0.1:9851> server {"ok":true,"stats":{"aof_size":20346998,"avg_item_size":175,"heap_released":38608896,"heap_size":93337736,"id":"281b8d0f24a5a1a3f73b8a31acdefbcd","in_memory_size":13227334,"max_heap_size":0,"mem_alloc":93337736,"num_collections":1,"num_hooks":23198,"num_objects":23198,"num_points":532270,"num_strings":0,"pid":6300,"pointer_size":8,"read_only":false},"elapsed":"1.314077ms"}

  2. Starting ingestion, 3 minutes. After ingestion:
    127.0.0.1:9851> server {"ok":true,"stats":{"aof_size":20474936,"avg_item_size":9466,"heap_released":0,"heap_size":5039052712,"id":"281b8d0f24a5a1a3f73b8a31acdefbcd","in_memory_size":13227545,"max_heap_size":0,"mem_alloc":5039052712,"num_collections":2,"num_hooks":23198,"num_objects":23205,"num_points":532277,"num_strings":0,"pid":6300,"pointer_size":8,"read_only":false},"elapsed":"2.515753704s"}

  3. Then after GC
    127.0.0.1:9851> server {"ok":true,"stats":{"aof_size":20475259,"avg_item_size":7838,"heap_released":1817870336,"heap_size":4171938192,"id":"281b8d0f24a5a1a3f73b8a31acdefbcd","in_memory_size":13227334,"max_heap_size":0,"mem_alloc":4171938192,"num_collections":1,"num_hooks":23198,"num_objects":23198,"num_points":532270,"num_strings":0,"pid":6300,"pointer_size":8,"read_only":false},"elapsed":"13.790705ms"}

  4. ps aux tells that it is still 3.9 GB after GC

  5. After 5 minutes it is still 3.9GB

  6. GC again - nothing changed, same 3.9GB. SERVER is:
    127.0.0.1:9851> server {"ok":true,"stats":{"aof_size":20475259,"avg_item_size":7838,"heap_released":1817886720,"heap_size":4171937552,"id":"281b8d0f24a5a1a3f73b8a31acdefbcd","in_memory_size":13227334,"max_heap_size":0,"mem_alloc":4171937552,"num_collections":1,"num_hooks":23198,"num_objects":23198,"num_points":532270,"num_strings":0,"pid":6300,"pointer_size":8,"read_only":false},"elapsed":"7.987286ms"}

So we have 2 issues:
a. Incremental memory consumption during ingestion
b. Memory is not freed after ingestion

@tidwall
Copy link
Owner

tidwall commented Dec 27, 2016

Yes, there is a leak. :(

After (6), the heap_size of 3.88GB is too high. Most of the that should have moved over to heap_released.

If allowable, could you to share the SETHOOK and SET EX commands? Perhaps a gzipped AOF file should be enough. I want to set up the identical Linux environment and pump the same data through.

Hopefully I can resolve this in short order.

@amorskoy
Copy link
Author

@tidwall sure - during nearest workday I will try to figure out what could be extracted - and will share

@amorskoy
Copy link
Author

@tidwall here is zipped AOF. Please let me know if you need something else. Thanks
appendonly.aof.zip

tidwall added a commit that referenced this issue Dec 28, 2016
@tidwall
Copy link
Owner

tidwall commented Dec 28, 2016

I just pushed an update that I believe fixes the leak. There're also a few other optimizations thrown in. Please check it on your side and let me know if this works for you.

Also I created a utility for this specific issue located in the test/107 directory. It downloads the AOF file that you provided in this thread, and starts an embedded tile38 server and http-webhook endpoint. Then it pushes many thousands for point that are randomly chosen within BOUNDS boundies, such as:

SET people person:5103 EX 10 POINT 38.3854766013488 -121.07936355637995
SET people person:5104 EX 10 POINT 34.4005906275879 -123.3521164634834

It prints memory status info every second to a log file in the format:

2016-12-28T12:33:45-07:00      92.63 MB (heap)      56.52 MB (released)     221.27 MB (system)
2016-12-28T12:33:45-07:00      91.95 MB (heap)      56.48 MB (released)     216.67 MB (system)
2016-12-28T12:33:46-07:00      92.64 MB (heap)      56.48 MB (released)     226.71 MB (system)
2016-12-28T12:33:47-07:00     105.35 MB (heap)      51.12 MB (released)     227.79 MB (system)
2016-12-28T12:33:48-07:00      92.64 MB (heap)      56.45 MB (released)     219.17 MB (system)

heap value is what Tile38 actively has in the heap.
released is what Tile38 has released to the OS, but the OS has yet to reclaim.
system is the actual memory the OS is reporting from a ps aux call.

Every second the test application send the server a GC followed by a SERVER call. In my tests these values stay within a reasonable range with system never going above 300 MB.

In the previous commit b8a0f59 the log looked like:

2016-12-28T12:46:44-07:00      88.94 MB (heap)       8.39 MB (released)     187.26 MB (system)
2016-12-28T12:46:48-07:00     268.40 MB (heap)      24.13 MB (released)     410.94 MB (system)
2016-12-28T12:46:51-07:00     254.30 MB (heap)     144.84 MB (released)     458.75 MB (system)
2016-12-28T12:46:56-07:00     447.80 MB (heap)       0.00 MB (released)     655.41 MB (system)
2016-12-28T12:46:58-07:00     336.24 MB (heap)     319.91 MB (released)     591.32 MB (system)
2016-12-28T12:47:04-07:00     427.05 MB (heap)     267.33 MB (released)     661.24 MB (system)
2016-12-28T12:47:09-07:00     372.64 MB (heap)     128.87 MB (released)     714.84 MB (system)
2016-12-28T12:47:15-07:00     812.47 MB (heap)       0.00 MB (released)    1065.01 MB (system)
2016-12-28T12:47:21-07:00     545.35 MB (heap)     487.55 MB (released)     902.20 MB (system)
2016-12-28T12:47:27-07:00     640.15 MB (heap)     426.56 MB (released)    1037.03 MB (system)
2016-12-28T12:47:32-07:00     843.21 MB (heap)     261.32 MB (released)    1098.46 MB (system)
2016-12-28T12:47:37-07:00     640.62 MB (heap)     398.92 MB (released)    1013.51 MB (system)
2016-12-28T12:47:45-07:00     660.77 MB (heap)     383.43 MB (released)    1248.36 MB (system)
2016-12-28T12:47:53-07:00    1105.44 MB (heap)     266.32 MB (released)    1557.04 MB (system)
2016-12-28T12:47:59-07:00    1173.08 MB (heap)     311.15 MB (released)    1709.92 MB (system)
2016-12-28T12:48:11-07:00    1538.30 MB (heap)     367.74 MB (released)    1810.65 MB (system)
2016-12-28T12:48:20-07:00    1333.08 MB (heap)     539.38 MB (released)    1684.70 MB (system) 

Please take it for a spin and let me know if you have any questions.

Thanks!

@amorskoy
Copy link
Author

Great! Ok - nearest workday (it is GMT+2 + dst) I will start with this issue and let you know ASAP @tidwall

@amorskoy
Copy link
Author

amorskoy commented Dec 29, 2016

@tidwall Great results.
Initial memory is 220 MB.
Max memory consumption during ingestion (5 minute period) is 290MB.
After final GC - 223 MB

127.0.0.1:9851> server {"ok":true,"stats":{"aof_size":20787387,"avg_item_size":231,"heap_released":69656576,"heap_size":122970800,"id":"281b8d0f24a5a1a3f73b8a31acdefbcd","in_memory_size":13227334,"max_heap_size":0,"mem_alloc":122970800,"num_collections":1,"num_hooks":23198,"num_objects":23198,"num_points":532270,"num_strings":0,"pid":19745,"pointer_size":8,"read_only":false},"elapsed":"588.426µs"}

@tidwall please feel free to close the ticket - cool result!

@tidwall
Copy link
Owner

tidwall commented Dec 29, 2016

I just pushed these changes to the master branch. Let me know if you run into problems along the way. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants