Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MRG, ENH: Fix memory on CircleCI #8379

Merged
merged 26 commits into from
Oct 22, 2020
Merged

MRG, ENH: Fix memory on CircleCI #8379

merged 26 commits into from
Oct 22, 2020

Conversation

larsoner
Copy link
Member

@larsoner larsoner commented Oct 16, 2020

Closes #8375

@larsoner
Copy link
Member Author

CircleCI now gives memory outputs which for the mixed source space example is:

I'll try to get this down somehow

@larsoner larsoner changed the title WIP: Fix memory on CircleCI MRG, ENH: Fix memory on CircleCI Oct 16, 2020
@larsoner
Copy link
Member Author

Looks like the culprit was our volume source space rr/nn/interpolator redundancy (and interpolator even when add_interpolator=False!). I wrote some code that makes it not copy these and not create them (the "empty" ones took 70 MB apiece, which for 9 regions leads to > 600 MB used to store "nothing"), looks much better now:

Ready for review/merge from my end.

@larsoner
Copy link
Member Author

Okay we seem to have some ever-increasing memory problem:

Memory should of course increase as examples run and Sphinx / Sphinx-gallery accumulate objects, by not by so much. This makes me think that plot_source_space_morphing.py is just a symptom and not the underlying problem.

@larsoner
Copy link
Member Author

I don't expect this [circle full] to succeed, but we should still merge this. It should help us see the memory increases / failures in master better. I also added a fix for latest matplotlib.

@GuillaumeFavelier can you take a look and approve if you're happy? Then I'll merge once the latest-matplotlib fix is merged upstream (nilearn).

@GuillaumeFavelier
Copy link
Contributor

I'll review right now

Copy link
Contributor

@GuillaumeFavelier GuillaumeFavelier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 for tools/circleci_download.sh

Just one question but except from that LGTM

doc/Makefile Show resolved Hide resolved
@larsoner
Copy link
Member Author

Okay looks like #8352 was probably the culprit as it added a reference cycle due to bound methods. Locally before 8abd6ff getting partway through a make html_dev-memory I got:

Screenshot from 2020-10-20 12-17-05

Now after 20 minutes of running I get a baseline around ~2GB instead of the ~5GB above:

Screenshot from 2020-10-20 12-48-31

I've added a brain_gc fixture that yells at us when we don't end up with properly GC'ed brains -- for example if brain.close isn't called we see a list of the type and repr(...)[:100] of the objects that are keeping the n brain instances alive:

_________________________________________________________ ERROR at teardown of test_plot_sparse_source_estimates[pyvista] __________________________________________________________
mne/conftest.py:490: in brain_gc
    assert n == 0, f'{n} after:\n{new.join(ref)}'
E   AssertionError: 1 after:
E     list: [[], {'__repr__': <slot wrapper '__repr__' of 'object' objects>, '__hash__': <slot wrapper '__hash__
E     frame: <frame at 0x7f5db684aa40, file '/home/larsoner/python/mne-python/mne/conftest.py', line 489, code br
E     method: <bound method Brain._on_mouse_move of <mne.viz._brain._brain.Brain object at 0x7f5da999af70>>
E     method: <bound method Brain._on_button_press of <mne.viz._brain._brain.Brain object at 0x7f5da999af70>>
E     method: <bound method Brain._on_button_release of <mne.viz._brain._brain.Brain object at 0x7f5da999af70>>
E     method: <bound method Brain._on_pick of <mne.viz._brain._brain.Brain object at 0x7f5da999af70>>
E     dict: {'fig': <Figure size 800x267 with 1 Axes>, 'canvas': <matplotlib.backends.backend_qt5agg.FigureCanva
E     dict: {'plotter': <pyvistaqt.plotting.BackgroundPlotter object at 0x7f5dadba1d30>, 'brain': <mne.viz._brai
E     dict: {'plotter': <pyvistaqt.plotting.BackgroundPlotter object at 0x7f5dadba1d30>, 'brain': <mne.viz._brai
E     dict: {'plotter': <pyvistaqt.plotting.BackgroundPlotter object at 0x7f5dadba1d30>, 'brain': <mne.viz._brai
E     dict: {'plotter': <pyvistaqt.plotting.BackgroundPlotter object at 0x7f5dadba1d30>, 'brain': <mne.viz._brai
E     dict: {'plotter': <pyvistaqt.plotting.BackgroundPlotter object at 0x7f5dadba1d30>, 'brain': <mne.viz._brai
E     dict: {'plotter': <pyvistaqt.plotting.BackgroundPlotter object at 0x7f5dadba1d30>, 'brain': <mne.viz._brai
E     method: <bound method Brain.set_data_smoothing of <mne.viz._brain._brain.Brain object at 0x7f5da999af70>>
E     method: <bound method Brain.plot_time_line of <mne.viz._brain._brain.Brain object at 0x7f5da999af70>>
E     method: <bound method Brain.set_playback_speed of <mne.viz._brain._brain.Brain object at 0x7f5da999af70>>
E     cell: <cell at 0x7f5da0dd0fa0: Brain object at 0x7f5da999af70>
E     cell: <cell at 0x7f5da10e10a0: Brain object at 0x7f5da999af70>
E     cell: <cell at 0x7f5da10e1220: Brain object at 0x7f5da999af70>
E   assert 1 == 0

@GuillaumeFavelier can you take another look? It looks like what really needed to happen was cleaning up the bound methods of Brain that were attached to the iren.

If the circle full passes, I'm going to add this GC reference check for Brain so that hopefully these get flagged in the future. I tried it already to see if we could fix "plot_montage.py", but it turns out that there are no references to pyvista.Plotter at all at the end of the example, indicating that it really is a problem at the VTK end with some objects not being cleaned up properly somehow.

@GuillaumeFavelier
Copy link
Contributor

Very good catch on 8abd6ff @larsoner !

@larsoner
Copy link
Member Author

Getting closer:

Need to figure out why there is a ~600 MB loss around 800 sec next

@GuillaumeFavelier
Copy link
Contributor

How can I make sure that close() is called even in this situation?

Nevermind, all of this can be closed afterwards with renderer.backend._close_all().

@larsoner
Copy link
Member Author

larsoner commented Oct 22, 2020

13ce561 passed with [circle full]!

memory

Clearly there are some examples that could be improved, but at least it got us back to green.

I pushed an empty commit with another [circle full] to see if it at least builds again, maybe even with lower memory usage after your commits @GuillaumeFavelier. Okay to merge from your end @GuillaumeFavelier ? If so I'll look at your latest commits and merge once everything comes back green.

@GuillaumeFavelier
Copy link
Contributor

Travis is far from happy, I am working on plot_source_estimates to improve the situation and there is the failure with notebook azure job. I would not say it's ready for merge yet.

@GuillaumeFavelier
Copy link
Contributor

Worst case scenario, we can remove brain_gc from test_plot_source_estimates and test_plot_sparse_source_estimates and do that later?

@larsoner
Copy link
Member Author

Worst case scenario, we can remove brain_gc from test_plot_source_estimates and test_plot_sparse_source_estimates and do that later?

Let's do that for now with a TODO or XXX comment. Feel free to push a commit with [circle full]. It would be good to get this in so that master CircleCI builds work again

@GuillaumeFavelier
Copy link
Contributor

That's it for the TODO, should I push now or wait that Circle is done?

@larsoner
Copy link
Member Author

Feel free to push now, don't bother with [circle full] because I have something to push as well

Comment on lines -149 to +150
resmat = np.absolute(resmat) # only use absolute values
# we want to use absolute values, but doing abs() mases a copy and this
# can be quite expensive in memory. So let's just use abs() in place below.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI @olafhauk these matrices can be quite large (~8000x8000), so instead of doing this operation here, I've moved it to be done column-wise. It shouldn't be an appreciable performance difference but it should lower memory usage by ~800 MB in the plot_resolution_metrics.py example.

@larsoner
Copy link
Member Author

@GuillaumeFavelier I worry that the FakeIren stuff is fragile, locally I get this:

Traceback (most recent call last):
  File "/home/larsoner/.local/lib/python3.8/site-packages/vtkmodules/qt/QVTKRenderWindowInteractor.py", line 463, in enterEvent
    self._Iren.EnterEvent()
AttributeError: 'FakeIren' object has no attribute 'EnterEvent'

I think we should prioritize fixing PyVista, getting a PR merged there, and then hopefully soon-ish get rid of our workarounds here.

@GuillaumeFavelier
Copy link
Contributor

I worry that the FakeIren stuff is fragile

This is homemade hack basically 😓

we should prioritize fixing PyVista

Agreed 👍

@GuillaumeFavelier
Copy link
Contributor

I can reproduce the failures of test_interactor_style locally:

E       AssertionError: Not all objects GCed:
E         vtkLightKit
E       assert 1 == 0
E        +  where 1 = len([(vtkmodules.vtkRenderingCore.vtkLightKit)0x7fa68995b6a0])

@larsoner
Copy link
Member Author

This sounds like a comment for pyvista/pyvista#958 :)

@GuillaumeFavelier
Copy link
Contributor

Whoops

@GuillaumeFavelier
Copy link
Contributor

@larsoner
Copy link
Member Author

Yes probably in the change from focalpoint=None to focalpoint=(0., 0., 0.) default that happened at some point. Let's fix that in another PR and get this in so master is green!

@larsoner larsoner merged commit 0629871 into mne-tools:master Oct 22, 2020
@larsoner larsoner deleted the mem branch October 22, 2020 14:59
@larsoner
Copy link
Member Author

@GuillaumeFavelier maybe we should add focalpoint='auto' which means "use the center of the bounding box"? There are some other examples where this could be used instead of custom focalpoint=(...) calls (git grep focalpoint should expose them)

@GuillaumeFavelier
Copy link
Contributor

I think that's the default when focalpoint=None

@GuillaumeFavelier
Copy link
Contributor

if focalpoint is not None:
focalpoint = np.asarray(focalpoint)
else:
focalpoint = (bounds[1::2] + bounds[::2]) * 0.5

@larsoner
Copy link
Member Author

Okay then let's just update the broken example and the ones that manually set focalpoint=(...) to use focalpoint=None instead (assuming it works as well for those)

larsoner added a commit to adam2392/mne-python that referenced this pull request Oct 23, 2020
* upstream/master:
  Fix separate canvas (mne-tools#8408)
  FIX: focalpoint (mne-tools#8405)
  WIP: Refs (mne-tools#8406)
  tiny cosmetic improvements to BEM code (mne-tools#8404)
  MRG, ENH: Fix memory on CircleCI (mne-tools#8379)
  MRG: Update backend parameter in stc.plot() (mne-tools#8395)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: CircleCI crashes on full build
2 participants