-
Notifications
You must be signed in to change notification settings - Fork 276
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ramses] serious performance regression #2766
Comments
I looked over the code that may cause it, and I think the issue is in
and
I believe that inside Cython, these will always be expanded, and will result in many many python objects being created and interpreted. For instance,
This is pretty expensive. Unfortunately, the fix for it is to do something like this:
A similar thing will have to be done for the second operation. I'm happy to make these changes, but want to give @cphyc the opportunity first. |
oops, those two were my suggestions. Learning everyday ! Thanks for figuring this out you guys ! |
I had a look and this already looks good to me. We may gain an extra bit by using memory views instead of numpy arrays, see the patch below. diff --git a/yt/geometry/oct_container.pyx b/yt/geometry/oct_container.pyx
index db2e13490..9c6ab9867 100644
--- a/yt/geometry/oct_container.pyx
+++ b/yt/geometry/oct_container.pyx
@@ -720,24 +720,34 @@ cdef class OctreeContainer:
@cython.wraparound(False)
@cython.cdivision(True)
def fill_level(self, int level,
- np.ndarray[np.uint8_t, ndim=1] levels,
- np.ndarray[np.uint8_t, ndim=1] cell_inds,
- np.ndarray[np.int64_t, ndim=1] file_inds,
+ np.uint8_t[::1] levels,
+ np.uint8_t[::1] cell_inds,
+ np.int64_t[::1] file_inds,
dest_fields, source_fields,
np.int64_t offset = 0):
cdef np.ndarray[np.float64_t, ndim=2] source
cdef np.ndarray[np.float64_t, ndim=1] dest
- cdef int i
+ cdef np.float64_t[::1] dest_view
+ cdef np.float64_t[::1, :] source_view
+
+ cdef int i, lvl, Nlevel, file_ind
+ Nlevel = len(levels)
for key in dest_fields:
dest = dest_fields[key]
source = source_fields[key]
- for i, lvl in enumerate(levels):
+
+ dest_view = dest
+ source_view = source
+ for i in range(Nlevel):
+ lvl = levels[i]
if lvl != level: continue
- if file_inds[i] < 0:
- dest[i + offset] = np.nan
+
+ file_ind = file_inds[i]
+ if file_ind < 0:
+ dest_view[i + offset] = np.nan
else:
- dest[i + offset] = source[file_inds[i], cell_inds[i]]
+ dest_view[i + offset] = source_view[file_ind, cell_inds[i]]
def fill_index(self, SelectorObject selector = AlwaysSelector(None)):
"""Get the on-file index of each cell"""
@@ -802,10 +812,10 @@ cdef class OctreeContainer:
@cython.cdivision(True)
def fill_level_with_domain(
self, int level,
- np.uint8_t[:] levels,
- np.uint8_t[:] cell_inds,
- np.int64_t[:] file_inds,
- np.int32_t[:] domains,
+ np.uint8_t[::1] levels,
+ np.uint8_t[::1] cell_inds,
+ np.int64_t[::1] file_inds,
+ np.int32_t[::1] domains,
dict dest_fields,
dict source_fields,
np.int32_t domain,
@@ -819,19 +829,29 @@ cdef class OctreeContainer:
"""
cdef np.ndarray[np.float64_t, ndim=2] source
cdef np.ndarray[np.float64_t, ndim=1] dest
- cdef int i, count
+ cdef np.float64_t[::1] dest_view
+ cdef np.float64_t[::1, :] source_view
+
+ cdef int i, count, lvl, dom, Nlevel, file_ind
+
+ Nlevel = len(levels)
for key in dest_fields:
dest = dest_fields[key]
+ dest_view = dest
source = source_fields[key]
+ source_view = source
count = 0
- for i, (lev, dom) in enumerate(zip(levels, domains)):
+ for i in range(Nlevel):
+ lev = levels[i]
+ dom = domains[i]
if lev != level or dom != domain: continue
count += 1
- if file_inds[i] < 0:
- dest[i + offset] = np.nan
+ file_ind = file_inds[i]
+ if file_ind < 0:
+ dest_view[i + offset] = np.nan
else:
- dest[i + offset] = source[file_inds[i], cell_inds[i]]
+ dest_view[i + offset] = source_view[file_ind, cell_inds[i]]
return count
@cython.boundscheck(False) |
Bug report
Current master suffers from a serious regression for Ramses frontend. Take the following code as an example:
On master (6daf9cd):
$ time python3.8 regression.py real 0m43.792s user 0m43.704s sys 0m0.969s
on :
$ time python3.8 regression.py real 0m3.882s user 0m3.786s sys 0m0.799s
Here's the basic profile:
The text was updated successfully, but these errors were encountered: