EnergyPlus Build Time Increasing #8598

Myoldmopar · 2021-03-09T15:07:28Z

Issue overview

The build time for EnergyPlus has steadily increased as of late. It's not a crazy amount, but it is becoming annoying, and starting to impact our CI's ability to produce tests in reasonable time.

lefticus · 2021-03-09T23:37:57Z

Research and prototyping of possibilities is commencing on this branch:

https://github.com/NREL/EnergyPlus/tree/prototype_simplified_array1d

A full description of the initial findings is available on this commit:

b036efa

amirroth · 2021-03-10T00:41:50Z

Could this be related to the state transition? I don’t see why it would be, but that is a big change that has been happening recently and incrementally.

lefticus · 2021-03-10T00:45:36Z

It is related, yes. It's partly due to increased visibility of Array1D<> template instantiations across translation units.

See my analysis on the this commit b036efa

amirroth · 2021-03-10T00:59:29Z

I see. Thanks. I am also a big proponent of thinning out whichever ObjexxFCL array libraries we end up keeping. There are a bunch of classes and operators that need to be in there for FORTRAN coverage purposes, but that we don't actually want to use in EnergyPlus, like array arithmetic, initialization (we use this a lot but should move away from it), and slicing (we use this less, but we need to stop). I'm not sure that these are great language features and I am pretty sure that they are terrible template library features.

lefticus · 2021-03-25T23:11:24Z

I fully agree that it does not make sense to remove Objexx's multi-dimensional arrays where we use them and their math functions. it would be far too much work to move to something different and pull in a new dependency.

Here is a proposal for decreasing build times while moving the E+ code base to more standard C++ (eg, 0-based indexing) where it makes sense to.

Prune all of ObjexxFCL that is not currently used
- Goal: Reduce the size of header files that need to be parsed across EnergyPlus.
- Impact: measurable but small change in compile times
- Changes required to EnergyPlus: 0
Move to energyplus::SimpleArray1D from objexx::Array1D where possible
- Goal: Identify uses of objexx::Array1D that do not take advantage advanced features of Objexx and are primary candidates for replacing with std::vector
- Impact: very large impact on compile times, as proven already
- Changes required to EnergyPlus: minimal, it is a drop-in replace for Array1D for the simple use cases we want to target
Move from energyplus::SimpleArray1D to std::vector (note that we have already identified cases where this should be easy in step 2)
- Goal: remove special containers where only simple containers are needed
- Impact: nominal impact on compile times
- Changes required to EnergyPlus: Larger - all accesses will need to be adjusted for 0-based indexing (or better, updated to ranged-based and algorithm accesses), allocate and deallocate member function calls will need to be updated
Move from objexx::Array1S slices to std::span where it makes sense
- Goal: identify remaining uses of objexx::Array1D that were only held back by slice requirements
- Impact: nominal impact on compile times
- Changes required to EnergyPlus: unknown
Goto 2

Note: reduction of ObjexxFCL and Array1D in particular should have a significant impact on binary sizes. It is hopeful that binary size reduction will also increase performance due to less cache pressure

Further reduction or removal of slices are up to the primary maintainers and developers of EnergyPlus, I don't know what impact this would have on the code, either from a compile-time perspective or a maintainability perspective.

Note 2: The goal is not to eliminate or even reduce any of the uses of objexx::Array1D that take advantage of matrix/math/vector functionality provided. The goal is only to remove those simple use cases that can easily become std::vector ultimately.

amirroth · 2021-03-26T00:06:37Z

Slicing is infrequently used and most of the instances that I've looked at I would characterize as "lazy programming". Eliminating the use of slicing in EnergyPlus is a reasonable goal that should be achievable in a week or so (not an exaggeration).
I don't know the extent to which the arithmetic and logical array functions are used as those are harder to grep for. I assume they are used more than slicing, but I have not run into them that frequently. However, I would characterize the instances I have run into as similarly "lazy programming". Eliminating the use of all (or maybe all except for matrix multiplication and matrix-vector multiplication if those are in fact used) ObjexxFCL arithmetic functions in EnergyPlus is a reasonable goal that should take several weeks to accomplish (probably also not an exaggeration). The ObjexxFCL array library is way over-engineered for the subset of FORTRAN that EnergyPlus actually uses. (Remember, EnergyPlus was not developed by FORTRAN programmers, it was developed by mechanical engineers cosplaying FORTRAN programmers.)
I think removing objexxFCL::Array1D in favor of std::array (why std::vector? the size of almost everything in EnergyPlus is known the start of the program) is a reasonable goal that should be achievable before the September 9.6 release.
Goto's are harmful, don't they teach this in comp-sci school anymore? In the future, please structure your narratives using reducible control flow constructs like for loops so that the narrative compiler can optimize them.

jasondegraw · 2021-03-26T04:41:23Z

Since it seems to have been forgotten in this context, I'll remind everyone that there was to be a cooperative effort on the Array1D replacement front that seems to have largely fallen through. Which is a shame, because we even had arrays of bools working. Anyway, PRs #7813 and #7841 cleaned up a lot of the unnecessary uses of the fancier array objects from ObjexxFCL. Some of the remainders are in code that we would have had to really understand to do a diff-free replacement, so I'm a little skeptical about a week being enough time.

I know I'll regret engaging on this, but once more into the breach. I believe it'd be more accurate to say that EnergyPlus was written in Fortran by FORTRAN programmers with many parts based on older FORTRAN code.

std::array requires the size be known at compile time, so it'll probably have to be std::vector. I wish there was a std::fixed_size_vector, but I don't think there is one. There are a lot of places where the replacement is really easy, see #7893.

lefticus · 2021-03-26T13:36:06Z

@jasondegraw agreed re: std::array vs std::vector, I've observed few cases where the size is known fully at compile time. My understanding is that the sizes are known at IDF parsing time, which is too late, but I am not an expert on the internals of EnergyPlus.

I was not aware of the existing PR's, but they could be very helpful here, I was just tasked with improving build times and found that Array1D is one of our main problems.

I do, of course, prefer std::array when that's possible (that's actually covered in chapter 13 of my book).

@amirroth I have no strongly held opinions on slices, because I do not know how they are used in the EnergyPlus code. My one interaction with slices in EnergyPlus was a 4D slice view being created from a 5D Array, which appeared to be a non-trivial use of it.

But, there can be very good performance and API reasons to using slice-like things (C++20's std::span, for example, which is very easy to implement in C++17), because they give you nice ranged access to contiguous data which is decoupled from the underlying data structure, without needing to templatize the function being called.

This could be very helpful for easily allowing common functions to be called with a Array1D, std::vector or std::array, with no loss in performance and possibly performance increases for those cases where we can use std::array.

amirroth · 2021-03-26T14:58:49Z

Jason. Jason. Should we invite Jason Glazer to this thread?

Sorry, I don't know what I was thinking about std::array. I could make 20 excuses. Thankfully my position means that I don't have to take the time.

I guess the current use of slicing in EnergyPlus appears to be minimal because @jasondegraw has gotten rid of many of the original uses. Thanks @jasondegraw! (I didn't mean to discount your contributions, I just lost track of them). Several of the remaining instances I have run into (actually @xuanluo113 ran into them) can be eliminated by transposing the indices of the parent matrix (there are some multi-dimensional arrays that still assume column-major ordering apparently) and replacing the transposed 2D array by a 1D array of 1D arrays. This would also save some "virtual space" because only a small fraction of the inner 1D arrays would need to be created. A few other instances could be handled by the std::span object, by passing the first and last indices as additional arguments, or even by manually inlining the body of the function which is usually a one liner, maybe two.

I apologize for the blanket slander of all EnergyPlus and predecessor FORTRAN developers. I will be sure to redact all names from the "Crimes against EnergyPlus" card series I am developing.

jasondegraw · 2021-03-30T02:32:19Z

@lefticus In the last PR I referenced we had gotten to the point that we were running up against multidimensional arrays as the major impediment to progress. Our stand-in for Array1D (EPVector in EnergyPlus.hh) had grown to have a few methods we didn't really want, but at least in a few cases we were able to add a method to get things compiling, then go back and do the correct thing, and finally remove the method. We had some progress spreadsheets someplace, I'll try to find those.

@amirroth I hope to be featured prominently as "that airflow guy". More seriously though, FORTRAN77 and previous had a limited set of data structures available, and some of us were taught to avoid things too far from common usage. The variations between compiler could be pretty extreme, but arrays would always be there, so... yeah... arrays. The plain vanilla F77 CFD code that I inherited for use on my masters still compiles and works and it dates to the 1980s, before the invention of lower case.

My recollection is that there were places where "arrays of arrays" would slot right in and eliminate more Array1Ss and thus any need to think about slicing. I tend to doubt that we "need" slicing, and while I could be wrong I think it would make for better code if we restructured things to not need to do that.

Myoldmopar added the DoNotPublish Includes changes that shouldn't be reported in the changelog label Mar 9, 2021

Myoldmopar assigned lefticus Mar 9, 2021

lefticus mentioned this issue Apr 9, 2021

Prototype simplified array1d #8707

Merged

Myoldmopar closed this as completed in #8707 Apr 14, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

EnergyPlus Build Time Increasing #8598

EnergyPlus Build Time Increasing #8598

Myoldmopar commented Mar 9, 2021

lefticus commented Mar 9, 2021

amirroth commented Mar 10, 2021

lefticus commented Mar 10, 2021

amirroth commented Mar 10, 2021

lefticus commented Mar 25, 2021

amirroth commented Mar 26, 2021

jasondegraw commented Mar 26, 2021

lefticus commented Mar 26, 2021

amirroth commented Mar 26, 2021

jasondegraw commented Mar 30, 2021

EnergyPlus Build Time Increasing #8598

EnergyPlus Build Time Increasing #8598

Comments

Myoldmopar commented Mar 9, 2021

Issue overview

lefticus commented Mar 9, 2021

amirroth commented Mar 10, 2021

lefticus commented Mar 10, 2021

amirroth commented Mar 10, 2021

lefticus commented Mar 25, 2021

amirroth commented Mar 26, 2021

jasondegraw commented Mar 26, 2021

lefticus commented Mar 26, 2021

amirroth commented Mar 26, 2021

jasondegraw commented Mar 30, 2021