Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

EnergyPlus Build Time Increasing #8598

Closed
Myoldmopar opened this issue Mar 9, 2021 · 10 comments · Fixed by #8707
Closed

EnergyPlus Build Time Increasing #8598

Myoldmopar opened this issue Mar 9, 2021 · 10 comments · Fixed by #8707
Assignees
Labels
DoNotPublish Includes changes that shouldn't be reported in the changelog

Comments

@Myoldmopar
Copy link
Member

Issue overview

The build time for EnergyPlus has steadily increased as of late. It's not a crazy amount, but it is becoming annoying, and starting to impact our CI's ability to produce tests in reasonable time.

@Myoldmopar Myoldmopar added the DoNotPublish Includes changes that shouldn't be reported in the changelog label Mar 9, 2021
@lefticus
Copy link
Contributor

lefticus commented Mar 9, 2021

Research and prototyping of possibilities is commencing on this branch:

https://github.com/NREL/EnergyPlus/tree/prototype_simplified_array1d

A full description of the initial findings is available on this commit:

b036efa

@amirroth
Copy link
Collaborator

Could this be related to the state transition? I don’t see why it would be, but that is a big change that has been happening recently and incrementally.

@lefticus
Copy link
Contributor

It is related, yes. It's partly due to increased visibility of Array1D<> template instantiations across translation units.

See my analysis on the this commit b036efa

@amirroth
Copy link
Collaborator

I see. Thanks. I am also a big proponent of thinning out whichever ObjexxFCL array libraries we end up keeping. There are a bunch of classes and operators that need to be in there for FORTRAN coverage purposes, but that we don't actually want to use in EnergyPlus, like array arithmetic, initialization (we use this a lot but should move away from it), and slicing (we use this less, but we need to stop). I'm not sure that these are great language features and I am pretty sure that they are terrible template library features.

@lefticus
Copy link
Contributor

I fully agree that it does not make sense to remove Objexx's multi-dimensional arrays where we use them and their math functions. it would be far too much work to move to something different and pull in a new dependency.

Here is a proposal for decreasing build times while moving the E+ code base to more standard C++ (eg, 0-based indexing) where it makes sense to.

  1. Prune all of ObjexxFCL that is not currently used
    • Goal: Reduce the size of header files that need to be parsed across EnergyPlus.
    • Impact: measurable but small change in compile times
    • Changes required to EnergyPlus: 0
  2. Move to energyplus::SimpleArray1D from objexx::Array1D where possible
    • Goal: Identify uses of objexx::Array1D that do not take advantage advanced features of Objexx and are primary candidates for replacing with std::vector
    • Impact: very large impact on compile times, as proven already
    • Changes required to EnergyPlus: minimal, it is a drop-in replace for Array1D for the simple use cases we want to target
  3. Move from energyplus::SimpleArray1D to std::vector (note that we have already identified cases where this should be easy in step 2)
    • Goal: remove special containers where only simple containers are needed
    • Impact: nominal impact on compile times
    • Changes required to EnergyPlus: Larger - all accesses will need to be adjusted for 0-based indexing (or better, updated to ranged-based and algorithm accesses), allocate and deallocate member function calls will need to be updated
  4. Move from objexx::Array1S slices to std::span where it makes sense
    • Goal: identify remaining uses of objexx::Array1D that were only held back by slice requirements
    • Impact: nominal impact on compile times
    • Changes required to EnergyPlus: unknown
  5. Goto 2

Note: reduction of ObjexxFCL and Array1D in particular should have a significant impact on binary sizes. It is hopeful that binary size reduction will also increase performance due to less cache pressure

Further reduction or removal of slices are up to the primary maintainers and developers of EnergyPlus, I don't know what impact this would have on the code, either from a compile-time perspective or a maintainability perspective.

Note 2: The goal is not to eliminate or even reduce any of the uses of objexx::Array1D that take advantage of matrix/math/vector functionality provided. The goal is only to remove those simple use cases that can easily become std::vector ultimately.

@amirroth
Copy link
Collaborator

  1. Slicing is infrequently used and most of the instances that I've looked at I would characterize as "lazy programming". Eliminating the use of slicing in EnergyPlus is a reasonable goal that should be achievable in a week or so (not an exaggeration).

  2. I don't know the extent to which the arithmetic and logical array functions are used as those are harder to grep for. I assume they are used more than slicing, but I have not run into them that frequently. However, I would characterize the instances I have run into as similarly "lazy programming". Eliminating the use of all (or maybe all except for matrix multiplication and matrix-vector multiplication if those are in fact used) ObjexxFCL arithmetic functions in EnergyPlus is a reasonable goal that should take several weeks to accomplish (probably also not an exaggeration). The ObjexxFCL array library is way over-engineered for the subset of FORTRAN that EnergyPlus actually uses. (Remember, EnergyPlus was not developed by FORTRAN programmers, it was developed by mechanical engineers cosplaying FORTRAN programmers.)

  3. I think removing objexxFCL::Array1D in favor of std::array (why std::vector? the size of almost everything in EnergyPlus is known the start of the program) is a reasonable goal that should be achievable before the September 9.6 release.

  4. Goto's are harmful, don't they teach this in comp-sci school anymore? In the future, please structure your narratives using reducible control flow constructs like for loops so that the narrative compiler can optimize them.

@jasondegraw
Copy link
Member

Since it seems to have been forgotten in this context, I'll remind everyone that there was to be a cooperative effort on the Array1D replacement front that seems to have largely fallen through. Which is a shame, because we even had arrays of bools working. Anyway, PRs #7813 and #7841 cleaned up a lot of the unnecessary uses of the fancier array objects from ObjexxFCL. Some of the remainders are in code that we would have had to really understand to do a diff-free replacement, so I'm a little skeptical about a week being enough time.

I know I'll regret engaging on this, but once more into the breach. I believe it'd be more accurate to say that EnergyPlus was written in Fortran by FORTRAN programmers with many parts based on older FORTRAN code.

std::array requires the size be known at compile time, so it'll probably have to be std::vector. I wish there was a std::fixed_size_vector, but I don't think there is one. There are a lot of places where the replacement is really easy, see #7893.

@lefticus
Copy link
Contributor

@jasondegraw agreed re: std::array vs std::vector, I've observed few cases where the size is known fully at compile time. My understanding is that the sizes are known at IDF parsing time, which is too late, but I am not an expert on the internals of EnergyPlus.

I was not aware of the existing PR's, but they could be very helpful here, I was just tasked with improving build times and found that Array1D is one of our main problems.

I do, of course, prefer std::array when that's possible (that's actually covered in chapter 13 of my book).

@amirroth I have no strongly held opinions on slices, because I do not know how they are used in the EnergyPlus code. My one interaction with slices in EnergyPlus was a 4D slice view being created from a 5D Array, which appeared to be a non-trivial use of it.

But, there can be very good performance and API reasons to using slice-like things (C++20's std::span, for example, which is very easy to implement in C++17), because they give you nice ranged access to contiguous data which is decoupled from the underlying data structure, without needing to templatize the function being called.

This could be very helpful for easily allowing common functions to be called with a Array1D, std::vector or std::array, with no loss in performance and possibly performance increases for those cases where we can use std::array.

@amirroth
Copy link
Collaborator

Jason. Jason. Should we invite Jason Glazer to this thread?

Sorry, I don't know what I was thinking about std::array. I could make 20 excuses. Thankfully my position means that I don't have to take the time.

I guess the current use of slicing in EnergyPlus appears to be minimal because @jasondegraw has gotten rid of many of the original uses. Thanks @jasondegraw! (I didn't mean to discount your contributions, I just lost track of them). Several of the remaining instances I have run into (actually @xuanluo113 ran into them) can be eliminated by transposing the indices of the parent matrix (there are some multi-dimensional arrays that still assume column-major ordering apparently) and replacing the transposed 2D array by a 1D array of 1D arrays. This would also save some "virtual space" because only a small fraction of the inner 1D arrays would need to be created. A few other instances could be handled by the std::span object, by passing the first and last indices as additional arguments, or even by manually inlining the body of the function which is usually a one liner, maybe two.

I apologize for the blanket slander of all EnergyPlus and predecessor FORTRAN developers. I will be sure to redact all names from the "Crimes against EnergyPlus" card series I am developing.

@jasondegraw
Copy link
Member

@lefticus In the last PR I referenced we had gotten to the point that we were running up against multidimensional arrays as the major impediment to progress. Our stand-in for Array1D (EPVector in EnergyPlus.hh) had grown to have a few methods we didn't really want, but at least in a few cases we were able to add a method to get things compiling, then go back and do the correct thing, and finally remove the method. We had some progress spreadsheets someplace, I'll try to find those.

@amirroth I hope to be featured prominently as "that airflow guy". More seriously though, FORTRAN77 and previous had a limited set of data structures available, and some of us were taught to avoid things too far from common usage. The variations between compiler could be pretty extreme, but arrays would always be there, so... yeah... arrays. The plain vanilla F77 CFD code that I inherited for use on my masters still compiles and works and it dates to the 1980s, before the invention of lower case.

My recollection is that there were places where "arrays of arrays" would slot right in and eliminate more Array1Ss and thus any need to think about slicing. I tend to doubt that we "need" slicing, and while I could be wrong I think it would make for better code if we restructured things to not need to do that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
DoNotPublish Includes changes that shouldn't be reported in the changelog
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants