Dates as datetime64[ms] - remove driving_institution
#222
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Pull Request Checklist:
number
) and pull request (:pull:number
) has been added.What kind of change does this PR introduce?
date_start
anddate_end
columns are casted with adatetime64[ms]
dtype (not a Period)date_parser
.subset_file_coverage
.driving_institution
as an official xscen column.Pandas 2 now supports datetime columns with a s, ms and us resolution, instead of the old ns default. This allows storing dates from before 1677 and after 2242. However, this support is still partial as many of the datetime manipulation methods will still fail on "out of bounds" dates. This includes:
pd.read_csv
andpd.to_datetime
... Because of this bug, I had to implement the parsing directly in theDataCatalog
's init, using a solution proposed on stackoverflow.Even with this strange workaround, opening
simulation.json
went from 3 s to 800 ms on my machine !The change had repercussions in other parts of xscen, especially
date_parser
andsubset_file_coverage
. I adapted the former to outputpd.Timestamp
objects by default and the latter to use more of theInterval
magic pandas can already do with datetime bounds.I also used this PR to remove
driving_institution
from the official columns, as discussed.Does this PR introduce a breaking change?
The default output of
date_parser
has changed.The default dtype of
date_start
anddate_end
has changed.The
driving_institution
column has been removed.Other information:
This required pinning pandas >= 2, clisops >= 0.10. The latter pin allowed unpinning python.