Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pandas 1.5.3 breaks xs.DataCatalog #161

Closed
1 task
juliettelavoie opened this issue Mar 3, 2023 · 3 comments · Fixed by #162
Closed
1 task

Pandas 1.5.3 breaks xs.DataCatalog #161

juliettelavoie opened this issue Mar 3, 2023 · 3 comments · Fixed by #162
Labels
bug Something isn't working

Comments

@juliettelavoie
Copy link
Contributor

Setup Information

  • xscen version: 5.0
  • Python version: 3.9.13
  • Operating System: linux

Description

I think the new pandas (1.5.3) is breaking the reading of the catalogs.
I have updated librairies in my env and I can't read catalogs anymore :'(.
It still works in another env that has pandas 1.4.3

Steps To Reproduce

xs.DataCatalog('simulation.json')

.../site-packages/intake_esm/cat.py:262: FutureWarning: 
        Use pd.to_datetime instead.

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
File /exec/jlavoie/.conda/espo-g/lib/python3.9/site-packages/pandas/_libs/tslibs/period.pyx:1485, in pandas._libs.tslibs.period._extract_ordinal()

AttributeError: 'str' object has no attribute 'ordinal'

During handling of the above exception, another exception occurred:

OutOfBoundsDatetime                       Traceback (most recent call last)
File /exec/jlavoie/.conda/espo-g/lib/python3.9/site-packages/pandas/io/parsers/base_parser.py:1080, in _make_date_converter.<locals>.converter(*date_cols)
   1078 try:
   1079     result = tools.to_datetime(
-> 1080         date_parser(*date_cols), errors="ignore", cache=cache_dates
   1081     )
   1082     if isinstance(result, datetime.datetime):

File /exec/jlavoie/.conda/espo-g/lib/python3.9/site-packages/xscen/catalog.py:131, in _parse_dates(elem)
    129 # Only where we have NaT (parser errors and empty fields), parse into a Period
    130 # This will raise DateParseError as expected if the string is not parsable.
--> 131 time[nat] = pd.PeriodIndex(elem[nat], freq="H")
    132 return pd.PeriodIndex(time)

File /exec/jlavoie/.conda/espo-g/lib/python3.9/site-packages/pandas/core/indexes/period.py:273, in PeriodIndex.__new__(cls, data, ordinal, freq, dtype, copy, name, **fields)
    271     else:
    272         # don't pass copy here, since we copy later.
--> 273         data = period_array(data=data, freq=freq)
    275 if copy:

File /exec/jlavoie/.conda/espo-g/lib/python3.9/site-packages/pandas/core/arrays/period.py:977, in period_array(data, freq, copy)
    975 data = ensure_object(arrdata)
--> 977 return PeriodArray._from_sequence(data, dtype=dtype)

File /exec/jlavoie/.conda/espo-g/lib/python3.9/site-packages/pandas/core/arrays/period.py:274, in PeriodArray._from_sequence(cls, scalars, dtype, copy)
    273 freq = freq or libperiod.extract_freq(periods)
--> 274 ordinals = libperiod.extract_ordinals(periods, freq)
    275 return cls(ordinals, freq=freq)

File /exec/jlavoie/.conda/espo-g/lib/python3.9/site-packages/pandas/_libs/tslibs/period.pyx:1459, in pandas._libs.tslibs.period.extract_ordinals()

File /exec/jlavoie/.conda/espo-g/lib/python3.9/site-packages/pandas/_libs/tslibs/period.pyx:1494, in pandas._libs.tslibs.period._extract_ordinal()

File /exec/jlavoie/.conda/espo-g/lib/python3.9/site-packages/pandas/_libs/tslibs/period.pyx:2579, in pandas._libs.tslibs.period.Period.__new__()

File /exec/jlavoie/.conda/espo-g/lib/python3.9/site-packages/pandas/_libs/tslibs/parsing.pyx:367, in pandas._libs.tslibs.parsing.parse_time_string()

File /exec/jlavoie/.conda/espo-g/lib/python3.9/site-packages/pandas/_libs/tslibs/parsing.pyx:416, in pandas._libs.tslibs.parsing.parse_datetime_string_with_reso()

File /exec/jlavoie/.conda/espo-g/lib/python3.9/site-packages/pandas/_libs/tslibs/timestamps.pyx:1698, in pandas._libs.tslibs.timestamps.Timestamp.__new__()

File /exec/jlavoie/.conda/espo-g/lib/python3.9/site-packages/pandas/_libs/tslibs/conversion.pyx:249, in pandas._libs.tslibs.conversion.convert_to_tsobject()

File /exec/jlavoie/.conda/espo-g/lib/python3.9/site-packages/pandas/_libs/tslibs/conversion.pyx:523, in pandas._libs.tslibs.conversion._convert_str_to_tsobject()

File /exec/jlavoie/.conda/espo-g/lib/python3.9/site-packages/pandas/_libs/tslibs/conversion.pyx:506, in pandas._libs.tslibs.conversion._convert_str_to_tsobject()

File /exec/jlavoie/.conda/espo-g/lib/python3.9/site-packages/pandas/_libs/tslibs/np_datetime.pyx:212, in pandas._libs.tslibs.np_datetime.check_dts_bounds()

OutOfBoundsDatetime: Out of bounds nanosecond timestamp: 2291-01-01 00:00:00

During handling of the above exception, another exception occurred:

AttributeError                            Traceback (most recent call last)
File /exec/jlavoie/.conda/espo-g/lib/python3.9/site-packages/pandas/io/parsers/base_parser.py:1088, in _make_date_converter.<locals>.converter(*date_cols)
   1086 try:
   1087     return tools.to_datetime(
-> 1088         parsing.try_parse_dates(
   1089             parsing.concat_date_cols(date_cols),
   1090             parser=date_parser,
   1091             dayfirst=dayfirst,
   1092         ),
   1093         errors="ignore",
   1094     )
   1095 except Exception:

File /exec/jlavoie/.conda/espo-g/lib/python3.9/site-packages/pandas/_libs/tslibs/parsing.pyx:718, in pandas._libs.tslibs.parsing.try_parse_dates()

File /exec/jlavoie/.conda/espo-g/lib/python3.9/site-packages/xscen/catalog.py:127, in _parse_dates(elem)
    125 # Cast to normal datetime as this is much faster than to period for in-bounds dates
    126 # errors are coerced to NaT, we convert to a PeriodIndex and then to a (mutable) series
--> 127 time = pd.to_datetime(elem, errors="coerce").astype(pd.PeriodDtype("H")).to_series()
    128 nat = time.isnull()

AttributeError: 'Timestamp' object has no attribute 'astype'

During handling of the above exception, another exception occurred:

AttributeError                            Traceback (most recent call last)
Input In [19], in <cell line: 7>()
      5 print(pd.__version__)
      6 print(sys.version)
----> 7 dcat = xs.DataCatalog('/tank/scenario/catalogues/simulation.json')

File /exec/jlavoie/.conda/espo-g/lib/python3.9/site-packages/xscen/catalog.py:171, in DataCatalog.__init__(self, check_valid, drop_duplicates, *args, **kwargs)
    166 kwargs["read_csv_kwargs"] = recursive_update(
    167     csv_kwargs.copy(), kwargs.get("read_csv_kwargs", {})
    168 )
    169 args = args_as_str(args)
--> 171 super().__init__(*args, **kwargs)
    172 if check_valid:
    173     self.check_valid()

File /exec/jlavoie/.conda/espo-g/lib/python3.9/site-packages/intake_esm/core.py:94, in esm_datastore.__init__(self, obj, progressbar, sep, registry, read_csv_kwargs, storage_options, intake_kwargs)
     92     self.esmcat = ESMCatalogModel.from_dict(obj)
     93 else:
---> 94     self.esmcat = ESMCatalogModel.load(
     95         obj, storage_options=self.storage_options, read_csv_kwargs=read_csv_kwargs
     96     )
     98 self.derivedcat = registry or default_registry
     99 self._entries = {}

File /exec/jlavoie/.conda/espo-g/lib/python3.9/site-packages/intake_esm/cat.py:262, in ESMCatalogModel.load(cls, json_file, storage_options, read_csv_kwargs)
    260         csv_path = f'{os.path.dirname(_mapper.root)}/{cat.catalog_file}'
    261     cat.catalog_file = csv_path
--> 262     df = pd.read_csv(
    263         cat.catalog_file,
    264         storage_options=storage_options,
    265         **read_csv_kwargs,
    266     )
    267 else:
    268     df = pd.DataFrame(cat.catalog_dict)

File /exec/jlavoie/.conda/espo-g/lib/python3.9/site-packages/pandas/util/_decorators.py:211, in deprecate_kwarg.<locals>._deprecate_kwarg.<locals>.wrapper(*args, **kwargs)
    209     else:
    210         kwargs[new_arg_name] = new_arg_value
--> 211 return func(*args, **kwargs)

File /exec/jlavoie/.conda/espo-g/lib/python3.9/site-packages/pandas/util/_decorators.py:331, in deprecate_nonkeyword_arguments.<locals>.decorate.<locals>.wrapper(*args, **kwargs)
    325 if len(args) > num_allow_args:
    326     warnings.warn(
    327         msg.format(arguments=_format_argument_list(allow_args)),
    328         FutureWarning,
    329         stacklevel=find_stack_level(),
    330     )
--> 331 return func(*args, **kwargs)

File /exec/jlavoie/.conda/espo-g/lib/python3.9/site-packages/pandas/io/parsers/readers.py:950, in read_csv(filepath_or_buffer, sep, delimiter, header, names, index_col, usecols, squeeze, prefix, mangle_dupe_cols, dtype, engine, converters, true_values, false_values, skipinitialspace, skiprows, skipfooter, nrows, na_values, keep_default_na, na_filter, verbose, skip_blank_lines, parse_dates, infer_datetime_format, keep_date_col, date_parser, dayfirst, cache_dates, iterator, chunksize, compression, thousands, decimal, lineterminator, quotechar, quoting, doublequote, escapechar, comment, encoding, encoding_errors, dialect, error_bad_lines, warn_bad_lines, on_bad_lines, delim_whitespace, low_memory, memory_map, float_precision, storage_options)
    935 kwds_defaults = _refine_defaults_read(
    936     dialect,
    937     delimiter,
   (...)
    946     defaults={"delimiter": ","},
    947 )
    948 kwds.update(kwds_defaults)
--> 950 return _read(filepath_or_buffer, kwds)

File /exec/jlavoie/.conda/espo-g/lib/python3.9/site-packages/pandas/io/parsers/readers.py:611, in _read(filepath_or_buffer, kwds)
    608     return parser
    610 with parser:
--> 611     return parser.read(nrows)

File /exec/jlavoie/.conda/espo-g/lib/python3.9/site-packages/pandas/io/parsers/readers.py:1778, in TextFileReader.read(self, nrows)
   1771 nrows = validate_integer("nrows", nrows)
   1772 try:
   1773     # error: "ParserBase" has no attribute "read"
   1774     (
   1775         index,
   1776         columns,
   1777         col_dict,
-> 1778     ) = self._engine.read(  # type: ignore[attr-defined]
   1779         nrows
   1780     )
   1781 except Exception:
   1782     self.close()

File /exec/jlavoie/.conda/espo-g/lib/python3.9/site-packages/pandas/io/parsers/c_parser_wrapper.py:320, in CParserWrapper.read(self, nrows)
    316         self._check_data_length(names, alldata)
    318     data = {k: v for k, (i, v) in zip(names, data_tups)}
--> 320     names, date_data = self._do_date_conversions(names, data)
    321     index, column_names = self._make_index(date_data, alldata, names)
    323 return index, column_names, date_data

File /exec/jlavoie/.conda/espo-g/lib/python3.9/site-packages/pandas/io/parsers/base_parser.py:822, in ParserBase._do_date_conversions(self, names, data)
    814 def _do_date_conversions(
    815     self,
    816     names: Sequence[Hashable] | Index,
    817     data: Mapping[Hashable, ArrayLike] | DataFrame,
    818 ) -> tuple[Sequence[Hashable] | Index, Mapping[Hashable, ArrayLike] | DataFrame]:
    819     # returns data, columns
    821     if self.parse_dates is not None:
--> 822         data, names = _process_date_conversion(
    823             data,
    824             self._date_conv,
    825             self.parse_dates,
    826             self.index_col,
    827             self.index_names,
    828             names,
    829             keep_date_col=self.keep_date_col,
    830         )
    832     return names, data

File /exec/jlavoie/.conda/espo-g/lib/python3.9/site-packages/pandas/io/parsers/base_parser.py:1183, in _process_date_conversion(data_dict, converter, parse_spec, index_col, index_names, columns, keep_date_col)
   1180         continue
   1181     # Pyarrow engine returns Series which we need to convert to
   1182     # numpy array before converter, its a no-op for other parsers
-> 1183     data_dict[colspec] = converter(np.asarray(data_dict[colspec]))
   1184 else:
   1185     new_name, col, old_names = _try_convert_dates(
   1186         converter, colspec, data_dict, orig_names
   1187     )

File /exec/jlavoie/.conda/espo-g/lib/python3.9/site-packages/pandas/io/parsers/base_parser.py:1096, in _make_date_converter.<locals>.converter(*date_cols)
   1087     return tools.to_datetime(
   1088         parsing.try_parse_dates(
   1089             parsing.concat_date_cols(date_cols),
   (...)
   1093         errors="ignore",
   1094     )
   1095 except Exception:
-> 1096     return generic_parser(date_parser, *date_cols)

File /exec/jlavoie/.conda/espo-g/lib/python3.9/site-packages/pandas/io/date_converters.py:106, in generic_parser(parse_func, *cols)
    104 for i in range(N):
    105     args = [c[i] for c in cols]
--> 106     results[i] = parse_func(*args)
    108 return results

File /exec/jlavoie/.conda/espo-g/lib/python3.9/site-packages/xscen/catalog.py:127, in _parse_dates(elem)
    124 """Parse an array of dates (strings) into a PeriodIndex of hourly frequency."""
    125 # Cast to normal datetime as this is much faster than to period for in-bounds dates
    126 # errors are coerced to NaT, we convert to a PeriodIndex and then to a (mutable) series
--> 127 time = pd.to_datetime(elem, errors="coerce").astype(pd.PeriodDtype("H")).to_series()
    128 nat = time.isnull()
    129 # Only where we have NaT (parser errors and empty fields), parse into a Period
    130 # This will raise DateParseError as expected if the string is not parsable.

AttributeError: 'Timestamp' object has no attribute 'astype'

Additional context

No response

Contribution

  • I would be willing/able to open a Pull Request to address this bug.
@juliettelavoie juliettelavoie added the bug Something isn't working label Mar 3, 2023
@aulemahal
Copy link
Collaborator

pandas-dev/pandas#51004

I think something is wrong in the conda linux build. Hopefully, a new version of pandas will simply make this disappear. Because, with 3555 open issues, the pandas dev team might not have time to look into this....

@juliettelavoie
Copy link
Contributor Author

yikes! thanks for pointing me to the issue. Should we pin the requirements on 1.5.2 until this is fixed ?

@aulemahal
Copy link
Collaborator

I guess so... Or maybe "!=1.5.3" ? From my quick tests, it really looks only a problem in conda.

@juliettelavoie juliettelavoie mentioned this issue Mar 6, 2023
4 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants