Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: Series.map may raise TypeError in Categorical or DatetimeTz #12532

Closed
wants to merge 1 commit into from

Conversation

sinhrks
Copy link
Member

@sinhrks sinhrks commented Mar 5, 2016

Needs to decide what categorical dtype map should return. Even though apply returns object dtype, but returning category is more consistent?

@sinhrks sinhrks added Dtype Conversions Unexpected or buggy dtype conversions Timezones Timezone data dtype Categorical Categorical Data Type labels Mar 5, 2016
@sinhrks sinhrks added this to the 0.18.1 milestone Mar 5, 2016
if needs_i8_conversion(values.dtype):
boxer = i8_boxer(values)
values = lib.map_infer(values, boxer)
not_ndarray = (is_categorical_dtype(self.dtype) or
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use is_internal_type (this gets sparse as well, but I think thats correct here as well), prob not tested

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually should prob rename is_internal_type -> is_extension_type

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, renamed.

@sinhrks sinhrks force-pushed the datetime_map branch 2 times, most recently from 2aa510a to 7b6c178 Compare March 14, 2016 06:49
@sinhrks
Copy link
Member Author

sinhrks commented Mar 15, 2016

Let me confirm the result of map/apply against categorical dtype. I think there are 3 options.

  1. Keep categorical dtype. If number of category is changed, raise error.
  2. Coerce to normal (object) dtype befoe map/apply
  3. Try 1st option, then 2nd option if catches error

and current impl are as below and has inconsistencies. I think option #1 is preferable.

map (option 1)

It raises AttributeError on current master.

s = pd.Series([1, 2, 3], dtype='category')
s.map(lambda x: x)
# 0    1
# 1    2
# 2    3
# dtype: category
# Categories (3, int64): [1, 2, 3]

apply (option 2)

no changes from current master

s.apply(lambda x: x)

@jreback
Copy link
Contributor

jreback commented Mar 15, 2016

yeah I think if you can construct a category with exactly the same dtype (IOW, categories & ordered) then do it. Otherwise just coerce to object.

@sinhrks sinhrks force-pushed the datetime_map branch 3 times, most recently from 5a3f5eb to b3c1caf Compare March 20, 2016 12:00
@sinhrks
Copy link
Member Author

sinhrks commented Mar 20, 2016

@jreback OK, both categorical apply/map are fixed to return category if possible, otherwise coerce to appropriate dtype.

else:
map_f = lib.map_infer
values = _values_from_object(self)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wish we could just make this simpler. IOW the difference between extension and non-extension types is glaring here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, how about moving boxing logic to Block? At a glance, i8_boxer is only used few times in series.py and frame.py.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, that would be better, you would have to dispatch to a new method on block, but it would be much cleaner. Ok with this (minor comment). and can do that later (or can refactor here). lmk.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, let me do it separately because it is likely to need API discussion (#12741).

categories=new_categories,
ordered=self.ordered)
except ValueError:
return np.take(new_categories, self._codes)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't this be:

Index(new_categories).take(self._codes)? maybe that's the same

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the result should be the same. I chose numpy logic to omit Index creation, but Index.take is preferable?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

its fine, more of a style things, but yes should be the same

if is_extension_type(self.dtype):
values = self._values
if na_action is not None:
raise NotImplementedError
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm, testing this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@@ -468,6 +468,24 @@ def take(self, indices, axis=0, allow_fill=True, fill_value=None):
na_value=-1)
return self._create_from_codes(taken)

def map(self, mapper):
"""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe in another PR can consolidate the doc-strings of map into base (when we combine them all there)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left it as it is, because Categorical.map needs some supplementary info. Using Appender doesn't make it very simple.

@sinhrks sinhrks force-pushed the datetime_map branch 2 times, most recently from 9b15940 to 031c8e6 Compare April 8, 2016 21:22
@sinhrks
Copy link
Member Author

sinhrks commented Apr 8, 2016

@jreback Because #12798 relies on this, can we consider to merge this in prior to fixing whole Index.map API ( #12756)?

@jreback
Copy link
Contributor

jreback commented Apr 9, 2016

this looks fine. maybe we can come back and try to simplify logic even more w.r.t. the extension types.

we have lots of if/then's handing around extension/ndarrays. An idea (maybe not trivial), is to have an Array internally which is really an extension type that simply passes thru everything, then we can define all of the methods on this type (and just pass thru everything else), e.g. map, .value_counts etc. This gets into libpandas territory (well it actually makes it simplerI think) as everything would then have a uniform API (from pandas perspective). The implementation can then be different.

But that's for another day.

@kawochen any comments on this code?

@jreback
Copy link
Contributor

jreback commented Apr 17, 2016

this lgtm. pls rebase

@sinhrks
Copy link
Member Author

sinhrks commented Apr 17, 2016

Rebased and now green.

@jreback jreback closed this in 4c84f2d Apr 18, 2016
@jreback
Copy link
Contributor

jreback commented Apr 18, 2016

thanks!

@sinhrks sinhrks deleted the datetime_map branch April 18, 2016 20:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Categorical Categorical Data Type Dtype Conversions Unexpected or buggy dtype conversions Timezones Timezone data dtype
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Pandas datetime64 series no longer has map function when localized
3 participants