Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(transform): task to apply reduction ops to a container #241

Merged
merged 7 commits into from
Jun 28, 2023

Conversation

ljgray
Copy link
Contributor

@ljgray ljgray commented Apr 27, 2023

Apply a reduction op across up to n-1 axes of a rank-n dataset. The reduction can be applied to more than one dataset in the container, and axis downselection can be applied to n-1 axes as well. A weight mask can optionally be generated, which masks values where the weights are zero and is broadcast to the dataset shape if possible. This mask can also be applied before doing the reduction, using a masked array.

The task tries to optimize redistribution by first trying to minimize the number of redistributions that happen and then trying to choose the best possible axes to redistribute over if needed. This is just being done by an inverse sort by axis length.

I've tested this a fair bit, but I wouldn't be surprised if there are still some edge cases that could break. It generally works as expected when being used to generate variance-over-freq and variance-over-el datasets in the chime daily pipeline.

Requires radiocosmology/caput#242

Copy link
Contributor

@sjforeman sjforeman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see any obvious flaws in this implementation (modulo a few points where I got confused and wrote comments), but the logic is sufficiently complicated that it would be good to have someone else take a look

draco/analysis/transform.py Outdated Show resolved Hide resolved
draco/analysis/transform.py Outdated Show resolved Hide resolved
draco/analysis/transform.py Outdated Show resolved Hide resolved
draco/analysis/transform.py Outdated Show resolved Hide resolved
Copy link
Contributor

@jrs65 jrs65 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm going to echo Simon's comments here. This is nice, and I think could be broadly useful, but I fear it might be too smart for its own good. The actual task we want to do (variance over freq, variance over elevation) could be implemented in probably < 20 lines. This generic version is 300, and might not be flexible enough for what we want.

One of my fears is that it's pretty hard to specify a sensible generic reduction of the data when you would like to be able to use different weightings (unweighted, masked, and weighted) and have things turn out sensibly. Some of the common reductions have fairly sensible definitions (e.g. mean, variance), but others don't, i.e. how do you do a weighted prod, or a weighted min?

I'm not sure how exactly to go about simplifying this. One thing I might suggest is defining a common API, like:

def reduction(data: np.ndarray, weight: np.ndarray, axis: int, weighting: Literal["none", "masked", "weighted"]) -> tuple[np.ndarray, np.ndarray]:
    """Perform a reduction for data and weight to return a reduced data and weight."""
    ...

and then having specific implementations for each of a more limited set of operations you actually want to support and be well defined.

Something like:

def reduction_var(data: np.ndarray, weight: np.ndarray, axis: int, weighting: Literal["none", "masked", "weighted"]) -> tuple[np.ndarray, np.ndarray]:
    if weighting == "none":
        v = data.var(axis=axis)
    else:
        if weighting == "masked":
            w = (weight > 0).astype(weight.dtype)
        else:
            w = weight.copy()
        w *= w.sum(axis=axis) / w.shape[axis]
        v = np.var(w**0.5 * v, axis=axis)
    return v, np.ones_like(v)

That above is a little ugly, so maybe there's a better approach, but I imagine you get the vague idea.

draco/analysis/transform.py Outdated Show resolved Hide resolved
@ljgray ljgray force-pushed the ljg/reduce-across-axis branch 5 times, most recently from df01fe1 to d9eaf0f Compare May 2, 2023 20:05
draco/analysis/transform.py Outdated Show resolved Hide resolved
@ljgray ljgray force-pushed the ljg/reduce-across-axis branch 3 times, most recently from 2e64493 to 7da4aed Compare May 2, 2023 23:37
draco/analysis/transform.py Outdated Show resolved Hide resolved
draco/analysis/transform.py Outdated Show resolved Hide resolved
draco/core/containers.py Show resolved Hide resolved
test/test_containers.py Show resolved Hide resolved
@ljgray ljgray force-pushed the ljg/reduce-across-axis branch 7 times, most recently from 9d31e3d to 5eabac1 Compare June 22, 2023 20:06
@ljgray ljgray requested a review from jrs65 June 22, 2023 20:12
@ljgray ljgray force-pushed the ljg/reduce-across-axis branch 4 times, most recently from 7076057 to b145a27 Compare June 22, 2023 23:31
Copy link
Member

@ketiltrout ketiltrout left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this looks okay. It might be worth adding a comment to _make_output_container pointing out the choice of index_map[0] for the value of the rolled-up axis.

@ljgray ljgray merged commit cdbbfb9 into master Jun 28, 2023
4 checks passed
@ljgray ljgray deleted the ljg/reduce-across-axis branch June 28, 2023 22:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants