Introduce clusterProperties option for aggregated cluster properties #7584

mourner · 2018-11-12T18:25:29Z

Closes #2412. An alternative to #7004.

Finally got my hands on exploring this feature, building on an extremely helpful implementation by @redbmk (sorry for not jumping in earlier!). Here I'm proposing a different API, which is very terse and easy to use, although it might look weird at first:

"clusterProperties": {
  "max": ["max", 0, ["get", "scalerank"]],
  "sum": ["+", 0, ["get", "scalerank"]],
  "has_island": ["any", false, ["==", ["get", "featureclass"], "island"]]
}

The syntax is property: [operator, initialExpression, mapExpression]. The array is itself a valid expression, though using it for cluster properties adds additional limitations.

Initial value is calculated by evaluating initialExpression
Single point value is calculated with mapExpression (that can reference feature properties)
Reduction is basically accumulated = evaluate([operator, accumulated, value]).

To implement this, I had to introduce an internal-use only ["accumulated"] expression. An alternative would be to expose it in the syntax, making it more verbose but also maybe more self-descriptive:

"max": ["max", ["accumulated", 0], ["get", "scalerank"]]

Regarding Infinity and -Infinity, a point raised in #7004 (it produces null because JSON format doesn't support Infinity), — I think it would make sense to introduce an ["infinity"] expression, similar to how we have ["pi"] and ["e"].

Regarding access to ["zoom"] in the accumulating expression — I think this wouldn't be useful because of how map/reduce works: imagine that out of 10 points, 5 merge into one cluster on z16, and 5 remaining get merged into that cluster on z15. The accumulated value in such cluster would depend on multiple zoom values (depending on where it acquired new points), which doesn't semantically makes much sense.

Opening up the PR to start a discussion, but it still needs some work before a proper review:

TODO

Proper Flow types
Proper error handling
Validation of cluster property expressions (specific to this use case)
Documentation

Launch Checklist

briefly describe the changes in this PR
write tests for all new functionality
document any changes to public APIs
~~post benchmark scores~~ we don't benchmark clustering
manually test the debug page
tagged @mapbox/studio if this PR includes style spec changes

sheerun · 2018-11-28T20:01:08Z

Looks awesome, I hope you'll have time to push this feature through :)

behuda · 2018-12-04T11:58:03Z

Maybe off topic, but can i change geojson clusterRadius property without creating new geojson source.
Wanted to change clusterRadius property through a input slider.
But the problem old geojson src has to be destroyed and corresponding layer should be remove.
Only if the same geojson source could be updated retaining the source id.
So that error: source cannot be removed since layers are using it can be avoided.

asheemmamoowala

@mourner - In the event that one of the map-reduce expressions are invalid, it is no longer possible to report that in style validation. I think it would be useful to move the validation from the geojson worker to the style-spec module and make it part of geojson source validation.

asheemmamoowala · 2019-01-04T18:05:45Z

src/source/geojson_worker_source.js

+
+        const initialExpressionParsed = createExpression(initialExpression);
+        const mapExpressionParsed = createExpression(mapExpression);
+        const reduceExpressionParsed = createExpression([operator, ['accumulated'], ['get', key]]);


Do you think the map/reduce expressions need to support the feature-state operator as well? Given that these expressions are evaluated just once it probably doesn't make sense to support that operator - it can change through interactivity on the map. An additional isStateConstant check would be required.

If we want to support feature-state, it would require re-computing the map/reduce on every change of a feature's state, or at least the clusters affected by a feature. Not sure that there is a need for it right now.

Good point. I think we shouldn't support feature-state, at least initially, because it would add a lot of complexity for a very obscure use case. I'll add the isStateConstant check in the validation.

@asheemmamoowala @mourner I'm just wondering if you're looking to support feature-state on clusterProperties in the near future?

I've got a use case where I need to highlight the marker cluster for a given feature when a reference of it is selected in a list view outside the map canvas.

This works nicely for non-clustered points, but I need a work around for this situation:

map.addSource('foos', { type: 'geojson', data, cluster: true, clusterMaxZoom: 14, clusterRadius: 50, clusterProperties: { 'has_hovered_foo': ['==', ['feature-state', 'hover'], true] } }) map.addLayer({ id: 'clusters', type: 'circle', source: 'foos', filter: ['has', 'point_count'], paint: { 'circle-color': [ 'case', ['get', 'has_hovered_foo'], '#000', '#FFF' ] } })

asheemmamoowala · 2019-01-04T18:11:31Z

src/style-spec/reference/v8.json

+    "clusterProperties": {
+      "type": "*",
+      "doc": "An object defining custom properties on the generated clusters if clustering is enabled, aggregating values from clustered points. Has the form `{\"property_name\": [operator, initial_expression, map_expression]}`. `operator` is any expression function that accepts at least 2 operands (e.g. `\"+\"` or `\"max\"`) — it accumulates the property value from clusters/points the cluster contains; `initial_expression` evaluates the initial value of the property before accummulating other points/clusters; `map_expression` produces the value of a single point.\n\nExample: `{\"sum\": [\"+\", 0, [\"get\", \"scalerank\"]]}`."
+    },


Do you think it would be useful to add an expression support block for each of the expressions - initial,map, reduce ?Something similar to what is in the the layer properties.

mapbox-gl-js/src/style-spec/reference/v8.json

Lines 3500 to 3508 in 1208cfc

"expression": {

"interpolated": true,

"parameters": [

"zoom",

"feature",

"feature-state"

]

},

"property-type": "data-driven"

I'm not sure how to separate those into separate properties. There are two expressions (map, initial) and an operator that accepts them as parameters. Map expression would be just ["feature"], and initial expression would be [].

Given the non-standard syntax and that support block is dictated more by property aggregation semantics rather than technical limitations (like in case of many properties), perhaps we should leave that to validation, and maybe add a note clarifying this in the doc string.

redbmk · 2019-01-04T23:07:19Z

src/source/geojson_worker_source.js

+        return properties;
+    };
+    superclusterOptions.map = (pointProperties) => {
+        feature.properties = pointProperties;


Why is this (and in reduce below) using a shared feature and overriding the properties of it instead of defining a new const feature = { properties: pointProperties } here?

The map function and expression evaluation is synchronous, meaning there are no side effects from using mutable state here (which is reset at the beginning). At the same time, we avoid constantly creating new feature objects, improving performance and lots of garbage collection passes.

Ah, OK that makes sense. I thought it might have something to do with performance optimization.

redbmk · 2019-01-04T23:29:12Z

src/source/geojson_worker_source.js

+    superclusterOptions.reduce = (accumulated, clusterProperties) => {
+        feature.properties = clusterProperties;
+        for (const key of propertyNames) {
+            globals.accumulated = accumulated[key];


Does this mean a reducer could access the accumulated property if it wanted to, and do something weird like the following?

"total": [ ["case", [">", ["get", "accumulated"], 100], "too many", ["+", ["get", "accumulated"], ["get", "total"] ], 0, ["get", "total"] ]

Currently not, but I think this could be added (check if operator is string and if it's not, consider it as a reduce expression). Would add quite a bit of complexity to the docs, but add some flexibility as well. Not sure how to go here.

Ah ok, I see - the reduce expression has to be something that fits into [operator, ['accumulated'], ['get', key]]. So that's why max and any work.

👍

redbmk · 2019-01-04T23:46:31Z

Thanks so much for taking this on! I'm looking forward to having native mapbox support for cluster agg! I left a couple comments regarding some of the code, but overall it looks great and I think this is easier to read than what I had come up with in #7004

redbmk

Looks good to me! I'm looking forward to seeing this in release!

…reduce fn

mourner · 2019-01-09T17:47:34Z

@asheemmamoowala moved the validation logic to style-spec and added the feature-state check — ready for another review.

@redbmk also added support custom expressions (referencing ["accumulated"]) in place of operator. The example above won't work because of type mismatches but I guess you could still do the same with something more verbose.

asheemmamoowala · 2019-01-09T18:14:25Z

docs/components/expression-metadata.js

@@ -199,6 +199,7 @@ for (const name in CompoundExpression.definitions) {
 }

 delete types['error'];
+delete types['accumulated']; // skip documenting `accumulated` since it is internal use only


Should this still be deleted, given that it is now usable as part of a custom reduce expression?

Right, just readded it.

mourner · 2019-01-15T13:15:20Z

I'm thinking that maybe I made a mistake of requiring an initial value, since for most cases, we could make it optional — see mapbox/supercluster#114. If it's merged, we could simplify the minimal syntax to:

"clusterProperties": {
  "max": ["max", ["get", "scalerank"]],
  "sum": ["+", ["get", "scalerank"]],
  "has_island": ["any", ["==", ["get", "featureclass"], "island"]]
}

And make the initial value the third optional argument. We could still make that change if we hurry to get this in before the next GL JS beta.

mourner · 2019-01-15T13:47:19Z

Just realized that it would mean we can define or omit initial values on a per-property basis, which doesn't fit the implementation in Supercluster (which is all or nothing). So our options are to either leave things as is, or ditch initial values completely (which I'm increasingly leaning towards).

redbmk · 2019-01-16T01:21:48Z

I'm inclined to leave initial as an optional param after the map expression. I shared an example in mapbox/supercluster#114 (comment) of using an initial value that wouldn't work by just using the first mapped value. However, I think in most cases you wouldn't need initial, so it makes sense to have it be optional.

mourner · 2019-01-16T17:06:18Z

My motivation behind removing it completely is reducing complexity as much as possible. Given technical details of how Supercluster implements map/reduce, we can either always require the initial argument for each of the specified properties, or remove that argument completely — we can't allow the syntax to accept initial in some of the properties but not others. So given that the need for initial is pretty marginal (which is my take at mapbox/supercluster#114 (comment)), that's why I'm still inclined to remove it. Curious to hear other opinions — tough decision when it's 1-1.

behuda · 2019-03-03T14:56:06Z

"clusterProperties": { 
    "my_point_count" : [["+", ["accumulated"], 1], 1]
}

when i'm comparing it with point_count the my_point_count property seems to different on some cluster

For more advanced use cases, in place of operator, you can use a custom reduce expression that references a special ["accumulated"] value, e.g.: {"sum": [["+", ["accumulated"], ["get", "sum"]], ["get", "scalerank"]]}

why ["get", "sum"], "accumulated" need more information on this, an analogy with javascript reduce fn would be great, why not
{"sum": [["+", ["accumulated"], ["get", "scalerank"]], ["get", "scalerank"]]}

Also, how to calculate average from a field lets say salary.

Empty2k12 · 2020-02-11T19:59:04Z

Sorry to revive this old thread, I have another interesting reduce problem where the reduce-syntax does not work like I expected it to.

Basically, all my markers have a custom property which is a string. Let's stay the property is ethnicity and the values are all from the set [Hispanic, White, Black, Asian]. Let's also assume this set is dynamic and I don't know the types of ethnicities in my data. Let's also assume I know all possible variations once I work with the data. I would like to aggregate the types of contained ethnicities.

What I am doing right now, is the following:

clusterProperties: {
    ethnicities: [
        [
            "concat",
            ["accumulated"],
            [
                "case",
                [
                    "!",
                    [
                        "in",
                        ["get", "ethnicity"],
                        ["accumulated"]
                    ]
                ],
                ["concat", "-", ["get", "ethnicity"]],
                ""
            ]
        ],
        ["get", "ethnicity"]
    ]
}

Basically in pseudo code:

concat to "accumulated" (if "ethnicity" is not substring of "accumulated" use "-" + "ethnicity" else emptystring)

Basically, add all the ethnicities in the cluster into a String, but do not repeat an ethnicity if it is already in the string. I want to use this to generate a fitting icon for the cluster.

However, it appears only the first ethnicity is appended into the string, and for each member of the cluster only the minus is appended, not -${ethnicity}. Additionally, I'd only expect as many minuses as there are ethnicities in the cluster, not one for every cluster member.

How can I reduce to properties without repeating?

mourner added the under development 🚧 label Nov 12, 2018

mourner mentioned this pull request Nov 12, 2018

Support property aggregation on clustered features #2412

Closed

ryanbaumann mentioned this pull request Dec 1, 2018

Can a heatmap represent the value of a property across a map rather than points density? #7218

Closed

mourner force-pushed the cluster-map-reduce branch 3 times, most recently from 7c55489 to f1fc585 Compare December 10, 2018 17:23

mourner added 3 commits January 4, 2019 18:12

introduce clusterProperties option for aggregating cluster properties

7624804

fix flow; more robust error handling

276bed8

fix unit tests

6032e41

mourner force-pushed the cluster-map-reduce branch from f1fc585 to 6032e41 Compare January 4, 2019 16:13

mourner added 2 commits January 4, 2019 18:54

add docs for clusterProperties

52fff4a

add a render test for clusterProperties

96969e0

mourner requested review from asheemmamoowala and ChrisLoer January 4, 2019 17:18

mourner removed the under development 🚧 label Jan 4, 2019

asheemmamoowala suggested changes Jan 4, 2019

View reviewed changes

redbmk reviewed Jan 4, 2019

View reviewed changes

redbmk approved these changes Jan 6, 2019

View reviewed changes

mourner mentioned this pull request Jan 9, 2019

[WIP] add cluster mapreduce feature to style spec #7004

Closed

6 tasks

move cluster properties validation logic to style-spec; allow custom …

41ad877

…reduce fn

mourner requested a review from asheemmamoowala January 9, 2019 17:37

asheemmamoowala reviewed Jan 9, 2019

View reviewed changes

expose accumulated expression in the docs

f7bf955

asheemmamoowala approved these changes Jan 10, 2019

View reviewed changes

mourner merged commit 47925a6 into master Jan 10, 2019

mourner deleted the cluster-map-reduce branch January 10, 2019 09:38

mourner mentioned this pull request Jan 17, 2019

Simplify clusterProperties expression #7784

Merged

5 tasks

martin-kieliszek mentioned this pull request Feb 17, 2019

Feature Request: Add clusterProperties option for aggregated cluster properties alex3165/react-mapbox-gl#697

Open

pozdnyakov mentioned this pull request Mar 6, 2019

Introduce clusterProperties option for aggregated cluster properties mapbox/mapbox-gl-native#14043

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Introduce clusterProperties option for aggregated cluster properties #7584

Introduce clusterProperties option for aggregated cluster properties #7584

mourner commented Nov 12, 2018 •

edited

Loading

sheerun commented Nov 28, 2018

behuda commented Dec 4, 2018

asheemmamoowala left a comment

asheemmamoowala Jan 4, 2019

mourner Jan 4, 2019

willviles Jun 17, 2019

asheemmamoowala Jan 4, 2019

mourner Jan 4, 2019

redbmk Jan 4, 2019 •

edited

Loading

mourner Jan 5, 2019

redbmk Jan 6, 2019

redbmk Jan 4, 2019

mourner Jan 5, 2019

redbmk Jan 6, 2019

redbmk commented Jan 4, 2019

redbmk left a comment

mourner commented Jan 9, 2019

asheemmamoowala Jan 9, 2019

mourner Jan 9, 2019

mourner commented Jan 15, 2019

mourner commented Jan 15, 2019

redbmk commented Jan 16, 2019

mourner commented Jan 16, 2019

behuda commented Mar 3, 2019 •

edited

Loading

Empty2k12 commented Feb 11, 2020

	"expression": {
	"interpolated": true,
	"parameters": [
	"zoom",
	"feature",
	"feature-state"
	]
	},
	"property-type": "data-driven"

Introduce clusterProperties option for aggregated cluster properties #7584

Introduce clusterProperties option for aggregated cluster properties #7584

Conversation

mourner commented Nov 12, 2018 • edited Loading

TODO

Launch Checklist

sheerun commented Nov 28, 2018

behuda commented Dec 4, 2018

asheemmamoowala left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

redbmk Jan 4, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

redbmk commented Jan 4, 2019

redbmk left a comment

Choose a reason for hiding this comment

mourner commented Jan 9, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mourner commented Jan 15, 2019

mourner commented Jan 15, 2019

redbmk commented Jan 16, 2019

mourner commented Jan 16, 2019

behuda commented Mar 3, 2019 • edited Loading

Empty2k12 commented Feb 11, 2020

mourner commented Nov 12, 2018 •

edited

Loading

redbmk Jan 4, 2019 •

edited

Loading

behuda commented Mar 3, 2019 •

edited

Loading