Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support joining external attributes to geometries directly #4261

Closed
stevage opened this issue Feb 13, 2017 · 13 comments
Closed

Support joining external attributes to geometries directly #4261

stevage opened this issue Feb 13, 2017 · 13 comments

Comments

@stevage
Copy link
Contributor

stevage commented Feb 13, 2017

Currently there are only two ways to join geometry from one tileset with an external dataset:

  1. Merge it offline and upload to Mapbox
  2. Compute all the visualisation properties directly for each row, as per the "Join local JSON data" example.

The first is inefficient, difficult to manage, and removes the possibility of using third-party geometries.

The second is cumbersome, foregoes all of the data vis capabilities of mapbox-gl-js (ie, you have to use "category" with a pre-calculated value, instead of interpolating), and slower.

Given that this is a very common use case (eg, visualising any kind of census data, per city/county/state/whatever), how about supporting external datasets directly?

It could be done fairly elegantly in the style spec:

"states": {
  "type": "vector",
  "url": "...",
  "attributes": {
    "population": {
      "data": [ { "shortname": "VIC", "count": 5000000 }, { "shortname": "NSW", "count": 6000000 } ...],
      "data-join-field": "shortname",
      "geometry-join-field": "id"
    }, ...
  }
}

Then you'd access the population value as .properties.population_shortname, for instance.

@anandthakker
Copy link
Contributor

@stevage thanks for this suggestion! I agree that this would be a really useful feature, and I think the most promising route for implementation is alongside the custom source type project. A key goal of that project is to introduce more flexibility into the Source API to allow for plugin-like extensibility for data sources. I see this as a principal use case for that API.

@andrewharvey
Copy link
Collaborator

Previously requested in #2671

@stevage
Copy link
Contributor Author

stevage commented Feb 14, 2017

Ah yep - good discussion there.

@anandthakker Custom source types look promising, it's a bit hard to tell from the various issues what the expected end state is at the moment, but I'll keep an eye on it.

@lucaswoj
Copy link
Contributor

We are hesitant to provide data join functionality in the style spec because

  • there is little performance advantage to joining data within GL JS rather than external to GL JS (during source generation or by building a geojson source at runtime)
  • users are likely to demand increasingly more powerful data join operations which will drastically complicate our style specification, these users would likely be more happy with the ability to write native code

For these reasons I am closing this ticket in favor of the custom source types concept. We do not currently have anybody working on its implementation. The closest thing we have to an interface proposal is #3186

@stevage
Copy link
Contributor Author

stevage commented Feb 15, 2017

there is little performance advantage to joining data within GL JS rather than external to GL JS (during source generation or by building a geojson source at runtime)

During source generation is ok, if the same person manages both the geometries and the attributes. But it's common for geometries to be provided by a government body such as the Australian Bureau of Statistics, which updates them only every couple of years. And the attributes to come from a wide variety of sources. If the person with the attributes never has to touch any spatial format or deal with vector tiles directly, it greatly increases the range of applications of this kind of visualisation.

As for building a GeoJSON at runtime, that's prohibitively expensive in many cases. For instance, Australia's SA4 boundaries (that is, the most coarse-grained boundaries, 100,000 - 500,000 people) is a 98MB GeoJSON file.

Various use cases are simply not possible at present:

  • attributes that change frequently (because the tiles would have to be constantly regenerated and uploaded)
  • selecting from large numbers of attributes (because the vector tiles would become too heavy)

So, basically I'm arguing on the basis of flexibility, not performance. :)

users are likely to demand increasingly more powerful data join operations

Hmm, what kind of "increasingly more powerful data join operations" are you envisaging? Joining a table of attributes to a set of polygons is extremely common, but I can't off the top of my head think what the next step of this slippery slope would be.

@lucaswoj
Copy link
Contributor

As for building a GeoJSON at runtime, that's prohibitively expensive in many cases.

So is loading a large dataset of joined values!

but I can't off the top of my head think what the next step of this slippery slope would be.

  • I want to downcase / strip whitespace from the joined values
  • I want to downcase / strip whitespace from the existing values
  • I only want to load joined values for tiles in the viewport
  • I want to add/multiply/concat/... the joined value to the existing value

Having a way to programatically create/manipulate data is going to be a more generally useful and future proof than addressing this one use case with a particular feature.

@stevage
Copy link
Contributor Author

stevage commented Feb 24, 2017

I want to downcase / strip whitespace from the joined values
I want to downcase / strip whitespace from the existing values
I want to add/multiply/concat/... the joined value to the existing value

Those all fit within a category of data munging that can be carried out before the geometry join. I don't see any need to ever support those. I'm assuming that:

  • geometry data is already hosted in Mapbox (perhaps by a third party) and is hard to manipulate
  • non-geometry data is loaded into the browser by the user, and is easy to manipulate

I only want to load joined values for tiles in the viewport

Sounds like an optimisation that the API might want to make at some point. I don't know enough about the internals of GLJS to comment really.

So...so far no scary steps down the slippery slope :)

@andrewharvey
Copy link
Collaborator

If your Tileset has a lot of features (and potentially a large number of attributes), your browser won't have a single object, rather you'd have an API to get these attributes from a list of IDs. So the data join would need to support this somehow, perhaps being synchronous so you can make that AJAX call to get the extra attribute values for the featureIds in a tile? If it's per tile is it up to the client to cache them or does GL JS handle? All things which in my view would need to be considered if the join attributes to a vector source is supported.

@mountainMath
Copy link

I am very interested in this scenario. The custom source type project looks like it would perfectly fit my bill. Is that project still ongoing? Are there any example implementations of a custom source type that I could look at and try and adapt for my purposes. Ideally a custom source that consumes geojson tiles, that would make it relatively easy for me to adapt.

@anandthakker
Copy link
Contributor

@mountainMath the project is on the roadmap, although not (yet!) under active development.

@anandthakker
Copy link
Contributor

@pestrov
Copy link

pestrov commented Apr 10, 2019

Hi everyone!

I've tried hard to find out the current state of an efficient way to join vector tile geometry with external data source by a field.
Has anything changed, is planning to change, or match is still the best way we could do that?

Thank you!

@asheemmamoowala
Copy link
Contributor

@pestrov The only other update to this is to use Map#setFeatureState in conjunction with feature-state expressions. This approach requires unique feature ids on every feature in the vector tile source-layer that needs to be joined to.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants