Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dropping subpixel accuracy for areas #2874

Merged
merged 1 commit into from
May 17, 2018

Conversation

kocio-pl
Copy link
Collaborator

@kocio-pl kocio-pl commented Sep 29, 2017

Follow up to #2873.
Related to #1604.

It needs some testing, but it looks that dropping subpixel accuracy from SQL queries does not have visible effect on rendering - if any - while increasing performance (at least decreasing memory usage).

Affected objects: water, landcovers, buildings and nature reserves.

@kocio-pl
Copy link
Collaborator Author

The differences are real, but they are not obvious, which is good.

I've made two Poland exports from Kosmtik at z10 (no buildings) - each about 40 MB, so I will not upload them. Then I compared them with ImageMagick (after disabling its policy limits in /etc):

compare -compose src before.png after.png difference.png

Click to see the full scale result (~0.9 MB):
difference

Then I opened all files in Gimp as layers and zoomed in where the changes are biggest according to this image. I was able to see the change when flipping visibility on 800% zoom, but it doesn't look like error to me - just a bit less of details:

Before
gimp-before
After
gimp-after
Difference
gimp-difference

@pnorman
Copy link
Collaborator

pnorman commented Sep 30, 2017

If doing this then the custom indexes will need adjusting.

My recollection was that I did see differences with a 1px filter, but that was a long time ago when I tested.

@kocio-pl
Copy link
Collaborator Author

If doing this then the custom indexes will need adjusting.

I'm not aware of this problem. I don't see any specific value in indexes.sql/indexes.yml except way_area > 59750 for planet_osm_polygon_way_area_z6 - or you mean something different?

My recollection was that I did see differences with a 1px filter, but that was a long time ago when I tested.

This query change means that we just don't want very small areas to have any impact on the rendering, but it's not distorting anything. In my opinion any object ~1 pixel big is still insignificant for the whole image. In some cases it might be even better to hide them, but it's rather styling problem than database filtering.

@aceman444
Copy link

What if somebody makes a farmland via small polygons representing each parcel. Would this be still rendered with the farmland colour before the patch and be completely missing after the patch?

@kocio-pl
Copy link
Collaborator Author

It depends on how small they are.

If they are <~1 px on zoom level n, they will just not be rendered on this level, but will be visible on n+1 and z19 should be more than enough for any area, no matter how small.

@kocio-pl
Copy link
Collaborator Author

kocio-pl commented Sep 30, 2017

@rrzefox Could you also apply this patch on your server?

It's just about being on the safe side, because we already know that:

  • changes are small and show up only with some minor objects, as tested with the water, but they don't break anything (as expected with just filtering out small objects, and we do it already in some places)
  • the code says we're getting less objects from these queries, we just don't know exact numbers
  • I have experienced that memory consumption in worst case with just a water was significantly lower, like 25+% (~6 GB max instead of 6GB+2GB of swap - possibly more, because I run out of swap), but that was using Kosmtik, not a renderd

So we know that we should end up using less resources, but we just don't know to what degree. My take is that if there won't be bad rendering changes (just less of some small objects is not bad in itself), it's worth to merge this, even just to spare some memory.

@kocio-pl kocio-pl changed the title Dropping subpixel accuracy for areas [WIP] Dropping subpixel accuracy for areas Sep 30, 2017
@pnorman
Copy link
Collaborator

pnorman commented Sep 30, 2017

If they are <~1 px on zoom level n, they will just not be rendered on this level, but will be visible on n+1 and z19 should be more than enough for any area, no matter how small.

No, because a large area mapped with blocks of 0.5 px would disappear completely. This isn't an issue with 0.01px because by the time that happens, roundoff errors will have overwhelmed anything else.

@kocio-pl
Copy link
Collaborator Author

This was a symbol to show areas "smaller but near 1 px". The rest will be just visible on n+2 - and so on. We don't even know how much such small areas change the rendering when there's more of them.

The real question is however not "how Mapnik works with a lot of subpixel areas", but "do you know examples where this could be a problem"?

@pnorman
Copy link
Collaborator

pnorman commented Sep 30, 2017

I'm not aware of this problem. I don't see any specific value in indexes.sql/indexes.yml except way_area > 59750 for planet_osm_polygon_way_area_z6 - or you mean something different?

That value is based on current queries.

This was a symbol to show areas "smaller but near 1 px". The rest will be just visible on n+2 - and so on.

I'm not sure what the relevance of zooming in is.

We don't even know how much such small areas change the rendering when there's more of them.

Yes we do - unless you start to get so many areas there's a roundoff error, it's the same as if they were joined.

Copy link
Collaborator

@pnorman pnorman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not convinced that going this far is safe, and would like to see the difference in a region with lots of almost 1px areas before having confidence it's okay.

The way area partial indexes will need adjusting to any change.

@kocio-pl
Copy link
Collaborator Author

That value is based on current queries.

What should it be after this PR? I will change it in the code.

I'm not sure what the relevance of zooming in is.

It's good to remember that this style has not just one scale, as on the paper. It means that we don't "drop" any data, we just show them later.

@pnorman
Copy link
Collaborator

pnorman commented Sep 30, 2017

What should it be after this PR? I will change it in the code.

It should be about 0.99 of the cutoff threshold at some zoom, rounded down. This makes sure that the partial condition closely matches the queries. As for what zoom, it probably won't be z6 anymore. I looked at index size vs zoom and made a call on when the tradeoffs were best. I'd guess the area from z7-z9 is more appropriate with these changes.

It's good to remember that this style has not just one scale, as on the paper. It means that we don't "drop" any data, we just show them later.

Using the example of water, the problem isn't deciding not to render very small lakes. The problem is having gaps appear in rivers where the individual polygons are under the threshold but the overall size is large.

@kocio-pl
Copy link
Collaborator Author

OK, so either I need some numbers for custom indexes or you can make additional patch if this PR will be merged - I leave it to you.

Using the example of water, the problem isn't deciding not to render very small lakes. The problem is having gaps appear in rivers where the individual polygons are under the threshold but the overall size is large.

Thanks, now I see. For complex objects made of smaller parts - like rivers - I would rely on relations, not individual water areas. Using any threshold is just guessing (probability of error) and safeguarding (eating resources to correct possible errors). Does it make sense or am I missing something?

@pnorman
Copy link
Collaborator

pnorman commented Oct 1, 2017

Thanks, now I see. For complex objects made of smaller parts - like rivers - I would rely on relations, not individual water areas. Using any threshold is just guessing (probability of error) and safeguarding (eating resources to correct possible errors). Does it make sense or am I missing something?

No, relations are not a solution. Water is a bit different because it has a name, but for forests, someone shouldn't make a relation unless there's an inner hole or some other reason.

@pnorman
Copy link
Collaborator

pnorman commented Oct 1, 2017

OK, so either I need some numbers for custom indexes or you can make additional patch if this PR will be merged - I leave it to you.

What are the area values at different zooms, calculated as above?

@kocio-pl
Copy link
Collaborator Author

kocio-pl commented Oct 1, 2017

No, relations are not a solution. Water is a bit different because it has a name, but for forests, someone shouldn't make a relation unless there's an inner hole or some other reason.

For me there's a big difference between a river and small lakes/forests/buildings etc. I'm not worried about small lakes or forests - they don't need to be shown at a certain level, because they don't need to be continuous.

The river is different when drawn as a water areas - they should be collected in a relation anyway. The same is true for natural reserves - they are also tagged this way when they consist of multiple areas. So I guess we are safe already. Are there any landuses or buildings that should be rendered continuously?

What are the area values at different zooms, calculated as above?

I'm not sure how to apply the rule you gave and get the numbers.

@matthijsmelissen
Copy link
Collaborator

So in two areas we see significant differences:

  • At this cemetery each plot is tagged as a separate cemetery. Therefore at lower zoom levels the plots/cemeteries fall under 0,01 pixel and are no longer rendered. To me this is clearly wrong tagging.
  • Here in Ruda every piece of grass is mapped separately. Also these pieces fall under the 0,01 pixel limit at low zoom levels, leading the area to be rendered as residential rather than grass in the new rendering.

To me all these effects are rather small, and remove a lot of noise (and blurry/mixed colors) from the map. So I'm in favour of merging this as it looks now (but haven't tested it myself on Luxembourg yet).

Note also that if we ever want to move to vector maps, probably we will need to do something similar.

By the way, I would also be interested, just for fun, to see how the map change if we increase the pixel limit even further.

@kocio-pl
Copy link
Collaborator Author

kocio-pl commented Oct 3, 2017

The cemetery tagging is just being overzealous - cemetery=* key does not require amenity=cemetery.

@kocio-pl
Copy link
Collaborator Author

kocio-pl commented Oct 3, 2017

By the way, I would also be interested, just for fun, to see how the map change if we increase the pixel limit even further.

Challenge accepted! Poland, z8 (click to view pictures in a full scale):

0.01
001

1
1

5
5

10
10

Difference 0.01-10:
difference

@kocio-pl kocio-pl mentioned this pull request Oct 3, 2017
@rrzefox
Copy link

rrzefox commented Oct 3, 2017

@rrzefox Could you also apply this patch on your server?

this has been applied two days ago and all tiles should have been rerendered by now.

@matthijsmelissen
Copy link
Collaborator

this has been applied two days ago and all tiles should have been rerendered by now.

Great, thanks a lot for your help!

@matthijsmelissen
Copy link
Collaborator

This PR makes this forest disappear from z7.

@matthijsmelissen
Copy link
Collaborator

@matkoniecz Thanks, moving the cut-off from 1 to 5 pixels clearly does not work. Many forests in Poland seem to have been important in small blocks, such as this one: https://www.openstreetmap.org/#map=13/52.7647/16.5602

@kocio-pl
Copy link
Collaborator Author

kocio-pl commented Oct 3, 2017

@rrzefox: I'm very glad of your help with testing medium and low zoom levels! It's important tool, since it's a risky ground that is hard to explore on our home environments (for database size and performance reasons). It's also a lot of fun for me to discover visual differences live!

@kocio-pl
Copy link
Collaborator Author

kocio-pl commented Oct 3, 2017

It has probably nothing to do with pixel accuracy, but do you have any idea why The Cotswolds national park (GB) disappears?

@kocio-pl
Copy link
Collaborator Author

kocio-pl commented May 9, 2018

I'm not a database guy, but maybe you know how to tune the custom indexes according to what Paul proposed:

It should be about 0.99 of the cutoff threshold at some zoom, rounded down.

@matthijsmelissen
Copy link
Collaborator

No, I have no idea.

@pnorman
Copy link
Collaborator

pnorman commented May 9, 2018

I'm not having much luck explaining, so here are the calculations

Our current index is designed for a cutoff of 0.01 of a pixel at z6 and below. The web mercator plane is 40075342.49 mercator meters in each direction, so the pixel width and height is 40075342.49/256*2^-z, giving a !pixel_width! and !pixel_height! of 2446.00479. This makes the planner evaluate the condition way_area > 0.01*!pixel_width!::real*!pixel_height!::real to be way_area > 59829.3943.

Next it looks at how to run the query. One way is using the planet_osm_polygon_index index with the way && !bbox! part of the query to find only geometries that intersect the bounding box, and this index is 33GB. Another one is the planet_osm_polygon_way_area_z6 index, which can be used the same way, but only if the query also matches the condition of way_area > 59750. Since anything that matches way_area > 59829.3943 also matches the index's condition, it can use it. This index is 1284MB.

Because one index is 30 times the size of the other, it expects the planet_osm_polygon_way_area_z6 index to be faster.

The reason we don't make an index for each zoom is threefold. In reverse order of importance, they are:

  • Each index slows down updates. This doesn't matter much for the number of indexes we have, but it means an index which is never used has costs
  • Indexes which are used push other stuff out of ram cache. If you had a a z5 index, it would be 707MB, and all of its contents would be in the z6 index.
  • Index sizes need to differ by about an order of magnitude for a performance difference. Performance is more closely related to depth of index, which is logarithmic with size. If you had a 1GB and a 2GB index, both usable, everything else the same, the planner wouldn't care which was used because they're so close.

I'll run some queries to get index sizes.

@pnorman
Copy link
Collaborator

pnorman commented May 12, 2018

zoom pixel area Area with x1 multiplier Index size (MB) Area with x0.01 multiplier Index size (MB)
5 2.4E+07 23900000 44 239000 628
6 6.0E+06 5980000 90 59800 1275
7 1.5E+06 1490000 218 14900 2312
8 3.7E+05 373000 494 3730 3629
9 9.3E+04 93400 1026 934
10 2.3E+04 23300 1936 233
11 5.8E+03 5840 58
12 1.5E+03 1460 14

@kocio-pl
Copy link
Collaborator Author

How should these indexes be applied now? I thought that maybe in the indexes.yml, but the only area-related number there is here:

CREATE INDEX planet_osm_polygon_way_area_z6
ON planet_osm_polygon USING GIST (way)
WHERE way_area > 59750;

However 59800 * 0.99 = 59202, so (rounding down not needed) it's not the same as 59750. But maybe this was calculated in a different way and we should now replace it with 5920200?

Sorry, it's not that you do something wrong with explaining, I'm just not familiar with database indexing.

@matthijsmelissen
Copy link
Collaborator

However 59800 * 0.99 = 59202, so (rounding down not needed) it's not the same as 59750. But maybe this was calculated in a different way and we should now replace it with 5920200?

This was explained by @pnorman above: if we have an index for way_area > 59750 we can quickly find all ways larger than 59750, and this is useful even if we actually want to find all ways larger than 59202 (we first use the index to quickly obtain all ways larger than 59750, and then only for these we need to check one by one if they are larger than 59202).

@kocio-pl
Copy link
Collaborator Author

I admit I still don't get all the details, but I'm not interested too much in the inner details of a database. I just like to know what number should I put and where to make indexing aligned to a new accuracy?

@kocio-pl
Copy link
Collaborator Author

@pnorman Would this change be proper or it should be something else:

WHERE way_area > 5920200

@pnorman
Copy link
Collaborator

pnorman commented May 17, 2018

It might not have been clear, but the values in my table are rounded appropriately, and not the precise values that appear in the query conditions.

@pnorman Would this change be proper or it should be something else:

It depends what we're doing. I suspect an index for z6 and below and for z10 and below would be best. But I don't know, and that's the type of consideration that needs to go into selecting a different area cutoff threshold for performance reasons.

@kocio-pl
Copy link
Collaborator Author

You have a great experience at database things, especially performance related, and I ultimately trust your choices regarding this. Given that indexes for z6+ and z10+ sound good for you, I believe we can go for that. How should they be implemented in the code?

@pnorman
Copy link
Collaborator

pnorman commented May 17, 2018

Have two indexes, one for z6 with the 5980000 limit, one for z10 with the 23300 limit.

@kocio-pl
Copy link
Collaborator Author

Thanks!

Should the code look like simply this?:

 CREATE INDEX planet_osm_polygon_way_area_z6 
   ON planet_osm_polygon USING GIST (way) 
   WHERE way_area > 5980000; 

 CREATE INDEX planet_osm_polygon_way_area_z10 
   ON planet_osm_polygon USING GIST (way) 
   WHERE way_area > 23300; 

@matthijsmelissen
Copy link
Collaborator

Yes, that's correct.

@kocio-pl kocio-pl merged commit 41939bc into gravitystorm:master May 17, 2018
@kocio-pl kocio-pl deleted the subpixel-performance branch May 17, 2018 22:26
@kocio-pl kocio-pl mentioned this pull request May 23, 2018
Phyks added a commit to cyclosm/cyclosm-cartocss-style that referenced this pull request Mar 4, 2023
Z9 was quite anomaly in terms of rendering time. It was taking much
longer than Z8 and Z10 during pre-computation.

Dominant layer was landuse_gen0. It was using subpixel rendering for
landuse, which is useful at very low zoom but seems to be safely
discardable at Z9.

See as well
gravitystorm/openstreetmap-carto#2874 and
gravitystorm/openstreetmap-carto#3458 (comment).

See #644.
Phyks added a commit to cyclosm/cyclosm-cartocss-style that referenced this pull request Mar 4, 2023
Z9 was quite anomaly in terms of rendering time. It was taking much
longer than Z8 and Z10 during pre-computation.

Dominant layer was landuse_gen0. It was using subpixel rendering for
landuse, which is useful at very low zoom but seems to be safely
discardable at Z9.

Also add checks between way geometries and bbox to ensure correct usage of geospatial indexes in db.

See as well
gravitystorm/openstreetmap-carto#2874 and
gravitystorm/openstreetmap-carto#3458 (comment).

See #644.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants