Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Suggested additional indexes #207

Closed
pnorman opened this issue Oct 1, 2013 · 19 comments
Closed

Suggested additional indexes #207

pnorman opened this issue Oct 1, 2013 · 19 comments

Comments

@pnorman
Copy link
Collaborator

pnorman commented Oct 1, 2013

#16 suggests non-standard indexes.

What non-standard indexes should we be suggesting for performance reasons?

CREATE INDEX water_areas_idx ON planet_osm_polygon USING gist (way) 
  WHERE (((waterway IS NOT NULL) 
      OR (landuse = ANY (ARRAY['reservoir'::text, 'water'::text, 'basin'::text]))) 
    OR ("natural" IS NOT NULL));
CREATE INDEX ferry_idx ON planet_osm_line USING gist (way) WHERE (route = 'ferry'::text)
@pnorman
Copy link
Collaborator Author

pnorman commented Oct 3, 2013

I did some testing on the partial gist (way) index for waterways, it decreases total time spent rendering by 10% in a cpu-bound set of 1k metas.

Detailed breakdown:
partial water gist

@pnorman
Copy link
Collaborator Author

pnorman commented Oct 4, 2013

I'm going to start a PR adding a SQL file which will add suggested indexes, but I completed some benchmarking to recommend a new index.

I tried this index

CREATE INDEX "planet_osm_polygon_nobuilding_index" 
  ON "planet_osm_polygon" 
  USING gist ("way") 
  WHERE "building" IS NULL;

I proceeded to benchmark with 1000 meta tiles taken from yevaud's log and got a reduction from 504.5 seconds to 453.8 seconds (11.8% speed increase). For reference, the partial index for water areas was 453.9 seconds (11.5% increase). Both indexes combined offer reduce the time to 437.3 seconds (16% increase).

planet_osm_polygon_nobuilding_index is 2.2GB, water_areas_idx is .7GB and planet_osm_polygon_index is 11GB. Sizes may differ on your database from using a different date planet, index bloat, or automatic use of fillfactor=100 on imports with --drop which are static.

It is likely that there are additional queries where a building IS NULL condition can be added, which would further increase speed gains.

I performed some additional analysis by zoom. Both indexes decrease rendering time at all zooms by at least 2%.
indexes1

Edit: Because of the simplicity of this index's WHERE condition and the significant performance gains, I'm going to be including it in my baseline for further testing.

Additional disclaimer: work_mem was its default of 1MB during this testing, which is sub-optimal, but impacted all four sets of data.

@dieterdreist
Copy link

Paul, that's interesting research. Do you know how much additional indexes decrease diff update time and if they degrade after a while when applying diffs?

@pnorman
Copy link
Collaborator Author

pnorman commented Oct 4, 2013

Paul, that's interesting research. Do you know how much additional indexes decrease diff update time

No. They obviously do some amount - all indexes increase insert time, including the standard geometry ones. Unfortunately, the test sequence for this is load the full planet data, build indexes, time applying a large set of diffs, repeat, then do the same for a different set of indexes. I do not expect these would have a significant negative impact on update time, and for a server like the tile.osm.org ones where rendering is most of the load, it should be hugely helpful.

and if they degrade after a while when applying diffs?

All indexes do - it's essential to REINDEX indexes when they bloat, or to do a concurrent rebuild. These should bloat no faster than the other gist indexes. That being said, reindexing is something that I do every 3 months or so on my database, and probably don't even need to do that often. GIN indexes on hstores do bloat significantly faster.

See postgresql index storage parameters for more information.

@jburgess777
Copy link

Did you try the ferry index too?

These are the index commands I used when the DB was reloaded recently on Yevaud. They were first derived a couple of years ago and may no longer match the queries generated by the Carto style but they did seem to reduce the database disk IO.

gis=# CREATE INDEX ferry_idx ON planet_osm_line USING gist (way) 
WHERE (route = 'ferry'::text);
CREATE INDEX
Time: 212686.663 ms

gis=# CREATE INDEX water_lines_idx ON planet_osm_line USING gist (way) 
WHERE (waterway = 'river'::text);
CREATE INDEX
Time: 245527.051 ms

gis=# CREATE INDEX national_park_idx ON planet_osm_polygon USING gist (way) 
WHERE (boundary = 'national_park'::text);
CREATE INDEX
Time: 200265.090 ms

gis=# CREATE INDEX water_areas_idx ON planet_osm_polygon USING gist (way) 
WHERE (((waterway IS NOT NULL) 
OR (landuse = ANY (ARRAY['reservoir'::text, 'water'::text, 'basin'::text]))) 
OR ("natural" IS NOT NULL));
CREATE INDEX
Time: 362130.281 ms

gis=# CREATE INDEX leisure_polygon_idx ON planet_osm_polygon USING gist (way) 
WHERE (((((((((landuse IS NOT NULL) 
OR (leisure IS NOT NULL)) 
OR (aeroway IS NOT NULL)) 
OR (amenity IS NOT NULL)) 
OR (military IS NOT NULL)) 
OR ("natural" IS NOT NULL)) 
OR (power IS NOT NULL)) 
OR (tourism IS NOT NULL)) 
OR (highway IS NOT NULL));
CREATE INDEX
Time: 692723.212 ms

gis=# analyze planet_osm_polygon;
ANALYZE
Time: 3631.508 ms

gis=# ANALYZE planet_osm_line;
ANALYZE
Time: 9027.941 ms
gis=# 

@pnorman
Copy link
Collaborator Author

pnorman commented Oct 4, 2013

Did you try the ferry index too?

Working my way there :)

These are the index commands I used when the DB was reloaded recently on Yevaud. They were first derived a couple of years ago and may no longer match the queries generated by the Carto style but they did seem to reduce the database disk IO

I'm hoping that the building IS NULL index can replace some of these, perhaps with the addition of btree indexes on columns, which have shown good results in some cases.

Can you check the number of scans with SELECT relname, indexrelname, idx_scan, idx_tup_read, idx_tup_fetch FROM pg_stat_all_indexes WHERE schemaname = 'public';?

@jburgess777
Copy link

I did spend a while thinking about using 'building is null' once before. I think I came to the conclusion that it would not be useful on its own.

  • It isn't selective enough to be useful as a partial index. Unless you can get the amount of data selected to less than say 10% then chances are you will end up fetching most pages from disk even if you only read one or two rows from each. This is where proper partitioning would be much more effective at reducing disk IO than a partial index since you don't have to worry about the data being fragmented.
  • The possibility that a building might have other tags which need to be seen by other non-building queries. To solve this the data would would either have to be duplicated into building and non-building tables, or the data would go into the table for data with non-building tags and the building queries would need to fetch from both tables. Either way this probably requires partitioning support which does not exist in osm2pgsql today. I don't know how many buildings fall into this category of having non-building tags which need rendering by other rules, perhaps the number is too small to worry about.
gis=# SELECT relname, indexrelname, idx_scan, idx_tup_read, idx_tup_fetch FROM pg_stat_all_indexes WHERE 
schemaname = 'public';
      relname       |       indexrelname       |  idx_scan  | idx_tup_read  | idx_tup_fetch 
--------------------+--------------------------+------------+---------------+---------------
 spatial_ref_sys    | spatial_ref_sys_pkey     |         58 |         42451 |            34
 planet_osm_nodes   | planet_osm_nodes_pkey    |          0 |             0 |             0
 planet_osm_rels    | planet_osm_rels_parts    |  113021697 |      26921754 |             0
 planet_osm_rels    | planet_osm_rels_idx      |      79139 |       2323268 |       1039537
 planet_osm_rels    | planet_osm_rels_pkey     |    2503925 |       3850694 |       2385537
 planet_osm_ways    | planet_osm_ways_nodes    |   88829336 |      49482411 |             0
 planet_osm_ways    | planet_osm_ways_idx      |      78981 |     139448491 |       9345884
 planet_osm_ways    | planet_osm_ways_pkey     |  478998833 |     483055856 |     470937294
 planet_osm_roads   | planet_osm_roads_pkey    |   26146323 |      11303403 |       2586441
 planet_osm_roads   | planet_osm_roads_index   |   81408599 |   14710768229 |    2301551862
 planet_osm_point   | planet_osm_point_pkey    |   97114726 |       1498889 |       1356590
 planet_osm_point   | planet_osm_point_index   |  780215985 |  293482811310 |   42034817139
 planet_osm_line    | water_lines_idx          |     220329 |      18794281 |       9230207
 planet_osm_line    | ferry_idx                |   19912711 |       8027972 |       6257047
 planet_osm_line    | planet_osm_line_pkey     |   26146330 |      47317105 |      26643291
 planet_osm_line    | planet_osm_line_index    | 1244288415 | 1209981428900 |  211810654512
 planet_osm_polygon | leisure_polygon_idx      |  117129517 |   29672063052 |    7492540844
 planet_osm_polygon | water_areas_idx          |  118773853 |    9762978659 |    2286235650
 planet_osm_polygon | national_park_idx        |   20787338 |       6961639 |       3075306
 planet_osm_polygon | planet_osm_polygon_pkey  |   26146335 |      27158346 |       4420594
 planet_osm_polygon | planet_osm_polygon_index |  830025980 |  879337830585 |  136794125324
(21 rows)

@pnorman
Copy link
Collaborator Author

pnorman commented Oct 5, 2013

I did spend a while thinking about using 'building is null' once before. I think I came to the conclusion that it would not be useful on its own.

Well, I was unsure, that's why I ran the tests, which showed the noticeable improvement.

For reference, the 5 layers with building is null in the SQL are water-areas, water-areas-overlay, glaciers-text, landuse-overlay, and national-park-boundaries. There are probably more queries where it can be added.

@pnorman
Copy link
Collaborator Author

pnorman commented Oct 7, 2013

Indexes ruled out: Anything general on boundary, anything on boundary on the roads table, partial gist on man_made=cutline.

Giving postgres enough work_mem to perform all sorts in memory and to not need to compress bitmaps from bitmap index scans really helps performance and also makes it hard to get additional gains from indexes.

@rny
Copy link

rny commented Sep 8, 2014

Ferry is the top time-consuming SQL for me.

BEFORE

gis=> EXPLAIN ANALYZE SELECT ST_AsBinary("way") AS geom FROM (select way from planet_osm_line where route='ferry' ) as ferry_routes WHERE "way" && ST_SetSRID('BOX3D(13130046.97072148 18765596.19213398,13482268.79705977 19117818.01847227)'::box3d, 900913);
                                                                                                                           QUERY PLAN                                         
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Seq Scan on planet_osm_line  (cost=0.00..5065485.38 rows=95738 width=32) (actual time=70068.252..70068.252 rows=0 loops=1)
   Filter: ((way && '010300002031BF0D0001000000050000007E2610DF270B694115FB12C371E571417E2610DF270B694194A94BA06F3B72417E83819923B7694194A94BA06F3B72417E83819923B7694115FB12C371E571417E2610DF270B694115FB12C371E57141'::geometry) AND (route = 'ferry'::text))
   Rows Removed by Filter: 95737936
 Total runtime: 70068.283 ms
(4 rows)

AFTER

CREATE INDEX ferry_idx ON planet_osm_line USING gist (way)
WHERE (route = 'ferry'::text);


gis=> EXPLAIN ANALYZE SELECT ST_AsBinary("way") AS geom FROM (select way from planet_osm_line where route='ferry' ) as ferry_routes WHERE "way" && ST_SetSRID('BOX3D(13130046.97072148 18765596.19213398,13482268.79705977 19117818.01847227)'::box3d, 900913)
gis-> ;
                                                                                                                              QUERY PLAN                                      
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on planet_osm_line  (cost=544.21..334446.19 rows=95738 width=32) (actual time=0.035..0.035 rows=0 loops=1)
   Recheck Cond: ((way && '010300002031BF0D0001000000050000007E2610DF270B694115FB12C371E571417E2610DF270B694194A94BA06F3B72417E83819923B7694194A94BA06F3B72417E83819923B7694115FB12C371E571417E2610DF270B694115FB12C371E57141'::geometry) AND (route = 'ferry'::text))
   ->  Bitmap Index Scan on ferry_idx  (cost=0.00..520.27 rows=95738 width=0) (actual time=0.030..0.030 rows=0 loops=1)
         Index Cond: (way && '010300002031BF0D0001000000050000007E2610DF270B694115FB12C371E571417E2610DF270B694194A94BA06F3B72417E83819923B7694194A94BA06F3B72417E83819923B7694115FB12C371E571417E2610DF270B694115FB12C371E57141'::geometry)
 Total runtime: 0.124 ms
(5 rows)

@mrwojo
Copy link
Contributor

mrwojo commented Sep 8, 2014

Seq Scan on planet_osm_line

That should be doing Index Scan using planet_osm_line_index. osm2pgsql should've created geometry indexes for each of the tables.

@pnorman
Copy link
Collaborator Author

pnorman commented Sep 8, 2014

That should be doing Index Scan using planet_osm_line_index

It depends entirely on the zoom. But yes, at this zoom, it should be index scanning. If the index exists, you're getting a horrible query plan, either from bad statistics or cost parameters.

@rny
Copy link

rny commented Sep 9, 2014

the index does exist, but I don't why postgres does not use it.

this is the index before I created a ferry one

CREATE INDEX planet_osm_line_index
  ON planet_osm_line
  USING gist
  (way);

I am using ubuntu 14.04, everything from the packages, and imported the whole planet using

osm2pgsql -d gis ~/path/to/data.osm.pbf --style openstreetmap-carto.style

@rny
Copy link

rny commented Sep 10, 2014

I found a similar issue

gis=> EXPLAIN ANALYZE SELECT ST_AsBinary("way") AS geom,"barrier" FROM (select way,barrier from planet_osm_polygon where barrier is not null) as area_barriers WHERE "way" && ST_SetSRID('BOX3D(-19940891.93904749 -6576030.417433815,-19918878.07490134 -6554016.553287672)'::box3d, 900913);
                                                                                                                          QUERY PLAN                                                                        
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Seq Scan on planet_osm_polygon  (cost=0.00..6879888.83 rows=29933854 width=64) (actual time=52122.745..52122.745 rows=0 loops=1)
   Filter: ((barrier IS NOT NULL) AND (way && '010300002031BF0D000100000005000000A95606BF610473C1523CB79AE71559C1A95606BF610473C1B2106923680059C1BFCB32E101FF72C1B2106923680059C1BFCB32E101FF72C1523CB79AE71559C1A95606BF610473C1523CB79AE71559C1'::geometry))
   Rows Removed by Filter: 150421368
 Total runtime: 52122.775 ms
(4 rows)

gis=>
gis=> set enable_seqscan = off;
SET
gis=> EXPLAIN ANALYZE SELECT ST_AsBinary("way") AS geom,"barrier" FROM (select way,barrier from planet_osm_polygon where barrier is not null) as area_barriers WHERE "way" && ST_SetSRID('BOX3D(-19940891.93904749 -6576030.417433815,-19918878.07490134 -6554016.553287672)'::box3d, 900913);
                                                                                                                 QUERY PLAN                                                                                 
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on planet_osm_polygon  (cost=1534352.07..6910027.15 rows=29933854 width=64) (actual time=0.070..0.070 rows=0 loops=1)
   Recheck Cond: (way && '010300002031BF0D000100000005000000A95606BF610473C1523CB79AE71559C1A95606BF610473C1B2106923680059C1BFCB32E101FF72C1B2106923680059C1BFCB32E101FF72C1523CB79AE71559C1A95606BF610473C1523CB79AE71559C1'::geometry)
   Filter: (barrier IS NOT NULL)
   Rows Removed by Filter: 1
   ->  Bitmap Index Scan on planet_osm_polygon_index  (cost=0.00..1526868.61 rows=30084275 width=0) (actual time=0.062..0.062 rows=1 loops=1)
         Index Cond: (way && '010300002031BF0D000100000005000000A95606BF610473C1523CB79AE71559C1A95606BF610473C1B2106923680059C1BFCB32E101FF72C1B2106923680059C1BFCB32E101FF72C1523CB79AE71559C1A95606BF610473C1523CB79AE71559C1'::geometry)
 Total runtime: 0.610 ms
(7 rows)

@rny
Copy link

rny commented Sep 10, 2014

Finally found the reason:

the problem is index not used by query planner, and query planner need table statistics to work, but when autovacuum is off, ANALYZE is NOT run automatically. this is on by default so most people not encounter this problem.

so the solution for slow rendering is:

do a manual ANALYZE

ANALYZE;

everything works.

@pnorman
Copy link
Collaborator Author

pnorman commented Sep 10, 2014

As I indicated above, these don't appear to be issues with the stylesheet, but with your database statistics, or cost parameters in postgresql.conf.

@pnorman
Copy link
Collaborator Author

pnorman commented Aug 17, 2015

For #1736 I loaded data into my testing server, and created some partial indexes, to see what is tested.

The indexes created, paramaterized for furry-sanza are

CREATE INDEX %prefix%_polygon_area_233 ON %prefix%_polygon USING gist (way) WHERE way_area > 233;
CREATE INDEX %prefix%_polygon_area_239000 ON %prefix%_polygon USING gist (way) WHERE way_area > 239000;

CREATE INDEX %prefix%_polygon_name ON %prefix%_polygon USING gist (way) WHERE name IS NOT NULL;
CREATE INDEX %prefix%_polygon_nobuilding ON %prefix%_polygon USING gist (way) WHERE building IS NULL;

CREATE INDEX %prefix%_polygon_military ON %prefix%_polygon USING gist (way) WHERE landuse = 'military';


CREATE INDEX %prefix%_line_ferry ON %prefix%_line USING gist (way) WHERE route = 'ferry';
CREATE INDEX %prefix%_line_river ON %prefix%_line USING gist (way) WHERE waterway = 'river';
CREATE INDEX %prefix%_line_ref ON %prefix%_line USING gist (way) WHERE ref IS NOT NULL;
CREATE INDEX %prefix%_line_name ON %prefix%_line USING gist (way) WHERE name IS NOT NULL;

CREATE INDEX %prefix%_roads_admin_low ON %prefix%_roads USING gist (way) WHERE boundary = 'administrative' AND admin_level IN ('0', '1', '2', '3', '4');
CREATE INDEX %prefix%_roads_admin ON %prefix%_roads USING gist (way) WHERE boundary = 'administrative';

CREATE INDEX %prefix%_roads_road_ref ON %prefix%_roads USING gist (way) WHERE highway IS NOT NULL AND ref IS NOT NULL;

CREATE INDEX %prefix%_point_place_low ON %prefix%_point USING gist (way) WHERE place IN ('city', 'town') AND name IS NOT NULL;
CREATE INDEX %prefix%_point_place ON %prefix%_point USING gist (way) WHERE place IS NOT NULL AND name IS NOT NULL;

CREATE INDEX %prefix%_point_name ON %prefix%_point USING gist (way) WHERE name IS NOT NULL;

@pnorman
Copy link
Collaborator Author

pnorman commented Aug 17, 2015

I pre-rendered in two stages: z0-z6 worldwide and z7-z12 in the US northeast. I captured the index usage stats, took differences, and got the index stats for each stage.

z0-z6

      relname       |         indexrelname          | idx_scan | idx_tup_read | idx_tup_fetch
--------------------+-------------------------------+----------+--------------+---------------
 planet_osm_line    | planet_osm_line_ferry         |       15 |            4 |             4
 planet_osm_line    | planet_osm_line_index         |      145 |       183222 |             0
 planet_osm_line    | planet_osm_line_name          |       14 |         9266 |            92
 planet_osm_line    | planet_osm_line_ref           |        0 |            0 |             0
 planet_osm_line    | planet_osm_line_river         |        6 |           44 |            44
 planet_osm_point   | planet_osm_point_index        |       35 |        14417 |             0
 planet_osm_point   | planet_osm_point_name         |       14 |         1483 |           107
 planet_osm_point   | planet_osm_point_place        |       18 |          144 |           144
 planet_osm_point   | planet_osm_point_place_low    |      621 |        28712 |          2675
 planet_osm_polygon | planet_osm_polygon_area_230   |        4 |         3578 |             0
 planet_osm_polygon | planet_osm_polygon_area_3700  |        0 |            0 |             0
 planet_osm_polygon | planet_osm_polygon_area_59000 |      768 |      1489986 |         20133
 planet_osm_polygon | planet_osm_polygon_index      |       70 |        20010 |           744
 planet_osm_polygon | planet_osm_polygon_military   |       14 |            3 |             3
 planet_osm_polygon | planet_osm_polygon_name       |      371 |      2179021 |         14042
 planet_osm_polygon | planet_osm_polygon_nobuilding |       77 |        30981 |           720
 planet_osm_roads   | planet_osm_roads_admin        |       12 |          187 |            30
 planet_osm_roads   | planet_osm_roads_admin_low    |      339 |        20830 |          4474
 planet_osm_roads   | planet_osm_roads_index        |      320 |      1437295 |          7439
 planet_osm_roads   | planet_osm_roads_road_ref     |       14 |         1113 |             4
 spatial_ref_sys    | spatial_ref_sys_pkey          |        0 |            0 |             0

z7-z12

      relname       |         indexrelname          | idx_scan | idx_tup_read | idx_tup_fetch
--------------------+-------------------------------+----------+--------------+---------------
 planet_osm_line    | planet_osm_line_ferry         |      644 |          859 |           373
 planet_osm_line    | planet_osm_line_index         |     6422 |     73209807 |             0
 planet_osm_line    | planet_osm_line_name          |      622 |      3389041 |          2752
 planet_osm_line    | planet_osm_line_ref           |        0 |            0 |             0
 planet_osm_line    | planet_osm_line_river         |      182 |        15413 |          6282
 planet_osm_point   | planet_osm_point_index        |     1665 |      5437437 |           294
 planet_osm_point   | planet_osm_point_name         |      622 |       541007 |           642
 planet_osm_point   | planet_osm_point_place        |      916 |        59600 |         34086
 planet_osm_point   | planet_osm_point_place_low    |     1922 |        13964 |          5187
 planet_osm_polygon | planet_osm_polygon_area_230   |       74 |      2512684 |             0
 planet_osm_polygon | planet_osm_polygon_area_3700  |       56 |      4653445 |             0
 planet_osm_polygon | planet_osm_polygon_area_59000 |        0 |            0 |             0
 planet_osm_polygon | planet_osm_polygon_index      |     3330 |     33611972 |         38704
 planet_osm_polygon | planet_osm_polygon_military   |      622 |          548 |           331
 planet_osm_polygon | planet_osm_polygon_name       |     1288 |      1543724 |         86461
 planet_osm_polygon | planet_osm_polygon_nobuilding |     3194 |      7345258 |        112205
 planet_osm_roads   | planet_osm_roads_admin        |      585 |        41707 |          2217
 planet_osm_roads   | planet_osm_roads_admin_low    |       59 |         3694 |          1048
 planet_osm_roads   | planet_osm_roads_index        |       22 |       933452 |             0
 planet_osm_roads   | planet_osm_roads_road_ref     |      622 |       294928 |           530
 spatial_ref_sys    | spatial_ref_sys_pkey          |        0 |            0 |             0

@pnorman
Copy link
Collaborator Author

pnorman commented Nov 22, 2015

Updated polygon way area recommendations. It turns out that you need a bigger difference between index size than I realized for it to matter, except for cache contention.

233 is 0.01*0.9999 of a z10 pixel, and z5 for 239000. For the planet I had, the indexes are 10GB and 494 MB. The non-partial index is 20GB.

pnorman added a commit to pnorman/openstreetmap-carto that referenced this issue Mar 30, 2016
These custom indexes should speed up rendering, particularly at low
and middle zooms.

Fixes gravitystorm#207
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants