Develop Pre-dissolved Basins Layer for Faster Delineations #9

ptomasula · 2024-07-23T14:11:52Z

Summary

During the development of dissolve logic under #7, it was discovered that dissolving modestly large watersheds is resource intensive and non-performant. In a test on decent hardware, combining ~850 basins, the run time was ~12 seconds (this is just the dissolve operation and excludes overhead like loading the file, subsetting, etc.).

One way to mitigate this performance issue would be to develop a layer of pre-dissolved polygons that represent chunks of upstream watershed. When a dissolve operation needs to be performed on a large watershed, these pre-dissolved polygons can be substituted for n polygons that they represent. This will drastically reduce the number total number of polygons for the dissolve of larger watersheds, thereby increasing performance.

Closure Criteria

Logic is developed for pre-dissolving polygons into meaningful chunks.
Optimization for chunking has been tested and implemented.
Additional pre-dissolved basin layers have been developed and exported to compressed geoparquet.
Layers have been uploaded to sharing location and are available for development of dissolve logic.

ptomasula · 2024-07-23T17:27:33Z

I performed some initial testing to determine an appropriate upstream subshed threshold before performing a pre-dissolve. The goal is to identify a value (n) when reaches with subsheds less than or equal to the threshold are pre-dissolved.

Threshold in this case refers to a count of subshed polygons upstream, where a value less than or equal to the threshold are pre-dissolved.

One import note for interpreting results is that testing was conducted at the root of the watershed. For results with larger thresholds and smaller watersheds the pre-dissolve logic would have resulted in a entire watershed being predissolved. This makes the results for those scenarios look extremely performant, but the reality is different. Had we tested as just one reach upstream from root, the pre-dissolve logic would not have been used at all, and the results would be closer to the control scenario.

Threshold	857 subsheds	855 subsheds	689 subsheds	443 subsheds	263 subsheds	159 subsheds	131 subsheds	93 subsheds	57 subsheds	37 subsheds
No Per-dissolve (Control)	11.1982	10.7074	8.6145	4.9172	2.8233	1.2009	1.1691	1.0206	0.4646	0.3533
5	9.3891	9.7361	7.4810	4.6031	2.5932	1.2613	1.2640	0.9707	0.4871	0.3199
10	8.0661	8.2562	6.5070	3.8708	2.1506	0.9305	0.9579	0.7909	0.4117	0.2047
25	5.5936	5.4039	4.5478	2.7599	1.3417	0.6659	0.6535	0.4152	0.2134	0.1485
50	4.2615	5.2853	3.9101	2.6000	0.9594	0.6750	0.5557	0.2917	0.0716	0.0057
100	3.2576	4.1618	2.2441	2.2099	0.7508	0.4158	0.2937	0.0043	0.0354	0.0033
200	2.8408	2.7946	1.5041	2.0656	0.5336	0.0529	0.1380	0.0036	0.0378	0.0062

ptomasula · 2024-07-23T17:34:40Z

I performed some initial testing to determine an appropriate upstream subshed threshold before performing a pre-dissolve. The goal is to identify a value (n) when reaches with subsheds less than or equal to the threshold are pre-dissolved.

From these initial results I think a threshold of around 100 seems like a reasonable setting, though I should also run this again with a 200 threshold. Looks at the control (no pre-dissolve) case it seem like 263 is a reasonable runtime and then 443 starts to get into too slow territory. I think any higher than 200 and we run the risk of selecting having reaches upstream of a pre-dissolve node with sufficiently large subshed counts as to impact delineation performance.

ptomasula · 2024-07-23T17:43:05Z

I added in results for a threshold of 200. The performance does seem reasonably worth bump up to processing threshold to 200. Noting the the basing line performance for 200 polygons (without pre-dissolve) is reasonable.

aufdenkampe · 2024-07-23T19:31:56Z

@ptomasula, Thanks for sharing all these results ! They're great to see.
How are you selecting which subsheds to pre-dissolve? Also, I'm wondering how pre-simplifying the pre-dissolved boundaries might further speed things up. My thinking is that fine resolution boundaries are no longer necessary once you move to larger watersheds.

Related issue #9 This adds a function to leverage the MNSI information and group upstream basins into meaningful groups that can be pre-dissolved. Pre-dissolving will allow for less total in the final dissolve.

@ptomasula

@ptomasula, I figured out, fixed, and tested the issue with the batch pipeline. The short story is that we used `compute_dissolve_groups()` in the wrong sequence of the workflow. I also added a few other fixes, such as for dtypes and adding ELEMENT_COUNT back to the output fields.

aufdenkampe · 2024-09-11T22:57:51Z

@ptomasula, I figured out, fixed, and tested the issue with the batch pipeline in 3d441b5 (see commit notes).

Try running it through our files our modeling computer!

ptomasula added the enhancement New feature or request label Jul 23, 2024

ptomasula self-assigned this Jul 23, 2024

aufdenkampe mentioned this issue Jul 29, 2024

Preprocessing pipeline for TDX Hydro Files #8

Open

8 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Develop Pre-dissolved Basins Layer for Faster Delineations #9

Develop Pre-dissolved Basins Layer for Faster Delineations #9

ptomasula commented Jul 23, 2024 •

edited

Loading

ptomasula commented Jul 23, 2024 •

edited

Loading

ptomasula commented Jul 23, 2024

ptomasula commented Jul 23, 2024

aufdenkampe commented Jul 23, 2024

aufdenkampe commented Sep 11, 2024

Develop Pre-dissolved Basins Layer for Faster Delineations #9

Develop Pre-dissolved Basins Layer for Faster Delineations #9

Comments

ptomasula commented Jul 23, 2024 • edited Loading

Summary

Closure Criteria

ptomasula commented Jul 23, 2024 • edited Loading

ptomasula commented Jul 23, 2024

ptomasula commented Jul 23, 2024

aufdenkampe commented Jul 23, 2024

aufdenkampe commented Sep 11, 2024

ptomasula commented Jul 23, 2024 •

edited

Loading

ptomasula commented Jul 23, 2024 •

edited

Loading