Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

misc: reduce memory usage of performance recalculation script #412

Open
TrueRou opened this issue Feb 23, 2023 · 6 comments
Open

misc: reduce memory usage of performance recalculation script #412

TrueRou opened this issue Feb 23, 2023 · 6 comments
Labels
performance Improvements to resource usage without changing functionality Priority 2 A typical task - new functionality, organizational work on our software

Comments

@TrueRou
Copy link
Contributor

TrueRou commented Feb 23, 2023

Cache every map during recalculation may cause memory lack on low-end server if there are too many maps (may eat many gigs of ram). It's better to make cache optional which can easily enable or disable by providing args (--use-cache or other something)

@TrueRou TrueRou added the triage This issue or pull request needs sorting. label Feb 23, 2023
@tsunyoku
Copy link
Contributor

maybe too high effort solution but we could use a cache which keeps the most frequently accessed maps and evicts the others to keep the memory down

@tsunyoku tsunyoku added the performance Improvements to resource usage without changing functionality label Feb 23, 2023
@minisbett
Copy link
Contributor

maybe instead of recalculating pp on a score id going upwards basis calculate it per map? so it goes through all maps in the maps table and recalcs all scores for that particular map?

@TrueRou
Copy link
Contributor Author

TrueRou commented Feb 24, 2023

maybe instead of recalculating pp on a score id going upwards basis calculate it per map? so it goes through all maps in the maps table and recalcs all scores for that particular map?

I agree with that, it just like the old way we calculate by command.

@cmyui
Copy link
Member

cmyui commented Feb 24, 2023

is this query the problem?

bancho.py/tools/recalc.py

Lines 183 to 190 in 701c462

scores = [
dict(row)
for row in await ctx.database.fetch_all(
"SELECT scores.id, scores.mode, scores.mods, scores.acc, nmiss, scores.max_combo, scores.map_md5, scores.pp, maps.id as map_id FROM scores INNER JOIN maps ON scores.map_md5 = maps.md5 "
"WHERE scores.status = 2 AND scores.mode = :mode ORDER BY scores.pp DESC",
{"mode": mode},
)
]

looks like it's only chunked for processing once in memory which is not particularly useful, it should be pulled from sql in chunks to keep memory usage down

@cmyui cmyui changed the title misc: optional cache when recalculation misc: reduce memory usage of performance recalculation script Feb 24, 2023
@minisbett
Copy link
Contributor

The problem is that everytime a map gets recalced the whole map needs to be read by rosu-pp and therefore loaded into ram I think

@TrueRou
Copy link
Contributor Author

TrueRou commented Feb 25, 2023

is this query the problem?

bancho.py/tools/recalc.py

Lines 183 to 190 in 701c462

scores = [
dict(row)
for row in await ctx.database.fetch_all(
"SELECT scores.id, scores.mode, scores.mods, scores.acc, nmiss, scores.max_combo, scores.map_md5, scores.pp, maps.id as map_id FROM scores INNER JOIN maps ON scores.map_md5 = maps.md5 "
"WHERE scores.status = 2 AND scores.mode = :mode ORDER BY scores.pp DESC",
{"mode": mode},
)
]

looks like it's only chunked for processing once in memory which is not particularly useful, it should be pulled from sql in chunks to keep memory usage down

This won't take too much because millions of scores will only cost about 100 mb. It's worthwhile to cost memory in enchange of some io performance

@NiceAesth NiceAesth added Priority 2 A typical task - new functionality, organizational work on our software and removed triage This issue or pull request needs sorting. labels Jul 19, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Improvements to resource usage without changing functionality Priority 2 A typical task - new functionality, organizational work on our software
Projects
None yet
Development

No branches or pull requests

5 participants