Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Overlays appear to be overly resource heavy #10688

Open
jamesaslett1985 opened this issue May 22, 2020 · 8 comments
Open

Overlays appear to be overly resource heavy #10688

jamesaslett1985 opened this issue May 22, 2020 · 8 comments

Comments

@jamesaslett1985
Copy link

jamesaslett1985 commented May 22, 2020

Description

On Pi 4 (4GB) when overlays are applied in conjuction with a shader there is noticable frame skip/stuttering, despite the FPS readout showing a steady 60FPS. Has been discussed on forums and a few users have indicated that the overlays appear to be overly resource heavy and are the root cause of the issue (as opposed to shaders). This is with the system resolution set to 1080p and the overlay image size 1080p.

Expected behavior

Image should be fluid with no apparent frame dropping/stuttering.

Steps to reproduce the bug

  1. Ensure Pi 4 resolution set to 1080p
  2. Boot into Retropie/EmulationStation
  3. Open Street of Rage 2 ROM in either lr-genesis-plus-gx or picodrive
  4. Set any 1080p overlay
  5. Set zfast-crt shader

Environment information

  • OS: RetroPie 4.6 (md5: 9154d998cba5219ddf23de46d8845f6c) from official website
@hizzlekizzle
Copy link
Contributor

Do you have threaded video enabled? That option can cause performance drops (usually from shaders, but I guess overlays could potentially cause it, too) to manifest as frame drops/skips while maintaining 60 fps in the frame counter.

@jamesaslett1985
Copy link
Author

jamesaslett1985 commented May 23, 2020

Just tried disabling threaded video and the game was literally unplayable with no audio

@hizzlekizzle
Copy link
Contributor

heh, ok, that's fine. So zfast-crt is full speed at 1080p on its own and the overlay is fine at 1080p on its own, but together they slow it down significantly?

@dankcushions
Copy link
Contributor

davej did a write up on why overlays + shaders can cause slowdown on low-memory-bandwidth devices like Raspberry Pi here: https://retropie.org.uk/forum/topic/11150/720p-or-1080p/15

Some notes on memory usage.

The examples shown are for a 4:3 game displayed on a 1920x1080 screen. The game screen is upscaled to 1440x1080 to keep the aspect ratio the same. Game screens vary in resolution so I've used a rough average.

a) GPU upscales image: GPU reads game image, upscales it and sends it to the display. It can upscale (with linear or nearest filtering) to any supported resolution without extra memory accesses. Relative memory accesses = 1.

b) With overlay: GPU reads game image, combines it with overlay (which it also has to read) and sends it to the display. Needs memory from a) plus size of overlay (overlay is 30 times as big as image from game). Relative memory accesses = 31.

c) Using shader: GPU upscales game image using shader and writes it out to memory. GPU reads upscaled image and sends it to display. Needs memory from a) plus 2 * upscaled image size (upscaled image is 23 times as big as image from game). Relative memory accesses = 47.

d) Shader with overlay: As c) but also has to read the overlay whilst sending image to display. Relative memory accesses = 78.

All Pis have relatively slow memory and the CPU and GPU can end up fighting over access to it. My recommendation is to overclock your memory as fast as it will go whilst still remaining stable.

assuming this is correct, is this some kind of efficiency issue where the entire overlay is being transferred from system memory to the GPU memory every single frame? it would seem like if it is unchanged, it shouldn't need to be constantly transferred, but i have no idea how these things work!

@jamesaslett1985
Copy link
Author

Thanks guys. Based on those findings it does indeed sound as though there are some inefficiencies that could be addressed. But I personally have no idea how!

@hizzlekizzle
Copy link
Contributor

I think what he was saying is that the combined performance hit is just part of the deal.

@dankcushions
Copy link
Contributor

dankcushions commented May 24, 2020

this is interesting: https://stackoverflow.com/a/25498119

seems like if overlays never overlapped the rendered scene, it might be possible to do it faster. however, retroarch overlays can (and often do) overlap, and be partially transparent, etc, so would have to be blended with the rendered scene even if unchanged (unless there was a way to calculate that by looking at the overlay and settings at init).

however, i wonder if that link's alternative approach (glblitframebuffer) could be a benefit to GLES 3.0 devices (such as Pi 4)...

(caveat: still no idea what i'm talking about)

@jamesaslett1985
Copy link
Author

Hmm, interesting. Aren't the screen areas in an overlay essentially just a transparent portion of the .PNG file though? Are we thinking that if we keep it within the transparent area only then it might be faster? Guess that won't work for those shaders that incorporate screen scratches/glare over the transparent area though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants