Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Object detection speed in a high-resolution camera #14

Closed
mmyros opened this issue Apr 15, 2016 · 16 comments
Closed

Object detection speed in a high-resolution camera #14

mmyros opened this issue Apr 15, 2016 · 16 comments

Comments

@mmyros
Copy link
Contributor

mmyros commented Apr 15, 2016

We are getting slow object detection using a 5MP, 2448 x 2048 Point Grey Blackfly camera.
Frameserve is fast, about 22 fps I think, which I set manually as it is the declared capability of the device. MOG slows it down dramatically, and object detection does though not as much as MOG. I did not time it but MOG+detection feels like 1fps, and bsub+detection seems like 5 fps.

I'd prefer to keep the resolution if possible, we ordered a Point Grey for that reason. I am considering keeping real-time detection on a webcam and saving the video from the Point Grey for further off-line analysis.

Thanks!

@jonnew
Copy link
Owner

jonnew commented Apr 15, 2016

Here are my thoughts:

  1. 5 MP is very high resolution for doing RT imaging processing without dedicated hardware, difficult and potentially obfuscating code optimizations, or movement to an domain-specific image processing language (e.g. that compiles to OpenCL or similar). To ease your future pain, I would seriously consider reducing the resolution of the camera unless its absolutely necessary. That said, its worth it to pinpoint exactly what is holding things up and how much. There are definitely portions of Oat that are very slow compared to others.
  2. With respect to MOG background subtraction: this is definitely a component that will benefit greatly from having GPU support enabled. Are you using Oat with a CUDA capable GPU?
  3. That said, on my laptop (which is very nice, I will admit), I am seeing 14.8 FPS with MOG and no GPU support on a 5 MP color image. What does your hardware look like? If you are trying to use an old machine for this processing, its probably not going to cut it. I will try to remember to test this on my GPU-enabled desktop tomorrow to see how good it could be. For a 1 MP image, I saw a 6.2X speedup with the GPU and I expect much better than linear scaling with MP.
  4. More on that last comment: Oat is extremely parallel in nature. Every component is a complete program, each of which can have several threads. For the image processing portions of a detection chain, (e.g. oat-frameserve, oat-framefilt, oat-posidet), its almost worth it to have 1 virtual core per program. In the case of oat-framefilt mog, that pretty much tacked each of my laptop's 4 cores 80% load for the duration of test. If I tried to throw something on top of that, it would be much worse. There are now 12 core CPUs available at a reasonable price and Oat will make full use of them.

@jonnew
Copy link
Owner

jonnew commented Apr 15, 2016

Also, if you want the ultimate in frame rate @ 5 MP, there are options, but they are not going to be super easy:

  • Dedicated hardware (e.g. implement the whole thing on a monster FPGA)
  • Somehow get the whole tracking chain running on the GPU with minimal communication with CPU, (e.g. implementation in https://github.com/eholk/harlan). (that said, at 5MP, I think that the processing is actually holding things up rather than the memcpys)

@jonnew
Copy link
Owner

jonnew commented Apr 15, 2016

OK on a Custom Desktop Intel Core i7-5820K CPU @ 3.30GHz GeForce GTX 970 GPU with CUDA 7.5, I get 108 FPS with oat-framefilt mog on a 5MP image.

However, during the course of testing I uncovered some very strange behavior that is likely related to your problem and that I don't understand yet:

  1. Compiling Oat with optimizations (in release mode), seems to prevent the GPU from working and causes oat-framefilt mog to run on a single thread. This results in like 3 FPS on my desktop...
  2. Even when can get everything working (in debug mode), the initial frames take orders of magnitude longer to process than the rest and account for 500 msec or so or processing time. I think there is some JIT compiliation going on to program the GPU that is causing this. I recall seeing things about this around the OpenCV forums. I think there are work arounds.

The first issue is puzzling and annoying and definitely needs to be fixed. Can you try to reproduce this issue on your end?

@jonnew jonnew added bug and removed enhancement labels Apr 15, 2016
@mmyros
Copy link
Contributor Author

mmyros commented Apr 15, 2016

To your points in the first comment:

  1. I agree. I have a webcam that works reasonably well, and the current plan is to keep using it and record the 5MP for future off-line work.
  2. On the webcam, MOG works fine without CUDA. On Point Grey, there is no need for MOG if I can't get fast enough object detection.
  3. I am using an admittedly old machine, but I'd prefer sticking with it for the time being, since the project at hand needs to keep my nice machine free of CPU load. I do have a 24-core machine for future work, but I'd prefer to stick with low-end hardware right now. That said, here are my specs: 4-core Core 2 Quad 2.8 MGz, GPUGF119 [NVS 310]

I tried to compile in debug, but looks like there is a problem with boost:

Linking CXX executable Node_test
CMakeFiles/Node_test.dir/Node_test.cpp.o: In function boost::interprocess::ipcdetail::semaphore_init(sem_t*, unsigned int)': /usr/include/boost/interprocess/sync/posix/semaphore_wrapper.hpp:145: undefined reference tosem_init'
CMakeFiles/Node_test.dir/Node_test.cpp.o: In function boost::interprocess::ipcdetail::semaphore_destroy(sem_t*)': /usr/include/boost/interprocess/sync/posix/semaphore_wrapper.hpp:157: undefined reference tosem_destroy'
CMakeFiles/Node_test.dir/Node_test.cpp.o: In function boost::interprocess::ipcdetail::semaphore_post(sem_t*)': /usr/include/boost/interprocess/sync/posix/semaphore_wrapper.hpp:167: undefined reference tosem_post'
CMakeFiles/Node_test.dir/Node_test.cpp.o: In function boost::interprocess::ipcdetail::semaphore_wait(sem_t*)': /usr/include/boost/interprocess/sync/posix/semaphore_wrapper.hpp:176: undefined reference tosem_wait'
collect2: error: ld returned 1 exit status
test/shmemdf/CMakeFiles/Node_test.dir/build.make:97: recipe for target 'test/shmemdf/Node_test' failed
make[2]: *** [test/shmemdf/Node_test] Error 1
CMakeFiles/Makefile2:946: recipe for target 'test/shmemdf/CMakeFiles/Node_test.dir/all' failed
make[1]: *** [test/shmemdf/CMakeFiles/Node_test.dir/all] Error 2
Makefile:126: recipe for target 'all' failed
make: *** [all] Error 2

I also ran into an issue with RAM and encoding. I have 8 GB, and they fill up within about 20 seconds, then goes into swap. Memory load persists for up to a minute after oat is killed (if I catch it before RAM runs out), but writing threads keep going and go away and clear memory after a while . This only happens when I am writing video to hard drive. Would adding memory help with this?

@jonnew
Copy link
Owner

jonnew commented Apr 19, 2016

  • Webcam: cool. Have you gotten control over the frame rate by chance? Its something I have not really worked out thus far b/c webcams very often have all kinds of automatic exposure control which is changing the shutter time etc and makes explicit control over frame rate hard.

  • Debug compile seems to be failing because linker cannot find pthread objects on your machine. Why does this only occur in debug? 😩, just don't know for the time being...

  • With respect to RAM, I suspect the following is the issue:

    (22 FPS * 5 MP * 24 bits/pixel) / ( 8 bits/ byte) = 330 MB/sec

    This (minus compression, which I'm admittedly ignoring, but is probably made up for by the time it takes to do the compression...) is the requisite write speed (in actuality, not theoretically) of your hard disk in order not to get memory overflow.

    8 GB / .330 GB =~ 24 seconds.

    The RAM is filling because your hardware writes are not occurring fast enough. Oat is pushing frames to be written into a FIFO in main memory that the recorder threads are desperately trying to write to disk. Getting more RAM will just make the process persist for a bit longer before failing . I would get an an SSD for streaming video to and then transfer those videos to a slower long term storage after recording. This is that 5 MP camera starting its slew of issues: your hard disk is not fast enough, your CPU might not be fast enough even if your hard disk is faster ('recording' involves compression step -- not just writing), and RAM takes a beating because of all those pixels :).

jonnew pushed a commit that referenced this issue May 3, 2016
- This addresses part of #14
- Linker was not being provided with a complete set of libraries during
  test compilation and this was causing linking issues in some build
  environments.
@jonnew
Copy link
Owner

jonnew commented May 3, 2016

OK, I just fixed the debug linking issue in 0e67705. You should now be able to compile/link in debug mode. I have not yet looked into why the GPU support is not functional in release mode. That is next.

@mmyros
Copy link
Contributor Author

mmyros commented May 3, 2016

Awesome, thanks! I should be able to test that soon. I ordered an SSD as well, so we'll see if it mitigates the issue of writing.

The framerate of the webcam is currently 15 FPS, I don't think I can influence it at all.

Off-topic question: Have you figured out a way to sync frames to open-ephys data? Thanks!

@jonnew
Copy link
Owner

jonnew commented May 4, 2016

Yeah, webcams seems like in general they should be treated as asychronous IO. Basically, they are gonna do what they're gonna do and you just have to write down the timestamps of frames. In 89498c0 I introduced a montonic timer that measures and packs the arrival time with each frame (in microseconds) when using oat framserve wcam

The oat::Frame type is derived from cv::Mat, but has timing information integrated (sample number, period, and time of acquisition). This info is pretty much locked into the processing pipeline until frames are transformed to positions using oat-posidet though. At that point, timing info can be seen by dumping positions to stdout using oat-posisock or by saving positions using oat-record. Some timing info can be written on the frame using `oat-decorate, but currently the sample time is not implemented and this is not really too helpful for post processing.

For these reasons, it might be good to have a text file saved with each video that specifies timing information of each frame within the video. Packing them with the video seems complicated and not simple to parse. What do you think?

I wonder if we will have to save uncompressed video to ensure no frame loss. I am not sure.

Answer to off topic question: Yes, but lets move that discussion to the OE forums so others may benefit more directly from the answer.

@jonnew
Copy link
Owner

jonnew commented May 5, 2016

I have begun to understand why CUDA does not seem to be working in release mode. In fact, it is working, its just that the context initialization for the GPU takes a huge amount of time (like 1 minute) on my machine. This is the same problem as the delay that was occurring on the first few frames in debug version of the code, only greatly amplified. The only work around right now is to start your processing chain like 1 minute before its needed so that CUDA has time to create the context. Pretty bad, but I don't see an easy solution other than that for the time being...

@mmyros
Copy link
Contributor Author

mmyros commented May 5, 2016

For these reasons, it might be good to have a text file saved with each video that specifies timing information of each frame within the video. Packing them with the video seems complicated and not simple to parse. What do you think?

How about saving the timing in json along with position information? A separate file might be ok as well, but since in most cases there will be a tracking file already, why not use it at least.

As far as CUDA, I am not in a good positiion to test this right now, but I can try in a couple of weeks. It does not seem like a huge hurdle to wait a minute. I wonder if this varies with GPU.

Discussion in forums created here

@jonnew
Copy link
Owner

jonnew commented May 5, 2016

  1. Timing is certainly saved with position information in the json files. I was just thinking about the case where the user is not saving position info, but just video.
  2. OK, no rush. It will probably vary with GPU, I would imagine. Yeah -- I couldn't believe it either. I've started looking into OpenGL options that are useful across GPU suppliers and don't seem to have this problem.

@mmyros
Copy link
Contributor Author

mmyros commented Jul 24, 2016

I finally set up my hi-res camera on a nicer computer with 2 processors (24 threads) and 16 Gb of RAM. It seems that Oat can indeed keep up with tracking, but recording video still does not work. Strangely, it records the same amount of video data to a regular hard drive and an SSD (1m52s vs 2m06s) before all 16 Gb fill up. However, all 24 threads are working at 100% capacity throughout... I wonder if they just can't keep up with the encoding.

The resulting file is 861 Mb, so 861Mb/130s=7Mb/s, right?

Is there a way to try a less demanding codec or less compression perhaps?

@mmyros
Copy link
Contributor Author

mmyros commented Jul 29, 2016

All right, recording uncompressed video seems to work even with a HDD. I just changed the line int fourcc = CV_FOURCC('H', '2', '6', '4'); // original (Recorder.cpp, line 287) to int fourcc = 0; // Uncompressed

The resulting files are huge, but hey, it works.

@mmyros mmyros closed this as completed Jul 29, 2016
@mmyros mmyros reopened this Sep 6, 2016
@mmyros
Copy link
Contributor Author

mmyros commented Sep 6, 2016

Hi Jon,

Since you are refactoring things, what do you think about adding an option to change the encoding type? I've been recording non-compressed, raw video files. That works quite well and leaves processing power for real-time stuff. I just compress everything at the end of the day using ffmpeg. However, it would be nice to have the option to have some frameserves being compressed on the fly and some raw.

The compression option would just switch from

int fourcc = CV_FOURCC('H', '2', '6', '4');

in Recorder.cpp, line 287 to

int fourcc = 0; // Uncompressed

@jonnew
Copy link
Owner

jonnew commented Sep 6, 2016

yes! Great idea. I will certainly do that. I'm refactoring oat-record now and will be sure to do that. I'm almost done here -- all but two components (oat-record and oat-posigen) to go. I hope it won't take too much longer.

@jonnew
Copy link
Owner

jonnew commented Nov 7, 2016

OK, I'm closing this since oat-record now supports uncompressed video and this should solve your issue. Let me know if this is not the case and we can reopen.

@jonnew jonnew closed this as completed Nov 7, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants